Need help with text-to-speech engine

Question

Level 1

32 points

Need help with text-to-speech engine

I've asked this question before, but the discussion seemed to drift a little off topic. Let me try to get to the answer a little differently.

How do I use the OS X text-to-speech function to get the following sentence to be pronounced correctly:

"I am content with the content in the book I read yesterday and **I** expect **you** to read it tomorrow." (**'s indicate emphasis wanted.)

I ought to be able to use embedded speech commands to force the right pronunciation of the two "contents", the two "reads", and the emphasized "I" and "you", but I can't make them work. It is possible they might work with some voices, but I haven't found the ones that they do work with. I have noticed that in some circumstances it does know which pronunciation of "read" to use, but not in others, likewise with "content." If no-one knows the answer to my question here, where should I ask it where someone would know the answer?

MacBook Pro (Retina, 13-inch, Late 2012), OS X Yosemite (10.10.5)

Posted on Sep 15, 2015 8:32 PM

Reply

Answer 1

Best reply

VikingOSX

Community+ 2024

Level 10

111,215 points

Sep 16, 2015 11:13 AM in response to Theodore Lee

"I am con [[emph -]]tent with the content in the book I read yesterday and I [[emph -; slnc 125; rate 150; pmas 90; pmod 60]] expect you [[emph -; slnc 125; rate 150; pmas 90; pmod 60]] to read it tomorrow."

Read in TextEdit (Mavericks 10.9.5) with voice Susan at the following setting. I tested this for quite awhile before I felt it was close enough to post, and you to continue tweaking.

Reply

Answer 2

Theodore Lee Author

Level 1

32 points

Sep 16, 2015 11:13 AM in response to VikingOSX

Thanks. I agree that it seems to work better than anything else I've tried, even with the voice I use (Ava). To be honest, I can't figure out why they work since your speech commands don't seem to be working the way I'd expect them to from the documentation -- they seem to be applying to the *previous* word, not to what follows, as I had understood them to work. I'll have to play with it see better what's going on. In particular, it seems very strange that [[emph -]] is what you use to create emphasis, not [[emph +]]!

Reply

Answer 3

Sep 16, 2015 11:21 AM in response to Theodore Lee

Tweaking is a time consuming activity. I tried [[emph +]] and it was just sailing right through the word until I applied [[emph -]]. Go figure.

Reply

Answer 4

Theodore Lee Author

Level 1

32 points

Sep 16, 2015 11:26 AM in response to VikingOSX

And you didn't use [[emph +]] at all! Do you know if anyone has ever written up a truly useful guide to the speech commands, especially indicating what works with which voices? The developer's "Speech Synthesis Programming Guide" from 2006, which seems to be the most recent version I can find, is not particularly helpful.

Reply

Answer 5

Theodore Lee Author

Level 1

32 points

Sep 16, 2015 12:03 PM in response to VikingOSX

OK, I did some more experimenting, both with Ava and Susan. I need to take the "helpful" off! The only commands that had any affect on the speech were the [[slnc 125]]. I took all the other commands (emph, rate, pmos, pmod) out and that made no difference. I had it pronounce the two sentences right after each other -- with and without any commands except [[slnc 125]] and couldn't hear any difference. It was breaking "content" into two words --- "con tent" -- that made the first "content" sound a little better, and the two 125ms delays made the words before them ("I" and "you") seem to have been emphasized, but they weren't really. It didn't matter whether I used Susan or Ava. I'd go back to the drawing board, but I think it isn't taking any ink at all. I am willing to assert that the only useful command that works any more is [[slnc ...]]. The manual has some examples of the phonetic spelling command that works for some voices (like Alex) but not others (neither Susan nor Ava.) The manual also has examples of [[emph +]] that don't work with Alex either.

Reply

Answer 6

red_menace

Level 6

17,030 points

Sep 16, 2015 2:08 PM in response to Theodore Lee

Those phonetic commands usually only work with the original set of voices from ancient times, such as Vicki, Victoria, Alex, and Bruce. The newer voices, along with the premium ones, pretty much ignore everything, including the basic speech pitch and rate.

Reply

Answer 7

Theodore Lee Author

Level 1

32 points

Sep 16, 2015 2:39 PM in response to red_menace

That does seem to be what I'm finding out and what the general consensus is. What a shame -- older is better than newer. Sure, on the whole the newer voices do a much better job than the older ones, but that just makes the places they fail all that more obvious! With all the work that I imagine it takes to produce a "voice" you'd think they could do just a little more so as to add a few of the simple embedded speech commands. If emphasis, phonetics (with stress), and possibly volume were available, that it would make it so much better. Oh, and being able to switch voices on the fly (not one of the original speech commands, I believe) would be super. One can only dream...

Reply

Answer 8

Hiroto

Level 5

7,461 points

Sep 16, 2015 7:44 PM in response to Theodore Lee

Hello

Here's a relevant thread.

"say" terminal command - how to change emphasis

https://discussions.apple.com/thread/6702327

It's too bad that text-to-speech engine has become completely a black box now...

H

Reply

Answer 9

Theodore Lee Author

Level 1

32 points

Sep 16, 2015 8:05 PM in response to Hiroto

Thanks for responding, but I'd already seen that thread, which just confirms what we've been talking about here -- except for the ability to introduce silence, the embedded speech commands basically don't work any more, except possibly for the "original" voices. Maybe I should ask this as a separate question, but how do we go about getting a definitive answer from Apple, rather than speculation amongst us users?

Reply

Answer 10

Hiroto

Level 5

7,461 points

Sep 16, 2015 8:31 PM in response to Theodore Lee

The text-to-speech engine under current discussion is of a closed kingdom named Apple. There's neither open documentation nor source code we can explore. That's why I used the expression "black box".

You might ask Apple questions on this issue but I'd be surprised if you get any answer.

H

Reply

Answer 11

Theodore Lee Author

Level 1

32 points

Sep 17, 2015 5:57 AM in response to Hiroto

Where would I ask the question such that it might get an answer? As best I can tell, Apple doesn't generally join in on the discussions in "Apple Support Communities." I would assume if that I were an official Apple developer I'd have an official channel for asking questions, but I'm not, just an ordinary customer.

Reply

Answer 12

Hiroto

Level 5

7,461 points

Sep 17, 2015 8:14 AM in response to Theodore Lee

I have no connection with Apple. Please ask Apple how to ask Apple questions on its text-to-speech engine. I'd guess you'll have to have a paid developer account at least to ask for any technically meaningful answer. Note that there's no guarantee for you to get any technically meaningful answer.

I'd not bother myself to pay and ask for the obvious.

Good luck,

H

Reply