Theodore Lee

Q: Need help with text-to-speech engine

I've asked this question before, but the discussion seemed to drift a little off topic.  Let me try to get to the answer a little differently. 

 

How do I use the OS X text-to-speech function to get the following sentence to be pronounced correctly:

 

"I am content with the content in the book I read yesterday and **I** expect **you** to read it tomorrow."   (**'s indicate emphasis wanted.)


I ought to be able to use embedded speech commands to force the right pronunciation of the two "contents", the two "reads", and the emphasized "I" and "you", but I can't make them work.  It is possible they might work with some voices, but I haven't found the ones that they do work with.  I have noticed that in some circumstances it does know which pronunciation of "read" to use, but not in others, likewise with "content."  If no-one knows the answer to my question here, where should I ask it where someone would know the answer?

MacBook Pro (Retina, 13-inch, Late 2012), OS X Yosemite (10.10.5)

Posted on Sep 15, 2015 8:32 PM

Close

Q: Need help with text-to-speech engine

  • All replies
  • Helpful answers

  • by VikingOSX,Helpful

    VikingOSX VikingOSX Sep 16, 2015 11:13 AM in response to Theodore Lee
    Level 7 (20,591 points)
    Mac OS X
    Sep 16, 2015 11:13 AM in response to Theodore Lee

    "I am con [[emph -]]tent with the content in the book I read yesterday and I [[emph -; slnc 125; rate 150; pmas 90; pmod 60]] expect you [[emph -; slnc 125; rate 150; pmas 90; pmod 60]] to read it tomorrow."

     

    Read in TextEdit (Mavericks 10.9.5) with voice Susan at the following setting. I tested this for quite awhile before I felt it was close enough to post, and you to continue tweaking.

    Screen Shot 2015-09-16 at 1.29.21 PM.png

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 16, 2015 11:13 AM in response to VikingOSX
    Level 1 (16 points)
    Sep 16, 2015 11:13 AM in response to VikingOSX

    Thanks.  I agree that it seems to work better than anything else I've tried, even with the voice I use (Ava).  To be honest, I can't figure out why they work since your speech commands don't seem to be working the way I'd expect them to from the documentation -- they seem to be applying to the *previous* word, not to what follows, as I had understood them to work.  I'll have to play with it see better what's going on.   In particular, it seems very strange that [[emph -]] is what you use to create emphasis, not [[emph +]]!

  • by VikingOSX,

    VikingOSX VikingOSX Sep 16, 2015 11:21 AM in response to Theodore Lee
    Level 7 (20,591 points)
    Mac OS X
    Sep 16, 2015 11:21 AM in response to Theodore Lee

    Tweaking is a time consuming activity. I tried [[emph +]] and it was just sailing right through the word until I applied [[emph -]]. Go figure.

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 16, 2015 11:26 AM in response to VikingOSX
    Level 1 (16 points)
    Sep 16, 2015 11:26 AM in response to VikingOSX

    And you didn't use [[emph +]] at all!   Do you know if anyone has ever written up a truly useful guide to the speech commands, especially indicating what works with which voices?   The developer's "Speech Synthesis Programming Guide" from 2006, which seems to be the most recent version I can find, is not particularly helpful.

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 16, 2015 12:03 PM in response to VikingOSX
    Level 1 (16 points)
    Sep 16, 2015 12:03 PM in response to VikingOSX

    OK, I did some more experimenting, both with Ava and Susan.  I need to take the "helpful" off!   The only commands that had any affect on the speech were the [[slnc 125]].  I took all the other commands (emph, rate, pmos, pmod) out and that made no difference.  I had it pronounce the two sentences right after each other -- with and without any commands except [[slnc 125]] and couldn't hear any difference.  It was breaking "content" into two words --- "con tent" -- that made the first "content" sound a little better, and the two 125ms delays made the words before them ("I" and "you") seem to have been emphasized, but they weren't really.  It didn't matter whether I used Susan or Ava.  I'd go back to the drawing board, but I think it isn't taking any ink at all.  I am willing to assert that the only useful command that works any more  is [[slnc ...]].  The manual has some examples of the phonetic spelling command that works for some voices (like Alex) but not others (neither Susan nor Ava.)  The manual also has examples of [[emph +]] that don't work with Alex either.

  • by red_menace,

    red_menace red_menace Sep 16, 2015 2:08 PM in response to Theodore Lee
    Level 6 (15,519 points)
    Desktops
    Sep 16, 2015 2:08 PM in response to Theodore Lee

    Those phonetic commands usually only work with the original set of voices from ancient times, such as Vicki, Victoria, Alex, and Bruce.  The newer voices, along with the premium ones, pretty much ignore everything, including the basic speech pitch and rate.

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 16, 2015 2:39 PM in response to red_menace
    Level 1 (16 points)
    Sep 16, 2015 2:39 PM in response to red_menace

    That does seem to be what I'm finding out and what the general consensus is.  What a shame -- older is better than newer.  Sure, on the whole the newer voices do a much better job than the older ones, but that just makes the places they fail all that more obvious!  With all the work that I imagine it takes to produce a "voice" you'd think they could do just a little more so as to add a few of the simple embedded speech commands.  If emphasis, phonetics (with stress), and possibly volume were available, that it would make it so much better.  Oh, and being able to switch voices on the fly (not one of the original speech commands, I believe) would be super.   One can only dream...

  • by Hiroto,

    Hiroto Hiroto Sep 16, 2015 7:44 PM in response to Theodore Lee
    Level 5 (7,276 points)
    Sep 16, 2015 7:44 PM in response to Theodore Lee

    Hello

     

    Here's a relevant thread.

     

    "say" terminal command - how to change emphasis

    https://discussions.apple.com/thread/6702327

     

     

    It's too bad that text-to-speech engine has become completely a black box now...

     

    H

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 16, 2015 8:05 PM in response to Hiroto
    Level 1 (16 points)
    Sep 16, 2015 8:05 PM in response to Hiroto

    Thanks for responding, but I'd already seen that thread, which just confirms what we've been talking about here -- except for the ability to introduce silence, the embedded speech commands basically don't work any more, except possibly for the "original" voices.  Maybe I should ask this as a separate question, but how do we go about getting a definitive answer from Apple, rather than speculation amongst us users?

  • by Hiroto,

    Hiroto Hiroto Sep 16, 2015 8:31 PM in response to Theodore Lee
    Level 5 (7,276 points)
    Sep 16, 2015 8:31 PM in response to Theodore Lee

    The text-to-speech engine under current discussion is of a closed kingdom named Apple. There's neither open documentation nor source code we can explore. That's why I used the expression "black box".

     

    You might ask Apple questions on this issue but I'd be surprised if you get any answer.

     

    H

  • by Theodore Lee,

    Theodore Lee Theodore Lee Sep 17, 2015 5:57 AM in response to Hiroto
    Level 1 (16 points)
    Sep 17, 2015 5:57 AM in response to Hiroto

    Where would I ask the question such that it might get an answer?  As best I can tell, Apple doesn't generally join in on the discussions in "Apple Support Communities."  I would assume if that I were an official Apple developer I'd have an official channel for asking questions, but I'm not, just an ordinary customer.

  • by Hiroto,

    Hiroto Hiroto Sep 17, 2015 8:14 AM in response to Theodore Lee
    Level 5 (7,276 points)
    Sep 17, 2015 8:14 AM in response to Theodore Lee

    I have no connection with Apple. Please ask Apple how to ask Apple questions on its text-to-speech engine. I'd guess you'll have to have a paid developer account at least to ask for any technically meaningful answer. Note that there's no guarantee for you to get any technically meaningful answer.

     

    I'd not bother myself to pay and ask for the obvious.

     

    Good luck,

    H