Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Improving voice dictation results?

Recently bought a MacBook Pro (Retina) and working (or trying to work) with the voice dictation. However the quality of the results actually seems to be getting worse over time, whereas it's supposed to get better. I suspect that I am giving it confusing feedback, but can anyone explain how it is working or how it is supposed to work?


My original theory was that sometimes I am editing silently but with the dictation active, and it was interpreting that as incorrect feedback about recognition errors that weren't there. To prevent this, I tried making sure that the voice input was disengaged before making any editorial changes, but that did not seem to help.


The feedback coming from the Mac's side is also somewhat confusing, but it definitely seems as though the Mac is refining its interpretation for several seconds. After I have said a few more words, it often goes back and changes part of the text that was already displayed, so I think it is building better recognition results based on the additional context. Sometimes it winds up with ambiguous text in blue and the option to select the proper text, but it definitely seems that it offers that option less frequently these days, even though the quality of the transcription in that passage may be poor. (There are times when it seems to be working better, but I also can't find any pattern or speaking style for consistently better results. The average transcription quality seems to be near the breakeven point... Sometimes it is definitely faster than typing, but other times the corrections take longer than just typing it in the first place.)


Related topic, but sometimes I want to go back and change part of what I dictated by inserting new words in the middle, still using the dictation. However, right now I think that is a bad idea, and I am trying to avoid doing it. The Mac ought to be smart enough to figure out what is going on, and even to recognize the nearby content as relevant, but I guess not.


As a constructive suggestion for recognizing the words properly, it would be quite helpful if I could tell the Mac to play the sound it thinks it heard around some word so that I could correct the displayed words in accord with what I actually said. Perhaps with an explicit switch to a no-dictation editing mode for final polishing?

MacBook Pro with Retina display

Posted on May 27, 2015 7:26 PM

Reply
22 replies

Oct 24, 2017 3:48 AM in response to shanen0

Considering this thread is from mid-2015 but seems to be the latest word on this question, there seems to be little interest in the topic. There seem to be a number of questions that have been asked but have not been addressed, in preference to the irrelevant but nevertheless tempting Apple versus MS question. Here are the questions that remain open:


1. How does the voice recognition actually work?

2. Is there any better software than the built-in Apple Enhanced Dictation on the market. I haven't been able to find anything and in the comparisons it is still listed among the top 4.

3. Is there any way to "educate" the computer to your voice or, by editing, to teach it words it has not used before?

4. My results are best when I talk slowly, the words clearly distinguishable from one another, with regular pauses. Is this the best one can do? Is there any way of speeding up the results? What am I missing?

5. Is there any clear guidline or support forum on this question? The available information begins by describing voice recognition with a long discourse and after you've read reams of stuff you already know, ends by telling you about the problems you already know.

6. You can enable Siri, which does appear to give slightly better results, but it introduces the issue of unwanted voice commands when such words appear in the text you are dictating. What is the relationship between the Enhanced Dictation and online voice recognition?

7. Is there any forum or continuous thread where those of us using Dictation software can discuss our problems and suggest improvements to the system, or help in its optimum usage?

May 27, 2015 7:32 PM in response to shanen0

By the way, I've also asked this question to Apple's telephone support without getting anything useful. Ditto a local Apple store, but I didn't know I was supposed to get an appointment over the Web before going...


At one point the phone people referred me to the manuals section of the website, and I also spent some time there, but could not find any manual that seemed to cover this question. If you can point me at an appropriate manual, it would help. Or maybe I need a book or manual targeted at Linux/Windows people entering the Mac world?


My main reaction to my first month of Mac ownership is that "Think Different" must be a joke. "Think OUR Way" seems more plausible at this point. Sorry, but I will persist in thinking my own way.

May 27, 2015 10:34 PM in response to leroydouglas

Been there and suspect I have already found most of the similar pages.


Perhaps it will help to clarify that I am not concerned with commands or meta-commands because of the modality problem. I don't think any of the systems are smart enough to distinguish between modes reliably, and I don't think I could even keep it straight enough to speak precisely about such things. My objective is simply voice input during bulk composition.


What I'm doing now when I dictate, but with only limited success, basically involves thinking for a few seconds and then saying a sentence at moderate speed and with fairly careful pronunciation of each word. Quite often I am ready and start dictating the next sentence before the analysis has been completed on the previous sentence, but the Mac can usually keep up with me pretty well.


I have done some experiments with just talking in a relaxed and informal way, but the results of that approach so far appear to be mostly unusable. At my normal speed, I'm lucky if it catches enough of the important words to allow me to reconstruct what I was thinking about, but my natural speaking speed is unusually rapid. However, this is where my suggestion might be especially helpful, because I am pretty good at recognizing my recorded words even at my fastest speaking tempos.

May 28, 2015 9:39 AM in response to shanen0

I wouldn't expect perfection in the area of voice recognition. Computer Science researchers have been working in this area for fifty or more years. It's not an easy area. Everyone's voice is slightly different. Words are context dependent.


Example: wind

http://dictionary.reference.com/browse/wind


If dictation is your goal, you could try a commercial package. I suspect the authors of a commercial package would have spent more effort than apple on the package. Would any diction application be good enough for you? Who would know.


What about background noises? Changing background noises can be a distraction. A quality microphone might help.


Robert

May 28, 2015 6:14 PM in response to rccharles

Actually I supported a research lab for many years, so I'm pretty up-to-speed on the state of voice recognition and transcript editing. Ever heard of a voice font? Not sure it ever went commercial, but Mel Blanc had scores of them in his head... I have experimented with conditions and various background noises, and also with my speaking position, but not with other microphones.


Feels like I'm repeating myself, but I'm not expecting perfection. My original focal concern is whether or not my efforts to train the system are going in the wrong direction. My broader concern is with the tradeoff against typing. Even when I am typing there is a certain loss of efficiency for mistakes, but the lossage of the voice dictation should be as low as possible to save as much input time as possible...


From Apple's description, they seem to think they are offering accurate recognition of natural and continuous speech of the conversational style. I know how difficult that is, but I am not trying to push it that hard. Perhaps my problem is that I'm fighting the system and Apple has truly optimized the parameters in that direction?

Jun 10, 2015 5:28 PM in response to shanen0

Eh? What is this auto-saved version? Seems to be the same as the one that was saved, but I can't figure out which I should delete, or if trying to delete either would destroy both... Ergo, I should just save this version and see what happens to the other one?


Actually I supported a research lab for many years, so I'm pretty up-to-speed on the state of voice recognition and transcript editing. Ever heard of a voice font? Not sure it ever went commercial, but Mel Blanc had scores of them in his head... I have experimented with conditions and various background noises, and also with my speaking position, but not with other microphones.


Feels like I'm repeating myself, but I'm not expecting perfection. My original focal concern is whether or not my efforts to train the system are going in the wrong direction. My broader concern is with the tradeoff against typing. Even when I am typing there is a certain loss of efficiency for mistakes, but the lossage of the voice dictation should be as low as possible to save as much input time as possible...


From Apple's description, they seem to think they are offering accurate recognition of natural and continuous speech of the conversational style. I know how difficult that is, but I am not trying to push it that hard. Perhaps my problem is that I'm fighting the system and Apple has truly optimized the parameters in that direction?

Jun 10, 2015 5:42 PM in response to shanen0

Hmm... Okay, as I was about to report before the mysterious interruption of the auto-saved version, just a minor status report on this topic:


The quality of the results from voice dictation now seem stable. They no longer seem to be getting worse, but neither are they getting better. My searches for more information on how it works or how to train it more effectively have all come up dry, including calls to Apple support. My various experiments all failed to produce clear results, at least over the time periods in which I tested various techniques or rules for dictation. I have become a little better using it, so I feel that there is usually a savings in time if I can dictate first, but the savings are marginal and I am not yet ready to assess the overall quality. Composing by speaking is more natural in some ways, but also affects the results, especially their structure and organization.


In terms of the higher goal of learning to like the Mac, no progress. Neither do I dislike it. Perhaps I should envy people who still feel such enthusiasm for any computer, Mac or whatever? As a paying customer, I think I should have bought the Chromebook first, but I'm pretty well resolved to buy a Chromebook and see whether or not that is more satisfactory. (I still despise Microsoft, but I may take advantage of the supposedly free trial of Windows 10 to test if I should despise MS less than I currently think I do...)


Narrow response to the Apple Support Communities is that my three attempts produced almost no useful information (excluding information that I had already found via other channels, which basically means searching the Web). I had high hopes for a pointer to some book like "How I Learned to Love my Mac"?

Jun 10, 2015 7:34 PM in response to shanen0

The beauty of the Mac experience is that the interface is approachable for novice users, but can be speeded up (by learning the shortcuts) to service near-expert users as well. Don't know what you don't know? Just mouse around the Menus and read the major choices. Or as a near-expert, type the shortcuts for immediate results, mousing is not required.


It is more approachable, more regular, and far more forgiving than other Operating Systems in use today.


Compare and contrast unix, where if you make too many mistakes in typing the OS suggests you may need to take a nap. The tools there are extremely sharp, but the only thing novices seem to accomplish is getting themselves cut to ribbons.


I did not say there is no learning curve -- of course there is one. and high-end software comes with an enormous learning curve. But more than most others, Mac OS X allows you to use the computer as a tool rather than an end in itself.

Jun 14, 2015 5:31 PM in response to Grant Bennet-Alder

Another heavy weekend, and only limited time for wrestling with the voice input. No significant progress to report, I'm afraid.


However, on the general interface topic, I think the future interfaces will be more flexible and adaptable to what the individual user wants to accomplish. As the work calls for new features, they should become accessible in accord with the way that person works. The idea of the monolithic "best" interface is a large part of the problems we're fighting, though I'm not sure who to blame... On the one hand, I feel like Apple may have started that arms race, but on the other hand, I feel MS made it profitable... On the third hand, I want to give some credit to Apple for rethinking things, even if they apparently remain committed to the ridiculous search for best. (Unfortunately, even if I know roughly what form the solution will have, I don't expect to live long enough to see it.)

Jun 27, 2015 12:10 PM in response to shanen0

I am currently using 10.10.3, on iMac. I had read that Yosemite could perform speech recognition offline, and Apple called it "enhanced" dictation...

Well,the results are not perfect, but different from online dictation.

The bad part of the business is that online dictation allow the text to be changed, because it underlines in blue the ambiguos words. Not the same happens using "enhanced" version! So it is impossible to teach the software! Also is impossible to add the missing words!

Is this version "enhanced"? Wow, fantastic. But, excuse me, let me better understand: enhanced regarding...what?

Apple support was not able to tell me a solution. Really, they looked like they weren't aware of the problem. Can't believe that nobody told to Apple about this issue...

Jun 27, 2015 5:40 PM in response to pinkypinky

Well, I think I have to make some allowances because your English is pretty clearly not one of the mainstream dialects. It differs to the point that it makes your meaning a bit unclear, but let me make sure I understand what you're saying in your conclusion: "Can't believe that nobody told to Apple about this issue..." I think you mean either "I'm surprised that other users are not complaining about this [voice dictation accuracy] problem" or "Hasn't anyone noticed this and asked Apple about it?"


For either interpretation of your intention, I think the explanation has to do with the way Apple relates to the users, which I would actually describe by starting with a comment that a self-confessed fanbois wrote elsewhere on an unrelated topic: "It just works." Now he sees that as a good thing, but I'm seeing that as a problem because I'm not always willing to accept that the way it works is good enough... That's probably the real crux of my "problem" with the voice dictation results. Apple has said that this is how it works and they expect me to be satisfied with those results, whereas I want to improve the results and I think that understanding HOW it works would help me make it work better, possibly by adjusting the ways that I'm using it.


Your [pinkypinky's] observation on the ambiguous word feature is interesting, and I will attempt to do some testing to confirm whether or not it is accurate. My initial feeling is that it probably isn't correct, since I'm usually connected to the network, even though I have installed the offline "enhanced" software modules for several languages. I also (obviously) agree with you that it would be good if Apple support at some level or path was able to address such "issues" before they become undeniable "problems".


In my own case, I think the issue is already mooted. The result was that I mostly quit using the Mac and now regard it as an expensive mistake that will never justify the large investment. I'm probably going to give the machine to a friend and hope that person finds it more useful than I did.


I want to attempt to speak in defense of Apple's support system because I really do feel like the people I spoke to on the phone really made a sincere effort to answer my questions, but they don't have access to the required data. I cannot speak in defense of the Apple store in Ginza, since my visits there were completely useless and did not even feel "sincere", but perhaps that was due to my own failure to make advance reservations over the Internet. However, my access to both of those options is exhausted now, even though I barely used them and certainly received no satisfaction via those channels. Again, that's my own fault for not giving higher priority to this matter during my 3-month new-owner period. I also spent quite a bit of time on the Apple website, and it was quite disappointing. Lastly, I should mention this discussion forum as part of Apple's support system, but that's kind of hard to assess... It seems obvious that Apple has provided incentives to encourage participation, but I didn't receive any useful answers to the tough questions. (Or maybe there were some and I just failed to recognize them?)

Improving voice dictation results?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.