Announcement: Upgrade to macOS Mojave

With features like Dark Mode, Stacks, and four new built-in apps, macOS Mojave helps you get more out of every click. 
Find out how to upgrade to macOS Mojave > https://support.apple.com/macos/mojave

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Question:

Question: Old problem with Spotlight and Word .docx files

There is an old problem with Spotlight not finding Word .docx files. You can find old discussions about this from 2008. But despite this problem being ancient it has never been solved.


The essence of the issue is that Spotlight relies on a variety of software modules to index files. Spotlight chooses the appropriate module to index each different type of file. These indexing modules all have the file suffix .mdimporter. There name of the module for indexing Microsoft office files is "Microsoft Office.mdimporter" and it can be found with some other .mdimporter modules at /Library/Spotlight/. The"Microsoft Office.mdimporter" module has copyright info from Microsoft inside its package so we know who wrote it. But this module is included with all new Macs by Apple. Perhaps this is why this problem has never been fixed - the software is written by Microsoft but delivered by Apple.


The symptoms of the problem are that Spotlight will not find content in some .docx files. It is not clear why it fails on some .docx files but not on others. This problem does not seem to affect other Microsoft formats such as .doc, .xls, .xlsx, etc.


Given that Apple wants to sell Macs to business, and that businesses make heavy use of MS Office typically, one would think that this would have been a priority to fix. But given that this problem is 7 years old and still unsolved I guess not.


Any suggestions on work-arounds would be welcome.

OS X Mountain Lion, latest release of OS X

Posted on

Reply
Question marked as Solved
Answer:
Answer:

Upon further investigation I found out that the problem is not with "Microsoft Office.mdimporter" as I had thought. If one runs the mdimport -d1 <Word file.docx> you can see what importer is being used. It turns out for .docx files the importer is called "RichText.mdimporter" which which has a copyright by Apple. And if you use mdimport the file will then be found by Spotlight. So the issue is not the importer but rather whatever is triggering the importer to run on modified .docx files. So much for blaming Microsoft for this one...


As a work-around I force indexing of my .docx files with a Bash script that is launched by cron. The script indexes all the .docx that were modified in the last 24 hours and is launched hourly by cron. If anyone is interested in the details just post here and I will include the script in another post.

Posted on

Question marked as Helpful

Feb 9, 2016 1:12 PM in response to Jonathan Brown In response to Jonathan Brown

Problem solved here, (El Capitan & office 2016 on mbp 2012);

Removed Microsoft Office.mdimporter from /Library/Spotlight, added Macintosh HD to privacy tab of spotlight preferences, and reversed this last action.

Now all docx files are found by spotlight, old, edited and new.

There’s more to the conversation

Read all replies

Page content loaded

Question marked as Solved

Dec 14, 2015 2:15 PM in response to Rumboogy22 In response to Rumboogy22

Upon further investigation I found out that the problem is not with "Microsoft Office.mdimporter" as I had thought. If one runs the mdimport -d1 <Word file.docx> you can see what importer is being used. It turns out for .docx files the importer is called "RichText.mdimporter" which which has a copyright by Apple. And if you use mdimport the file will then be found by Spotlight. So the issue is not the importer but rather whatever is triggering the importer to run on modified .docx files. So much for blaming Microsoft for this one...


As a work-around I force indexing of my .docx files with a Bash script that is launched by cron. The script indexes all the .docx that were modified in the last 24 hours and is launched hourly by cron. If anyone is interested in the details just post here and I will include the script in another post.

Dec 14, 2015 2:15 PM

Reply Helpful (3)

Dec 28, 2015 11:16 AM in response to Rumboogy22 In response to Rumboogy22

This has been a huge PITA for years - I'd love to have a solution!


The problem, as I understand it, is that while you can force Spotlight to index a docx file, the mdimporter can't distinguish the actual content from all the XML gibberish that it's wrapped in. If you've solved that, you're a hero. (I think "some" files are found because bits of content have made it into the metadata - although I haven't tested this theory.)

Dec 28, 2015 11:16 AM

Reply Helpful

Dec 28, 2015 11:45 AM in response to jpdemers In response to jpdemers

To force indexing of all your .docx files open the Terminal app and cut and paste the following line:


find ~ -type f -name '*.docx' -print -exec mdimport -d1 {} \;


What will happen is that it will search for all files ending in .docx within your home directory. For each file it will run the mdimport command which will index (or import in Apple terminology) the file. In the terminal window you will see the result of it indexing each file. When this is done (when you get back to the command prompt) you can quite the Terminal app. If you have a lot of .docx files this will take some time.


This is intended to be a one-time operation because it takes so long. In another post I will share some code to index just the recently changed .docx files.

Dec 28, 2015 11:45 AM

Reply Helpful (3)

Dec 28, 2015 12:04 PM in response to Rumboogy22 In response to Rumboogy22

Once all the .docx files have been indexed then you need a way to keep the recently changed ones indexed. The following is a small script program which does this.


Open the TextEdit app and cut and paste the following indented text into its window:


#!/bin/bash

# Forces Spotlight to re-indexing of .docx files in a range of modification/creation times.

# Does this by running the "mdimport" command on the appropriate .docx files.

# THIS IS INTENDED TO BE RUN AT LEAST ONCE EVERY 24 HOURS.


# Setup time stamp file.

TIME_STAMP=~/.time_stamp_file.txt

# Set time stamp file to current date/time.

touch "$TIME_STAMP"

# Adjust the date back in time 24 hours.

touch -A -240000 "$TIME_STAMP"


# Index .docx files that were modified since the time stamp file.

find ~ -type f -name '*.docx' -newer "$TIME_STAMP" -print -exec mdimport -d1 {} \;


exit


Then save this file with the name Spotlight_Reindex_DOCX.command. Make sure that TextEdit does not add a .txt suffix to the file. Also make sure that this file was saved to your home directory.


Open a Terminal app window and cut and paste the following command:

chmod +x ~/Spotlight_Reindex_DOCX.command


Now the program can be run. Just double click the file to execute it. It will index all the .docx files that were modified in the last 24 hours. It will show you all the files that it has indexed. Once the script is done running, as indicated by the line "[Process completed]", you can close the Terminal window that was opened by the script.

Dec 28, 2015 12:04 PM

Reply Helpful (1)

Dec 28, 2015 12:47 PM in response to Rumboogy22 In response to Rumboogy22

Believe it or not ... just on a whim, I copied the 2009 version of Microsoft Office.mdimporter into ~/Library/Spotlight, replacing the 2011 version that's been there since, I assume, 2011. (The 2009 version comes via the accumulated MS updates to Office 2004; the 2011 version via Office 2008.)


Forced a re-index, and IT WORKS!


(This is on an iMac running El Capitan, so ymmv. I can email you the file if you want to give it a shot.)

Dec 28, 2015 12:47 PM

Reply Helpful

Dec 29, 2015 1:03 AM in response to jpdemers In response to jpdemers

From what I can tell it is the RichText.mdimporter that is used in indexing .docx files. If you run the mdimport -d1 command on one of your .docx files what importer does it say it is using? Below is an example of what I get. Notice that it says is is using RichText.mdimporter (bolded in output)


Terminal command:

mdimport -d1 /Users/AAW/Downloads/test.docx

Output of command:

2015-12-29 00:59:38.755 mdimport[9479:543385] Imported '/Users/AAW/Downloads/test.docx' of type 'org.openxmlformats.wordprocessingml.document' with plugIn /System/Library/Spotlight/RichText.mdimporter.

Dec 29, 2015 1:03 AM

Reply Helpful (1)

Dec 31, 2015 9:28 AM in response to Rumboogy22 In response to Rumboogy22

Same as you get: RichText.mdimporter


Not at all clear why swapping in the older Microsoft Office.mdimporter had an effect.

Also odd is that Spotlight finds versions 1-5 of a docx document, spanning a period of about two years ... yet for version 6 (a day old), only the "AutoRecovery save" file is found. (This is typical - the AutoRecovery files have always been indexed.)

Running sudo mdutil -E / to force another re-indexing doesn't alter anything; v.6 is still missing from the index.


As I recall, when the problem first cropped up, it was only newer documents that weren't being indexed. This is pretty bizarre.

And it's exactly the situation you create with your Terminal command.


To automate the running of your script, you might have a look at http://superuser.com/questions/126907/how-can-i-get-a-script-to-run-every-day-on -mac-os-x

I imagine some files might get missed if the Mac isn't running at the appointed hour, but you could have it index all files that are, say, 3 days (or a week) old.

Dec 31, 2015 9:28 AM

Reply Helpful

Dec 31, 2015 10:53 AM in response to jpdemers In response to jpdemers

I was not aware of the different versions of .docx. So you are saying that version 6 never gets indexed? I am using Word 2011 with all the updates. Do you know what version of .docx that makes?


I have already automated my script using crontab. I did not want to post that since I thought it might be too involved for this forum. If you note in the script that I posted that it searches for all .docx files modified in the last 24 hours. I run that script every hour so this way it reaches back overnight when I start up the computer in the morning. Once a month I index all the .docx just to make sure nothing has slipped between the cracks.

Dec 31, 2015 10:53 AM

Reply Helpful

Jan 10, 2016 10:32 AM in response to Rumboogy22 In response to Rumboogy22

The fix that worked for me: remove Microsoft Office.mdimporter from /Library/Spotlight. Voilà: no more indexing loop. I wouldn't have hit upon the solution without reading this thread. The vital link: the mdimporter redundancy of Microsoft Office.mdimporter and Rich Text.mdimporter. Since they both index Word files, and since Console and opensnoop in Terminal both showed errors from Microsoft Office.mdimporter, I pulled the latter from the plug-ins folder and restarted. Spotlight finds Word docs—whether it will always find everything is another question I can't answer—but the pulsing dot in the Spotlight menu bar icon is no more.


I should add that I'm working with an early 2008 Macbook with 10.7.5, the last OS its hardware can run. It's a laptop kept offline at an office mainly for writing. I was asked to look at it because of nagging slowness, possibly after an update to 10.7.5. I did a clean install after I was unable to quell the constant indexing by other means. Spotlight finished its initial work without a hitch, but the looping came back after I added two large folders containing, among other things, numerous Word files, and I was able to stop the indexing loop by putting the two folders into the Spotlight privacy list, a less than ideal solution. None of the usual Spotlight interventions involving Terminal commands stopped the looping. Only yanking Microsoft Office.mdimporter did. Because consequently Spotlight may be … spotty, as a supplement I downloaded the free EasyFind. But as far as I can tell with minimal testing, Word documents appear, and the continual indexing that slows the machine is no longer going on.


So, while I'm not using your script as a solution, your research made it possible to solve this nettlesome problem. Thank you!

Jan 10, 2016 10:32 AM

Reply Helpful (2)

Jan 10, 2016 11:47 AM in response to jpdemers In response to jpdemers

Following your lead, I compared the Microsoft Office.mdimporter from an Office 2011 installation on the Macbook with another from a Macbook Air. Both plug-ins (annoyingly) show version 12.3.0, but the one on the 2008 Macbook running 10.7.5 has a creation date from 2009, whereas the Air's, which is running 10.11.2, has a creation date from 2011. I copied the later plug-in onto the old Macbook and ran Disk Utility to fix permissions. I got an error 5 for the plug-in, but Disk Utility fixed its permissions, so I restarted and forced a Spotlight reindexing with sudo mdutil -E / . Console shows errors loading the later mdimporter plug-in. It seems that on the older hardware and OS, replacing the 2009 plug-in with the one from 2011 only causes the reindexing loop to come back.


Like you and Rumboogy22, I find it puzzling that neither Microsoft nor Apple has dealt with this issue, which a quick web search shows has been vexing users for many a year. I wish I could give you a "This helped me" point for your observations about Spotlight quirks with different Word document versions, but I discovered too late that I can give credit only twice on a thread. However your post was likewise very helpful. Thank you.

Jan 10, 2016 11:47 AM

Reply Helpful (1)

Feb 9, 2016 8:50 AM in response to Jonathan Brown In response to Jonathan Brown

I think the reason neither Apple nor Microsoft has done anything about the failure of Spotlight to index docx documents is that it's not an obvious problem for many people who continue using doc files rather than docx, and it's not obvious even if you do use docx files unless you happen to be looking for something that you know is only in a docx file.


In any case, I've run into the problem using Mavericks, and I'm curious about fixes that don't requiring setting up special scripts.

Feb 9, 2016 8:50 AM

Reply Helpful
Question marked as Helpful

Feb 9, 2016 1:12 PM in response to Jonathan Brown In response to Jonathan Brown

Problem solved here, (El Capitan & office 2016 on mbp 2012);

Removed Microsoft Office.mdimporter from /Library/Spotlight, added Macintosh HD to privacy tab of spotlight preferences, and reversed this last action.

Now all docx files are found by spotlight, old, edited and new.

Feb 9, 2016 1:12 PM

Reply Helpful (7)

Feb 9, 2016 1:44 PM in response to Jonathan Brown In response to Jonathan Brown

Jonathan,


Thanks for your responses. You have gone much deeper with this than I. I have had success using the script that I posted on Dec 28, 2015. This is automated in crontab as I mentioned so I don't think about it anymore. And the problem has been gone for months now so I guess this is a solution that is proven to work (on 10.11 at least).


I don't intend to change versions of Office until I am absolutely forced to when support stops on Office 2011. So I guess I will just stick with what I have until that time then revisit this problem.

Feb 9, 2016 1:44 PM

Reply Helpful

Feb 10, 2016 12:42 PM in response to bjfromvelp In response to bjfromvelp

bjfromvelp's process worked for me, and I'm on Mavericks and office 2011 on a 2010 MacMini.

I removed Microsoft Office.mdimporter from /Library/Spotlight (keeping a copy on the desktop just in case)

added my Documents folder to the privacy tab of spotlight (to avoid searching a large file of PDFs in other files)

Then removed my Documents folder from the privacy tab.

Spotlight pondered a bit, and now seems to index both .doc and .docx documents, as well as RTFs and texts.

So far, so good. Maybe something will disappear, but that seems to solve it. Thanks.

Feb 10, 2016 12:42 PM

Reply Helpful (1)

Feb 10, 2016 3:45 PM in response to fiberhome In response to fiberhome

Turned out today that all file names were found after the changes I made, but not all files could be searched on content. I had to change the permissions of my documents folder: "everyone" to read&write; now all files can be searched. Just can't explain why I had to do this. Also can't explain why only the doc/docx files couldn't be searched... But it seems to work completely now.

Feb 10, 2016 3:45 PM

Reply Helpful
User profile for user: Rumboogy22

Question: Old problem with Spotlight and Word .docx files