Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Old problem with Spotlight and Word .docx files

There is an old problem with Spotlight not finding Word .docx files. You can find old discussions about this from 2008. But despite this problem being ancient it has never been solved.


The essence of the issue is that Spotlight relies on a variety of software modules to index files. Spotlight chooses the appropriate module to index each different type of file. These indexing modules all have the file suffix .mdimporter. There name of the module for indexing Microsoft office files is "Microsoft Office.mdimporter" and it can be found with some other .mdimporter modules at /Library/Spotlight/. The"Microsoft Office.mdimporter" module has copyright info from Microsoft inside its package so we know who wrote it. But this module is included with all new Macs by Apple. Perhaps this is why this problem has never been fixed - the software is written by Microsoft but delivered by Apple.


The symptoms of the problem are that Spotlight will not find content in some .docx files. It is not clear why it fails on some .docx files but not on others. This problem does not seem to affect other Microsoft formats such as .doc, .xls, .xlsx, etc.


Given that Apple wants to sell Macs to business, and that businesses make heavy use of MS Office typically, one would think that this would have been a priority to fix. But given that this problem is 7 years old and still unsolved I guess not.


Any suggestions on work-arounds would be welcome.

OS X Mountain Lion, latest release of OS X

Posted on Dec 7, 2015 11:09 AM

Reply
39 replies

Mar 19, 2017 6:18 PM in response to VikingOSX

Update: Apple has changed all Word documents (.doc, .docx) kind reference back to "Microsoft Word Document." This is true in El Capitan 10.11.6 (15G1217), and macOS Sierra 10.12.3 (16D32).


The following will find the word stranger in every .doc and .docx Word document on the indexed drive, and avoids other Word document extensions:


"stranger" kind:word +.doc OR +.docx

Jun 4, 2017 9:58 PM in response to Rumboogy22

Just come across this thread when I realised that any docx files modified since April this year were not being indexed. What have I discovered:


1. Microsoft Office.mdimporter is supplied by Apple (not from Microsoft as part of Office as I had assumed) and is regularly changed even though the version number remains at 12.3.0 and the bundle's creation date remains at 27 June 2011. New versions are certainly included in the 10.12.4 and 10.12.5 updates.


2. If you delete the importer that may partially fix indexing of .docx files, but means that Excel files are no longer indexed!


3. I have the problem (non indexing of docx) on my desktop Mac, but not on my MacBook. Both are running 10.12.5 and Office 15.34. I have copied Microsoft Office.mdimporter from the MacBook to the desktop. And the problem is fixed. All new/modified documents are being indexed correctly. As far as I can see the two importers are identical.


4. Doing mdimport -d1 <path to docx/xlsx file> in Terminal still gives an error about 'wrong architecture'. But automatic indexing is working.


5. My solution is to replace Microsoft Office.mdimporter with a copy from a Mac which is working correctly. Keeping my fingers crossed that it continues to work for a while.


6. I don't understand this any more than anyone else.

Jul 6, 2017 11:59 AM in response to Gilby101

Coming back to this after a while, this whole thing makes less sense than ever. I just tried a search for a term that Spotlight found in a .docx file that was closed but not in one that was open. There seems to be no rhyme or reason.


I have lots and lots of files reflecting many years of writing. Could the real problem be that Spotlight can't handle all the word processing documents I have and is losing some -- perhaps only from the .docx files? Do others who see this have lots of files? I also still have a spinning hard drive - could another part of the problem be that spotlight access is so slow with a spinner that it can't find everything?

Sep 2, 2017 8:56 AM in response to mikeheck

Update--the solution I tried above (copying RichText.mdimporter alongside Microsoft Office.mdimporter, from livins2) lasted for a few hours and then Spotlight was no longer indexing .docx file contents.


I ran the terminal command

find ~ -type f -name '*.docx' -print -exec mdimport -d1 {} \;


provided by the original poster and that has at least brought all current files into Spotlight's index...good enough for now.


Still very frustrating that this has not be resolved.

Dec 14, 2015 2:15 PM in response to Rumboogy22

Upon further investigation I found out that the problem is not with "Microsoft Office.mdimporter" as I had thought. If one runs the mdimport -d1 <Word file.docx> you can see what importer is being used. It turns out for .docx files the importer is called "RichText.mdimporter" which which has a copyright by Apple. And if you use mdimport the file will then be found by Spotlight. So the issue is not the importer but rather whatever is triggering the importer to run on modified .docx files. So much for blaming Microsoft for this one...


As a work-around I force indexing of my .docx files with a Bash script that is launched by cron. The script indexes all the .docx that were modified in the last 24 hours and is launched hourly by cron. If anyone is interested in the details just post here and I will include the script in another post.

Dec 28, 2015 11:16 AM in response to Rumboogy22

This has been a huge PITA for years - I'd love to have a solution!


The problem, as I understand it, is that while you can force Spotlight to index a docx file, the mdimporter can't distinguish the actual content from all the XML gibberish that it's wrapped in. If you've solved that, you're a hero. (I think "some" files are found because bits of content have made it into the metadata - although I haven't tested this theory.)

Dec 28, 2015 11:45 AM in response to jpdemers

To force indexing of all your .docx files open the Terminal app and cut and paste the following line:


find ~ -type f -name '*.docx' -print -exec mdimport -d1 {} \;


What will happen is that it will search for all files ending in .docx within your home directory. For each file it will run the mdimport command which will index (or import in Apple terminology) the file. In the terminal window you will see the result of it indexing each file. When this is done (when you get back to the command prompt) you can quite the Terminal app. If you have a lot of .docx files this will take some time.


This is intended to be a one-time operation because it takes so long. In another post I will share some code to index just the recently changed .docx files.

Dec 28, 2015 12:04 PM in response to Rumboogy22

Once all the .docx files have been indexed then you need a way to keep the recently changed ones indexed. The following is a small script program which does this.


Open the TextEdit app and cut and paste the following indented text into its window:


#!/bin/bash

# Forces Spotlight to re-indexing of .docx files in a range of modification/creation times.

# Does this by running the "mdimport" command on the appropriate .docx files.

# THIS IS INTENDED TO BE RUN AT LEAST ONCE EVERY 24 HOURS.


# Setup time stamp file.

TIME_STAMP=~/.time_stamp_file.txt

# Set time stamp file to current date/time.

touch "$TIME_STAMP"

# Adjust the date back in time 24 hours.

touch -A -240000 "$TIME_STAMP"


# Index .docx files that were modified since the time stamp file.

find ~ -type f -name '*.docx' -newer "$TIME_STAMP" -print -exec mdimport -d1 {} \;


exit


Then save this file with the name Spotlight_Reindex_DOCX.command. Make sure that TextEdit does not add a .txt suffix to the file. Also make sure that this file was saved to your home directory.


Open a Terminal app window and cut and paste the following command:

chmod +x ~/Spotlight_Reindex_DOCX.command


Now the program can be run. Just double click the file to execute it. It will index all the .docx files that were modified in the last 24 hours. It will show you all the files that it has indexed. Once the script is done running, as indicated by the line "[Process completed]", you can close the Terminal window that was opened by the script.

Dec 28, 2015 12:47 PM in response to Rumboogy22

Believe it or not ... just on a whim, I copied the 2009 version of Microsoft Office.mdimporter into ~/Library/Spotlight, replacing the 2011 version that's been there since, I assume, 2011. (The 2009 version comes via the accumulated MS updates to Office 2004; the 2011 version via Office 2008.)


Forced a re-index, and IT WORKS!


(This is on an iMac running El Capitan, so ymmv. I can email you the file if you want to give it a shot.)

Dec 29, 2015 1:03 AM in response to jpdemers

From what I can tell it is the RichText.mdimporter that is used in indexing .docx files. If you run the mdimport -d1 command on one of your .docx files what importer does it say it is using? Below is an example of what I get. Notice that it says is is using RichText.mdimporter (bolded in output)


Terminal command:

mdimport -d1 /Users/AAW/Downloads/test.docx

Output of command:

2015-12-29 00:59:38.755 mdimport[9479:543385] Imported '/Users/AAW/Downloads/test.docx' of type 'org.openxmlformats.wordprocessingml.document' with plugIn /System/Library/Spotlight/RichText.mdimporter.

Dec 31, 2015 9:28 AM in response to Rumboogy22

Same as you get: RichText.mdimporter


Not at all clear why swapping in the older Microsoft Office.mdimporter had an effect.

Also odd is that Spotlight finds versions 1-5 of a docx document, spanning a period of about two years ... yet for version 6 (a day old), only the "AutoRecovery save" file is found. (This is typical - the AutoRecovery files have always been indexed.)

Running sudo mdutil -E / to force another re-indexing doesn't alter anything; v.6 is still missing from the index.


As I recall, when the problem first cropped up, it was only newer documents that weren't being indexed. This is pretty bizarre.

And it's exactly the situation you create with your Terminal command.


To automate the running of your script, you might have a look at http://superuser.com/questions/126907/how-can-i-get-a-script-to-run-every-day-on -mac-os-x

I imagine some files might get missed if the Mac isn't running at the appointed hour, but you could have it index all files that are, say, 3 days (or a week) old.

Dec 31, 2015 10:53 AM in response to jpdemers

I was not aware of the different versions of .docx. So you are saying that version 6 never gets indexed? I am using Word 2011 with all the updates. Do you know what version of .docx that makes?


I have already automated my script using crontab. I did not want to post that since I thought it might be too involved for this forum. If you note in the script that I posted that it searches for all .docx files modified in the last 24 hours. I run that script every hour so this way it reaches back overnight when I start up the computer in the morning. Once a month I index all the .docx just to make sure nothing has slipped between the cracks.

Jan 10, 2016 10:32 AM in response to Rumboogy22

The fix that worked for me: remove Microsoft Office.mdimporter from /Library/Spotlight. Voilà: no more indexing loop. I wouldn't have hit upon the solution without reading this thread. The vital link: the mdimporter redundancy of Microsoft Office.mdimporter and Rich Text.mdimporter. Since they both index Word files, and since Console and opensnoop in Terminal both showed errors from Microsoft Office.mdimporter, I pulled the latter from the plug-ins folder and restarted. Spotlight finds Word docs—whether it will always find everything is another question I can't answer—but the pulsing dot in the Spotlight menu bar icon is no more.


I should add that I'm working with an early 2008 Macbook with 10.7.5, the last OS its hardware can run. It's a laptop kept offline at an office mainly for writing. I was asked to look at it because of nagging slowness, possibly after an update to 10.7.5. I did a clean install after I was unable to quell the constant indexing by other means. Spotlight finished its initial work without a hitch, but the looping came back after I added two large folders containing, among other things, numerous Word files, and I was able to stop the indexing loop by putting the two folders into the Spotlight privacy list, a less than ideal solution. None of the usual Spotlight interventions involving Terminal commands stopped the looping. Only yanking Microsoft Office.mdimporter did. Because consequently Spotlight may be … spotty, as a supplement I downloaded the free EasyFind. But as far as I can tell with minimal testing, Word documents appear, and the continual indexing that slows the machine is no longer going on.


So, while I'm not using your script as a solution, your research made it possible to solve this nettlesome problem. Thank you!

Jan 10, 2016 11:47 AM in response to jpdemers

Following your lead, I compared the Microsoft Office.mdimporter from an Office 2011 installation on the Macbook with another from a Macbook Air. Both plug-ins (annoyingly) show version 12.3.0, but the one on the 2008 Macbook running 10.7.5 has a creation date from 2009, whereas the Air's, which is running 10.11.2, has a creation date from 2011. I copied the later plug-in onto the old Macbook and ran Disk Utility to fix permissions. I got an error 5 for the plug-in, but Disk Utility fixed its permissions, so I restarted and forced a Spotlight reindexing with sudo mdutil -E / . Console shows errors loading the later mdimporter plug-in. It seems that on the older hardware and OS, replacing the 2009 plug-in with the one from 2011 only causes the reindexing loop to come back.


Like you and Rumboogy22, I find it puzzling that neither Microsoft nor Apple has dealt with this issue, which a quick web search shows has been vexing users for many a year. I wish I could give you a "This helped me" point for your observations about Spotlight quirks with different Word document versions, but I discovered too late that I can give credit only twice on a thread. However your post was likewise very helpful. Thank you.

Old problem with Spotlight and Word .docx files

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.