Q: Old problem with Spotlight and Word .docx files
There is an old problem with Spotlight not finding Word .docx files. You can find old discussions about this from 2008. But despite this problem being ancient it has never been solved.
The essence of the issue is that Spotlight relies on a variety of software modules to index files. Spotlight chooses the appropriate module to index each different type of file. These indexing modules all have the file suffix .mdimporter. There name of the module for indexing Microsoft office files is "Microsoft Office.mdimporter" and it can be found with some other .mdimporter modules at /Library/Spotlight/. The"Microsoft Office.mdimporter" module has copyright info from Microsoft inside its package so we know who wrote it. But this module is included with all new Macs by Apple. Perhaps this is why this problem has never been fixed - the software is written by Microsoft but delivered by Apple.
The symptoms of the problem are that Spotlight will not find content in some .docx files. It is not clear why it fails on some .docx files but not on others. This problem does not seem to affect other Microsoft formats such as .doc, .xls, .xlsx, etc.
Given that Apple wants to sell Macs to business, and that businesses make heavy use of MS Office typically, one would think that this would have been a priority to fix. But given that this problem is 7 years old and still unsolved I guess not.
Any suggestions on work-arounds would be welcome.
OS X Mountain Lion, latest release of OS X
Posted on Dec 7, 2015 11:09 AM
Upon further investigation I found out that the problem is not with "Microsoft Office.mdimporter" as I had thought. If one runs the mdimport -d1 <Word file.docx> you can see what importer is being used. It turns out for .docx files the importer is called "RichText.mdimporter" which which has a copyright by Apple. And if you use mdimport the file will then be found by Spotlight. So the issue is not the importer but rather whatever is triggering the importer to run on modified .docx files. So much for blaming Microsoft for this one...
As a work-around I force indexing of my .docx files with a Bash script that is launched by cron. The script indexes all the .docx that were modified in the last 24 hours and is launched hourly by cron. If anyone is interested in the details just post here and I will include the script in another post.
Posted on Dec 14, 2015 2:15 PM