Rumboogy22

Q: Old problem with Spotlight and Word .docx files

There is an old problem with Spotlight not finding Word .docx files.  You can find old discussions about this from 2008.  But despite this problem being ancient it has never been solved.

 

The essence of the issue is that Spotlight relies on a variety of software modules to index files.  Spotlight chooses the appropriate module to index each different type of file.  These indexing modules all have the file suffix .mdimporter.  There name of the module for indexing Microsoft office files is "Microsoft Office.mdimporter" and it can be found with some other .mdimporter modules at /Library/Spotlight/.  The"Microsoft Office.mdimporter" module has copyright info from Microsoft inside its package so we know who wrote it.  But this module is included with all new Macs by Apple.  Perhaps this is why this problem has never been fixed - the software is written by Microsoft but delivered by Apple.

 

The symptoms of the problem are that Spotlight will not find content in some .docx files.  It is not clear why it fails on some .docx files but not on others.  This problem does not seem to affect other Microsoft formats such as .doc, .xls, .xlsx, etc.

 

Given that Apple wants to sell Macs to business, and that businesses make heavy use of MS Office typically, one would think that this would have been a priority to fix.  But given that this problem is 7 years old and still unsolved I guess not.

 

Any suggestions on work-arounds would be welcome.

OS X Mountain Lion, latest release of OS X

Posted on Dec 7, 2015 11:09 AM

Close

Q: Old problem with Spotlight and Word .docx files

  • All replies
  • Helpful answers

Page 1 Next
  • by Rumboogy22,Solvedanswer

    Rumboogy22 Rumboogy22 Dec 14, 2015 2:15 PM in response to Rumboogy22
    Level 1 (0 points)
    Dec 14, 2015 2:15 PM in response to Rumboogy22

    Upon further investigation I found out that the problem is not with "Microsoft Office.mdimporter" as I had thought.  If one runs the mdimport -d1 <Word file.docx> you can see what importer is being used. It turns out for .docx files the importer is called "RichText.mdimporter" which which has a copyright by Apple.  And if you use mdimport the file will then be found by Spotlight.  So the issue is not the importer but rather whatever is triggering the importer to run on modified .docx files.  So much for blaming Microsoft for this one...

     

    As a work-around I force indexing of my .docx files with a Bash script that is launched by cron.  The script indexes all the .docx that were modified in the last 24 hours and is launched hourly by cron.  If anyone is interested in the details just post here and I will include the script in another  post.

  • by jpdemers,

    jpdemers jpdemers Dec 28, 2015 11:16 AM in response to Rumboogy22
    Level 1 (41 points)
    Mac OS X
    Dec 28, 2015 11:16 AM in response to Rumboogy22

    This has been a huge PITA for years - I'd love to have a solution!

     

    The problem, as I understand it, is that while you can force Spotlight to index a docx file, the mdimporter can't distinguish the actual content from all the XML gibberish that it's wrapped in.  If you've solved that, you're a hero.  (I think "some" files are found because bits of content have made it into the metadata - although I haven't tested this theory.)

  • by Rumboogy22,

    Rumboogy22 Rumboogy22 Dec 28, 2015 11:45 AM in response to jpdemers
    Level 1 (0 points)
    Dec 28, 2015 11:45 AM in response to jpdemers

    To force indexing of all your .docx files open the Terminal app and cut and paste the following line:

     

    find ~ -type f -name '*.docx' -print -exec mdimport -d1 {} \;

     

    What will happen is that it will search for all files ending in .docx within your home directory.  For each file it will run the mdimport command which will index (or import in Apple terminology) the file.  In the terminal window you will see the result of it indexing each file.  When this is done (when you get back to the command prompt) you can quite the Terminal app.  If you have a lot of .docx files this will take some time.

     

    This is intended to be a one-time operation because it takes so long.  In another post I will share some code to index just the recently changed .docx files.

  • by Rumboogy22,

    Rumboogy22 Rumboogy22 Dec 28, 2015 12:04 PM in response to Rumboogy22
    Level 1 (0 points)
    Dec 28, 2015 12:04 PM in response to Rumboogy22

    Once all the .docx files have been indexed then you need a way to keep the recently changed ones indexed.  The following is a small script program which does this.

     

    Open the TextEdit app and cut and paste the following indented text into its window:

     

    #!/bin/bash

    # Forces Spotlight to re-indexing of .docx files in a range of modification/creation times.

    # Does this by running the "mdimport" command on the appropriate .docx files.

    # THIS IS INTENDED TO BE RUN AT LEAST ONCE EVERY 24 HOURS.

     

    # Setup time stamp file.

    TIME_STAMP=~/.time_stamp_file.txt

    # Set time stamp file to current date/time.

    touch "$TIME_STAMP"

    # Adjust the date back in time 24 hours.

    touch -A -240000 "$TIME_STAMP"

     

    # Index .docx files that were modified since the time stamp file.

    find ~ -type f -name '*.docx' -newer "$TIME_STAMP" -print -exec mdimport -d1 {} \;

     

    exit

     

    Then save this file with the name Spotlight_Reindex_DOCX.command.  Make sure that TextEdit does not add a .txt suffix to the file.  Also make sure that this file was saved to your home directory.

     

    Open a Terminal app window and cut and paste the following command:

    chmod +x ~/Spotlight_Reindex_DOCX.command

     

    Now the program can be run.  Just double click the file to execute it.  It will index all the .docx files that were modified in the last 24 hours.  It will show you all the files that it has indexed.  Once the script is done running, as indicated by the line "[Process completed]", you can close the Terminal window that was opened by the script.

  • by jpdemers,

    jpdemers jpdemers Dec 28, 2015 12:47 PM in response to Rumboogy22
    Level 1 (41 points)
    Mac OS X
    Dec 28, 2015 12:47 PM in response to Rumboogy22

    Believe it or not ... just on a whim, I copied the 2009 version of Microsoft Office.mdimporter into ~/Library/Spotlight, replacing the 2011 version that's been there since, I assume, 2011.  (The 2009 version comes via the accumulated MS updates to Office 2004; the 2011 version via Office 2008.)

     

    Forced a re-index, and IT WORKS!

     

    (This is on an iMac running El Capitan, so ymmv.  I can email you the file if you want to give it a shot.)

  • by Rumboogy22,

    Rumboogy22 Rumboogy22 Dec 29, 2015 1:03 AM in response to jpdemers
    Level 1 (0 points)
    Dec 29, 2015 1:03 AM in response to jpdemers

    From what I can tell it is the RichText.mdimporter that is used in indexing .docx files.  If you run the mdimport -d1 command on one of your .docx files what importer does it say it is using?  Below is an example of what I get.  Notice that it says is is using RichText.mdimporter (bolded in output)

     

    Terminal command:

    mdimport -d1 /Users/AAW/Downloads/test.docx

    Output of command:

    2015-12-29 00:59:38.755 mdimport[9479:543385] Imported '/Users/AAW/Downloads/test.docx' of type 'org.openxmlformats.wordprocessingml.document' with plugIn /System/Library/Spotlight/RichText.mdimporter.

  • by jpdemers,

    jpdemers jpdemers Dec 31, 2015 9:28 AM in response to Rumboogy22
    Level 1 (41 points)
    Mac OS X
    Dec 31, 2015 9:28 AM in response to Rumboogy22

    Same as you get:  RichText.mdimporter

     

    Not at all clear why swapping in the older Microsoft Office.mdimporter had an effect.

    Also odd is that Spotlight finds versions 1-5 of a docx document, spanning a period of about two years ... yet for version 6 (a day old), only the "AutoRecovery save" file is found. (This is typical - the AutoRecovery files have always been indexed.)

    Running sudo mdutil -E / to force another re-indexing doesn't alter anything; v.6 is still missing from the index.

     

    As I recall, when the problem first cropped up, it was only newer documents that weren't being indexed. This is pretty bizarre.

    And it's exactly the situation you create with your Terminal command.

     

    To automate the running of your script, you might have a look at http://superuser.com/questions/126907/how-can-i-get-a-script-to-run-every-day-on -mac-os-x

    I imagine some files might get missed if the Mac isn't running at the appointed hour, but you could have it index all files that are, say, 3 days (or a week) old.

  • by Rumboogy22,

    Rumboogy22 Rumboogy22 Dec 31, 2015 10:53 AM in response to jpdemers
    Level 1 (0 points)
    Dec 31, 2015 10:53 AM in response to jpdemers

    I was not aware of the different versions of .docx.  So you are saying that version 6 never gets indexed?  I am using Word 2011 with all the updates.  Do you know what version of .docx that makes?

     

    I have already automated my script using crontab.  I did not want to post that since I thought it might be too involved for this forum.  If you note in the script that I posted that it searches for all .docx files modified in the last 24 hours.  I run that script every hour so this way it reaches back overnight when I start up the computer in the morning.  Once a month I index all the .docx just to make sure nothing has slipped between the cracks.

  • by Jonathan Brown,

    Jonathan Brown Jonathan Brown Jan 10, 2016 10:32 AM in response to Rumboogy22
    Level 1 (25 points)
    Jan 10, 2016 10:32 AM in response to Rumboogy22

    The fix that worked for me: remove Microsoft Office.mdimporter from /Library/Spotlight. Voilà: no more indexing loop. I wouldn't have hit upon the solution without reading this thread. The vital link: the mdimporter redundancy of Microsoft Office.mdimporter and Rich Text.mdimporter. Since they both index Word files, and since Console and opensnoop in Terminal both showed errors from Microsoft Office.mdimporter, I pulled the latter from the plug-ins folder and restarted. Spotlight finds Word docs—whether it will always find everything is another question I can't answer—but the pulsing dot in the Spotlight menu bar icon is no more.

     

    I should add that I'm working with an early 2008 Macbook with 10.7.5, the last OS its hardware can run. It's a laptop kept offline at an office mainly for writing. I was asked to look at it because of nagging slowness, possibly after an update to 10.7.5. I did a clean install after I was unable to quell the constant indexing by other means. Spotlight finished its initial work without a hitch, but the looping came back after I added two large folders containing, among other things, numerous Word files, and I was able to stop the indexing loop by putting the two folders into the Spotlight privacy list, a less than ideal solution. None of the usual Spotlight interventions involving Terminal commands stopped the looping. Only yanking Microsoft Office.mdimporter did. Because consequently Spotlight may be … spotty, as a supplement I downloaded the free EasyFind. But as far as I can tell with minimal testing, Word documents appear, and the continual indexing that slows the machine is no longer going on.

     

    So, while I'm not using your script as a solution, your research made it possible to solve this nettlesome problem. Thank you!

  • by Jonathan Brown,

    Jonathan Brown Jonathan Brown Jan 10, 2016 11:47 AM in response to jpdemers
    Level 1 (25 points)
    Jan 10, 2016 11:47 AM in response to jpdemers

    Following your lead, I compared the Microsoft Office.mdimporter from an Office 2011 installation on the Macbook with another from a Macbook Air. Both plug-ins (annoyingly) show version 12.3.0, but the one on the 2008 Macbook running 10.7.5 has a creation date from 2009, whereas the Air's, which is running 10.11.2, has a creation date from 2011. I copied the later plug-in onto the old Macbook and ran Disk Utility to fix permissions. I got an error 5 for the plug-in, but Disk Utility fixed its permissions, so I restarted and forced a Spotlight reindexing with sudo mdutil -E / . Console shows errors loading the later mdimporter plug-in. It seems that on the older hardware and OS, replacing the 2009 plug-in with the one from 2011 only causes the reindexing loop to come back.

     

    Like you and Rumboogy22, I find it puzzling that neither Microsoft nor Apple has dealt with this issue, which a quick web search shows has been vexing users for many a year. I wish I could give you a "This helped me" point for your observations about Spotlight quirks with different Word document versions, but I discovered too late that I can give credit only twice on a thread. However your post was likewise very helpful. Thank you.

  • by fiberhome,

    fiberhome fiberhome Feb 9, 2016 8:50 AM in response to Jonathan Brown
    Level 1 (4 points)
    Mac OS X
    Feb 9, 2016 8:50 AM in response to Jonathan Brown

    I think the reason neither Apple nor Microsoft has done anything about the failure of Spotlight to index docx documents is that it's not an obvious problem for many people who continue using doc files rather than docx, and it's not obvious even if you do use docx files unless you happen to be looking for something that you know is only in a docx file.

     

    In any case, I've run into the problem using Mavericks, and I'm curious about fixes that don't requiring setting up special scripts.

  • by bjfromvelp,

    bjfromvelp bjfromvelp Feb 9, 2016 1:12 PM in response to Jonathan Brown
    Level 1 (0 points)
    Feb 9, 2016 1:12 PM in response to Jonathan Brown

    Problem solved here, (El Capitan & office 2016 on mbp 2012);

    Removed Microsoft Office.mdimporter from /Library/Spotlight, added Macintosh HD to privacy tab of spotlight preferences, and reversed this last action.

    Now all docx files are found by spotlight, old, edited and new.

  • by Rumboogy22,

    Rumboogy22 Rumboogy22 Feb 9, 2016 1:44 PM in response to Jonathan Brown
    Level 1 (0 points)
    Feb 9, 2016 1:44 PM in response to Jonathan Brown

    Jonathan,

     

    Thanks for your responses.  You have gone much deeper with this than I.  I have had success using the script that I posted on Dec 28, 2015.  This is automated in crontab as I mentioned so I don't think about it anymore.  And the problem has been gone for months now so I guess this is a solution that is proven to work (on 10.11 at least).

     

    I don't intend to change versions of Office until I am absolutely forced to when support stops on Office 2011.  So I guess I will just stick with what I have until that time then revisit this problem.

  • by fiberhome,

    fiberhome fiberhome Feb 10, 2016 12:42 PM in response to bjfromvelp
    Level 1 (4 points)
    Mac OS X
    Feb 10, 2016 12:42 PM in response to bjfromvelp

    bjfromvelp's process worked for me, and I'm on Mavericks and office 2011 on a 2010 MacMini.

    I removed Microsoft Office.mdimporter from /Library/Spotlight (keeping a copy on the desktop just in case)

    added my Documents folder to the privacy tab of spotlight (to avoid searching a large file of PDFs in other files)

    Then removed my Documents folder from the privacy tab.

    Spotlight pondered a bit, and now seems to index both .doc and .docx documents, as well as RTFs and texts.

    So far, so good. Maybe something will disappear, but that seems to solve it. Thanks.

Page 1 Next