Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

How to search files on a Time Machine backup disk only once perfile

Greetings,


Goal: I want to find all user files in a set of Time Machine backups residing on a single disk that contain the string "escalator". Now, there are multiple hard links to each file, so a simple search would search each file multiple times. That would take a very long time. Is there a way that I can search (grep) all text files and mail files only once and list those that contain the string? Note that I only want to search user directories; that is,


/Volumes/Time Machine Backups/Backups.backupdb/username’s iMac/Latest/Macintosh HD/Users


where "Latest" can also be any of the dates of all the backups.


Thanks!

Posted on Aug 17, 2014 11:08 AM

Reply
5 replies

Aug 17, 2014 12:32 PM in response to betaneptune

Well, to clarify:


I want to find all user text files and mail messages on a Time Machine backup disk that contain the word "escalator".


By user files I mean those in


/Volumes/Time Machine Backups/Backups.backupdb/<username>iMac/*/Macintosh HD/Users


and subdirectories thereof.


I was trying to use the find command, but it was taking a very long time, so I aborted it. I then realized that most of the files have many hard links, and therefore will be searched many times. That, of course, is a great waste.


Is there some way to search each file only once?


Thanks.

Jul 22, 2015 6:57 PM in response to brianbaughn

Sorry I haven't already posted it. But when I solved this problem it might have already been months later and I didn't think about my post. I don't really know. Anyway, I have since found the answer . . .


What you need to do is to add -atime +1 to your find command. But first you have to not read or change anything on the disk for 24 hours! I figure that on the first pass of a file its atime is updated to the current time. Then on the second and subsequent passes (due to the file's "extra" hard links) the find command sees that the file was last accessed less than a day ago and therefore skips it. I'd think this is at least useful to avoid searching large files repeatedly. I don't know how much overhead you get when a file is skipped and therefore can't tell you how much time it saves when skipping the smaller files.


Note: If your search takes more than 24 hours you'll probably begin to search at least some of the files again, as they will have been read more than 24 hours prior.


I can't guarantee the accuracy of this, but it seemed to work fine for me. It still takes a while!


Good luck on your search.

Jul 22, 2015 6:58 PM in response to brianbaughn

Sorry I haven't already posted it. But when I solved this problem it might have already been months later and I didn't think about my post. I don't really know. Anyway, I have since found the answer . . .


What you need to do is to add -atime +1 to your find command. But first you have to not read or change anything on the disk for 24 hours! I figure that on the first pass of a file its atime is updated to the current time. Then on the second and subsequent passes (due to the file's "extra" hard links) the find command sees that the file was last accessed less than a day ago and therefore skips it. I'd think this is at least useful to avoid searching large files repeatedly. I don't know how much overhead you get when a file is skipped and therefore can't tell you how much time it saves when skipping the smaller files.


Note: If your search takes more than 24 hours you'll probably begin to search at least some of the files again, as they will have been read more than 24 hours prior.


I can't guarantee the accuracy of this, but it seemed to work fine for me. It still takes a while!


Good luck on your search.

Jul 26, 2015 9:57 AM in response to brianbaughn

Here's an example of the command to use:


find . /Volumes/My\ Book\ for\ Mac/ -name '*.txt' -atime +1 -ls -exec grep alato {} ';' > my-book-for-mac-escalator-search-take1.out


This command searches all *.txt files on the disk for alato. (I'm looking for all files that contain the word escalator, but am using "alato" to make the search go faster. The search string "alato" is just long enough to keep the number of false hits down to a reasonably small number. I'm not sure how much time you save with this trick, though. And it's too time-consuming to test it. [Note that you also have to keep in mind that recently read files are cached, which speeds up the next search considerably, thereby thwarting your test.])


I tested this "-atime +1" trick and it works!


One of the files in the output has an inode number of 11570349. If I search for that with the command that uses "atime +1" I see it only once. If I search for it with a command like the one above, but without "atime +1" I get 115 hits. Unfortunately, the one with "atime +1" took 11h42m, while the one without took 13h01m. So the overhead per hard link is significant. OTOH, you get an output file without multiple hits in it.


With "atime +1":


find . /Volumes/My\ Book\ for\ Mac/ -name '*.txt' -atime +1 -ls -exec grep alato {} ';' > my-book-for-mac-escalator-search-take1.out


new-host:ESCALATOR SEARCH root# grep 11570349 my_book_for_mac_escalator_search_take1.out | wc -l

1


Without "atime +1":


find . /Volumes/My\ Book\ for\ Mac/ -name '*.txt' -ls -exec grep alato {} ';' > my-book-for-mac-escalator-search-noatime.out


new-host:ESCALATOR SEARCH root# grep 11570349 my_book_for_mac_escalator_search_noatime.out | wc -l

115


The file sizes:


-rw-r--r-- 1 root staff 144463769 Jul 26 01:45 my_book_for_mac_escalator_search_noatime.out

-rw-r--r-- 1 root staff 3455675 Jul 24 08:24 my_book_for_mac_escalator_search_take1.out


If there's a way to search by inode you'd be guaranteed to search each file only once. But then you have to find a hard link that goes with that inode. I don't know if that would help much.

How to search files on a Time Machine backup disk only once perfile

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.