Q: Check for duplicate files
I´ve seen "duplicate" scenarios before, even a bunch of mac apps that helps to find and/or get rid of duplicate files in the hard disk in order to gain some space. That is not my case this time though, and by the way, this is NOT a weird-o case. I´m sure many of us found our selfs in a similar situation every once in a while.
So please, keep on reading.
I have a (1) small folder full of (hundreds) single files with no sub-folder system in it, and another (2) big folder full of (thousands) of files, folders, subfolders and more files....arranged in complex directory scheme.
Most of the files in folder (1) have a copy in folder (2), and that is OK. In fact what I need is to make sure that ALL files in folder (1) have an exact copy in folder 2 regardless of it´s heritage.
I can go thru the process of checking each file, one by one, and once finding its copy in folder (2), delete it in folder (1). That way I will end up with a tiny little folder (1) with only the few files which´s copy couldn't be found in folder (2).
Is there any way to automate the process?, using automator perhaps?, do you know an app that can help me achieve that?
Thanks a lot!
Posted on Jun 21, 2016 11:53 AM
The following script (minimally tested!) should do what you want. Copy the script into a new Script Editor document and run it (the delete command is actually a misnomer... it only moves the files to the trash so there's still a chance of recovery.
set folder1 to (choose folder with prompt "Please select the folder to be cleaned")
set folder2 to (choose folder with prompt "Please select the folder to compare")
-- fast way to get a list files in a directory
set fileList to do shell script "/usr/bin/find " & quoted form of POSIX path of folder2 & " -type f -exec basename {} \\;"
set fileList to paragraphs of fileList
tell application "Finder"
set folder1Files to every file of folder1
repeat with eachFile in folder1Files
set fName to name of eachFile
if fName is in fileList then
-- we have a filename match
set f1Size to size of eachFile as integer
set matchingf2File to do shell script "/usr/bin/find " & quoted form of POSIX path of folder2 & " -type f -name " & quoted form of fName & " -size " & f1Size & "c"
if matchingf2File is not "" then
-- we have a duplicate, so:
delete eachFile
end if
end if
end repeat
end tell
It probably needs some explanation...
It starts off by prompting for two folders - the first should be the one that contains the flat directory that you want to clean up (folder1). The second should be the one with the hierarchal directories you want to search in (folder2).
set folder1 to (choose folder with prompt "Please select the folder to be cleaned")
set folder2 to (choose folder with prompt "Please select the folder to compare")
Then it uses a shell command find to get a listing of all the files in folder2. I do this because the Finder is notoriously slow in traversing large directory trees, so even though using the Finder would be simpler, it would be much slower.
set fileList to do shell script "/usr/bin/find " & quoted form of POSIX path of folder2 & " -type f -exec basename {} \\;"
set fileList to paragraphs of fileList
Now I iterate through the files in folder1, checking the name of the file against the cached list of files in folder2.
set folder1Files to every file of folder1
repeat with eachFile in folder1Files
set fName to name of eachFile
if fName is in fileList then
If there are no matches I move on to the next file, otherwise we at least have a file that has the same name, so we need to check its size.
set f1Size to size of eachFile as integer
set matchingf2File to do shell script "/usr/bin/find " & quoted form of POSIX path of folder2 & " -type f -name " & quoted form of fName & " -size " & f1Size & "c"
Here I use another shell trick. I first get the size of the current file. I then perform another find to find a file that has the same size as the current find. If I get back an empty list I know the file sizes are different, so I leave the file alone, but if the file sizes match I know it's safe to delete the file.
I know this may sound convoluted, but for a large directory tree, with a large number of files in folder1, it would be cumbersome/unwieldy to perform a full depth traversal of folder2 for every file, so I first cache the list of file names and just do a secondary search for those that have matching filenames.
Posted on Jun 23, 2016 1:00 AM