Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Can I copy files – but with certain restrictions?

I have just finished a large project, the archives of which involves about 5000 "base" files, stored on about 80 CDs and 50 DVDs, involving about 50,000 files in total. Each of the "base" files may have had up to 30 incremental versions. i.e. a certain text file may have undergone revision 23 times, and each revision was saved and archived to (probably) a different disk, with a different suffix – a, b, c and so on. But sometimes the suffix didn't change even though the file was edited. I might have done a bit more dust removal on an image and just overwrote the old file (already archived), and so the new one was archived on a different disk.

I now have 130 disks from which I would like to extract all the files and collapse them to one large archive that will probably span about 20 disks by the time I delete some files not needed. That way I can easily search for all versions of, say, GB097, by going to the particular DVD that has the "G" files on it. Up would come:

GB097
GB097a
GB097b
GB097b-1
GB097b-2
GB097c
... and so on.

This is what I would like to do:

1. Grab the first archive disk, open every folder, and copy all the files to the one folder on a hard drive.

2. Open the second disk and repeat step (1), but with these two provisos.

(a) If a file is identical to a previously copied file (maybe I archived it twice), the file isn't copied. However...

(b) If a file has the same name as a previously copied file, but the data within that file is different (i.e. I removed some dust from an image file, but left the name unchanged), I'd like that file to be copied with a numbered suffix, the same way that Trash treats identically named files.

Any suggestions how I could do this?

G5 iSight, Mac OS X (10.4.11)

Posted on May 18, 2010 3:27 AM

Reply
92 replies

May 23, 2010 6:47 PM in response to Guy Burns

I couldn't get consolidate8 to do anything at all. It is within a folder on the desktop along with my test files.

1. I double clicked on consolidate8.command as downloaded. Terminal came back with a message: " The .command file ... could not opened. Most likely it is not executable."

2. I opened consolidate8.command in iText Express and saved it as plain text. It became consolidate8.txt.

3. I dragged that file to Terminal, added a space, dragged in a test file, and Terminal responded with "Permission denied".

4. As a check that something hasn't gone haywire elsewhere, I tried using consolidate6 as per step3, and it worked okay.

5. I know next to nothing about Terminal, but I thought I'd look at consolidate6 and consolidate8 side by side to see if there was anything really obviously wrong with the latter. Nothing stuck out, but I did see they were different near the start. Consolidate8 has:

echo
echo "Consolidate script"
echo

whereas 6 didn't. So I copied those three lines to 6, saved it, tried to run it and it came back "permission denied".

I don't know why those echos cause a problem, but they seem to. When I removed them and tried to run consolidate8, permission was still denied.

This is Terminal's output:

+Last login: Mon May 24 11:13:07 on ttyp1+
+Welcome to Darwin!+
+jenny-pearces-imac-g5:~ Jenny$ /Users/Jenny/Desktop/Consolidate\ Test\ Files/consolidate8.txt /Users/Jenny/Desktop/Consolidate\ Test\ Files/Test\ Folder/+
+-bash: /Users/Jenny/Desktop/Consolidate Test Files/consolidate8.txt: Permission denied+
+jenny-pearces-imac-g5:~ Jenny$+

May 23, 2010 6:53 PM in response to Guy Burns

Sorry, I forgot to zip it up! The execute bit was not preserved since it wasn't zipped, to fix it, run Terminal and type
chmod +x
followed by a space, then drag the consolidate8.command file to the Terminal window, click back in the window and press return. That sets the execute bit for the file to allow the system to execute (run) it.

Here's 9, zipped. I made a very minor change between 8 and 9 so they are almost identical.

http://www.mediafire.com/?0n2qztmwzzw

May 23, 2010 10:17 PM in response to JulieJulieJulie

I am still testing consolidate9. In the meantime, three more questions.

Q1: To save me watching the progress of my tests for timing purposes, I have started to put a "date" at the start and end of the script, then I can work out the run time. Is there a simple statement like "Start Timer" and "Stop Timer" that will output the time the script ran?

Q2: When I run the script a second time on a folder, it is much faster. Is this because whatever checking was done the first time is not duplicated the second time (because certain files have already been deleted, for instance), or because the script looks up the index files that are added to the folder and uses the information in there?

Q3: Running the script once on all my archive DVDs may take 5-6 hours. I will probably copy 5-10 disks a day in the background, and I'm thinking it may be best to run the script each day. Would running the script each time a DVD is added result in more, but shorter script-runs?

May 24, 2010 10:00 AM in response to Guy Burns

If you run the script from the command-line (instead of double-clicking) you could preface the line with 'time ' to have the system time it.

As in:
time <drag consolidate9.command into the window> <drag your folder to be scanned into the window>

I'll add time output to 10. 🙂


The increased speed on the second run is probably because all duplicates have been removed and the checksum comparison is therefore very fast (since no two files still have the same checksum, there isn't much to compare...) The index file is not re-used because it references all the files including those that were removed. It's there for you in the event that you discover a file is missing because it was removed as a duplicate, you can open the Index_Archive.txt and command-f to find the file by name, copy it's checksum and command-f command-v to find what other file had the same content.

Running the script each time a disk is added should result in more total time than a single run. It has the md5 utility produce a checksum for every file or bundle in the folder, every time - so running it after each disc for 10 discs, the non-duplicate files from the first disc would have their md5 sum calculated 10 times, the files from the second disc 9 times etc. as opposed to all files only one time.

Message was edited by: JulieJulieJulie

May 24, 2010 11:01 AM in response to JulieJulieJulie

Consolidate10

Shows elapsed time in seconds since start of script after each group of operations. Sorts and merges Index.txt and Index_Archive.txt leaving only Index_Archive.txt.

http://www.mediafire.com/?o2konivyhnt

Also, in regards to running the script after each disc - it wouldn't hurt anything aside from total time although it would affect the renaming of files. After disc 1, a file with a duplicate name such as "name.txt" would become name #2.txt. If the same file name exists on disc 3 then that file would become name #3.txt.

Whereas if the script is run only once after all discs are copied, then it is possible that the file from disc 3 might become name #2.txt instead of name #3.txt (it depends entirely upon the alphabetical order of the folders enclosing the files. If each disc were copied into a folder by disc number, then it should result in the same outcome as running the script after each disc is copied.)

Actually now that I think about it, the list is sorted by checksum so no, alphabetical order isn't the primary key. Hmm.

Message was edited by: JulieJulieJulie

May 24, 2010 11:31 AM in response to JulieJulieJulie

Consolidate11

http://www.mediafire.com/?mnzidgzw2wi

In looking at the renaming routine I discovered an error and fixed it. Essentially it was not updating the changed name in the list so a situation would occur where a file would be renamed to "file #3.txt" and then renamed again to "file #2.txt" because the script was still using the original name (file.txt) for comparison but the correct filepath "/some/where/file #3.txt".

Also, the list of names -is- sorted alphabetically (the list is sorted for checksum comparison first, then re-sorted for names.)

May 24, 2010 8:25 PM in response to JulieJulieJulie

Comments on Consolidate11.

1. The time function might be better started from when the processing of the files begins, not from when the script starts.

2. I have uploaded a folder that is presenting problems for the script:

http://www.mediafire.com/?2znljmrztll

This is a subset of a larger folder (about 200 MB) that took ~10 minutes to process in one of the earlier versions of consolidate. The folder contains GIS (Graphical Information System) data for generating maps with programs like QGIS ( http://www.qgis.org/). I used such data for generating maps in my books. The problem is: the whole folder appears to be ignored. I have placed a duplicate PDF file within a subfolder and both are still present at the end of the run.

After looking at the script I now know why the folder was ignored -- it has a dot in the name. However, the folder is still useful for testing, because it can be used to illustrate another problem (see point 3).

If you want a simple introduction to QGIS, download this file: http://www.mediafire.com/?4me1ttbvgb2

3. Further on mapping data: I ran the script over my entire folder called "Mapping" and at the end of the run a few thousand of the "shapefiles" had been lifted out of their folders to the top folder. Not all of them, just some. This means that those uplifted shapefiles can no longer be used to draw maps because the data has to remain inside defined folders.

You can see the effect by running the script over the folder in point 2.

I don't think there would be any easy way of overcoming this type of problem, except by leaving the files in their original folders or by asking the user right at the end of the process: "Do you want to move files to the top folder" -- with a stern warning, repeated twice, that moving files may cause some files to become non-functioning.

Overall, I think moving files to the top folder automatically may be undesirable, but it would be good to have the option there under control of the user.

To stop the script uplifting to the top folder, I assume I delete this part of the script:

####### Moving items to top-level
...
find -d "${MainFolder}" -type d -empty -delete 2>/dev/null

May 24, 2010 9:46 PM in response to Guy Burns

- The script will treat any folder with a period in it's name as a bundle.
- It will not individually checksum, compare, rename, or move any file within a bundle.
- It will checksum, compare, rename and move the whole bundle as one item.

So if you run the script on a single folder named "Australia 2.5M Test", not much happens because the entire source folder is treated as a bundle (and thus a duplicate PDF within that folder is never compared to a single-file PDF outside that folder.) Since there is no other item to compare with the one source folder bundle, nothing is moved, renamed etc. Basically nothing happens.

I created a test folder named "JJJExampleFolder" with two subfolders into each of which I copied two of your mapping folders as examples.
-->"The files in these folders will be moved to the top level"
(contains "Alick Creek" and "Beames Brook" folders)
and
-->"These folders will be moved to the top level intact"
(contains "Albert River 2.5M" and "Alice River 2.5M" folders.)

A 'before' picture of the folder may be viewed at the following URL:
http://img571.imageshack.us/img571/6950/beforeg.png

An 'after' picture of the folder may be viewed at the following URL:
http://img532.imageshack.us/img532/3712/afterp.png

What happened in between was that when run, the script first looked for individual files while ignoring any folders which had periods within their filename... so the "Albert River 2.5M" and "Alice River 2.5M" folders were ignored during the finding of individual files. The files within the "Alick Creek" and "Beames Brook" folders were found and individually checksummed, then added to the list of items to process as individual files.

Next, the script looked for bundles/packages and found the "Albert River 2.5M" and "Alice River 2.5M" folders because they are folders with a period in their names. The script then generated a checksum for the entire content of each folder (instead of for it's individual files) and added each folder and it's total checksum to the list of items to process.

Here is the output from the 'consolidate12' run for this example folder:

./consolidate11.command /testpix/JJJExampleFolder/

Consolidate12 script revised 2010-05-24 8:00 P.M.

--> Finding files and ignoring any folder whose name contains a period and it's content (potential packages).
Elapsed time in seconds since start of script: 0

--> Generating list of MD5 checksums for file content.
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.dbf e7eac6212bbddbb6d529362f89cce800
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.prj e729936bf5360b37a15365fc295a1901
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shp acfee71e413e11896505f23ebc069947
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shx 4742fa4533022e192c40ab85e26ba8f4
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.dbf ab24e4770eaebf7fbc66f7066726ee17
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.prj e729936bf5360b37a15365fc295a1901
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shp e8a455ba8d8b5241f21b4d4ab98e2048
/testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shx 477dfdd0e54ec53a4f8ba5c24152688e
Elapsed time in seconds since start of script: 1

--> Finding potential packages/bundles.
Elapsed time in seconds since start of script: 1

--> Generating list of MD5 checksums for potential packages/bundles.
/testpix/JJJExampleFolder//These folders will be moved to the top level intact/Albert River 2.5M 7237ff1ed62224b18d5392356dfa9b05
/testpix/JJJExampleFolder//These folders will be moved to the top level intact/Alice River 2.5M 23f78fe58cc832dababaa6b894287e71
Elapsed time in seconds since start of script: 2

--> Sorting list of files and potential bundles by checksum.
Elapsed time in seconds since start of script: 2

Checking checksum for: /testpix/JJJExampleFolder//These folders will be moved to the top level intact/Alice River 2.5M
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shx
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shx
Checking checksum for: /testpix/JJJExampleFolder//These folders will be moved to the top level intact/Albert River 2.5M
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.dbf
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shp
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.prj
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.dbf
Checking checksum for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shp
Elapsed time in seconds since start of script: 2

--> Sorting list of names.
Checking name for: /testpix/JJJExampleFolder//These folders will be moved to the top level intact/Albert River 2.5M
Checking name for: /testpix/JJJExampleFolder//These folders will be moved to the top level intact/Alice River 2.5M
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.dbf
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.prj
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shp
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Alick Creek/Alick Creek.shx
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.dbf
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shp
Checking name for: /testpix/JJJExampleFolder//The files in these folders will be moved to the top level/Beames Brook/Beames Brook 2.5M.shx
Elapsed time in seconds since start of script: 3

--> Finding possible packages/bundles and moving them to the top-level.
--> Finding files and moving them to the top-level.
--> Deleting .DS_Store files from subfolders.
--> Deleting empty subfolders.
--> Sorting index file.

DONE.
NOTE: Unknown packages, invisible files, and files with unresolved name conflicts may be left within any remaining subfolders!
Elapsed time in seconds since start of script: 3



So, the mapping project folders which had "2.5M" in their folder names were treated as a bundle. Those which did not have a period in the folder name were dealt with as a subfolder of individual files.

As you can see from the 'after' picture, those files that were dealt with individually were moved up to the top-level but they are still together by name, with all the parts together in the same folder and should still function with the mapping software. They just look less organized.

The two "bundles" were also moved up to the top level as bundles, which is probably what you would want to have happen?

We have several choices:
1] Rely on the user to ensure that all mapping project folder names contain a period.

2] Before other operations, the script could first look for the presence of .shp files within folders which do NOT have 2.5M in their name and when found, determine if all files in that file's folder have the same root name (as in Alick Creek.dbf, Alick Creek.prj, Alick Creek.shp and Alick Creek.shx.) If so, the script can rename the folder by appending "2.5M". Then the rest of the function of the script will handle all the mapping project folders consistently as bundles.

3] At completion of the script, it could offer to find *.shp files in the top-level folder only (not in subfolders), create a new folder using the shp file name without the extension and with 2.5M added on and then move the files with the same name and any extension into that folder. For instance, when the script finds "Alick Creek.shp" in the top-level folder it could make a new folder "Alick Creek 2.5M" and move files "Alick Creek.*" (the asterisk is a wild-card to match anything) into the "Alick Creek 2.5M" folder. This would re-group the mapping project files into their own subfolders.

4] We could leave it as is since the mapping data should still be functional (all required parts are in the same folder - they just have a lot of company.)

What would be best for your situation?

May 24, 2010 10:00 PM in response to JulieJulieJulie

Option #2 actually won't work with some of your other folders such as "contours" because not all of the files have identical names prior to the extension. We could generalize "aus25cgd" with a wild-card after to include "aus25cgd_l" "aus25cgd_p" and "aus25cgd_r" as well, which would work for that folder...

But what if you have a folder containing both "Rivers inAustralia.shp" and "Rivers inNewZealand.shp"? They might end up being treated as one project with the name "Rivers_in 2.5M" even though you may regard them as two projects? Or is their presence in the same folder enough to consider them a single project together?

May 24, 2010 11:30 PM in response to JulieJulieJulie

Consolidate 12

http://www.mediafire.com/?k5dtdkmmk3m

Changes:
--Allows -n switch / option when invoked from the command line, to disable the moving of files.

--Prompts user to allow or disable the moving of files if the option was not specified on the command line.

--If only one bundle (folder with a period in it's name) is found, no checksum will be generated (since there can't possibly be a matching bundle since there is only one.)

--The "Elapsed time in seconds" is now "since start of processing" (After human interaction is complete thus it isn't timing the delay for the user to respond to questions.)

--Presence of index and temporary files is ensured prior to accessing them via 'touch'ing the files first.


Still to resolve: What to do about mapping project folders?

May 25, 2010 12:43 AM in response to Guy Burns

I've just finished a short test of consolidate12 with all four combinations of "dot" or "no-dot" in the folder name, and "yes" or "no" to the uplifting of files. All worked as expected.

This little script (correction - "big" script now) shouldn't be made too specific. It's become a script that could be used generally and shouldn't be limited by my mapping folder data -- or any other data structure. I suggest that it be left as is, and if the user wants to protect a certain folder structure, then they do so by putting a dot in the name, or choose not to move any files at all. Seems to me that the script now has a good amount of flexibility.

However, I think the "turning off" option is asked back to front -- too many negatives:

+Would you like to disable the moving of files and bundles to the top level?+
+Type 'n' to disable this option, otherwise press any key.+

To disable the option, the answer should, strictly speaking, be "y", not "n". The use of negatives can lead to confusion. What about:

+Do you want to move all files and bundles to the top level at the end of the run?+
+Press "n" to NOT move files and keep the folder structure intact; otherwise press any key.+

An accidental wrong key stroke here could mean undesirable folder changes. If "move all files" is selected, it may be a good idea to double check:

+Moving files out of folders can, in some cases, cause problems. Go ahead anyway?+
+Press "n" to NOT move files and keep the folder structure intact; otherwise press any key.+

Message was edited by: Guy Burns

May 25, 2010 1:56 AM in response to Guy Burns

One of the benefits of sharing the source code of a program is that it's easy for others to modify (as opposed to a compiled program with no source code which can be difficult to merely follow the code, let alone change it.) With comments in the script detailing what each segment does it's even easier... so I have no problem with adding specific routines for any given task (which can be enabled/disabled easily enough within the script) and if nothing else, it's a good example for anyone wishing to do the similar tasks. I can think of dozens of times when I've been stuck on a programming issue and found the inspiration for a solution in code posted by someone else for a slightly different purpose. No public effort at resolving a problem, no matter how specific, is wasted. Neither is the experience at figuring out a solution! 🙂

So, if there are situations specific to your work we could add the routines and have them either activated or deactivated with a single question " Run this script in Guy Burns mode? (y/N): "

I agree regarding the choice of -n and the wording for moving files, I was thinking more about the coding than the usage. 🙂 I changed the wording to your suggestion along with a WARNING preceding the question.

Control-c is the break key for command line utilities. (The more traditionally Mac "command-period" also works most of the time.) So even if a user were to answer incorrectly, they need only control-c to stop the script before it starts moving files. Or just close the Terminal window to utterly terminate the script.

I intend to rewrite the main 'find' and checksum routines so that they are only one loop instead of one for individual files and one for bundles.

I hope that it saves you more time on the real run than it has taken for the testing! 🙂

Consolidate13
http://www.mediafire.com/?jz2czmygej2

May 25, 2010 7:18 PM in response to JulieJulieJulie

I have tested consolidate13.

*Times to completion*
1 x 18 files ... 2 seconds (6 removed)
8 x 18 files ... 16 seconds (132 removed)
32 x 18 files ... 65 seconds (564 removed)

One hitch: I did another test on a different folder which has a 16 MB subfolder called Recordings within which are more subfolders containing only PDFs and TextEdit files. There were also 4 invisible files inside Recordings:

+.DS_Store (28 KB)+
+.FBCIndex (1.6 MB)+
+.FBCIndexcopy (1.6 MB)+
.FBCLockFolder

The script moved all my files to the top, but at the end of the run Recordings (now empty of my files) still existed as a 3.2 MB folder containing the two FBCIndex files, but the other two invisible files were at the top level.

I have uploaded the 3.2 MB folder Recordings in case you want to see why this folder wasn't emptied of files: http://www.mediafire.com/?vyywlmm4ii2

May 25, 2010 10:26 PM in response to Guy Burns

The .DS_Store file is intentionally deleted. The system recreates .DS_Store files when the window is closed for any folder whose window settings are changed in the GUI. In other words, all .DS_Store files should have been deleted at the end of the script but by opening and closing the test folder to look inside it, you may have caused OS X to save a new .DS_Store file in that folder. (Check the time on the .DS_Store file and see if it was created -after- the script ran.)

The .FBCLockFolder was moved because I neglected in the command to find and move items to the top level, to specifically say not to move a folder whose name begins with a period as opposed to having a period somewhere in the middle of the name. Fixed in 14.

I did not forget to specifically eliminate the moving of files whose names begin with a period though, and that is why the .FBCIndex files remained (as they should have.)

Those files are Find By Content indexes of the text of documents in the same folder... but they have not been used since before Spotlight was introduced with Mac OS X 10.4 / Tiger. So they are remnants of the prior indexing system used in 10.3 and older. In their particular case, it's safe to delete these invisible files, 10.3 and prior would rebuild them automatically and 10.4 and later won't use them at all, so I will have the script remove them when it deletes the .DS_Store files.

Your .FBCLockFolder probably contains a file named ".FBCSemaphoreFile" which is another invisible file from 10.3 and prior's Find By Content system. It's safe to delete those too and once that is done the folder will be empty and thus deleted automagically.

Consolidate14
http://www.megaupload.com/?d=46FJ4TPM

(MediaFire wasn't loading so I used MegaUpload.)

May 26, 2010 12:24 AM in response to JulieJulieJulie

I'm impressed with your knowledge. Yes, there was a semaphore file inside .FBCLockFolder. I deleted all the files you mentioned, ran consolidate14, and the problem folder was removed.

*Times to completion*
1 x 18 files ... 1 sec (Test x1, 6 removed)
8 x 18 files ... 12 secs (Test x8, 132 removed)
32 x 18 files ... 64 secs (Test x32, 564 removed)
1 x 400 files ... 25 secs (Transcripts, 0 removed)

I'll run the script over some larger test folders tonight, and get back with results.

Can I copy files – but with certain restrictions?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.