Is there a way to make this script using “System Events”?

Question

Level 1

1 points

Is there a way to make this script using “System Events”?

I have this great piece of script.

What it does is it processes the folder, looks in cats_names.txt and if it finds a picture that contains name from this .txt file it will move it to cats_folder. All other pics go to dogs folder. Later I'll have more folders and lists (parrots + names, fishes + names etc).

I just processed 40,000 files in it took about 20 minutes. Is there a way to somehow use System Events or something similar to it? Because I guess that shell part makes it so slow. I cannot even think how long would Finder take to make the same job.

Also I think something in the script is wrong. It assigned name of folder to all the files in it and moved it to right folder, but also it moved 5x more photos that I actually had to one of folders.

property source_folder : alias "path:to:source_folder"
property dogs_folder : alias "path:to:dogs_folder"
property cats_folder : alias "path:to:cats:folder"
property cats_list : alias "path:to:cats_names.txt" as string
process_folder(source_folder)
on process_folder(this_folder)
  set these_items to list folder this_folder without invisibles
  set container_name to name of (info for this_folder)
  repeat with i from 1 to the count of these_items
  set this_item to alias ((this_folder as Unicode text) & (item i of these_items))
  if folder of (info for this_item) is true then
  process_folder(this_item)
  else
  process_item(this_item, container_name, i)
  end if
  end repeat
end process_folder
on process_item(this_item, c, i)
  set i to text -4 thru -1 of ((10000 + i) as text)
  set r to (random number from 0 to 9999)
  set r to text -4 thru -1 of ((10000 + r) as text)
  tell application "System Events"
  set e to name extension of this_item
  set new_name to "" & r & "" & c & "" & i & "." & e
  set name of this_item to new_name
  end tell
end process_item
set args to ""
#repeat with a in {folder_1 list_1 ... folder_n list_n, dogs}
repeat with a in {source_folder, cats_folder, cats_list, dogs_folder}
  set args to args & (a as alias)'s POSIX path's quoted form & space
end repeat
do shell script "/bin/bash -s <<'EOF' - " & args & "
#
# $1 : source_directory
# $2.. : destination_1 list_1 ... destination_n list_n destination_r
#  
# - file in source_direcotry is moved to destination_k if the name containns some name in list_k for k = 1..n
# - file in source_directory which has not been moved to destination_1..n is moved to destination_r if destination_r is given
# - destination_i (i = 1..n, r) cannot be descendant of source_directory
# - if there's no corresponding destination_k for list_k, list_i (i = k..n) is ignored.
# - destination_k and list_k may be either interleaved or separated in arguments list
# 
SOURCE=$1
DEST=() # array of destination directories
LIST=() # array of name list files
shift
for a in \"$@\"
do
  [[ -d $a ]] && DEST+=( \"$a\" )
  [[ -f $a ]] && LIST+=( \"$a\" )
done
# move files in $SOURCE whose name contains some name in ${LIST[i]} to ${DEST[i]}
for (( i = 0; i < ${#LIST[@]}; i++ ))
do
  [[ -z ${DEST[i]} ]] && break # break if no corresponding destination
  awk '1' \"${LIST[i]}\" |
  while read n
  do
  [[ -z \"$n\" ]] && continue # skip blank line
  find \"$SOURCE\" -type f -name \"*$n*\" -print0 | xargs -0 -J% mv % \"${DEST[i]}\"
  done
done
[[ -z ${DEST[i]} ]] && exit # exit if no destination left
# move rest of files in $SOURCE (except for dot files) to ${DEST[i}}
find \"$SOURCE\" -type f ! -name '.*' -print0 | xargs -0 -J% mv % \"${DEST[i]}\"
EOF"
display notification "All images were processed." with title "New" sound name "Glass.aiff"
tell me to quit

By the way System Events' move command will work in this script if you put it somewhere in the 'on process_item'.

MacBook Pro with Retina display

Posted on Nov 20, 2015 12:56 PM

Reply

Answer 1

Hiroto

Level 5

7,461 points

Nov 20, 2015 9:03 PM in response to bagrov

Hello

It's quite unlikely that shell script is slower than System Events in processing file system nodes. Indeed you should try invoking shell script in lieu of System Events.

E.g., try replacing:

process_folder(source_folder) on process_folder(this_folder) -- omitted end process_folder on process_item(this_item, c, i) -- omitted end process_item

with:

(* -- context memo property source_folder : "path:to:source_folder" *) set args to (source_folder as alias)'s POSIX path's quoted form do shell script "/bin/bash -s <<'EOF' - " & args & " # $1 : source_directory # # - file named after name.extension in source_directory is renamed after RRRRnameIIII.extension, where # RRRR is four-digit random number (0000..9999), # IIII is four-digit ordinal index number of the file in its parent directory # - entire tree rooted at source_directory is processed # - dot file is ignored # SOURCE=$1 traverse_l() # lexical order { # $1 : node # $2.. : argv [[ -d $1 ]] || { process_node \"$@\"; return; } local i=0 find \"$1\" ! -name '.*' -depth 1 -print0 | while read -d $'\\0' f do traverse_l \"$f\" $(( ++i )) done } process_node() { # $1 = file # $2 = index in the parent or nil (nil implies 0) n=${1##*/} # file name e=${1##*.} # file name extension d=${1%/*} # parent dn=${d##*/} # parent name n1=$( printf '%04d%s%04d.%s' \"$(jot -r 1 0 9999)\" \"$dn\" \"$2\" \"$e\" ) mv \"$1\" \"$d/$n1\" } traverse_l \"$SOURCE\" EOF"

You may compare the execution time of the original System Events script and this shell script.

Note that this shell script will simulate the behaviour of your original script in getting ordinal index of item in direcotry, that is index is given in lexical order of item name regardless of whether the item is file or direcotry.

E.g., given a.jpg, b_directory, c.jpg in a direcotry named after dirname, a.jpg and c.jpg are renamed to ####dirname0001.jpg and ####dirname0003.jpg respectively, where #### is 4-digit random number, because index 2 is given to b_directory. If you want c.jpg to be reramed to ####dirname0002.jpg, it is easy to amend the script accordingly.

Script is briefly tested under OS X 10.6.8 but no warranties. Please make sure you have backup of files and directories before running this sort of script.

Good luck,

H

Reply

Answer 2

bagrov Author

Level 1

1 points

Nov 21, 2015 1:37 AM in response to Hiroto

Hello Hiroto,

I'm actually worried about existing shell script part.

Is there a way to put it inside of System Events part?

I know that before adding that shell part that you kindly helped me to make it become so slow.

Before that System Events moved lots of files in no time.

When I'm trying to put shell script inside of System Events I get an error:

Can’t make quoted form of «class posx» of alias "path:to:source" into type Unicode text.

Reply

Answer 3

Hiroto

Level 5

7,461 points

Nov 21, 2015 2:57 AM in response to bagrov

Just wonder how many names in name list?

If there're thousands of names to match, the current shell script should be rewritten so as to reduce the number of find(1) invocations. In other words, if [number of files in directory tree] < [number of names in name list], iterating through files should be more efficient than iterating through names.

H

Reply

Answer 4

bagrov Author

Level 1

1 points

Nov 21, 2015 4:01 AM in response to Hiroto

75 right now, will be about 100-150. Later I'll add another .txt files with another 100-150 names in it.

Note that this shell script will simulate the behaviour of your original script in getting ordinal index of item in direcotry, that is index is given in lexical order of item name regardless of whether the item is file or direcotry.

E.g., given a.jpg, b_directory, c.jpg in a direcotry named after dirname, a.jpg and c.jpg are renamed to ####dirname0001.jpg and ####dirname0003.jpg respectively, where #### is 4-digit random number, because index 2 is given to b_directory. If you want c.jpg to be reramed to ####dirname0002.jpg, it is easy to amend the script accordingly.

It doesn't really matter which number will be before and after the name. I just need this random number to not to overwrite files with identical names.

Reply

Answer 5

bagrov Author

Level 1

1 points

Nov 21, 2015 7:03 AM in response to Hiroto

I just tested both scripts again.

There were 50,000 files, 3 folders to move them in (20,000 / 20,000 / 10,000)

Original — 4 min 21 sec (everything was right)

New shell part + old shell part — 7 min 17 sec (somehow lost ca. 2700 pictures and moved bunch of files wrong)

The question is: is 4:21 is a good time for processing 50,000 files?

Reply

Answer 6

bagrov Author

Level 1

1 points

Nov 23, 2015 4:12 AM in response to Hiroto

All right, I believe 4:21 isn't such bad result.

I just have three last questions for you.

1. If I want to give list for every name I have so there's no need to keep 'destination_r' folder, I just remove from script following strings?

# move rest of files in $SOURCE (except for dot files) to ${DEST[i}}

find \"$SOURCE\" -type f ! -name '.*' -print0 | xargs -0 -J% mv % \"${DEST[i]}\"

EOF"

2. What if there are the same names in two lists and I want the same files in 2 or more folders? How do I duplicate files and move them to both for example 'destination_1' and 'destination_2' folders?

3. What if there are the same names in two lists and I want files from source folder to be equally divided between 2 or more folders?

(2 folders = 50% to every destination folder; 3 folders = 33,3% to every destination folder, 4 folders = 25% to every destination folder etc)

Thank you big time.

Reply

Answer 7

Hiroto

Level 5

7,461 points

Nov 23, 2015 1:19 PM in response to bagrov

Hello

Sorry for late reply. I've been stuck with something.

Regarding execution time, it depends upon where 50K files reside – e.g., on internal SSD, internal HDD, external HDD, remote network drive or else – and processor speed. If they're on internal SATA-2 HDD on 2-core or 4-core i7 machine, 300 sec for 50K files is 10 times slower than I'd expect.

Regarding execution result, I noticed some possible reasons for incorrect result, which include a) mismatch between NFD and NFC of HFS+ name is not handled in current find command in shell, b) file name containing backslash (\) and file name ending with white spaces are not handled correctly by the current read command in shell, c) nonexistent name extension can be incorrectly extracted by the current code in shell, e.g., given file name a.b c, "b c" is extracted as extension but should not, d) invisible Icon\r file is not ignored by the current code in shell.

By the way, if you're checking the results in Finder, you'd better restart Finder after running the script. As far as I can tell, Finder is a poor beast and its cache can be out of sync and may not reflect the current state of file system which has been modified via Unix layer. Also note that destination directory may NOT be a descendant of source directory.

Anyway, here're some revised scripts to address the issues a), b), c) and d) noted above in addition to optimisation by reducing the number of subshell invocations in shell script.

1) script to rename files (reduced number of subshell invocations)

set source to "path:to:source" set args to source's quoted form do shell script "/bin/bash -s <<'EOF' - " & args & " # # $1 : source_directory # # - file named after name.extension in source_directory is renamed after RRRRnameIIII.extension, where # RRRR is four-digit random number (0000..9999), # IIII is four-digit ordinal index number of the file in its parent directory # - entire tree rooted at source_directory is processed # - dot file is ignored # - Icon\\r file is ignored # SOURCE=$1 shopt -s nullglob traverse_l() # lexical order { # $1 : node # $2.. : argv [[ -d $1 ]] || { process_node \"$@\"; return; } local i=0 for f in \"$1\"/* do n=${f##*/} [[ $n == Icon$'\\r' ]] && continue traverse_l \"$f\" $(( ++i )) done } process_node() { # $1 = file # $2 = index in the parent or nil (nil implies 0) n=${1##*/} # file name m=${n%.*} # file name w/o extension e=${n#$m} # file name extension including . [[ $e =~ [^.[:alnum:]] ]] && e='' # extension check: it may not contain [^.[:alnum:]] d=${1%/*} # parent dn=${d##*/} # parent name r=0000$((RANDOM % 10000)) j=0000$i n1=\"${r: -4}${dn}${j: -4}${e}\" # new file name mv \"$1\" \"$d/$n1\" } traverse_l \"$SOURCE\" EOF"

2) another script to rename files (new script using find(1) to traverse directory tree)

set source to "path:to:source" set args to source's quoted form do shell script "/bin/bash -s <<'EOF' - " & args & " # # $1 : source_directory # # - file named after name.extension in source_directory is renamed after RRRRnameIIII.extension, where # RRRR is four-digit random number (0000..9999), # IIII is four-digit ordinal index number of the file in its parent directory # - entire tree rooted at source_directory is processed # - dot file is ignored # - Icon\\r file is ignored # SOURCE=$1 shopt -s nullglob while IFS= read -r -d $'\\0' d do (( i = 0 )) for f in \"$d\"/* do [[ -d $f ]] && continue # skip directory n=${f##*/} # file name [[ $n == Icon$'\\r' ]] && continue # skip Icon\\r file m=${n%.*} # file name w/o extension e=${n#$m} # file name extension including . [[ $e =~ [^.[:alnum:]] ]] && e='' # extension check: it may not contain [^.[:alnum:]] dn=${d##*/} # parent name r=0000$((RANDOM % 10000)) j=0000$(( ++i )) n1=\"${r: -4}${dn}${j: -4}${e}\" # new file name mv \"$f\" \"$d/$n1\" done done < <(find -d \"$SOURCE\" -type d -print0) EOF"

3) script to move files (reduced number of find(1) invocations by combining search names into regex pattern)

set source to "path:to:source" set c1 to "path:to:category1" set c2 to "path:to:category2" set c3 to "path:to:category3" set c4 to "path:to:category4" set c1_names to "path:to:category1_names.txt" set c2_names to "path:to:category2_names.txt" set c3_names to "path:to:category3_names.txt" set args to "" repeat with a in {source, c1, c1_names, c2, c2_names, c3, c3_names, c4} set args to args & (a as alias)'s POSIX path's quoted form & space end repeat do shell script "/bin/bash -s <<'EOF' - " & args & " # # $1 : source_directory # $2.. : destination_1 list_1 ... destionation_n list_n destination_r # # - file in source_direcotry is moved to destination_k if the name contains some name in list_k for k = 1..n # - file in source_directory which has not been moved to destination_1..n is moved to destination_r if destination_r is given # - destination_i (i = 1..n, r) cannot be descendant of source_directory # - if there's no corresponding destination_k for list_k, list_k is ignored. # - destination_k and list_k may be either interleaved or separated in arguments list # - list_k is assumed to have text in UTF-8. # - dot file is ignored # - Icon\\r file is ignored # SOURCE=$1 DEST=() # array of destination directories LIST=() # array of name list (in UTF-8 NFD) shift for a in \"$@\" do [[ -d $a ]] && DEST+=( \"$a\" ) [[ -f $a ]] && LIST+=( \"$(perl -CSDA -MUnicode::Normalize <<'EOF' - \"$a\" my @a = map {chomp; $_ ne '' ? quotemeta NFD($_) : () } <>; while (@a) { printf \"(%s)\\n\", join '|', splice(@a, 0, 100); } # max number of alternations in single pattern = 100 EOF)\" ) done # move files in $SOURCE whose name contains some name in ${LIST[i]} to ${DEST[i]} for (( i = 0; i < ${#LIST[@]}; i++ )) do [[ -z ${DEST[i]} ]] && break # break if no corresponding destination while IFS= read -r p do [[ -z $p ]] && continue # skip blank line find -E \"$SOURCE\" -type f ! -name '.*' -and ! -name 'Icon'$'\\r' -regex \".*$p[^/]*$\" -print0 | xargs -0 -J% mv % \"${DEST[i]}\" done <<< \"${LIST[i]}\" done (( i = ${#LIST[@]} )) [[ -z ${DEST[i]} ]] && exit # exit if no destination for the rest # move rest of files in $SOURCE (except for dot files and Icon\\r files ) to ${DEST[i}} find \"$SOURCE\" -type f ! -name '.*' -and ! -name 'Icon'$'\\r' -print0 | xargs -0 -J% mv % \"${DEST[i]}\" EOF"

Scripts are briefly tested under OS X 10.6.8 but no warranties.

Good luck,

H

PS. Now I noticed your additional questions on Nov 23. I'll answer them later.

Reply

Answer 8

Hiroto

Level 5

7,461 points

Nov 23, 2015 2:06 PM in response to bagrov

Hello

As for Q1, destination_r is optional. If you do not specify it in arguments for script, script will not move the rest.

E.g., the following script will leave the files not moved to c1, c2 or c3 in the original directories.

set source to "path:to:source" set c1 to "path:to:category1" set c2 to "path:to:category2" set c3 to "path:to:category3" set c1_names to "path:to:category1_names.txt" set c2_names to "path:to:category2_names.txt" set c3_names to "path:to:category3_names.txt" set args to "" repeat with a in {source, c1, c1_names, c2, c2_names, c3, c3_names} set args to args & (a as alias)'s POSIX path's quoted form & space end repeat do shell script "/bin/bash -s <<'EOF' - " & args & " # # $1 : source_directory # $2.. : destination_1 list_1 ... destionation_n list_n destination_r # # rest omitted EOF"

As for Q2, use cp(1) in lieu of mv(1) in shell script.

E.g, replace:

mv X Y

with:

cp -pR X Y

also replace:

xargs -0 -J% mv % Y

with:

xargs -0 -J% cp -pR % Y

Note that if you leave the files in source and specify destination_r, those files are all moved or copied to destination_r at end.

As for Q3, I'm not sure I understand the question but I guess – e.g., given a name ABC in three lists X_names.txt, Y_names.txt and Z_names.txt, script should find the files whose name contains ABC, divide them into 3 subsets and move (or copy) each subset to X, Y and Z respectively. Well, it is possible, or indeed almost anything well-defined is possible, but it is very complicated and requires entirely new script. If I find spare time, I may try to solve this puzzle... But not in a few days.

Regards,

H

Reply

Answer 9

bagrov Author

Level 1

1 points

Nov 23, 2015 10:41 PM in response to Hiroto

I totally understand you and thank you big time for your help.

Q1: will test it today as well as new scripts.

About Q2: best way to do that is to first move all files to some destination folder and then copy files from there to another destination so there is no need to have destination_r.

Q3: yes, you got it right.

Reply

Answer 10

bagrov Author

Level 1

1 points

Nov 24, 2015 11:21 AM in response to Hiroto

1. Script to rename files: I couldn't run it (there was an error: this script was processing folders or even path to folder. I got something like "blah-blah-blah 1234"path:to:source_folder"1234 was not found".

2. Script to rename files (2): 48000 files went I don't know where, I cannot find them on my Mac. I guess they are just gone. 🙂 I got only 2000 files in the right folder.

3. Script to move files: it didn't make it faster, I still hit 4:20-4:30. I guess I'll just leave it as is. It's not so bad and I only have 50,000 files in the beginning of the process, when I create something new. Then it will be much less, so I think script will run very slow.

By the way I've got not the best MBP: It has only 4 GB RAM + 128 GB SSD + 2,4 GHz i5 processor. Maybe on such machine this number is okay, don't you think?

Reply

Answer 11

bagrov Author

Level 1

1 points

Dec 5, 2015 2:01 AM in response to Hiroto

Could I count on you helping me with these two scripts:

1. What if there are the same names in two lists and I want the same files in 2 or more folders? How do I duplicate files and move them to both for example 'destination_1' and 'destination_2' folders? (Files should be first moved to destination_1 and the copied to destination_2/destination_3/etc

2. What if there are the same names in two lists and I want files from source folder to be equally divided between 2 or more folders?

(2 folders = 50% to every destination folder; 3 folders = 33,3% to every destination folder, 4 folders = 25% to every destination folder etc)

(As well as in the first case nothing should be in a source_folder after running the script.

Reply

Answer 12

Hiroto

Level 5

7,461 points

Dec 5, 2015 1:58 PM in response to bagrov

Hello

Sorry for late reply. Here's an entirely new script to address your Q3. Since the requried process is fairly complicated, I've written it in Perl which I feel more comfortable with. You may call it from AppleScript using do shell script command.

Recipe.

1) Save the Perl script listed below as plain text file named distribute.pl in ~/Desktop.

2) Run the following command in Terminal to make it executable (if it's not yet so):

#!/bin/bash chmod u+x ~/Desktop/distribute.pl

3) Use something like the following AppleScript script to invoke the Perl script:

(* distribute.pl is assumed to be located at ~/Desktop/distribute.pl *) set source to "path:to:source" set c1 to "path:to:category1" set c2 to "path:to:category2" set c3 to "path:to:category3" set c4 to "path:to:category4" set c1_names to "path:to:category1_names.txt" set c2_names to "path:to:category2_names.txt" set c3_names to "path:to:category3_names.txt" set args to "" repeat with a in {source, c1, c1_names, c2, c2_names, c3, c3_names, c4} set args to args & (a as alias)'s POSIX path's quoted form & space end repeat set command to (path to desktop)'s POSIX path & "distribute.pl" do shell script command's quoted form & " " & args

Notes.

- Script has three operation modes: 0) echo (test), 1) move and 2) copy. You may set it by $OPERATION, which is currently set to 1 for move.

- Script has debug mode, which will let it print some internal data structures. You may set it by $DEBUG, which is currently set to 1 for debug on.

- This script allows destination in source tree.

- Briefly tested with Perl v5.10.0 built for darwin-thread-multi-2level under OS X 10.6.8 but no warranties.

Perl script:

#!/usr/bin/perl -w # # file: # distribute.pl # # arguments: # $ARGV[0] : source_directory # $ARGV[1].. : destination_0 list_0 ... destionation_n list_n destination_r # # function: # - file in source directory is classified to set_i if the name contains some name in list_i for any i in I = [0, n]; and # file_k for any k in K = [0, m] shared in set_j for all j in J which is a subset of I is distributed to destination_h, # where h = J[k % |J|] that is (k % |J|)'th index in J. J is processed in descending order of its size. # This way, files shared in set_j for all j in J are evenly distributed amongst destination_j for any j in J. # # - file in source_directory which has not been classified to set_i for any i in I is distributed to destination_r # if destination_r is specified. # # - if there's no corresponding destination_i for list_i, list_i is ignored. # - if there's no corresponding list_i for destination_i except for destination_r, destination_i is ignored. # - destination_i and list_i may be either interleaved or separated in arguments list # - list_i is assumed to have text in UTF-8. # # - dot file is ignored # - Icon\r file is ignored # # - operation mode is specified by $OPERATION in script (currently set to 1): # $OPERATION = 0|1|2 # 0 : $OP = $ECHO = ["echo"] # 1 : $OP = $MV = ["mv"] # 2 : $OP = $CP = ["cp", "-pR"] # - debug mode is specified by $DEBUG in script (currently set to 0): # $DEBUG = 0|1 # 0 : no debug output # 1 : debug output for internal data structures # # version: # v0.12d2 # # written by Hiroto, 2015-12 # # E.g.: # Given files in source tree: # source/abc1 # source/abc2 # source/abc3 # source/abc4 # source/abd # source/bcd # # destination directories: # c1, c2, c3, c4 # # name list files: # c1_names.txt => abc # c2_names.txt => abc # c3_names.txt => ab # # command: # ./distribute.pl source c1 c1_names.txt c2 c2_names.txt c3 c3_names.txt c4 # # distribution result will be: # c1/abc1 # c1/abc4 # # c2/abc2 # # c3/abc3 # c3/abd # # c4/bcd # use strict; use Encode; use encoding 'utf8'; use open IO => ':utf8'; use Unicode::Normalize; use Data::Dumper; my $DEBUG = 1; # debug flag (0|1): 0 => no debug output, 1 => debug output my $OPERATION = 1; # operation of this script (0|1|2): 0 => echo (test), 1 => move, 2 => copy my $MAX_ALTERNATIONS = 100; # max number of alternations in single pattern my $EXCLUSIONS = [ # find(1) expressions to exclude certain files "!", "-name", ".*", # - exclude dot file "!", "-name", "Icon\r", # - exclude Icon\r file ]; my $ECHO = ["echo"]; # echo command for test my $MV = ["mv"]; # mv(1) command and options my $CP = ["cp", "-pR"]; # cp(1) command and options my $OP = $OPERATION == 0 ? $ECHO : $OPERATION == 1 ? $MV : $OPERATION == 2 ? $CP : undef; unless ($OP) { printf STDERR "Invalid operation: %d\n", $OPERATION; exit 1; } my @dest = (); # array of destination directories my @list = (); # array of name list files my @re_list = (); # array of array of alternations regex patterns; subarray per name list file my %dtable = (); # distribution table: { file => string of name list indices } where index is terminated by ; e.g., 0;2;3; my %dtable_r = (); # reverse distribution table: { string of name list indices => array of files } my @files = (); # array of array of files; subarray per destination directory my $i = 0; # name list index @ARGV = map { decode('utf8', $_) } @ARGV; for (@ARGV) { if (-d $_) { push @dest, $_; next; } if (-e $_) { push @list, $_; open(LIST, "<", $_) or die "$!"; my @a = map {chomp; $_ ne '' ? quotemeta NFD($_) : () } <LIST>; close LIST; while (@a) { push @{$re_list[$i]}, sprintf "(%s)", join '|', splice(@a, 0, $MAX_ALTERNATIONS); } ++$i; next; } { printf STDERR "No such file or directory: %s\n", $_; exit 1; } } my $source = shift @dest; # source directory unless (@dest) { print STDERR "No destination directory is specified\n"; exit 1; } # build %dtable for $i (0 .. @list - 1) { my @ff = (); for my $re ( @{$re_list[$i]} ) { local $/ = "\0"; open(PIPEIN, "-|", "find", "-E", $source, "-type", "f", @{$EXCLUSIONS}, # exclude dot file and Icon\r file "-regex", ".*${re}[^/]*\$", # match $re in leaf node name "-print0") or die "$!"; push @ff, map { chomp; $_ } <PIPEIN>; close PIPEIN or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?"; } my %uniq = map { $_ => 1 } @ff; for (keys %uniq) { exists $dtable{$_} ? ( $dtable{$_} .= "$i;") : ( $dtable{$_} = "$i;" ); } } # build %dtable_r while (my ($k, $v) = each %dtable) { exists $dtable_r{$v} ? ( push @{$dtable_r{$v}}, $k ) : ( $dtable_r{$v} = [$k] ); } for (keys %dtable_r) { $dtable_r{$_} = [ sort @{$dtable_r{$_}} ]; # sort is optional if lexically unordered distribution is fine } if ($DEBUG) { print Data::Dumper->Dump([\%dtable]); print Data::Dumper->Dump([\%dtable_r]); } # build @files for my $m (reverse 1 .. @list) { # e.g., $m in (3, 2, 1) my $cc = &combination([0 .. @list - 1], $m); # e.g., $cc = [[0, 1], [0, 2], [1, 2]] for $m = 2 for my $c ( @{$cc} ) { # e.g., $c = [0, 1] my $d = sprintf '%d;' x $m, @{$c}; # e.g., $d = '0;1;' for $c = [0, 1] my $k = 0; for my $f ( @{$dtable_r{$d}} ) { my $j = $c->[$k++ % $m]; # rotate distribution index push @{$files[$j]}, $f; } } } # - retrieve the rest and set $files[0 + @list] to it if (@list < @dest) { local $/ = "\0"; open(PIPEIN, "-|", "find", "-E", $source, "-type", "f", @{$EXCLUSIONS}, # exclude dot file and Icon\r file "-print0") or die "$!"; my @rest = map { chomp; exists $dtable{$_} ? () : $_ } <PIPEIN>; close PIPEIN or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?"; $files[0 + @list] = @rest ? [ @rest ] : undef; } @files = map { $_ ? [sort @{$_}] : undef } @files; # sort is optional (it does not affect results but possibly mv/cp performance) if ($DEBUG) { print Data::Dumper->Dump(\@files); } # distribute files for $i (0 .. @list) { next unless $files[$i] and $dest[$i]; local $, = "\0"; local $\ = "\0"; open(PIPEOUT, "|-", "xargs", "-0", "-J%", @{$OP}, "%", $dest[$i]) or die "$!"; print PIPEOUT @{$files[$i]}; close PIPEOUT or warn $! ? "Error closing pipe-out: $!" : "Wait status from pipe-out: $?"; } sub combination($$) { # $ : (array ref) list of elements # $ : (int) number of elements in each combination # return (array ref) array of combinations # # * array may contain repeated elements my ($aa, $n) = @_; my ($bb, $cc) = ({}, []); return [] if $n < 1; if ($n == 1) { for my $e ( @{$aa} ) { push @{$cc}, [$e] unless exists $bb->{$e}; $bb->{$e} = 1; } return $cc; } my $aa1 = [ @{$aa} ]; # local copy for ( 0 .. @{$aa1} - $n ) { my $e = shift @{$aa1}; if ( exists $bb->{$e} ) {} # prune this branch else { for my $c ( @{&combination($aa1, $n - 1)} ) { push @{$cc}, [$e, @{$c}]; } $bb->{$e} = 1; # add this to prune list } } return $cc; }

Good luck,

H

Reply

Answer 13

Hiroto

Level 5

7,461 points

Dec 6, 2015 1:06 AM in response to bagrov

Here's another Perl script which directly addresses your Q2. The recipe is basically the same as the previous script.

You may change the $OPERATION and $DEBUG as you see fit. They are both currently set to 1. Read the comments in script for more details.

Regards,

H

#!/usr/bin/perl -w # # file: # distribute_copies.pl # # arguments: # $ARGV[0] : source_directory # $ARGV[1].. : destination_0 list_0 ... destionation_n list_n destination_r # # function: # - file in source directory is classified to set_i if the name contains some name in list_i for any i in I = [0, n]; and # copy of every file in set_i for any i in I is distributed to destination_i. # If $OPERATION == 1 (move), the original file in set_i for any i in I and the original file classified as the rest # if destination_r is specified is removed at end. # # - file in source_directory which has not been classified to set_i for any i in I is distributed to destination_r # if destination_r is specified. # # - if there's no corresponding destination_i for list_i, list_i is ignored. # - if there's no corresponding list_i for destination_i except for destination_r, destination_i is ignored. # - destination_i and list_i may be either interleaved or separated in arguments list # - list_i is assumed to have text in UTF-8. # # - dot file is ignored # - Icon\r file is ignored # # - operation mode is specified by $OPERATION in script (currently set to 1): # $OPERATION = 0|1|2 # 0 : $OP = $ECHO = ["echo"] # 1 : $OP = $MV = ["cp", "-pR"] (source files copied to any destination will be removed at end) # 2 : $OP = $CP = ["cp", "-pR"] # - debug mode is specified by $DEBUG in script (currently set to 0): # $DEBUG = 0|1 # 0 : no debug output # 1 : debug output for internal data structures # # versions: # v0.10d1 - # # written by Hiroto, 2015-12 # # E.g.: # Given files in source tree: # source/abc1 # source/abc2 # source/abc3 # source/abc4 # source/abd # source/bcd # # destination directories: # c1, c2, c3, c4 # # name list files: # c1_names.txt => abc # c2_names.txt => abc # c3_names.txt => ab # # command: # ./distribute_copies.pl source c1 c1_names.txt c2 c2_names.txt c3 c3_names.txt c4 # # distribution result will be: # c1/abc1 # c1/abc2 # c1/abc3 # c1/abc4 # # c2/abc1 # c2/abc2 # c2/abc3 # c2/abc4 # # c3/abc1 # c3/abc2 # c3/abc3 # c3/abc4 # c3/abd # # c4/bcd # use strict; use Encode; use encoding 'utf8'; use open IO => ':utf8'; use Unicode::Normalize; use Data::Dumper; my $DEBUG = 1; # debug flag (0|1): 0 => no debug output, 1 => debug output my $OPERATION = 1; # operation mode (0|1|2): 0 => echo (test), 1 => move, 2 => copy my $MAX_ALTERNATIONS = 100; # max number of alternations in single pattern my $EXCLUSIONS = [ # find(1) expressions to exclude certain files "!", "-name", ".*", # - exclude dot file "!", "-name", "Icon\r", # - exclude Icon\r file ]; my $ECHO = ["echo"]; # echo command for test my $MV = ["cp", "-pR"]; # cp(1) command and options my $CP = ["cp", "-pR"]; # cp(1) command and options my $RM = ["rm"]; # rm(1) command my $OP = $OPERATION == 0 ? $ECHO : $OPERATION == 1 ? $MV : $OPERATION == 2 ? $CP : undef; unless ($OP) { printf STDERR "Invalid operation: %d\n", $OPERATION; exit 1; } my @dest = (); # array of destination directories my @list = (); # array of name list files my @re_list = (); # array of array of alternations regex patterns; subarray per name list file my %dtable = (); # distribution table: { file => string of name list indices } where index is terminated by ; e.g., 0;2;3; my @files = (); # array of array of files; subarray per destination directory my $i = 0; # name list index @ARGV = map { decode('utf8', $_) } @ARGV; for (@ARGV) { if (-d $_) { push @dest, $_; next; } if (-e $_) { push @list, $_; open(LIST, "<", $_) or die "$!"; my @a = map {chomp; $_ ne '' ? quotemeta NFD($_) : () } <LIST>; close LIST; while (@a) { push @{$re_list[$i]}, sprintf "(%s)", join '|', splice(@a, 0, $MAX_ALTERNATIONS); } ++$i; next; } { printf STDERR "No such file or directory: %s\n", $_; exit 1; } } my $source = shift @dest; # source directory unless (@dest) { print STDERR "No destination directory is specified\n"; exit 1; } # build @files for $i (0 .. @list - 1) { my @ff = (); for my $re ( @{$re_list[$i]} ) { local $/ = "\0"; open(PIPEIN, "-|", "find", "-E", $source, "-type", "f", @{$EXCLUSIONS}, # exclude dot file and Icon\r file "-regex", ".*${re}[^/]*\$", # match $re in leaf node name "-print0") or die "$!"; push @ff, map { chomp; $_ } <PIPEIN>; close PIPEIN or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?"; } my %uniq = map { $_ => 1 } @ff; for (keys %uniq) { exists $dtable{$_} ? ( $dtable{$_} .= "$i;") : ( $dtable{$_} = "$i;" ); exists $files[$i] ? ( push @{$files[$i]}, $_ ) : ( $files[$i] = [$_] ); } } # - retrieve the rest and set $files[0 + @list] to it if (@list < @dest) { local $/ = "\0"; open(PIPEIN, "-|", "find", "-E", $source, "-type", "f", @{$EXCLUSIONS}, # exclude dot file and Icon\r file "-print0") or die "$!"; my @rest = map { chomp; exists $dtable{$_} ? () : $_ } <PIPEIN>; close PIPEIN or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?"; $files[0 + @list] = @rest ? [ @rest ] : undef; } @files = map { $_ ? [sort @{$_}] : undef } @files; # sort is optional (it does not affect results but possibly mv/cp performance) if ($DEBUG) { print Data::Dumper->Dump([\%dtable]); print Data::Dumper->Dump(\@files); } # distribute files for $i (0 .. @list) { next unless $files[$i] and $dest[$i]; local $, = "\0"; local $\ = "\0"; open(PIPEOUT, "|-", "xargs", "-0", "-J%", @{$OP}, "%", $dest[$i]) or die "$!"; print PIPEOUT @{$files[$i]}; close PIPEOUT or warn $! ? "Error closing pipe-out: $!" : "Wait status from pipe-out: $?"; } # remove the original distributed to any destination if $OPERATION == 1 unless ($OPERATION == 2) { # unless copy $OP = $RM if $OPERATION == 1; # if move local $, = "\0"; local $\ = "\0"; open(PIPEOUT, "|-", "xargs", "-0", @{$OP}) or die "$!"; print PIPEOUT sort keys %dtable; # the matched (sort is optional) print PIPEOUT @{$files[0 + @list]} if $files[0 + @list]; # the rest if defined close PIPEOUT or warn $! ? "Error closing pipe-out: $!" : "Wait status from pipe-out: $?"; }

Reply

Answer 14

bagrov Author

Level 1

1 points

Dec 10, 2015 3:09 AM in response to Hiroto

Unfortunately, it doesn't work as I expected.

Maybe I could somehow 'hire' you so you could work not just for 'thank you'?

Reply

Answer 15

Hiroto

Level 5

7,461 points

Dec 12, 2015 7:17 AM in response to bagrov

My answering question here is like solving puzzle and solution is its own reward. Being hired means being obliged which defeats the pleasure to solve a puzzle in freedom.

If you want to continue this thread, show minimal negative examples which demonstrate the "unexpected" results along with the "expected" results so that I can test them. Otherwise I'm done with this because for me the problems have been already solved reasonably.

H

Reply