Hello
Sorry for late reply. Here's an entirely new script to address your Q3. Since the requried process is fairly complicated, I've written it in Perl which I feel more comfortable with. You may call it from AppleScript using do shell script command.
Recipe.
1) Save the Perl script listed below as plain text file named distribute.pl in ~/Desktop.
2) Run the following command in Terminal to make it executable (if it's not yet so):
#!/bin/bash
chmod u+x ~/Desktop/distribute.pl
3) Use something like the following AppleScript script to invoke the Perl script:
(*
distribute.pl is assumed to be located at
~/Desktop/distribute.pl
*)
set source to "path:to:source"
set c1 to "path:to:category1"
set c2 to "path:to:category2"
set c3 to "path:to:category3"
set c4 to "path:to:category4"
set c1_names to "path:to:category1_names.txt"
set c2_names to "path:to:category2_names.txt"
set c3_names to "path:to:category3_names.txt"
set args to ""
repeat with a in {source, c1, c1_names, c2, c2_names, c3, c3_names, c4}
set args to args & (a as alias)'s POSIX path's quoted form & space
end repeat
set command to (path to desktop)'s POSIX path & "distribute.pl"
do shell script command's quoted form & " " & args
Notes.
- Script has three operation modes: 0) echo (test), 1) move and 2) copy. You may set it by $OPERATION, which is currently set to 1 for move.
- Script has debug mode, which will let it print some internal data structures. You may set it by $DEBUG, which is currently set to 1 for debug on.
- This script allows destination in source tree.
- Briefly tested with Perl v5.10.0 built for darwin-thread-multi-2level under OS X 10.6.8 but no warranties.
Perl script:
#!/usr/bin/perl -w
#
# file:
# distribute.pl
#
# arguments:
# $ARGV[0] : source_directory
# $ARGV[1].. : destination_0 list_0 ... destionation_n list_n destination_r
#
# function:
# - file in source directory is classified to set_i if the name contains some name in list_i for any i in I = [0, n]; and
# file_k for any k in K = [0, m] shared in set_j for all j in J which is a subset of I is distributed to destination_h,
# where h = J[k % |J|] that is (k % |J|)'th index in J. J is processed in descending order of its size.
# This way, files shared in set_j for all j in J are evenly distributed amongst destination_j for any j in J.
#
# - file in source_directory which has not been classified to set_i for any i in I is distributed to destination_r
# if destination_r is specified.
#
# - if there's no corresponding destination_i for list_i, list_i is ignored.
# - if there's no corresponding list_i for destination_i except for destination_r, destination_i is ignored.
# - destination_i and list_i may be either interleaved or separated in arguments list
# - list_i is assumed to have text in UTF-8.
#
# - dot file is ignored
# - Icon\r file is ignored
#
# - operation mode is specified by $OPERATION in script (currently set to 1):
# $OPERATION = 0|1|2
# 0 : $OP = $ECHO = ["echo"]
# 1 : $OP = $MV = ["mv"]
# 2 : $OP = $CP = ["cp", "-pR"]
# - debug mode is specified by $DEBUG in script (currently set to 0):
# $DEBUG = 0|1
# 0 : no debug output
# 1 : debug output for internal data structures
#
# version:
# v0.12d2
#
# written by Hiroto, 2015-12
#
# E.g.:
# Given files in source tree:
# source/abc1
# source/abc2
# source/abc3
# source/abc4
# source/abd
# source/bcd
#
# destination directories:
# c1, c2, c3, c4
#
# name list files:
# c1_names.txt => abc
# c2_names.txt => abc
# c3_names.txt => ab
#
# command:
# ./distribute.pl source c1 c1_names.txt c2 c2_names.txt c3 c3_names.txt c4
#
# distribution result will be:
# c1/abc1
# c1/abc4
#
# c2/abc2
#
# c3/abc3
# c3/abd
#
# c4/bcd
#
use strict;
use Encode;
use encoding 'utf8';
use open IO => ':utf8';
use Unicode::Normalize;
use Data::Dumper;
my $DEBUG = 1; # debug flag (0|1): 0 => no debug output, 1 => debug output
my $OPERATION = 1; # operation of this script (0|1|2): 0 => echo (test), 1 => move, 2 => copy
my $MAX_ALTERNATIONS = 100; # max number of alternations in single pattern
my $EXCLUSIONS = [ # find(1) expressions to exclude certain files
"!", "-name", ".*", # - exclude dot file
"!", "-name", "Icon\r", # - exclude Icon\r file
];
my $ECHO = ["echo"]; # echo command for test
my $MV = ["mv"]; # mv(1) command and options
my $CP = ["cp", "-pR"]; # cp(1) command and options
my $OP = $OPERATION == 0 ? $ECHO :
$OPERATION == 1 ? $MV :
$OPERATION == 2 ? $CP : undef;
unless ($OP) { printf STDERR "Invalid operation: %d\n", $OPERATION; exit 1; }
my @dest = (); # array of destination directories
my @list = (); # array of name list files
my @re_list = (); # array of array of alternations regex patterns; subarray per name list file
my %dtable = (); # distribution table: { file => string of name list indices } where index is terminated by ; e.g., 0;2;3;
my %dtable_r = (); # reverse distribution table: { string of name list indices => array of files }
my @files = (); # array of array of files; subarray per destination directory
my $i = 0; # name list index
@ARGV = map { decode('utf8', $_) } @ARGV;
for (@ARGV) {
if (-d $_) {
push @dest, $_;
next;
}
if (-e $_) {
push @list, $_;
open(LIST, "<", $_) or die "$!";
my @a = map {chomp; $_ ne '' ? quotemeta NFD($_) : () } <LIST>;
close LIST;
while (@a) {
push @{$re_list[$i]}, sprintf "(%s)", join '|', splice(@a, 0, $MAX_ALTERNATIONS);
}
++$i;
next;
}
{ printf STDERR "No such file or directory: %s\n", $_; exit 1; }
}
my $source = shift @dest; # source directory
unless (@dest) { print STDERR "No destination directory is specified\n"; exit 1; }
# build %dtable
for $i (0 .. @list - 1) {
my @ff = ();
for my $re ( @{$re_list[$i]} ) {
local $/ = "\0";
open(PIPEIN, "-|",
"find", "-E", $source, "-type", "f",
@{$EXCLUSIONS}, # exclude dot file and Icon\r file
"-regex", ".*${re}[^/]*\$", # match $re in leaf node name
"-print0") or die "$!";
push @ff, map { chomp; $_ } <PIPEIN>;
close PIPEIN
or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?";
}
my %uniq = map { $_ => 1 } @ff;
for (keys %uniq) {
exists $dtable{$_} ? ( $dtable{$_} .= "$i;") : ( $dtable{$_} = "$i;" );
}
}
# build %dtable_r
while (my ($k, $v) = each %dtable) {
exists $dtable_r{$v} ? ( push @{$dtable_r{$v}}, $k ) : ( $dtable_r{$v} = [$k] );
}
for (keys %dtable_r) {
$dtable_r{$_} = [ sort @{$dtable_r{$_}} ]; # sort is optional if lexically unordered distribution is fine
}
if ($DEBUG) {
print Data::Dumper->Dump([\%dtable]);
print Data::Dumper->Dump([\%dtable_r]);
}
# build @files
for my $m (reverse 1 .. @list) { # e.g., $m in (3, 2, 1)
my $cc = &combination([0 .. @list - 1], $m); # e.g., $cc = [[0, 1], [0, 2], [1, 2]] for $m = 2
for my $c ( @{$cc} ) { # e.g., $c = [0, 1]
my $d = sprintf '%d;' x $m, @{$c}; # e.g., $d = '0;1;' for $c = [0, 1]
my $k = 0;
for my $f ( @{$dtable_r{$d}} ) {
my $j = $c->[$k++ % $m]; # rotate distribution index
push @{$files[$j]}, $f;
}
}
}
# - retrieve the rest and set $files[0 + @list] to it
if (@list < @dest) {
local $/ = "\0";
open(PIPEIN, "-|",
"find", "-E", $source, "-type", "f",
@{$EXCLUSIONS}, # exclude dot file and Icon\r file
"-print0") or die "$!";
my @rest = map { chomp; exists $dtable{$_} ? () : $_ } <PIPEIN>;
close PIPEIN
or warn $! ? "Error closing pipe-in: $!" : "Wait status from pipe-in: $?";
$files[0 + @list] = @rest ? [ @rest ] : undef;
}
@files = map { $_ ? [sort @{$_}] : undef } @files; # sort is optional (it does not affect results but possibly mv/cp performance)
if ($DEBUG) {
print Data::Dumper->Dump(\@files);
}
# distribute files
for $i (0 .. @list) {
next unless $files[$i] and $dest[$i];
local $, = "\0";
local $\ = "\0";
open(PIPEOUT, "|-", "xargs", "-0", "-J%", @{$OP}, "%", $dest[$i]) or die "$!";
print PIPEOUT @{$files[$i]};
close PIPEOUT
or warn $! ? "Error closing pipe-out: $!" : "Wait status from pipe-out: $?";
}
sub combination($$) {
# $ : (array ref) list of elements
# $ : (int) number of elements in each combination
# return (array ref) array of combinations
#
# * array may contain repeated elements
my ($aa, $n) = @_;
my ($bb, $cc) = ({}, []);
return [] if $n < 1;
if ($n == 1) {
for my $e ( @{$aa} ) {
push @{$cc}, [$e] unless exists $bb->{$e};
$bb->{$e} = 1;
}
return $cc;
}
my $aa1 = [ @{$aa} ]; # local copy
for ( 0 .. @{$aa1} - $n ) {
my $e = shift @{$aa1};
if ( exists $bb->{$e} ) {} # prune this branch
else {
for my $c ( @{&combination($aa1, $n - 1)} ) {
push @{$cc}, [$e, @{$c}];
}
$bb->{$e} = 1; # add this to prune list
}
}
return $cc;
}
Good luck,
H