Why won't Numbers 09 open when a tab delimited file is placed over the Numbers icon in Applications
Am I doing something wrong? That's how Help told me to do it.
Mac mini (Late 2012), OS X Mountain Lion (10.8.3)
Am I doing something wrong? That's how Help told me to do it.
Mac mini (Late 2012), OS X Mountain Lion (10.8.3)
I've finally figured out one way to allow the user to specify the delimiter (separator character) used in a source file without having to go in and modify the script. After right-clicking a file in Finder and choosing from the Services contextual menu, this dialog will pop up:
Revised Automator Service here (Dropbox download).
SG
Hello SG,
Here's my version of a CSV to TSV converter written in AppleScript.
It took me a while to write new csv2array() handler afresh, for logic of N.G.'s script seems overly complicated and a little unclear for me to follow. New handler implements some heuristic logic to probe record separator and field separator in given csv file. Since the probed value is merely educated guess, script will show a list of possible field separators with the probed value pre-selected and let the user choose one. On the other hand, it will use the probed value of record separator without asking the user, for this guess is more plausible.
The array2tsv() hadler will convert the 2d-array to tsv, where tab and possbile record separators in field data are replaced with predefind replacement characters, that is, tab by ⇥ U+21E5 RIGHTWARDS ARROW TO BAR; CRLF, LF, CR and U+2029 PARAGRAPH SEPARATOR by ↩ U+21A9 LEFTWARDS ARROW WITH HOOK. These replacement characters may be replaced with the original after imported into spreadsheet. (You may copy and paste these arrow charaters into find & replace dialogue or you may directly input them by using Unicode Hex Input text input source, that is, by holding down option key and typing the hexadecimal digits of code point such as 21E5 and 21A9.)
In its current form, the script is to be run as stand-alone script, for instance, by running it in AppleScript Editor or from script menu or as applet. In order to use it as Run AppleScript action in Automator workflow to make a service, change the run handler from (B) to (A) at the beginning of script.
By the way, csv2array() and array2tsv() handlers have some options which are not used in this script.
Codes have been moderately tested under 10.6.8.
Have fun!
All the best.
H
(*
CSV to TSV in clipboard
csv2array v0.21
array2tsv v0.11
*)
-- (A) run handler for automator action
(*
on run {input, parameters}
-- input = alias list of csv file(s)
_main(input's item 1's POSIX path)
end run
*)
-- (B) (implicit) run handler for stand-alone script
set f to (choose file with prompt "Choose source csv file")'s POSIX path
_main(f)
on _main(f)
(*
string f : POSIX path of csv file
*)
script o
-- FS options = , ; | HT SPACE*4 SPACE*2 SPACE
property |*FS| : {",", ";", "|", tab, (space & space & space & space), (space & space), space}
try
-- UTF-8
set csv to read POSIX file f as «class utf8»
on error
do shell script "f=" & f's quoted form & ";[[ $(file -I \"$f\") =~ charset=utf-16 ]] && echo 1 || echo 0"
if result + 0 = 1 then
-- UTF-16LE or UTF_16BE
set csv to read POSIX file f as Unicode text
else
-- system's primary encoding
set csv to read POSIX file f
end if
end try
set {|FS|:|~FS|, |RS|:|~RS|} to csv2array(csv, {|PROBE|:1})
set cc to unicode_codepoint_and_name(|*FS|)
set cc1 to {}
repeat with c in cc
set c to c's contents
if _split(c, " U+")'s item 1 = |~FS| then
set cc1 to {c}
exit repeat
end if
end repeat
tell application "SystemUIServer"
activate
set r to choose from list cc default items cc1 ¬
with title ("Select CSV Field Separator") ¬
with prompt ("current selection is the probed value")
end tell
if r = false then error number -128
set |~FS| to _split(r's item 1, " U+")'s item 1
--set t0 to (do shell script "/usr/bin/ruby -e 'puts Time.now.to_f'") as number
set array to csv2array(csv, {|FS|:|~FS|, |RS|:|~RS|})
set tsv to array2tsv(array, {|FS|:tab, |RS|:linefeed})
--set t1 to (do shell script "/usr/bin/ruby -e 'puts Time.now.to_f'") as number
set the clipboard to tsv
tell application "SystemUIServer"
activate
display dialog "TSV text is in clipboard." with title "Complete" giving up after 2
end tell
--return {t1 - t0, tsv}
return tsv
end script
tell o to run
end _main
on array2tsv(aa, options) -- v0.11
(*
list aa : source 2d-array
record options : {|FS|:FS, |RS|:RS, |%FS|:FS_ESCAPE, |%RS|:RS_ESCAPE, |QUOTE|:QUOTE}
FS = (string) field separator; default = HT (tab)
RS = (string) record separator; default = LF (linefeed)
FS_ESCAPE = (string) escape sequence for FS in original field data; default = U+21E5 RIGHTWARDS ARROW TO BAR ⇥
RS_ESCAPE = (string) escape sequence for RS in original field data; default = U+21A9 LEFTWARDS ARROW WITH HOOK ↩
QUOTE = (integer) quoting option for field data; default = 0
0 = no quoting
1 = quote every field by placing ' (U+0027 APOSTROPHE) at the beginning of field data
2 = quote every field not starting with = by placing ' (U+0027 APOSTROPHE) at the beginning of field data
(i.e., leave formula in spreadsheet alone)
return string : tsv representation of aa,
where FS and RS in field data are escaped as FS_ESCAPE and RS_ESCAPE respectively,
and CRLF, LF, CR, U+2029 PARAGRAPH SEPARATOR in field data are all escaped as RS_ESCAPE.
* because of the limitation of tsv data representation which does not have decent quoting mechanism,
some tokens in the original field data will be replaced by other characters.
I.e, as defined in the properties below,
|FS| is replaced by |%FS|
|RS| and |*RS| = {CRLF, LF, CR, U+2029} are replaced by |%RS|
*)
script o
property pp : aa
property qq : {}
property |U+21E5| : character id 8677 -- U+21E5 RIGHTWARDS ARROW TO BAR ⇥
property |U+21A9| : character id 8617 -- U+21A9 LEFTWARDS ARROW WITH HOOK ↩
--property |U+2028| : character id 8232 -- U+2028 LINE SEPARATOR -- [1]
property |U+2029| : character id 8233 -- U+2029 PARAGRAPH SEPARATOR
property |_FS| : character id 65534 -- U+FFFE <not a character>
property |_RS| : character id 65535 -- U+FFFF <not a character>
property |*RS| : {return & linefeed, linefeed, return, |U+2029|} -- CRLF, LF, CR, U+2029 PARAGRAPH SEPARATOR -- [2]
property opt : options & {|FS|:tab, |RS|:linefeed, |%FS|:|U+21E5|, |%RS|:|U+21A9|, |QUOTE|:0}
property |FS| : opt's |FS|
property |RS| : opt's |RS|
property |%FS| : opt's |%FS|
property |%RS| : opt's |%RS|
if |%RS| is in |*RS| then error "Invalid escape for record separator: id = " & _join(|%RS|'s id as list, " ") number 8000
if |%FS| is in {tab} & |*RS| then error "Invalid escape for field separator: id = " & _join(|%FS|'s id as list, " ") number 8001
if opt's |QUOTE| = 1 then
-- quote every cell by prepending ' except for the already quoted
repeat with i from 1 to count my pp
repeat with j from 1 to count my pp's item i
set p to my pp's item i's item j
if p starts with "'" then
else
set my pp's item i's item j to "'" & p
end if
end repeat
end repeat
else if opt's |QUOTE| = 2 then
-- quote non-formula cell by prepending ' except for the already quoted
repeat with i from 1 to count my pp
repeat with j from 1 to count my pp's item i
set p to my pp's item i's item j
if p starts with "'" or p starts with "=" then
else
set my pp's item i's item j to "'" & p
end if
end repeat
end repeat
end if
repeat with p in my pp
set my qq's end to _join(p's contents, |_FS|)
end repeat
set t to _join(qq, |_RS|)
repeat with c in my |*RS| -- [2]
set t to _gsub(t, c's contents, |%RS|)
end repeat
if |RS| is not in |*RS| then set t to _gsub(t, |RS|, |%RS|) -- not probable for tsv but possible (?)
set t to _gsub(t, |FS|, |%FS|)
set t to _gsub(t, |_RS|, |RS|)
set t to _gsub(t, |_FS|, |FS|)
return t
(*
[1] U+2028 often shows strange behaviour in cell, e.g., it separates lines only when text in the cell has focus.
So it is not advised to use it as |%RS|.
[2] These characters are treated as record separator in TSV importer (in paste operation) in OSX.
So they must all be escaped.
*)
end script
considering case -- to increase performance in some text processing
tell o to run
end considering
end array2tsv
on csv2array(t, options) -- v0.12
(*
string t : source csv text
record options : {|FS|:FS, |RS|:RS, |PROBE|:PROBE}
FS = (string) field separator, default = COMMA (,)
RS = (string) record separator; default = CRLF (return & linefeed)
PROBE = (integer) probe option; default = 0
0 = no probe
1 = probe csv spec and return the spec
2 = probe csv spec and process the csv using the spec
return list (PROBE = 0, 2) : 2d-array (list of lists) representing records of fields in csv
return record (PROBE = 1) : {|RS|:probed_RS, |FS|: probed_FS, |RS_ID|:probed_RS_id, |FS_ID|:probed_FS_id}
* empty records are ignored
*)
script o
property pp : {}
property qq : {}
property |_FS| : character id 28 -- U+001C INFORMATION SEPARATOR FOUR
property |_RS| : character id 30 -- U+001E INFORMATION SEPARATOR TWO
property |_TS| : character id 29 -- U+001D INFORMATION SEPARATOR THREE
property |PROBE_LENGTH| : 1000
property |*RS| : {return & linefeed, linefeed, return} -- RS candidates = CRLF, LF, CR
--property |*FS| : {",", ";", "|", tab} -- FS candidates = , ; | HT
property |*FS| : {",", ";", "|", tab, space & space & space & space, space & space, space} -- FS candidates = , ; | HT SP*4 SP*2 SP
property opt : options & {|FS|:|*FS|'s item 1, |RS|:|*RS|'s item 1, |PROBE|:0}
property |FS| : opt's |FS|
property |RS| : opt's |RS|
property |QQ| : "\""
on |?:|(a, b, c)
if a then return b
return c
end |?:|
on _probe()
local len, s, p, q, qx
set len to |?:|((count t) < |PROBE_LENGTH|, count t, |PROBE_LENGTH|)
set s to t's text 1 thru len
set pp to _split(s, |QQ|)
-- probe RS
(* get the first line terminator sequence in the un-quoted substrings in probe range *)
set |~RS| to |RS| -- fallback
repeat with i from 1 to count pp by 2
set p to my pp's item i -- non-quoted substring when i is odd integer
if (count p's paragraphs) > 1 then
tell p
set |~RS| to (text 1 thru paragraph 2)'s text ((count paragraph 1) + 1) thru -((count paragraph 2) + 1)
end tell
exit repeat
end if
end repeat
-- probe FS
(* get the FS which can yield the maxium and the equal number of fields in records in probe range *)
set |~FS| to |FS| -- fallback
set qx to 0
repeat with c in |*FS| -- for each FS candidates
set c to c's contents
set pp to csv2array(s, {|FS|:c, |RS|:|~RS|, |PROBE|:0})
set q to count (my pp's item 1) -- count of fields in the first record
repeat with i from 2 to (count my pp) - 1 -- ignore the last record which may be incomplete
if q - (count my pp's item i) ≠ 0 then -- field count is not constant
set q to 0 -- thus reject this
exit repeat
end if
end repeat
if q > qx then
set qx to q -- maximum count of fields by this FS thus far
if qx > 0 then set |~FS| to c
end if
end repeat
set {|RS|, |FS|} to {|~RS|, |~FS|}
return {|RS|:|~RS|, |FS|:|~FS|, |RS_ID|:|~RS|'s id as list, |FS_ID|:|~FS|'s id as list}
end _probe
(*
(0) probe csv spec
*)
if opt's |PROBE| = 1 then
return _probe()
else if opt's |PROBE| = 2 then
_probe()
end if
(*
(1) generate escape sequences for |FS| and |RS|, |_FS| and |_RS| respectively, which are not present in source text.
and temporary separator |_TS| to be used in joining and splitting non-quoted substrings
*)
set c to |_FS|
repeat while c is in t
set c to c & |_FS|
end repeat
set |_FS| to c
set c to |_RS|
repeat while c is in t
set c to c & |_RS|
end repeat
set |_RS| to c
set c to |_TS|
repeat while c is in t
set c to c & |_TS|
end repeat
set |_TS| to c
(*
(2) preprocess the source text such that -
a) |FS| and |RS| are replaced with |_FS| and |_RS| respectively in non-quoted parts; and
b) |QQ| are removed except for consecutive double |QQ|, which are replaced with single |QQ|
*)
-- split t by quote character |QQ|
set pp to _split(t, |QQ|)
set ix to count my pp
-- extract odd substrings, which are non-quoted substrings
repeat with i from 1 to ix by 2
set my qq's end to my pp's item i -- non-quoted substring when i is odd integer
end repeat
-- escape |FS| and |RS| in qq
set s to _join(qq, |_TS|)
set s to _gsub(s, |FS|, |_FS|)
set s to _gsub(s, |RS|, |_RS|)
set qq to _split(s, |_TS|)
-- replace empty string in qq with |QQ| except for those at the ends of pp
repeat with j from 2 to ix div 2
(* empty string in qq except for those at the ends of pp is derived from consecutive double |QQ| in t,
which is here replaced with single |QQ|. *)
if my qq's item j = "" then set my qq's item j to |QQ|
end repeat
-- replace odd items in pp with qq pre-processed
repeat with i from 1 to ix by 2
set my pp's item i to my qq's item ((i + 1) div 2)
end repeat
-- join pp with null joiner, which effectively removs original |QQ| in t
set t to _join(pp, "")
(*
(3) convert t to 2d-array by using |_RS| and |_FS|
*)
set pp to _split(t, |_RS|)
set qq to {}
repeat with p in my pp
set p to p's contents
if p ≠ "" then set my qq's end to _split(p, |_FS|) -- empty record is ignored
end repeat
return my qq's contents
end script
considering case -- to increase performance in some text processing
tell o to run
end considering
end csv2array
on _split(t, d) -- stack overflow safe and fast for generating large list
(*
text t : source text
text d : delimiter string
return list : list of text items delimited by d (d is not inclusive)
*)
local asitd0, ix, j, tt
set tt to {}
try
set astid0 to AppleScript's text item delimiters
set AppleScript's text item delimiters to {} & d
set ix to count t's text items
repeat with i from 1 to ix by 3500
set j to i + 3499
if j > ix then set j to ix
set tt to tt & t's text items i thru j
end repeat
set AppleScript's text item delimiters to astid0
on error errs number errn
set AppleScript's text item delimiters to astid0
error "_split(): " & errs number errn
end try
return tt
end _split
on _join(tt, d)
(*
list tt : source list
string d : separator
return string : tt joined with d
*)
local astid0, t
try
set astid0 to AppleScript's text item delimiters
set AppleScript's text item delimiters to {} & d
set t to "" & tt
set AppleScript's text item delimiters to astid0
on error errs number errn
set AppleScript's text item delimiters to astid0
error "_join(): " & errs number errn
end try
return t
end _join
on _gsub(t, s, r)
(*
string t, s, r : source string, search string, replace string
return string : every occurence of s being replaced by r in t
*)
return _join(_split(t, s), r)
end _gsub
on unicode_codepoint_and_name(cc)
(*
list cc : character list
return list : list of character, codepoint and name
e.g., given cc = ("a", "人人"}, result =
{"é U+00E9 LATIN SMALL LETTER E ACUTE;",
"人人 U+4EBA CJK UNIFIED IDEOGRAPH-4EBA; U+4EBA CJK UNIFIED IDEOGRAPH-4EBA;"}
*)
set argv to ""
repeat with c in cc
set argv to argv & " " & c's quoted form
end repeat
do shell script "/usr/bin/perl -CSDA <<'EOF' - " & argv & "
use Unicode::UCD 'charinfo';
for (@ARGV) {
print $_;
for (split //) {
my $ci = charinfo(ord $_);
my ($code, $name, $uc10) = @{$ci}{'code', 'name', 'unicode10'};
print qq( U+$code ) . ($uc10 ? $uc10 : $name) . qq(;);
}
print qq(\\n);
}
EOF"
result's paragraphs
end unicode_codepoint_and_nameThanks H!
Got a lot to learn here.🙂
So far its guesses are right on the mark.
Best,
SG
Why won't Numbers 09 open when a tab delimited file is placed over the Numbers icon in Applications