Why won't Numbers 09 open when a tab delimited file is placed over the Numbers icon in Applications

Question

nathanschwam Author

Level 1

9 points

Why won't Numbers 09 open when a tab delimited file is placed over the Numbers icon in Applications

Am I doing something wrong? That's how Help told me to do it.

Mac mini (Late 2012), OS X Mountain Lion (10.8.3)

Posted on Feb 28, 2014 5:40 AM

Reply

Answer 1

SGIII

Level 8

37,550 points

Mar 4, 2014 7:57 PM in response to SGIII

I've finally figured out one way to allow the user to specify the delimiter (separator character) used in a source file without having to go in and modify the script. After right-clicking a file in Finder and choosing from the Services contextual menu, this dialog will pop up:

Revised Automator Service here (Dropbox download).

SG

Reply

Answer 2

Hiroto

Level 5

7,467 points

Mar 6, 2014 7:50 PM in response to SGIII

Hello SG,

Here's my version of a CSV to TSV converter written in AppleScript.

It took me a while to write new csv2array() handler afresh, for logic of N.G.'s script seems overly complicated and a little unclear for me to follow. New handler implements some heuristic logic to probe record separator and field separator in given csv file. Since the probed value is merely educated guess, script will show a list of possible field separators with the probed value pre-selected and let the user choose one. On the other hand, it will use the probed value of record separator without asking the user, for this guess is more plausible.

The array2tsv() hadler will convert the 2d-array to tsv, where tab and possbile record separators in field data are replaced with predefind replacement characters, that is, tab by ⇥ U+21E5 RIGHTWARDS ARROW TO BAR; CRLF, LF, CR and U+2029 PARAGRAPH SEPARATOR by ↩ U+21A9 LEFTWARDS ARROW WITH HOOK. These replacement characters may be replaced with the original after imported into spreadsheet. (You may copy and paste these arrow charaters into find & replace dialogue or you may directly input them by using Unicode Hex Input text input source, that is, by holding down option key and typing the hexadecimal digits of code point such as 21E5 and 21A9.)

In its current form, the script is to be run as stand-alone script, for instance, by running it in AppleScript Editor or from script menu or as applet. In order to use it as Run AppleScript action in Automator workflow to make a service, change the run handler from (B) to (A) at the beginning of script.

By the way, csv2array() and array2tsv() handlers have some options which are not used in this script.

Codes have been moderately tested under 10.6.8.

Have fun!

All the best.

H

(*
    CSV to TSV in clipboard
    
        csv2array v0.21
        array2tsv v0.11
*)
-- (A) run handler for automator action
(*
on run {input, parameters}
    -- input = alias list of csv file(s)
    _main(input's item 1's POSIX path)
end run
*)

-- (B) (implicit) run handler for stand-alone script
set f to (choose file with prompt "Choose source csv file")'s POSIX path
_main(f)

on _main(f)
    (*
        string f : POSIX path of csv file
    *)
    script o
        -- FS options = , ; | HT SPACE*4 SPACE*2 SPACE
        property |*FS| : {",", ";", "|", tab, (space & space & space & space), (space & space), space}
        
        try
            -- UTF-8
            set csv to read POSIX file f as «class utf8»
        on error
            do shell script "f=" & f's quoted form & ";[[ $(file -I \"$f\") =~ charset=utf-16 ]] && echo 1 || echo 0"
            if result + 0 = 1 then
                -- UTF-16LE or UTF_16BE
                set csv to read POSIX file f as Unicode text
            else
                -- system's primary encoding
                set csv to read POSIX file f
            end if
        end try
        
        set {|FS|:|~FS|, |RS|:|~RS|} to csv2array(csv, {|PROBE|:1})
        set cc to unicode_codepoint_and_name(|*FS|)
        set cc1 to {}
        repeat with c in cc
            set c to c's contents
            if _split(c, " U+")'s item 1 = |~FS| then
                set cc1 to {c}
                exit repeat
            end if
        end repeat
        tell application "SystemUIServer"
            activate
            set r to choose from list cc default items cc1 ¬
                with title ("Select CSV Field Separator") ¬
                with prompt ("current selection is the probed value")
        end tell
        if r = false then error number -128
        set |~FS| to _split(r's item 1, " U+")'s item 1
        
        --set t0 to (do shell script "/usr/bin/ruby -e 'puts Time.now.to_f'") as number
        set array to csv2array(csv, {|FS|:|~FS|, |RS|:|~RS|})
        set tsv to array2tsv(array, {|FS|:tab, |RS|:linefeed})
        --set t1 to (do shell script "/usr/bin/ruby -e 'puts Time.now.to_f'") as number
        set the clipboard to tsv
        
        tell application "SystemUIServer"
            activate
            display dialog "TSV text is in clipboard." with title "Complete" giving up after 2
        end tell
        --return {t1 - t0, tsv}
        return tsv
    end script
    tell o to run
end _main

on array2tsv(aa, options) -- v0.11
    (*
        list aa : source 2d-array
        record options : {|FS|:FS, |RS|:RS, |%FS|:FS_ESCAPE, |%RS|:RS_ESCAPE, |QUOTE|:QUOTE}
            FS = (string) field separator; default = HT (tab)
            RS = (string) record separator; default = LF (linefeed)
            FS_ESCAPE = (string) escape sequence for FS in original field data; default = U+21E5 RIGHTWARDS ARROW TO BAR ⇥
            RS_ESCAPE = (string) escape sequence for RS in original field data; default = U+21A9 LEFTWARDS ARROW WITH HOOK ↩
            QUOTE = (integer) quoting option for field data; default = 0
                0 = no quoting
                1 = quote every field by placing ' (U+0027 APOSTROPHE) at the beginning of field data
                2 = quote every field not starting with = by placing ' (U+0027 APOSTROPHE) at the beginning of field data
                    (i.e., leave formula in spreadsheet alone)
        return string : tsv representation of aa,
            where FS and RS in field data are escaped as FS_ESCAPE and RS_ESCAPE respectively,
            and CRLF, LF, CR, U+2029 PARAGRAPH SEPARATOR in field data are all escaped as RS_ESCAPE.
        
        * because of the limitation of tsv data representation which does not have decent quoting mechanism,
         some tokens in the original field data will be replaced by other characters.
         I.e, as defined in the properties below,
             |FS| is replaced by |%FS|
            |RS| and |*RS| = {CRLF, LF, CR, U+2029} are replaced by |%RS|
    *)
    script o
        property pp : aa
        property qq : {}
        property |U+21E5| : character id 8677 -- U+21E5 RIGHTWARDS ARROW TO BAR ⇥
        property |U+21A9| : character id 8617 -- U+21A9 LEFTWARDS ARROW WITH HOOK ↩
        --property |U+2028| : character id 8232 -- U+2028 LINE SEPARATOR -- [1]
        property |U+2029| : character id 8233 -- U+2029 PARAGRAPH SEPARATOR
        property |_FS| : character id 65534 -- U+FFFE <not a character>
        property |_RS| : character id 65535 -- U+FFFF <not a character>
        property |*RS| : {return & linefeed, linefeed, return, |U+2029|} -- CRLF, LF, CR, U+2029 PARAGRAPH SEPARATOR -- [2]
        
        property opt : options & {|FS|:tab, |RS|:linefeed, |%FS|:|U+21E5|, |%RS|:|U+21A9|, |QUOTE|:0}
        property |FS| : opt's |FS|
        property |RS| : opt's |RS|
        property |%FS| : opt's |%FS|
        property |%RS| : opt's |%RS|
        
        if |%RS| is in |*RS| then error "Invalid escape for record separator: id = " & _join(|%RS|'s id as list, " ") number 8000
        if |%FS| is in {tab} & |*RS| then error "Invalid escape for field separator: id = " & _join(|%FS|'s id as list, " ") number 8001
        if opt's |QUOTE| = 1 then
            -- quote every cell by prepending ' except for the already quoted
            repeat with i from 1 to count my pp
                repeat with j from 1 to count my pp's item i
                    set p to my pp's item i's item j
                    if p starts with "'" then
                    else
                        set my pp's item i's item j to "'" & p
                    end if
                end repeat
            end repeat
        else if opt's |QUOTE| = 2 then
            -- quote non-formula cell by prepending ' except for the already quoted
            repeat with i from 1 to count my pp
                repeat with j from 1 to count my pp's item i
                    set p to my pp's item i's item j
                    if p starts with "'" or p starts with "=" then
                    else
                        set my pp's item i's item j to "'" & p
                    end if
                end repeat
            end repeat
        end if
        
        repeat with p in my pp
            set my qq's end to _join(p's contents, |_FS|)
        end repeat
        set t to _join(qq, |_RS|)
        repeat with c in my |*RS| -- [2]
            set t to _gsub(t, c's contents, |%RS|)
        end repeat
        if |RS| is not in |*RS| then set t to _gsub(t, |RS|, |%RS|) -- not probable for tsv but possible (?)
        set t to _gsub(t, |FS|, |%FS|)
        set t to _gsub(t, |_RS|, |RS|)
        set t to _gsub(t, |_FS|, |FS|)
        return t
        (*
            [1] U+2028 often shows strange behaviour in cell, e.g., it separates lines only when text in the cell has focus.
                So it is not advised to use it as |%RS|.
            [2] These characters are treated as record separator in TSV importer (in paste operation) in OSX.
                So they must all be escaped.
        *)
    end script
    considering case -- to increase performance in some text processing
        tell o to run
    end considering
end array2tsv

on csv2array(t, options) -- v0.12 
    (*
        string t : source csv text
        record options : {|FS|:FS, |RS|:RS, |PROBE|:PROBE}
            FS = (string) field separator, default = COMMA (,)
            RS = (string) record separator; default = CRLF (return & linefeed)
            PROBE = (integer) probe option; default = 0
                0 = no probe
                1 = probe csv spec and return the spec
                2 = probe csv spec and process the csv using the spec
            
        return list (PROBE = 0, 2) : 2d-array (list of lists) representing records of fields in csv
        return record (PROBE = 1)  : {|RS|:probed_RS, |FS|: probed_FS, |RS_ID|:probed_RS_id, |FS_ID|:probed_FS_id}
        
        * empty records are ignored
    *)
    script o
        property pp : {}
        property qq : {}
        property |_FS| : character id 28 -- U+001C INFORMATION SEPARATOR FOUR
        property |_RS| : character id 30 -- U+001E INFORMATION SEPARATOR TWO
        property |_TS| : character id 29 -- U+001D INFORMATION SEPARATOR THREE
        
        property |PROBE_LENGTH| : 1000
        property |*RS| : {return & linefeed, linefeed, return} -- RS candidates = CRLF, LF, CR
        --property |*FS| : {",", ";", "|", tab} -- FS candidates = , ; | HT
        property |*FS| : {",", ";", "|", tab, space & space & space & space, space & space, space} -- FS candidates = , ; | HT SP*4 SP*2 SP
        
        property opt : options & {|FS|:|*FS|'s item 1, |RS|:|*RS|'s item 1, |PROBE|:0}
        property |FS| : opt's |FS|
        property |RS| : opt's |RS|
        property |QQ| : "\""
        
        on |?:|(a, b, c)
            if a then return b
            return c
        end |?:|
        
        on _probe()
            local len, s, p, q, qx
            set len to |?:|((count t) < |PROBE_LENGTH|, count t, |PROBE_LENGTH|)
            set s to t's text 1 thru len
            set pp to _split(s, |QQ|)
            
            -- probe RS
            (* get the first line terminator sequence in the un-quoted substrings in probe range *)
            set |~RS| to |RS| -- fallback
            repeat with i from 1 to count pp by 2
                set p to my pp's item i -- non-quoted substring when i is odd integer
                if (count p's paragraphs) > 1 then
                    tell p
                        set |~RS| to (text 1 thru paragraph 2)'s text ((count paragraph 1) + 1) thru -((count paragraph 2) + 1)
                    end tell
                    exit repeat
                end if
            end repeat
            
            -- probe FS
            (* get the FS which can yield the maxium and the equal number of fields in records in probe range *)
            set |~FS| to |FS| -- fallback
            set qx to 0
            repeat with c in |*FS| -- for each FS candidates
                set c to c's contents
                set pp to csv2array(s, {|FS|:c, |RS|:|~RS|, |PROBE|:0})
                set q to count (my pp's item 1) -- count of fields in the first record
                repeat with i from 2 to (count my pp) - 1 -- ignore the last record which may be incomplete
                    if q - (count my pp's item i) ≠ 0 then -- field count is not constant
                        set q to 0 -- thus reject this
                        exit repeat
                    end if
                end repeat
                if q > qx then
                    set qx to q -- maximum count of fields by this FS thus far
                    if qx > 0 then set |~FS| to c
                end if
            end repeat
            
            set {|RS|, |FS|} to {|~RS|, |~FS|}
            return {|RS|:|~RS|, |FS|:|~FS|, |RS_ID|:|~RS|'s id as list, |FS_ID|:|~FS|'s id as list}
        end _probe
        
        (*
            (0) probe csv spec
        *)
        if opt's |PROBE| = 1 then
            return _probe()
        else if opt's |PROBE| = 2 then
            _probe()
        end if
        
        (*
            (1) generate escape sequences for |FS| and |RS|, |_FS| and |_RS| respectively, which are not present in source text.
                and temporary separator |_TS| to be used in joining and splitting non-quoted substrings
        *)
        set c to |_FS|
        repeat while c is in t
            set c to c & |_FS|
        end repeat
        set |_FS| to c
        set c to |_RS|
        repeat while c is in t
            set c to c & |_RS|
        end repeat
        set |_RS| to c
        set c to |_TS|
        repeat while c is in t
            set c to c & |_TS|
        end repeat
        set |_TS| to c
        
        (*
            (2) preprocess the source text such that -
                a) |FS| and |RS| are replaced with |_FS| and |_RS| respectively in non-quoted parts; and
                b) |QQ| are removed except for consecutive double |QQ|, which are replaced with single |QQ|
        *)
        -- split t by quote character |QQ|
        set pp to _split(t, |QQ|)
        set ix to count my pp
        -- extract odd substrings, which are non-quoted substrings
        repeat with i from 1 to ix by 2
            set my qq's end to my pp's item i -- non-quoted substring when i is odd integer
        end repeat
        -- escape |FS| and |RS| in qq
        set s to _join(qq, |_TS|)
        set s to _gsub(s, |FS|, |_FS|)
        set s to _gsub(s, |RS|, |_RS|)
        set qq to _split(s, |_TS|)
        -- replace empty string in qq with |QQ| except for those at the ends of pp
        repeat with j from 2 to ix div 2
            (* empty string in qq except for those at the ends of pp is derived from consecutive double |QQ| in t,
                which is here replaced with single |QQ|. *)
            if my qq's item j = "" then set my qq's item j to |QQ|
        end repeat
        -- replace odd items in pp with qq pre-processed
        repeat with i from 1 to ix by 2
            set my pp's item i to my qq's item ((i + 1) div 2)
        end repeat
        -- join pp with null joiner, which effectively removs original |QQ| in t
        set t to _join(pp, "")
        
        (*
            (3) convert t to 2d-array by using |_RS| and |_FS|
        *)
        set pp to _split(t, |_RS|)
        set qq to {}
        repeat with p in my pp
            set p to p's contents
            if p ≠ "" then set my qq's end to _split(p, |_FS|) -- empty record is ignored
        end repeat
        return my qq's contents
    end script
    considering case -- to increase performance in some text processing
        tell o to run
    end considering
end csv2array

on _split(t, d) -- stack overflow safe and fast for generating large list
    (*
        text t : source text
        text d : delimiter string
        return list : list of text items delimited by d (d is not inclusive)
    *)
    local asitd0, ix, j, tt
    set tt to {}
    try
        set astid0 to AppleScript's text item delimiters
        set AppleScript's text item delimiters to {} & d
        set ix to count t's text items
        repeat with i from 1 to ix by 3500
            set j to i + 3499
            if j > ix then set j to ix
            set tt to tt & t's text items i thru j
        end repeat
        set AppleScript's text item delimiters to astid0
    on error errs number errn
        set AppleScript's text item delimiters to astid0
        error "_split(): " & errs number errn
    end try
    return tt
end _split

on _join(tt, d)
    (*
        list tt : source list
        string d : separator
        return string : tt joined with d
    *)
    local astid0, t
    try
        set astid0 to AppleScript's text item delimiters
        set AppleScript's text item delimiters to {} & d
        set t to "" & tt
        set AppleScript's text item delimiters to astid0
    on error errs number errn
        set AppleScript's text item delimiters to astid0
        error "_join(): " & errs number errn
    end try
    return t
end _join

on _gsub(t, s, r)
    (*
        string t, s, r : source string, search string, replace string
        return string : every occurence of s being replaced by r in t
    *)
    return _join(_split(t, s), r)
end _gsub

on unicode_codepoint_and_name(cc)
    (*
        list cc : character list
        return list : list of character, codepoint and name
        
        e.g., given cc = ("a", "人人"}, result =
            {"é U+00E9 LATIN SMALL LETTER E ACUTE;",
            "人人 U+4EBA CJK UNIFIED IDEOGRAPH-4EBA; U+4EBA CJK UNIFIED IDEOGRAPH-4EBA;"}
    *)
    set argv to ""
    repeat with c in cc
        set argv to argv & " " & c's quoted form
    end repeat
    do shell script "/usr/bin/perl -CSDA <<'EOF' - " & argv & "
use Unicode::UCD 'charinfo';
for (@ARGV) {
    print $_;
    for (split //) {
        my $ci = charinfo(ord $_);
        my ($code, $name, $uc10) = @{$ci}{'code', 'name', 'unicode10'};
        print qq( U+$code ) . ($uc10 ? $uc10 : $name) . qq(;);
    }
    print qq(\\n);
}
EOF"
    result's paragraphs
end unicode_codepoint_and_name

Reply