mingsai

Q: How can I use Automator to extract substring of text based on pattern?

I have inbound text in a workflow and I want to extract a substring from the text.

 

(inbound text)

 

我 wǒ 代 ① (指一人) (用作主语) I (用作宾语) me (表所属关系) my 告诉我 tell me 我为人人,人人为我 one for all and all for one 我爸/妈 my father/mother 我的祖国 my homeland 我现在没空。 I am busy at the moment. 我认为我行! I think I can manage it. ② (指两人或以上) (用作主语) we (用作宾语) us (表所属关系) our 我厂/国/校/军 our factory/country/school/army 敌军被我全歼。 The enemy was annihilated by us. → 我方, 敌我矛盾 ③ (表泛指) [used together with 你 in parallel structures] anyone 大家你一言,我一语,献计献策。 They had a brainstorming session with anyone and everyone joining in. 市场里你来我往非常热闹。 The market is bustling with people coming and going. → 尔虞我诈, 你死我活 ④ (指自我) self → 忘我, 自我

 

I basically only want the non-double byte characters between after the first character and the first occurrence of ① : (see sample)

 

代 ①

 

Using regex101.com I have been able to determine that this regex pattern should produce the required results but I need help getting these results into automator:

 

/ ([a-z].) /

MacBook Air (13-inch Mid 2012), Mac OS X (10.7.5), Love the mac (OSX 10.9+)

Posted on Sep 9, 2014 3:13 PM

Close

Q: How can I use Automator to extract substring of text based on pattern?

  • All replies
  • Helpful answers

first Previous Page 3 of 4 last Next
  • by Hiroto,

    Hiroto Hiroto Sep 18, 2014 6:30 AM in response to SGIII
    Level 5 (7,348 points)
    Sep 18, 2014 6:30 AM in response to SGIII

    Hello SG,

     

    You may save the script anywhere you want, not necessarily in /usr/local/bin, and call it by specifying its path.

     

    Anyway, you may do as follows to install it in /usr/local/bin, provided you saved the script as UTF-8 plain text file named "hanzi2pinyin" on desktop. Name extension is optional but none would be better for script used as user command.

     

     

    #!/bin/bash
    #
    #      install /usr/local/bin/hanzi2pinyin
    #
    #     1) save the script as utf8 plain text file in ~/desktop/hanzi2pinyin
    #     2) run the following commands in Terminal.app
    #     
    cd ~/desktop || exit
    chmod a+x hanzi2pinyin
    [[ -d /usr/local/bin ]] || sudo mkdir /usr/local/bin
    sudo cp -pP hanzi2pinyin /usr/local/bin
    

     

     

    You may manually type these in Terminal or you may save the above code as plain text file named "install.command" on desktop and double click it. Either way, script will ask you to enter your account password.

     

     

    Once the command is installed, applescript would be something like this.

     

    --set dictf to "/Library/Dictionaries/Simplified Chinese - English.dictionary"
    --set dictf to "/Library/Dictionaries/The Standard Dictionary of Contemporary Chinese.dictionary"
    --set dictf to "/Library/Dictionaries/小词典.dictionary"
    --set dictf to "/Library/Dictionaries/小词典-繁体字.dictionary"
    set dictf to "/Library/Dictionaries/CC-CEDICT.dictionary"
    
    set query to "悟空捣鬼花果山"
    --hanzi2pinyin(dictf, 10, 0, query)
    hanzi2pinyin(dictf, 10, 1, query)
    --hanzi2pinyin(dictf, 10, 2, query)
    
    on hanzi2pinyin(dictf, max_count, output_format, query)
         (*
              string dictf : POSIX path of dictionary file
              integer max_count : Max record count to retrieve
              integer output_format : Output format.
                      0 = interleaved : H[p] H[p]...
                      1 = separate    : H H...[p p...]
                      2 = pinyin only : p p...
              string query : query string
              return string : Hanzi[pinyin] in specified output format
         *)
         do shell script "d=" & dictf's quoted form & "; c=" & max_count & "; o=" & output_format & "
    /usr/local/bin/hanzi2pinyin -d \"$d\" -c \"$c\" -o \"$o\" -e -- " & query's quoted form
    end hanzi2pinyin
    

     

     

    Good luck,

    H

  • by SGIII,

    SGIII SGIII Sep 18, 2014 6:00 PM in response to Hiroto
    Level 6 (10,782 points)
    Mac OS X
    Sep 18, 2014 6:00 PM in response to Hiroto

    Hi H,

     

    Thanks for your expert clear instructions!

     

    I think I have installed properly. In terminal:

     

    $ cd /usr/local/bin/

    $ ls

     

    resulted in:

     

    hanzi2pinyin       pin      pip-2.7

     

    But running the AppleScript just kept running with no results the first time tried.  I force quit AppleScript Editor (though there was no message that it was not responding) and launched it again.  Now it throws off an error:

     

         error "An error of type 100035 has occurred." number 100035

     

    I realize it's tough to troubleshoot from a different version of OS X. But does that give you any clue as to what I might try next?

     

    (I checked to make sure I had CC-CEDICT.dictionary in /Library/Dictionaries)

     

    SG

  • by Hiroto,

    Hiroto Hiroto Sep 19, 2014 6:21 AM in response to SGIII
    Level 5 (7,348 points)
    Sep 19, 2014 6:21 AM in response to SGIII

    Hello SG,

     

    All I can say now is that error 100035 is a POSIX error EAGAIN (Resource temporarily unavailable).

     

    /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/Headers/MacErrors.h:  kPOSIXErrorEAGAIN             = 100035, /* Resource temporarily unavailable */
    
    /usr/include/sys/errno.h:#define    EAGAIN        35        /* Resource temporarily unavailable */
    

     

    I first suspect that the dictionary access is blocked by something introduced in later OSes. But it is also quite possible that DictionaryServices.framework functions have been updated in such a way that my script based upon 10.6.8 won't work under 10.9.

     

    All the best,

    H

  • by SGIII,

    SGIII SGIII Sep 19, 2014 9:14 AM in response to Hiroto
    Level 6 (10,782 points)
    Mac OS X
    Sep 19, 2014 9:14 AM in response to Hiroto

    Hi H,

     

    Yes, am guess this probably something to do with the increased security features.  Probably a way around them by changing permissions somewhere but, alas, I don't know enough to know where to look and what to do.  Tantalizingly close, though.

     

    SG

  • by mingsai,

    mingsai mingsai Sep 19, 2014 9:43 AM in response to Hiroto
    Level 1 (30 points)
    Sep 19, 2014 9:43 AM in response to Hiroto

    Hiroto,

     

    This script has taken the concept another step forward! Thanks for sharing! After reviewing the script in my OS X 10.10 environment and adjusting for changes in the resource paths, I have not been successful at getting it to work. The script logic did give me pause to consider the basic algorithm that we have been using.

     

    It seems to me that what would work best against any dictionary file is a query to produce structured data able to reliably handle the many dictionary variances (xml seems like a good possibility). Notwithstanding the fact that I don't currently know how to extract all dictionary definitions as xml, I was able to find the Unicode Han dictionary and apply a regex handler to always grab the first pinyin for each specific character (this dictionary doesn't have compound words so the transliterations are literal but not contextual, which can lead to some inaccuracies).

     

    The markers in this database are consistently formatted and because it's titled Unicode I suspect it has the adequate character range: Here's my regex to parse the definitions:

     

    ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:

    Screen Shot 2014-09-19 at 12.31.40 PM.png

    Screen Shot 2014-09-19 at 12.32.25 PM.png

     

    Explanation of the Regex:

     

    • /ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:/i
      • ReadingsMandarin: matches the characters ReadingsMandarin: literally (case insensitive)
      • 1st Capturing group (.+?)
        • .+? matches any character (except newline)
          • Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
      • (?:\,.*){0,1} Non-capturing group
        • Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
        • \, matches the character , literally
        • .* matches any character (except newline)
          • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
      • Cantonese: matches the characters Cantonese: literally (case insensitive)
      • 2nd Capturing group (.+?)
        • .+? matches any character (except newline)
          • Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
      • (?:\,.*){0,1} Non-capturing group
        • Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
        • \, matches the character , literally
        • .* matches any character (except newline)
          • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
      • On’yomi: matches the characters On’yomi: literally (case insensitive)
      • i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

     

     

    MATCH INFORMATION

     

  • by mingsai,

    mingsai mingsai Sep 19, 2014 10:16 AM in response to SGIII
    Level 1 (30 points)
    Sep 19, 2014 10:16 AM in response to SGIII

    Hi SGIII,

     

    I'm curious,

     

    If you execute the hanzi2pinyin in a terminal window, what results are produced?

  • by SGIII,

    SGIII SGIII Sep 19, 2014 7:15 PM in response to mingsai
    Level 6 (10,782 points)
    Mac OS X
    Sep 19, 2014 7:15 PM in response to mingsai

    When I execute in a terminal window, I get continual Fork ... Resource Temporarily Unavailable interspersed with what looks like the lines I had entered (it flashes by too quickly to see exactly).  In the end I had to force quit Terminal.

     

    SG

  • by Hiroto,

    Hiroto Hiroto Sep 23, 2014 1:49 AM in response to SGIII
    Level 5 (7,348 points)
    Sep 23, 2014 1:49 AM in response to SGIII

    Hello SG and mingsai,

     

    Just in case, here's minor update. I'm not sure if it helps but something tells me that the use of DCSCreateDictionary() function might be the cause of the reported error. So I refrained from calling it and instead used DCSCopyAvailableDictionaries() in this version. Also script is now entirely ruby script without using bash which had been used only for process substitution facility to create temporary bridgesupport file. Calling convention is the same as the previous version.

     

    Tested under 10.6.8. (Sorry for not testing this under later OSes, which I don't use.)

     

    Good luck,

    H

     

     

     

    hanzi2pinyin

     

    #!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w
    # coding: utf-8
    # 
    #   ARGV = options query [query ...]
    #       -d, --dictionary DICTIONARY      Dictionary file.
    #       -c, --count COUNT                Max record count to retrieve (=10).
    #       -o, --output FORMAT              Output format (=0).
    #                                          0 = interleaved : H[p] H[p]...
    #                                          1 = separate    : H H...[p p...]
    #                                          2 = pinyin only : p p...
    #       -e, --echo [CHARACTER]           Character(s) to be echoed for no result.
    #                                        Given no CHARACTER, query is echoed.
    #       -h, --help                       Display this help.
    # 
    #   v0.34d
    #   written by Hiroto, 2014-09
    # 
    #   v0.34d -
    #       using DCSActiveDictionaries() or DCSCopyAvailableDictionaries() and DCSDictionaryGetURL() to get the specified dictionary ref
    #           (instead of using DCSDicionaryCreate())
    #       using DCSGetActiveDictionaries() to get the default dictionary in case no -d option is specified.
    #           (DCSGetActiveDictionaries().first is the 1st dictionary in the preferences of Dictionary.app
    #           Previously, DCSGetDefaultDictionary() is used, which returns fixed dictionary regardless of the preferences order)
    # 
    #   v0.34 -
    #       pure ruby version
    #           without using bash's process substitution to create temporary bridgesupport file
    # 
    #       * this is noticeably faster than v0.33
    # 
    # 
    require 'optparse'
    require 'osx/cocoa'
    include OSX
    # OSX.require_framework '/System/Library/Frameworks/CoreServices.framework/Frameworks/DictionaryServices.framework'     # [1]
    
    while File.exist?(BSFILE = "/tmp/DictionaryServices.#{rand(1e6)}.bridgesupport") do end
    # while File.exist?(BSFILE =  File.expand_path("~/desktop/DictionaryServices.#{rand(1e6)}.bridgesupport")) do end
    
    Signal.trap("EXIT") { File.delete BSFILE if File.exist?(BSFILE) }
    
    File.open(BSFILE, "w") { |f| f.print DATA.read }
    OSX.load_bridge_support_file BSFILE     # [2]
    File.delete BSFILE if File.exist?(BSFILE)
    
    # -----------------------------------------------------
    #   * some DictionaryServices.framework functions (OS X 10.6.8)
    # 
    #   (undocumented)
    #   
    #   extern CFArrayRef DCSGetActiveDictionaries (void)
    #   extern CFSetRef DCSCopyAvailableDictionaries (void)
    #   extern DCSDictionaryRef DCSGetDefaultDictionary (void)
    #   extern DCSDictionaryRef DCSGetDefaultThesaurus (void)
    #   extern DCSDictionaryRef DCSDictionaryCreate (CFURLRef)
    #   extern CFURLRef DCSDictionaryGetURL (DCSDictionaryRef)
    #   extern CFStringRef DCSDictionaryGetName (DCSDictionaryRef)
    #   extern CFStringRef DCSDictionaryGetIdentifier (DCSDictionaryRef)
    #   
    #   extern CFArray DCSCopyRecordsForSearchString (DCSDictionaryRef, CFStringRef, unsigned long long, long long)
    #       unsigned long long method
    #           0   = exact match
    #           1   = forward match (prefix match)
    #           2   = partial query match (matching (leading) part of query; including ignoring diacritics, four tones in Chinese, etc)
    #           >=3 = ? (exact match?)
    #       
    #       long long max_record_count
    # 
    #   extern CFStringRef DCSRecordGetString (DCSRecordRef) 
    #   extern CFStringRef DCSRecordGetHeadword (DCSRecordRef) 
    #   extern CFStringRef DCSRecordGetRawHeadword (DCSRecordRef) 
    #   extern CFStringRef DCSRecordGetTitle (DCSRecordRef) 
    #   extern CFStringRef DCSRecordGetAnchor (DCSRecordRef) 
    #   extern CFURLRef DCSRecordGetDataURL (DCSRecordRef) 
    # 
    #   extern CFStringRef DCSRecordCopyData (DCSRecordRef, long)
    #       long output_style
    #           0 = XML XHTML <html> string
    #           1 = XML XHTML <html> string
    #           2 = XML XHTML <html> string
    #           3 = plain text
    #           4 = XML XHTML <text> string (single element)
    #       * corresponding to (?)
    #           Transform.xsl
    #           TransformApp.xsl
    #           TransformPanel.xsl
    #           TransformSimpleText.xsl
    #           TransformText.xsl
    # 
    #   (documented)
    #   
    #   CFStringRef DCSCopyTextDefinition (DCSDictionaryRef, CFStringRef, CFRange)
    #   CFRange DCSGetTermRangeInString (DCSDictionaryRef, CFStringRef, CFIndex)
    # 
    # -----------------------------------------------------
    
    def hanzi2pinyin(argv)
        # 
        #   argv = options query [query ...]
        #       -d, --dictionary DICTIONARY      Dictionary file.
        #       -c, --count COUNT                Max record count to retrieve (=10).
        #       -o, --output FORMAT              Output format (=0).
        #                                          0 = interleaved : H[p] H[p]...
        #                                          1 = separate    : H H...[p p...]
        #                                          2 = pinyin only : p p...
        #       -e, --echo [CHARACTER]           Character(s) to be echoed for no result.
        #                                        Given no CHARACTER, query is echoed.
        #       -h, --help                       Display this help.
        # 
        args = {
            :dictf  => nil,
            :count  => 10,
            :output => 0,
            :echo   => '',
        }
        op = OptionParser.new do|o|
            o.banner = "Usage: #{File.basename($0)} options query [query ...]"      
            o.on('-d', '--dictionary DICTIONARY', String, "Dictionary file.") do |f|
                args[:dictf] = f
            end
            o.on('-c', '--count COUNT', Integer, "Max record count to retrieve (=10).") do |i|
                raise OptionParser::InvalidArgument, i unless i.to_i > 0
                args[:count] = i.to_i
            end
            o.on('-o', '--output FORMAT', Integer, "Output format (=0).", 
                "  0 = interleaved : H[p] H[p]...", 
                "  1 = separate    : H H...[p p...]", 
                "  2 = pinyin only : p p...") do |i|
                raise OptionParser::InvalidArgument, i unless [0, 1, 2].include?(i.to_i)
                args[:output] = i.to_i
            end
            o.on('-e', '--echo [CHARACTER]', String, "Character(s) to be echoed for no result.",
                "Given no CHARACTER, query is echoed.") do |s|
                args[:echo] = s || ''
            end
            o.on( '-h', '--help', 'Display this help.' ) do
                $stderr.puts o; exit 1
            end
        end
        begin
            op.parse!(argv)
        rescue => ex
            $stderr.puts "#{ex.class} : #{ex.message}"
            $stderr.puts op.help(); exit 1
        end
        if argv.length == 0
            $stderr.puts op.help(); exit 1
        end
    
        if (dctf = args[:dictf])
            unless File.exists?(dctf)
                $stderr.puts "No such dictionary: %s" % dctf
                exit 1
            end
            url = NSURL.fileURLWithPath(dctf)
            # dct = DCSDictionaryCreate(url)
            # dct, = DCSGetActiveDictionaries().select { |d| DCSDictionaryGetURL(d).path == url.path }
            # dcts = DCSCopyAvailableDictionaries()
            dcts = dcts.allObjects if (dcts = DCSCopyAvailableDictionaries()).is_a? NSSet   # [5]
            dct, = dcts.select { |d| DCSDictionaryGetURL(d).path == url.path }
            unless dct
                $stderr.puts "Failed to get dictionary for: %s" % dctf
                exit 2
            end
        else
            # dct = DCSGetDefaultDictionary()
            dct, = DCSGetActiveDictionaries()
            unless dct
                $stderr.puts "Failed to get the 1st active dictionary"
                exit 2
            end
        end
    
        query_method = 0                # exact match
        max_record_count = args[:count] # max record count to be retrieved
        output_format = args[:output]   # output format option
                                        #   0 = interleaved : H[p] H[p]...
                                        #   1 = separate    : H H...[p p...]
                                        #   2 = pinyin only : p p...
                                        #
                                        # e.g., given query '我的母亲'
                                        #   0 => 我[wǒ] 的[de(dī,dí,dì)] 母亲[mǔqīn]
                                        #   1 => 我 的 母亲[wǒ de(dī,dí,dì) mǔqīn]
                                        #   2 => wǒ de(dī,dí,dì) mǔqīn
    
        trim_chars = "\t\n |"           # characters to be trimmed at both ends of pronunciation string
        echo_query = ''                 # special character to let it echo query if result is not found
        echo_char = args[:echo]         # character(s) to be echoed if no result is found for query
                                        # if echo_query is specified, query string is echoed for no result
        
        trim_chars_set = NSCharacterSet.characterSetWithCharactersInString(trim_chars)
        echo_ns = echo_char.to_ns
    
        argv.map {|a| a.to_ns }.each do |q|     # [3]
            dd = []
            while true do
                # 
                #   Until given query string (q) is exhausted, repeat as follows -
                #     get longest leading substring (qu) of the query string matching a term in dictionary,
                #     look the substring up in dictionary and retrieve title and pronunciation of the matching entry.
                # 
                u = DCSGetTermRangeInString(dct, q, 0)              # try to find longest leading range matching a term in dictionary
                u = NSMakeRange(0, 1) if u.location == KCFNotFound  # fallback [4]
                qu = q.substringWithRange(u)
                rr = DCSCopyRecordsForSearchString(dct, qu, query_method, max_record_count)
                unless rr
                    c = q.substringWithRange(NSMakeRange(0, 1))     # give up one character at the beginning
                    dd << [[c, echo_char == echo_query ? c : echo_ns]]
                    break if q.length < 2
                    q = q.substringFromIndex(1)
                else
                    tt, pp = [], {}
                    rr.each do |r|  # r = DCSRecordRef
                        # 
                        #   parse xml representation of record entry to get title and pronunciation
                        # 
                        xml = DCSRecordCopyData(r, 0)
                        err = OCObject.new
                        doc = NSXMLDocument.alloc.objc_send(
                            :initWithXMLString, xml,
                            :options, 0,
                            :error, err)
                        unless doc
                            $stderr.puts "Failed to obtain XML document for %s: %s" % [qu, err.description]
                            next
                        end
                        nn = doc.objc_send(
                            :nodesForXPath, '//d:entry/@d:title',   # d:title attribute
                            :error, nil)
                        title = nn && nn == [] ? echo_ns : nn.first.stringValue
                        nn = doc.objc_send(
                            :nodesForXPath, '//d:entry//span[@d:pr]',   # span element with d:pr attribute
                            :error, nil)
                        pron = nn && nn == [] ? echo_ns : nn.first.stringValue
                        pron = pron.stringByTrimmingCharactersInSet(trim_chars_set).
                            stringByReplacingOccurrencesOfString_withString(' ', '').lowercaseString
                        
                        tt << title unless tt.include?(title)
                        title_s = title.to_s    # for use as hash key in ruby
                        if not pp.key?(title_s)
                            pp[title_s] = [pron]
                        elsif not pp[title_s].include?(pron)
                            pp[title_s] << pron
                        end
                    end
                    # 
                    #   Let query_{k} denote sub-query for k-th substring defined by range u,
                    #       title_{k,i} denote i-th found title for query_{k},
                    #       pron_{k,i,j} denote j-th pronunciation for title_{k,i};
                    # 
                    #   array cc_k holds each collection of pronunciations per tile_{k,i} found for query_{k}:
                    #       cc_k    = [c_{k,1}, c_{k,2}, ...]
                    #       c_{k,i} = [ title_{k,i},  pron_{k,i,1} *1( '(' pron_{k,i,2} ',' pron_{k,i,3} ',' ... ')' ) ]
                    # 
                    #   array dd holds list of cc_k for every query_{k}
                    #       dd  = [cc_1, cc_2, ...]
                    # 
                    cc_k = tt.map do |t|
                        a = pp[t.to_s]
                        [t,  a.shift + (a == [] ? '' : "(%s)" % a.join(','))]
                    end
                    dd << cc_k
    
                    k = u.location + u.length
                    break unless k < q.length
                    q = q.substringFromIndex(k)
                end
            end
    
            case output_format
            #   0 = interleaved : H[p] H[p]...
            #   1 = separate    : H H...[p p...]
            #   2 = pinyin only : p p...
            when 0
                ee = dd.map do |cc|
                    next '' if cc == []
                    ("%s[%s]" % cc.shift) + (cc == [] ? '' : "(%s)" % cc.map {|c| "%s[%s]" % c}.join(','))  
                end
                puts ee.join(' ')
            when 1
                aa = dd.map do |cc|
                    a, b = cc.transpose
                    next '' unless a
                    (a.shift) + (a == [] ? '' : "(%s)" % a.join(','))   
                end
                bb = dd.map do |cc|
                    a, b = cc.transpose
                    next '' unless b
                    (b.shift) + (b == [] ? '' : "(%s)" % b.join(','))   
                end
                puts "%s[%s]" % [aa.join(' '), bb.join(' ')]
            when 2
                bb = dd.map do |cc|
                    a, b = cc.transpose
                    next '' unless b
                    (b.shift) + (b == [] ? '' : "(%s)" % b.join(','))   
                end
                puts bb.join(' ')
            end
        end
    end
    
    hanzi2pinyin(ARGV)
    
    # 
    #   [1] DictionaryServices.framework/Resources/BridgeSupport/DictionaryServices.bridgesupport has problem to be fixed.
    #       I.e., in signatures of DCSCopyTextDefinition(), DCSGetTermRangeInString() function etc,
    #           {??=qq} should have been {_CFRange=qq}
    #           {??=ii} should have been {_CFRange=ii}
    #   [2] Fixed and extended bridgesupport file is loaded by OSX.load_bridge_support_file.
    #       It now includes signatures for several undocumented functions as well.
    #   [3] argv.to_ns is required to handle unicode characters correctly (in ruby 1.8).
    #   [4] DCSGetTermRangeInString(dct, q, 0) returning range [KCFNotFound, 0] does not necessarily mean q's 1st character
    #       as query may not match any term in dictionary.  It is necessary to use DCSCopyRecordsForSearchString() 
    #       for the 1st character in order to know the (existence of) matching term(s).
    #   [5] DCSCopyAvailableDictionaries() returns CFSetRef under 10.6.8 but may return CFArrayRef in later OSes.
    # 
    
    
    __END__
    <?xml version="1.0" standalone="yes"?>
    <!DOCTYPE signatures SYSTEM "file://localhost/System/Library/DTDs/BridgeSupport.dtd">
    <signatures version="0.9">
        <function name="DCSCopyTextDefinition">
            <arg type="^{__DCSDictionary=}"></arg>
            <arg type="^{__CFString=}"></arg>
            <arg type64="{_CFRange=qq}" type="{_CFRange=ii}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSGetTermRangeInString">
            <arg type="^{__DCSDictionary=}"></arg>
            <arg type="^{__CFString=}"></arg>
            <arg type64="q" type="l"></arg>
            <retval type64="{_CFRange=qq}" type="{_CFRange=ii}"></retval>
        </function>
        <function name="DCSDictionaryCreate">
            <arg type="^{__CFURL=}"></arg>
            <retval type="^{__DCSDictionary=}"></retval>
        </function>
        <function name="DCSGetActiveDictionaries">
            <retval type="^{__CFArray=}"></retval>
        </function>
        <function name="DCSCopyAvailableDictionaries">
            <retval type="^{__CFSet=}"></retval>
        </function>
        <function name="DCSGetDefaultDictionary">
            <retval type="^{__DCSDictionary=}"></retval>
        </function>
        <function name="DCSGetDefaultThesaurus">
            <retval type="^{__DCSDictionary=}"></retval>
        </function>
        <function name="DCSDictionaryGetURL">
            <arg type="^{__DCSDictionary=}"></arg>
            <retval type="^{__CFURL=}"></retval>
        </function>
        <function name="DCSDictionaryGetName">
            <arg type="^{__DCSDictionary=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSDictionaryGetIdentifier">
            <arg type="^{__DCSDictionary=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSCopyRecordsForSearchString">
            <arg type="^{__DCSDictionary=}"></arg>
            <arg type="^{__CFString=}"></arg>
            <arg type="l"></arg>
            <arg type="l"></arg>
            <retval type="^{__CFArray=}"></retval>
        </function>
        <function name="DCSRecordGetHeadword">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSRecordGetString">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSRecordGetRawHeadword">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSRecordGetTitle">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSRecordGetAnchor">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFString=}"></retval>
        </function>
        <function name="DCSRecordGetDataURL">
            <arg type="^{__DCSRecord=}"></arg>
            <retval type="^{__CFURL=}"></retval>
        </function>
        <function name="DCSRecordCopyData">
            <arg type="^{__DCSRecord=}"></arg>
             <arg type="l"></arg>
           <retval type="^{__CFString=}"></retval>
        </function>
    </signatures>
    
  • by SGIII,

    SGIII SGIII Sep 24, 2014 6:21 PM in response to Hiroto
    Level 6 (10,782 points)
    Mac OS X
    Sep 24, 2014 6:21 PM in response to Hiroto

    Hi H,

     

    Thanks so much for all the time you've spent on this.

     

    The AppleScript no longer seems to be in an endless loop on my machine.

     

    Here's what appears in the Replies panel of AppleScript Editor:

     

    Screen Shot 2014-09-24 at 9.14.27 PM.png

     

    If you have any ideas where I should look now, I'll give it a shot.

     

    SG

  • by Hiroto,

    Hiroto Hiroto Sep 25, 2014 7:27 AM in response to SGIII
    Level 5 (7,348 points)
    Sep 25, 2014 7:27 AM in response to SGIII

    Hello SG,

     

    I suspect your hanzi2pinyin script starts with the line 'hanzi2pinyin'. But it should have started with the she-bang line:

     

    #!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w
    

     

     

    The 'hanzi2pinyin' I put before the source code in the post(s) above is just a label in text message and not the part of source code.

     

    All the best,

    H

  • by SGIII,

    SGIII SGIII Sep 25, 2014 8:52 AM in response to Hiroto
    Level 6 (10,782 points)
    Mac OS X
    Sep 25, 2014 8:52 AM in response to Hiroto

    Hi H,

     

    Yes that was it, an extraneous line. After correcting hanzi2pinyin to start with the shebang line, running your AppleScript example (using all three parameters) produces the expected results in Mavericks... and presumably will in Yosemite too.  This is great, because it spaces the pinyin into "words."

     

    Thanks so much.

     

    SG

  • by mingsai,

    mingsai mingsai Sep 25, 2014 9:40 AM in response to Hiroto
    Level 1 (30 points)
    Sep 25, 2014 9:40 AM in response to Hiroto

    Hi Hiroto,

     

    Thanks for posting the update. I am running Yosemite (currently at Beta 8) and seeing the following issue after I adjust the path for my ruby installation.

     

    /usr/local/bin/h2p:277: warning: shadowing outer local variable - c

    /usr/local/bin/h2p:282: warning: assigned but unused variable - b

    /usr/local/bin/h2p:287: warning: assigned but unused variable - a

    /usr/local/bin/h2p:294: warning: assigned but unused variable - a

    /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyge ms/core_ext/kernel_require.rb:55:in `require': cannot load such file -- osx/cocoa (LoadError)

      from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyg ems/core_ext/kernel_require.rb:55:in `require'

      from /usr/local/bin/h2p:33:in `<main>'

  • by Hiroto,

    Hiroto Hiroto Sep 25, 2014 10:34 AM in response to SGIII
    Level 5 (7,348 points)
    Sep 25, 2014 10:34 AM in response to SGIII

    My pleasure! I'm really glad to hear it worked at last!

     

    Hiroto

  • by Hiroto,

    Hiroto Hiroto Sep 25, 2014 10:41 AM in response to mingsai
    Level 5 (7,348 points)
    Sep 25, 2014 10:41 AM in response to mingsai

    Hello mingsai

     

    The core part of the script is written in RubyCocoa which only works with Ruby 1.8. The default Ruby under OSX 10.9 or later is Ruby 2.0 or later and that is why I specified the full path of Ruby 1.8 interpreter in my script.

     

    So please specify the full path of Ruby 1.8 under 10.10. If there's no Ruby 1.8 under OSX 10.10, RubyCocoa script won't work and you'd need to translate the script to C or Objective-C proper. (Or it would be possible to call the C functions by using DL module in Ruby if you wish.)

     

    Good luck,

    H

  • by Hiroto,

    Hiroto Hiroto Sep 25, 2014 12:05 PM in response to Hiroto
    Level 5 (7,348 points)
    Sep 25, 2014 12:05 PM in response to Hiroto
first Previous Page 3 of 4 last Next