Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

How can I use Automator to extract substring of text based on pattern?

I have inbound text in a workflow and I want to extract a substring from the text.


(inbound text)


我 wǒ 代 ① (指一人) (用作主语) I (用作宾语) me (表所属关系) my 告诉我 tell me 我为人人,人人为我 one for all and all for one 我爸/妈 my father/mother 我的祖国 my homeland 我现在没空。 I am busy at the moment. 我认为我行! I think I can manage it. ② (指两人或以上) (用作主语) we (用作宾语) us (表所属关系) our 我厂/国/校/军 our factory/country/school/army 敌军被我全歼。 The enemy was annihilated by us. → 我方, 敌我矛盾 ③ (表泛指) [used together with 你 in parallel structures] anyone 大家你一言,我一语,献计献策。 They had a brainstorming session with anyone and everyone joining in. 市场里你来我往非常热闹。 The market is bustling with people coming and going. → 尔虞我诈, 你死我活 ④ (指自我) self → 忘我, 自我


I basically only want the non-double byte characters between after the first character and the first occurrence of ① : (see sample)


代 ①


Using regex101.com I have been able to determine that this regex pattern should produce the required results but I need help getting these results into automator:


/ ([a-z].) /

MacBook Air (13-inch Mid 2012), Mac OS X (10.7.5), Love the mac (OSX 10.9+)

Posted on Sep 9, 2014 2:59 PM

Reply
Question marked as Best reply

Posted on Sep 9, 2014 7:24 PM

If your pinyin substring never has spaces in it, then you can use AppleScript to extract it like this:


User uploaded file



This is the script (with the sample input). Copy/paste into AppleScript Editor and click the green triangle 'Run' button:



set input to " wǒ (指一人) (用作主语) I (用作宾语) me (表所属关系) my 告诉我 tell me 我为人人,人人为我 one for all and all for one 我爸/ my father/mother 我的祖国 my homeland 我现在没空。 I am busy at the moment. 我认为我行! I think I can manage it. (指两人或以上) (用作主语) we (用作宾语) us (表所属关系) our 我厂/// our factory/country/school/army 敌军被我全歼。 The enemy was annihilated by us. 我方, 敌我矛盾 (表泛指) [used together with in parallel structures] anyone 大家你一言,我一语,献计献策。 They had a brainstorming session with anyone and everyone joining in. 市场里你来我往非常热闹。 The market is bustling with people coming and going. 尔虞我诈, 你死我活 (指自我) self 忘我, 自我"


set pyCharSet to {"a", "ā", "á", "ǎ", "à", "b", "c", "d", "e", "ē", "é", "ě", "è", "f", "g", "h", "i", "ī", "í", "ǐ", "ì", "j", "k", "l", "m", "n", "o", "ō", "ó", "ǒ", "ò", "p", "q", "r", "s", "t", "u", "ū", "ú", "ǔ", "ù", "ǖ", "ǚ", "ǜ", "w", "x", "y", "z"}

set {oTID, AppleScript'stext item delimiters} to {AppleScript'stext item delimiters, ""}

set cc to (input as string)'s text items's item 1's characters 2 thru -1

set py to ""

repeat with c in cc

if c is in pyCharSet then set py to py & c

end repeat

set AppleScript'stext item delimiters to oTID

return py



The above view is AppleScript Editor, not Automator. What do you plan to do with the results in Automator?


SG

51 replies

Sep 18, 2014 6:00 PM in response to Hiroto

Hi H,


Thanks for your expert clear instructions!


I think I have installed properly. In terminal:


$ cd /usr/local/bin/

$ ls


resulted in:


hanzi2pinyin pin pip-2.7


But running the AppleScript just kept running with no results the first time tried. I force quit AppleScript Editor (though there was no message that it was not responding) and launched it again. Now it throws off an error:


error "An error of type 100035 has occurred." number 100035


I realize it's tough to troubleshoot from a different version of OS X. But does that give you any clue as to what I might try next?


(I checked to make sure I had CC-CEDICT.dictionary in /Library/Dictionaries)


SG

Sep 19, 2014 6:21 AM in response to SGIII

Hello SG,


All I can say now is that error 100035 is a POSIX error EAGAIN (Resource temporarily unavailable).


/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/Headers/MacErrors.h:  kPOSIXErrorEAGAIN             = 100035, /* Resource temporarily unavailable */

/usr/include/sys/errno.h:#define    EAGAIN        35        /* Resource temporarily unavailable */


I first suspect that the dictionary access is blocked by something introduced in later OSes. But it is also quite possible that DictionaryServices.framework functions have been updated in such a way that my script based upon 10.6.8 won't work under 10.9.


All the best,

H

Sep 19, 2014 9:43 AM in response to Hiroto

Hiroto,


This script has taken the concept another step forward! Thanks for sharing! After reviewing the script in my OS X 10.10 environment and adjusting for changes in the resource paths, I have not been successful at getting it to work. The script logic did give me pause to consider the basic algorithm that we have been using.


It seems to me that what would work best against any dictionary file is a query to produce structured data able to reliably handle the many dictionary variances (xml seems like a good possibility). Notwithstanding the fact that I don't currently know how to extract all dictionary definitions as xml, I was able to find the Unicode Han dictionary and apply a regex handler to always grab the first pinyin for each specific character (this dictionary doesn't have compound words so the transliterations are literal but not contextual, which can lead to some inaccuracies).


The markers in this database are consistently formatted and because it's titled Unicode I suspect it has the adequate character range: Here's my regex to parse the definitions:


ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:

User uploaded file

User uploaded file


Explanation of the Regex:


/ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:/i

  • ReadingsMandarin: matches the characters ReadingsMandarin: literally (case insensitive)
  • 1st Capturing group (.+?)
    • .+? matches any character (except newline)
      • Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
  • (?:\,.*){0,1} Non-capturing group
    • Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
    • \, matches the character , literally
    • .* matches any character (except newline)
      • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • Cantonese: matches the characters Cantonese: literally (case insensitive)
  • 2nd Capturing group (.+?)
    • .+? matches any character (except newline)
      • Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
  • (?:\,.*){0,1} Non-capturing group
    • Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
    • \, matches the character , literally
    • .* matches any character (except newline)
      • Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
  • On’yomi: matches the characters On’yomi: literally (case insensitive)
  • i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])



MATCH INFORMATION


Sep 23, 2014 1:49 AM in response to SGIII

Hello SG and mingsai,


Just in case, here's minor update. I'm not sure if it helps but something tells me that the use of DCSCreateDictionary() function might be the cause of the reported error. So I refrained from calling it and instead used DCSCopyAvailableDictionaries() in this version. Also script is now entirely ruby script without using bash which had been used only for process substitution facility to create temporary bridgesupport file. Calling convention is the same as the previous version.


Tested under 10.6.8. (Sorry for not testing this under later OSes, which I don't use.)


Good luck,

H




hanzi2pinyin


#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w # coding: utf-8 # # ARGV = options query [query ...] # -d, --dictionary DICTIONARY Dictionary file. # -c, --count COUNT Max record count to retrieve (=10). # -o, --output FORMAT Output format (=0). # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # -e, --echo [CHARACTER] Character(s) to be echoed for no result. # Given no CHARACTER, query is echoed. # -h, --help Display this help. # # v0.34d # written by Hiroto, 2014-09 # # v0.34d - # using DCSActiveDictionaries() or DCSCopyAvailableDictionaries() and DCSDictionaryGetURL() to get the specified dictionary ref # (instead of using DCSDicionaryCreate()) # using DCSGetActiveDictionaries() to get the default dictionary in case no -d option is specified. # (DCSGetActiveDictionaries().first is the 1st dictionary in the preferences of Dictionary.app # Previously, DCSGetDefaultDictionary() is used, which returns fixed dictionary regardless of the preferences order) # # v0.34 - # pure ruby version # without using bash's process substitution to create temporary bridgesupport file # # * this is noticeably faster than v0.33 # # require 'optparse' require 'osx/cocoa' include OSX # OSX.require_framework '/System/Library/Frameworks/CoreServices.framework/Frameworks/DictionaryServices.framework' # [1] while File.exist?(BSFILE = "/tmp/DictionaryServices.#{rand(1e6)}.bridgesupport") do end # while File.exist?(BSFILE = File.expand_path("~/desktop/DictionaryServices.#{rand(1e6)}.bridgesupport")) do end Signal.trap("EXIT") { File.delete BSFILE if File.exist?(BSFILE) } File.open(BSFILE, "w") { |f| f.print DATA.read } OSX.load_bridge_support_file BSFILE # [2] File.delete BSFILE if File.exist?(BSFILE) # ----------------------------------------------------- # * some DictionaryServices.framework functions (OS X 10.6.8) # # (undocumented) # # extern CFArrayRef DCSGetActiveDictionaries (void) # extern CFSetRef DCSCopyAvailableDictionaries (void) # extern DCSDictionaryRef DCSGetDefaultDictionary (void) # extern DCSDictionaryRef DCSGetDefaultThesaurus (void) # extern DCSDictionaryRef DCSDictionaryCreate (CFURLRef) # extern CFURLRef DCSDictionaryGetURL (DCSDictionaryRef) # extern CFStringRef DCSDictionaryGetName (DCSDictionaryRef) # extern CFStringRef DCSDictionaryGetIdentifier (DCSDictionaryRef) # # extern CFArray DCSCopyRecordsForSearchString (DCSDictionaryRef, CFStringRef, unsigned long long, long long) # unsigned long long method # 0 = exact match # 1 = forward match (prefix match) # 2 = partial query match (matching (leading) part of query; including ignoring diacritics, four tones in Chinese, etc) # >=3 = ? (exact match?) # # long long max_record_count # # extern CFStringRef DCSRecordGetString (DCSRecordRef) # extern CFStringRef DCSRecordGetHeadword (DCSRecordRef) # extern CFStringRef DCSRecordGetRawHeadword (DCSRecordRef) # extern CFStringRef DCSRecordGetTitle (DCSRecordRef) # extern CFStringRef DCSRecordGetAnchor (DCSRecordRef) # extern CFURLRef DCSRecordGetDataURL (DCSRecordRef) # # extern CFStringRef DCSRecordCopyData (DCSRecordRef, long) # long output_style # 0 = XML XHTML <html> string # 1 = XML XHTML <html> string # 2 = XML XHTML <html> string # 3 = plain text # 4 = XML XHTML <text> string (single element) # * corresponding to (?) # Transform.xsl # TransformApp.xsl # TransformPanel.xsl # TransformSimpleText.xsl # TransformText.xsl # # (documented) # # CFStringRef DCSCopyTextDefinition (DCSDictionaryRef, CFStringRef, CFRange) # CFRange DCSGetTermRangeInString (DCSDictionaryRef, CFStringRef, CFIndex) # # ----------------------------------------------------- def hanzi2pinyin(argv) # # argv = options query [query ...] # -d, --dictionary DICTIONARY Dictionary file. # -c, --count COUNT Max record count to retrieve (=10). # -o, --output FORMAT Output format (=0). # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # -e, --echo [CHARACTER] Character(s) to be echoed for no result. # Given no CHARACTER, query is echoed. # -h, --help Display this help. # args = { :dictf => nil, :count => 10, :output => 0, :echo => '', } op = OptionParser.new do|o| o.banner = "Usage: #{File.basename($0)} options query [query ...]" o.on('-d', '--dictionary DICTIONARY', String, "Dictionary file.") do |f| args[:dictf] = f end o.on('-c', '--count COUNT', Integer, "Max record count to retrieve (=10).") do |i| raise OptionParser::InvalidArgument, i unless i.to_i > 0 args[:count] = i.to_i end o.on('-o', '--output FORMAT', Integer, "Output format (=0).", " 0 = interleaved : H[p] H[p]...", " 1 = separate : H H...[p p...]", " 2 = pinyin only : p p...") do |i| raise OptionParser::InvalidArgument, i unless [0, 1, 2].include?(i.to_i) args[:output] = i.to_i end o.on('-e', '--echo [CHARACTER]', String, "Character(s) to be echoed for no result.", "Given no CHARACTER, query is echoed.") do |s| args[:echo] = s || '' end o.on( '-h', '--help', 'Display this help.' ) do $stderr.puts o; exit 1 end end begin op.parse!(argv) rescue => ex $stderr.puts "#{ex.class} : #{ex.message}" $stderr.puts op.help(); exit 1 end if argv.length == 0 $stderr.puts op.help(); exit 1 end if (dctf = args[:dictf]) unless File.exists?(dctf) $stderr.puts "No such dictionary: %s" % dctf exit 1 end url = NSURL.fileURLWithPath(dctf) # dct = DCSDictionaryCreate(url) # dct, = DCSGetActiveDictionaries().select { |d| DCSDictionaryGetURL(d).path == url.path } # dcts = DCSCopyAvailableDictionaries() dcts = dcts.allObjects if (dcts = DCSCopyAvailableDictionaries()).is_a? NSSet # [5] dct, = dcts.select { |d| DCSDictionaryGetURL(d).path == url.path } unless dct $stderr.puts "Failed to get dictionary for: %s" % dctf exit 2 end else # dct = DCSGetDefaultDictionary() dct, = DCSGetActiveDictionaries() unless dct $stderr.puts "Failed to get the 1st active dictionary" exit 2 end end query_method = 0 # exact match max_record_count = args[:count] # max record count to be retrieved output_format = args[:output] # output format option # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # # e.g., given query '我的母亲' # 0 => 我[wǒ] 的[de(dī,dí,dì)] 母亲[mǔqīn] # 1 => 我 的 母亲[wǒ de(dī,dí,dì) mǔqīn] # 2 => wǒ de(dī,dí,dì) mǔqīn trim_chars = "\t\n |" # characters to be trimmed at both ends of pronunciation string echo_query = '' # special character to let it echo query if result is not found echo_char = args[:echo] # character(s) to be echoed if no result is found for query # if echo_query is specified, query string is echoed for no result trim_chars_set = NSCharacterSet.characterSetWithCharactersInString(trim_chars) echo_ns = echo_char.to_ns argv.map {|a| a.to_ns }.each do |q| # [3] dd = [] while true do # # Until given query string (q) is exhausted, repeat as follows - # get longest leading substring (qu) of the query string matching a term in dictionary, # look the substring up in dictionary and retrieve title and pronunciation of the matching entry. # u = DCSGetTermRangeInString(dct, q, 0) # try to find longest leading range matching a term in dictionary u = NSMakeRange(0, 1) if u.location == KCFNotFound # fallback [4] qu = q.substringWithRange(u) rr = DCSCopyRecordsForSearchString(dct, qu, query_method, max_record_count) unless rr c = q.substringWithRange(NSMakeRange(0, 1)) # give up one character at the beginning dd << [[c, echo_char == echo_query ? c : echo_ns]] break if q.length < 2 q = q.substringFromIndex(1) else tt, pp = [], {} rr.each do |r| # r = DCSRecordRef # # parse xml representation of record entry to get title and pronunciation # xml = DCSRecordCopyData(r, 0) err = OCObject.new doc = NSXMLDocument.alloc.objc_send( :initWithXMLString, xml, :options, 0, :error, err) unless doc $stderr.puts "Failed to obtain XML document for %s: %s" % [qu, err.description] next end nn = doc.objc_send( :nodesForXPath, '//d:entry/@d:title', # d:title attribute :error, nil) title = nn && nn == [] ? echo_ns : nn.first.stringValue nn = doc.objc_send( :nodesForXPath, '//d:entry//span[@d:pr]', # span element with d:pr attribute :error, nil) pron = nn && nn == [] ? echo_ns : nn.first.stringValue pron = pron.stringByTrimmingCharactersInSet(trim_chars_set). stringByReplacingOccurrencesOfString_withString(' ', '').lowercaseString tt << title unless tt.include?(title) title_s = title.to_s # for use as hash key in ruby if not pp.key?(title_s) pp[title_s] = [pron] elsif not pp[title_s].include?(pron) pp[title_s] << pron end end # # Let query_{k} denote sub-query for k-th substring defined by range u, # title_{k,i} denote i-th found title for query_{k}, # pron_{k,i,j} denote j-th pronunciation for title_{k,i}; # # array cc_k holds each collection of pronunciations per tile_{k,i} found for query_{k}: # cc_k = [c_{k,1}, c_{k,2}, ...] # c_{k,i} = [ title_{k,i}, pron_{k,i,1} *1( '(' pron_{k,i,2} ',' pron_{k,i,3} ',' ... ')' ) ] # # array dd holds list of cc_k for every query_{k} # dd = [cc_1, cc_2, ...] # cc_k = tt.map do |t| a = pp[t.to_s] [t, a.shift + (a == [] ? '' : "(%s)" % a.join(','))] end dd << cc_k k = u.location + u.length break unless k < q.length q = q.substringFromIndex(k) end end case output_format # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... when 0 ee = dd.map do |cc| next '' if cc == [] ("%s[%s]" % cc.shift) + (cc == [] ? '' : "(%s)" % cc.map {|c| "%s[%s]" % c}.join(',')) end puts ee.join(' ') when 1 aa = dd.map do |cc| a, b = cc.transpose next '' unless a (a.shift) + (a == [] ? '' : "(%s)" % a.join(',')) end bb = dd.map do |cc| a, b = cc.transpose next '' unless b (b.shift) + (b == [] ? '' : "(%s)" % b.join(',')) end puts "%s[%s]" % [aa.join(' '), bb.join(' ')] when 2 bb = dd.map do |cc| a, b = cc.transpose next '' unless b (b.shift) + (b == [] ? '' : "(%s)" % b.join(',')) end puts bb.join(' ') end end end hanzi2pinyin(ARGV) # # [1] DictionaryServices.framework/Resources/BridgeSupport/DictionaryServices.bridgesupport has problem to be fixed. # I.e., in signatures of DCSCopyTextDefinition(), DCSGetTermRangeInString() function etc, # {??=qq} should have been {_CFRange=qq} # {??=ii} should have been {_CFRange=ii} # [2] Fixed and extended bridgesupport file is loaded by OSX.load_bridge_support_file. # It now includes signatures for several undocumented functions as well. # [3] argv.to_ns is required to handle unicode characters correctly (in ruby 1.8). # [4] DCSGetTermRangeInString(dct, q, 0) returning range [KCFNotFound, 0] does not necessarily mean q's 1st character # as query may not match any term in dictionary. It is necessary to use DCSCopyRecordsForSearchString() # for the 1st character in order to know the (existence of) matching term(s). # [5] DCSCopyAvailableDictionaries() returns CFSetRef under 10.6.8 but may return CFArrayRef in later OSes. # __END__ <?xml version="1.0" standalone="yes"?> <!DOCTYPE signatures SYSTEM "file://localhost/System/Library/DTDs/BridgeSupport.dtd"> <signatures version="0.9"> <function name="DCSCopyTextDefinition"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type64="{_CFRange=qq}" type="{_CFRange=ii}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSGetTermRangeInString"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type64="q" type="l"></arg> <retval type64="{_CFRange=qq}" type="{_CFRange=ii}"></retval> </function> <function name="DCSDictionaryCreate"> <arg type="^{__CFURL=}"></arg> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSGetActiveDictionaries"> <retval type="^{__CFArray=}"></retval> </function> <function name="DCSCopyAvailableDictionaries"> <retval type="^{__CFSet=}"></retval> </function> <function name="DCSGetDefaultDictionary"> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSGetDefaultThesaurus"> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSDictionaryGetURL"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFURL=}"></retval> </function> <function name="DCSDictionaryGetName"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSDictionaryGetIdentifier"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSCopyRecordsForSearchString"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type="l"></arg> <arg type="l"></arg> <retval type="^{__CFArray=}"></retval> </function> <function name="DCSRecordGetHeadword"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetString"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetRawHeadword"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetTitle"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetAnchor"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetDataURL"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFURL=}"></retval> </function> <function name="DCSRecordCopyData"> <arg type="^{__DCSRecord=}"></arg> <arg type="l"></arg> <retval type="^{__CFString=}"></retval> </function> </signatures>

Sep 25, 2014 7:27 AM in response to SGIII

Hello SG,


I suspect your hanzi2pinyin script starts with the line 'hanzi2pinyin'. But it should have started with the she-bang line:


#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w



The 'hanzi2pinyin' I put before the source code in the post(s) above is just a label in text message and not the part of source code. 🙂


All the best,

H

Sep 25, 2014 8:52 AM in response to Hiroto

Hi H,


Yes that was it, an extraneous line. After correcting hanzi2pinyin to start with the shebang line, running your AppleScript example (using all three parameters) produces the expected results in Mavericks... and presumably will in Yosemite too.🙂 This is great, because it spaces the pinyin into "words."


Thanks so much.


SG

Sep 25, 2014 9:40 AM in response to Hiroto

Hi Hiroto,


Thanks for posting the update. I am running Yosemite (currently at Beta 8) and seeing the following issue after I adjust the path for my ruby installation.


/usr/local/bin/h2p:277: warning: shadowing outer local variable - c

/usr/local/bin/h2p:282: warning: assigned but unused variable - b

/usr/local/bin/h2p:287: warning: assigned but unused variable - a

/usr/local/bin/h2p:294: warning: assigned but unused variable - a

/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyge ms/core_ext/kernel_require.rb:55:in `require': cannot load such file -- osx/cocoa (LoadError)

from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyg ems/core_ext/kernel_require.rb:55:in `require'

from /usr/local/bin/h2p:33:in `<main>'

Sep 25, 2014 10:41 AM in response to mingsai

Hello mingsai


The core part of the script is written in RubyCocoa which only works with Ruby 1.8. The default Ruby under OSX 10.9 or later is Ruby 2.0 or later and that is why I specified the full path of Ruby 1.8 interpreter in my script.


So please specify the full path of Ruby 1.8 under 10.10. If there's no Ruby 1.8 under OSX 10.10, RubyCocoa script won't work and you'd need to translate the script to C or Objective-C proper. (Or it would be possible to call the C functions by using DL module in Ruby if you wish.)


Good luck,

H

Sep 25, 2014 12:28 PM in response to Hiroto

Thanks for the update. I haven't tried the new rubycocoa yet but I was able to workaround the issue by copying over the prior version of the Ruby Frameworks into my system and pointing to the original source in the script. This enabled me to validate that the original script does work on Yosemite (beta 8). The other dictionaries did not return good results but the first item produced the desired results.

onrun {input}


set dictf to "/Library/Dictionaries/Simplified Chinese - English.dictionary"


--set dictf to "/Library/Dictionaries/The Standard Dictionary of Contemporary Chinese.dictionary"


--set dictf to "/Library/Dictionaries/小词典.dictionary"


--set dictf to "/Library/Dictionaries/unihan.dictionary"


--set dictf to "/Library/Dictionaries/小词典-繁体字.dictionary"


--set dictf to "/Library/Dictionaries/CC-CEDICT.dictionary"


set query to input as Unicode text


--hanzi2pinyin(dictf, 10, 0, query)


--hanzi2pinyin(dictf, 10, 1, query)


set pinyinText to hanzi2pinyin(dictf, 10, 2, query)




input & pinyinText

endrun

on hanzi2pinyin(dictf, max_count, output_format, query)


(*

string dictf : POSIX path of dictionary file

integer max_count : Max record count to retrieve

integer output_format : Output format.

0 = interleaved : H[p] H[p]...

1 = separate : H H...[p p...]

2 = pinyin only : p p...

string query : query string

return string : Hanzi[pinyin] in specified output format

*)

do shell script "d=" & dictf's quoted form & "; c=" & max_count & "; o=" & output_format & "

/usr/local/bin/h2p -d \"$d\" -c \"$c\" -o \"$o\" -e -- " & query's quoted form

end hanzi2pinyin

How can I use Automator to extract substring of text based on pattern?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.