How can I use Automator to extract substring of text based on pattern?

Question

Level 1

30 points

How can I use Automator to extract substring of text based on pattern?

I have inbound text in a workflow and I want to extract a substring from the text.

(inbound text)

我 wǒ 代 ① （指一人）（用作主语） I （用作宾语） me （表所属关系） my 告诉我 tell me 我为人人，人人为我 one for all and all for one 我爸/妈 my father/mother 我的祖国 my homeland 我现在没空。 I am busy at the moment. 我认为我行！ I think I can manage it. ② （指两人或以上）（用作主语） we （用作宾语） us （表所属关系） our 我厂/国/校/军 our factory/country/school/army 敌军被我全歼。 The enemy was annihilated by us. → 我方, 敌我矛盾 ③ （表泛指） [used together with 你 in parallel structures] anyone 大家你一言，我一语，献计献策。 They had a brainstorming session with anyone and everyone joining in. 市场里你来我往非常热闹。 The market is bustling with people coming and going. → 尔虞我诈, 你死我活 ④ （指自我） self → 忘我, 自我

I basically only want the non-double byte characters between after the first character and the first occurrence of ① : (see sample)

我 wǒ 代 ①

Using regex101.com I have been able to determine that this regex pattern should produce the required results but I need help getting these results into automator:

/ ([a-z].) /

MacBook Air (13-inch Mid 2012), Mac OS X (10.7.5), Love the mac (OSX 10.9+)

Posted on Sep 9, 2014 2:59 PM

Reply

Answer 1

SGIII

Level 8

37,532 points

Sep 18, 2014 6:00 PM in response to Hiroto

Hi H,

Thanks for your expert clear instructions!

I think I have installed properly. In terminal:

$ cd /usr/local/bin/

$ ls

resulted in:

hanzi2pinyin pin pip-2.7

But running the AppleScript just kept running with no results the first time tried. I force quit AppleScript Editor (though there was no message that it was not responding) and launched it again. Now it throws off an error:

error "An error of type 100035 has occurred." number 100035

I realize it's tough to troubleshoot from a different version of OS X. But does that give you any clue as to what I might try next?

(I checked to make sure I had CC-CEDICT.dictionary in /Library/Dictionaries)

SG

Reply

Answer 2

Hiroto

Level 5

7,467 points

Sep 19, 2014 6:21 AM in response to SGIII

Hello SG,

All I can say now is that error 100035 is a POSIX error EAGAIN (Resource temporarily unavailable).

/System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/CarbonCore.framework/Versions/A/Headers/MacErrors.h:  kPOSIXErrorEAGAIN             = 100035, /* Resource temporarily unavailable */

/usr/include/sys/errno.h:#define    EAGAIN        35        /* Resource temporarily unavailable */

I first suspect that the dictionary access is blocked by something introduced in later OSes. But it is also quite possible that DictionaryServices.framework functions have been updated in such a way that my script based upon 10.6.8 won't work under 10.9.

All the best,

H

Reply

Answer 3

SGIII

Level 8

37,532 points

Sep 19, 2014 9:14 AM in response to Hiroto

Hi H,

Yes, am guess this probably something to do with the increased security features. Probably a way around them by changing permissions somewhere but, alas, I don't know enough to know where to look and what to do. Tantalizingly close, though.

SG

Reply

Answer 4

mingsai Author

Level 1

30 points

Sep 19, 2014 9:43 AM in response to Hiroto

Hiroto,

This script has taken the concept another step forward! Thanks for sharing! After reviewing the script in my OS X 10.10 environment and adjusting for changes in the resource paths, I have not been successful at getting it to work. The script logic did give me pause to consider the basic algorithm that we have been using.

It seems to me that what would work best against any dictionary file is a query to produce structured data able to reliably handle the many dictionary variances (xml seems like a good possibility). Notwithstanding the fact that I don't currently know how to extract all dictionary definitions as xml, I was able to find the Unicode Han dictionary and apply a regex handler to always grab the first pinyin for each specific character (this dictionary doesn't have compound words so the transliterations are literal but not contextual, which can lead to some inaccuracies).

The markers in this database are consistently formatted and because it's titled Unicode I suspect it has the adequate character range: Here's my regex to parse the definitions:

ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:

Explanation of the Regex:

/ReadingsMandarin: (.+?)(?:\,.*){0,1}Cantonese: (.+?)(?:\,.*){0,1}On’yomi:/i

ReadingsMandarin: matches the characters ReadingsMandarin: literally (case insensitive)
1st Capturing group (.+?)
- .+? matches any character (except newline)
  - Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?:\,.*){0,1} Non-capturing group
- Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
- \, matches the character , literally
- .* matches any character (except newline)
  - Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Cantonese: matches the characters Cantonese: literally (case insensitive)
2nd Capturing group (.+?)
- .+? matches any character (except newline)
  - Quantifier: Between one and unlimited times, as few times as possible, expanding as needed [lazy]
(?:\,.*){0,1} Non-capturing group
- Quantifier: Between 0 and 1 times, as many times as possible, giving back as needed [greedy]
- \, matches the character , literally
- .* matches any character (except newline)
  - Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
On’yomi: matches the characters On’yomi: literally (case insensitive)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

MATCH INFORMATION

Reply

Answer 5

mingsai Author

Level 1

30 points

Sep 19, 2014 10:16 AM in response to SGIII

Hi SGIII,

I'm curious,

If you execute the hanzi2pinyin in a terminal window, what results are produced?

Reply

Answer 6

SGIII

Level 8

37,532 points

Sep 19, 2014 7:15 PM in response to mingsai

When I execute in a terminal window, I get continual Fork ... Resource Temporarily Unavailable interspersed with what looks like the lines I had entered (it flashes by too quickly to see exactly). In the end I had to force quit Terminal.

SG

Reply

Answer 7

Hiroto

Level 5

7,467 points

Sep 23, 2014 1:49 AM in response to SGIII

Hello SG and mingsai,

Just in case, here's minor update. I'm not sure if it helps but something tells me that the use of DCSCreateDictionary() function might be the cause of the reported error. So I refrained from calling it and instead used DCSCopyAvailableDictionaries() in this version. Also script is now entirely ruby script without using bash which had been used only for process substitution facility to create temporary bridgesupport file. Calling convention is the same as the previous version.

Tested under 10.6.8. (Sorry for not testing this under later OSes, which I don't use.)

Good luck,

H

hanzi2pinyin

#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w # coding: utf-8 # # ARGV = options query [query ...] # -d, --dictionary DICTIONARY Dictionary file. # -c, --count COUNT Max record count to retrieve (=10). # -o, --output FORMAT Output format (=0). # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # -e, --echo [CHARACTER] Character(s) to be echoed for no result. # Given no CHARACTER, query is echoed. # -h, --help Display this help. # # v0.34d # written by Hiroto, 2014-09 # # v0.34d - # using DCSActiveDictionaries() or DCSCopyAvailableDictionaries() and DCSDictionaryGetURL() to get the specified dictionary ref # (instead of using DCSDicionaryCreate()) # using DCSGetActiveDictionaries() to get the default dictionary in case no -d option is specified. # (DCSGetActiveDictionaries().first is the 1st dictionary in the preferences of Dictionary.app # Previously, DCSGetDefaultDictionary() is used, which returns fixed dictionary regardless of the preferences order) # # v0.34 - # pure ruby version # without using bash's process substitution to create temporary bridgesupport file # # * this is noticeably faster than v0.33 # # require 'optparse' require 'osx/cocoa' include OSX # OSX.require_framework '/System/Library/Frameworks/CoreServices.framework/Frameworks/DictionaryServices.framework' # [1] while File.exist?(BSFILE = "/tmp/DictionaryServices.#{rand(1e6)}.bridgesupport") do end # while File.exist?(BSFILE = File.expand_path("~/desktop/DictionaryServices.#{rand(1e6)}.bridgesupport")) do end Signal.trap("EXIT") { File.delete BSFILE if File.exist?(BSFILE) } File.open(BSFILE, "w") { |f| f.print DATA.read } OSX.load_bridge_support_file BSFILE # [2] File.delete BSFILE if File.exist?(BSFILE) # ----------------------------------------------------- # * some DictionaryServices.framework functions (OS X 10.6.8) # # (undocumented) # # extern CFArrayRef DCSGetActiveDictionaries (void) # extern CFSetRef DCSCopyAvailableDictionaries (void) # extern DCSDictionaryRef DCSGetDefaultDictionary (void) # extern DCSDictionaryRef DCSGetDefaultThesaurus (void) # extern DCSDictionaryRef DCSDictionaryCreate (CFURLRef) # extern CFURLRef DCSDictionaryGetURL (DCSDictionaryRef) # extern CFStringRef DCSDictionaryGetName (DCSDictionaryRef) # extern CFStringRef DCSDictionaryGetIdentifier (DCSDictionaryRef) # # extern CFArray DCSCopyRecordsForSearchString (DCSDictionaryRef, CFStringRef, unsigned long long, long long) # unsigned long long method # 0 = exact match # 1 = forward match (prefix match) # 2 = partial query match (matching (leading) part of query; including ignoring diacritics, four tones in Chinese, etc) # >=3 = ? (exact match?) # # long long max_record_count # # extern CFStringRef DCSRecordGetString (DCSRecordRef) # extern CFStringRef DCSRecordGetHeadword (DCSRecordRef) # extern CFStringRef DCSRecordGetRawHeadword (DCSRecordRef) # extern CFStringRef DCSRecordGetTitle (DCSRecordRef) # extern CFStringRef DCSRecordGetAnchor (DCSRecordRef) # extern CFURLRef DCSRecordGetDataURL (DCSRecordRef) # # extern CFStringRef DCSRecordCopyData (DCSRecordRef, long) # long output_style # 0 = XML XHTML <html> string # 1 = XML XHTML <html> string # 2 = XML XHTML <html> string # 3 = plain text # 4 = XML XHTML <text> string (single element) # * corresponding to (?) # Transform.xsl # TransformApp.xsl # TransformPanel.xsl # TransformSimpleText.xsl # TransformText.xsl # # (documented) # # CFStringRef DCSCopyTextDefinition (DCSDictionaryRef, CFStringRef, CFRange) # CFRange DCSGetTermRangeInString (DCSDictionaryRef, CFStringRef, CFIndex) # # ----------------------------------------------------- def hanzi2pinyin(argv) # # argv = options query [query ...] # -d, --dictionary DICTIONARY Dictionary file. # -c, --count COUNT Max record count to retrieve (=10). # -o, --output FORMAT Output format (=0). # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # -e, --echo [CHARACTER] Character(s) to be echoed for no result. # Given no CHARACTER, query is echoed. # -h, --help Display this help. # args = { :dictf => nil, :count => 10, :output => 0, :echo => '', } op = OptionParser.new do|o| o.banner = "Usage: #{File.basename($0)} options query [query ...]" o.on('-d', '--dictionary DICTIONARY', String, "Dictionary file.") do |f| args[:dictf] = f end o.on('-c', '--count COUNT', Integer, "Max record count to retrieve (=10).") do |i| raise OptionParser::InvalidArgument, i unless i.to_i > 0 args[:count] = i.to_i end o.on('-o', '--output FORMAT', Integer, "Output format (=0).", " 0 = interleaved : H[p] H[p]...", " 1 = separate : H H...[p p...]", " 2 = pinyin only : p p...") do |i| raise OptionParser::InvalidArgument, i unless [0, 1, 2].include?(i.to_i) args[:output] = i.to_i end o.on('-e', '--echo [CHARACTER]', String, "Character(s) to be echoed for no result.", "Given no CHARACTER, query is echoed.") do |s| args[:echo] = s || '' end o.on( '-h', '--help', 'Display this help.' ) do $stderr.puts o; exit 1 end end begin op.parse!(argv) rescue => ex $stderr.puts "#{ex.class} : #{ex.message}" $stderr.puts op.help(); exit 1 end if argv.length == 0 $stderr.puts op.help(); exit 1 end if (dctf = args[:dictf]) unless File.exists?(dctf) $stderr.puts "No such dictionary: %s" % dctf exit 1 end url = NSURL.fileURLWithPath(dctf) # dct = DCSDictionaryCreate(url) # dct, = DCSGetActiveDictionaries().select { |d| DCSDictionaryGetURL(d).path == url.path } # dcts = DCSCopyAvailableDictionaries() dcts = dcts.allObjects if (dcts = DCSCopyAvailableDictionaries()).is_a? NSSet # [5] dct, = dcts.select { |d| DCSDictionaryGetURL(d).path == url.path } unless dct $stderr.puts "Failed to get dictionary for: %s" % dctf exit 2 end else # dct = DCSGetDefaultDictionary() dct, = DCSGetActiveDictionaries() unless dct $stderr.puts "Failed to get the 1st active dictionary" exit 2 end end query_method = 0 # exact match max_record_count = args[:count] # max record count to be retrieved output_format = args[:output] # output format option # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... # # e.g., given query '我的母亲' # 0 => 我[wǒ] 的[de(dī,dí,dì)] 母亲[mǔqīn] # 1 => 我的母亲[wǒ de(dī,dí,dì) mǔqīn] # 2 => wǒ de(dī,dí,dì) mǔqīn trim_chars = "\t\n |" # characters to be trimmed at both ends of pronunciation string echo_query = '' # special character to let it echo query if result is not found echo_char = args[:echo] # character(s) to be echoed if no result is found for query # if echo_query is specified, query string is echoed for no result trim_chars_set = NSCharacterSet.characterSetWithCharactersInString(trim_chars) echo_ns = echo_char.to_ns argv.map {|a| a.to_ns }.each do |q| # [3] dd = [] while true do # # Until given query string (q) is exhausted, repeat as follows - # get longest leading substring (qu) of the query string matching a term in dictionary, # look the substring up in dictionary and retrieve title and pronunciation of the matching entry. # u = DCSGetTermRangeInString(dct, q, 0) # try to find longest leading range matching a term in dictionary u = NSMakeRange(0, 1) if u.location == KCFNotFound # fallback [4] qu = q.substringWithRange(u) rr = DCSCopyRecordsForSearchString(dct, qu, query_method, max_record_count) unless rr c = q.substringWithRange(NSMakeRange(0, 1)) # give up one character at the beginning dd << [[c, echo_char == echo_query ? c : echo_ns]] break if q.length < 2 q = q.substringFromIndex(1) else tt, pp = [], {} rr.each do |r| # r = DCSRecordRef # # parse xml representation of record entry to get title and pronunciation # xml = DCSRecordCopyData(r, 0) err = OCObject.new doc = NSXMLDocument.alloc.objc_send( :initWithXMLString, xml, :options, 0, :error, err) unless doc $stderr.puts "Failed to obtain XML document for %s: %s" % [qu, err.description] next end nn = doc.objc_send( :nodesForXPath, '//d:entry/@d:title', # d:title attribute :error, nil) title = nn && nn == [] ? echo_ns : nn.first.stringValue nn = doc.objc_send( :nodesForXPath, '//d:entry//span[@d:pr]', # span element with d:pr attribute :error, nil) pron = nn && nn == [] ? echo_ns : nn.first.stringValue pron = pron.stringByTrimmingCharactersInSet(trim_chars_set). stringByReplacingOccurrencesOfString_withString(' ', '').lowercaseString tt << title unless tt.include?(title) title_s = title.to_s # for use as hash key in ruby if not pp.key?(title_s) pp[title_s] = [pron] elsif not pp[title_s].include?(pron) pp[title_s] << pron end end # # Let query_{k} denote sub-query for k-th substring defined by range u, # title_{k,i} denote i-th found title for query_{k}, # pron_{k,i,j} denote j-th pronunciation for title_{k,i}; # # array cc_k holds each collection of pronunciations per tile_{k,i} found for query_{k}: # cc_k = [c_{k,1}, c_{k,2}, ...] # c_{k,i} = [ title_{k,i}, pron_{k,i,1} *1( '(' pron_{k,i,2} ',' pron_{k,i,3} ',' ... ')' ) ] # # array dd holds list of cc_k for every query_{k} # dd = [cc_1, cc_2, ...] # cc_k = tt.map do |t| a = pp[t.to_s] [t, a.shift + (a == [] ? '' : "(%s)" % a.join(','))] end dd << cc_k k = u.location + u.length break unless k < q.length q = q.substringFromIndex(k) end end case output_format # 0 = interleaved : H[p] H[p]... # 1 = separate : H H...[p p...] # 2 = pinyin only : p p... when 0 ee = dd.map do |cc| next '' if cc == [] ("%s[%s]" % cc.shift) + (cc == [] ? '' : "(%s)" % cc.map {|c| "%s[%s]" % c}.join(',')) end puts ee.join(' ') when 1 aa = dd.map do |cc| a, b = cc.transpose next '' unless a (a.shift) + (a == [] ? '' : "(%s)" % a.join(',')) end bb = dd.map do |cc| a, b = cc.transpose next '' unless b (b.shift) + (b == [] ? '' : "(%s)" % b.join(',')) end puts "%s[%s]" % [aa.join(' '), bb.join(' ')] when 2 bb = dd.map do |cc| a, b = cc.transpose next '' unless b (b.shift) + (b == [] ? '' : "(%s)" % b.join(',')) end puts bb.join(' ') end end end hanzi2pinyin(ARGV) # # [1] DictionaryServices.framework/Resources/BridgeSupport/DictionaryServices.bridgesupport has problem to be fixed. # I.e., in signatures of DCSCopyTextDefinition(), DCSGetTermRangeInString() function etc, # {??=qq} should have been {_CFRange=qq} # {??=ii} should have been {_CFRange=ii} # [2] Fixed and extended bridgesupport file is loaded by OSX.load_bridge_support_file. # It now includes signatures for several undocumented functions as well. # [3] argv.to_ns is required to handle unicode characters correctly (in ruby 1.8). # [4] DCSGetTermRangeInString(dct, q, 0) returning range [KCFNotFound, 0] does not necessarily mean q's 1st character # as query may not match any term in dictionary. It is necessary to use DCSCopyRecordsForSearchString() # for the 1st character in order to know the (existence of) matching term(s). # [5] DCSCopyAvailableDictionaries() returns CFSetRef under 10.6.8 but may return CFArrayRef in later OSes. # __END__ <?xml version="1.0" standalone="yes"?> <!DOCTYPE signatures SYSTEM "file://localhost/System/Library/DTDs/BridgeSupport.dtd"> <signatures version="0.9"> <function name="DCSCopyTextDefinition"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type64="{_CFRange=qq}" type="{_CFRange=ii}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSGetTermRangeInString"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type64="q" type="l"></arg> <retval type64="{_CFRange=qq}" type="{_CFRange=ii}"></retval> </function> <function name="DCSDictionaryCreate"> <arg type="^{__CFURL=}"></arg> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSGetActiveDictionaries"> <retval type="^{__CFArray=}"></retval> </function> <function name="DCSCopyAvailableDictionaries"> <retval type="^{__CFSet=}"></retval> </function> <function name="DCSGetDefaultDictionary"> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSGetDefaultThesaurus"> <retval type="^{__DCSDictionary=}"></retval> </function> <function name="DCSDictionaryGetURL"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFURL=}"></retval> </function> <function name="DCSDictionaryGetName"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSDictionaryGetIdentifier"> <arg type="^{__DCSDictionary=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSCopyRecordsForSearchString"> <arg type="^{__DCSDictionary=}"></arg> <arg type="^{__CFString=}"></arg> <arg type="l"></arg> <arg type="l"></arg> <retval type="^{__CFArray=}"></retval> </function> <function name="DCSRecordGetHeadword"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetString"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetRawHeadword"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetTitle"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetAnchor"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFString=}"></retval> </function> <function name="DCSRecordGetDataURL"> <arg type="^{__DCSRecord=}"></arg> <retval type="^{__CFURL=}"></retval> </function> <function name="DCSRecordCopyData"> <arg type="^{__DCSRecord=}"></arg> <arg type="l"></arg> <retval type="^{__CFString=}"></retval> </function> </signatures>

Reply

Answer 8

SGIII

Level 8

37,532 points

Sep 24, 2014 6:21 PM in response to Hiroto

Hi H,

Thanks so much for all the time you've spent on this.

The AppleScript no longer seems to be in an endless loop on my machine.

Here's what appears in the Replies panel of AppleScript Editor:

If you have any ideas where I should look now, I'll give it a shot.

SG

Reply

Answer 9

Hiroto

Level 5

7,467 points

Sep 25, 2014 7:27 AM in response to SGIII

Hello SG,

I suspect your hanzi2pinyin script starts with the line 'hanzi2pinyin'. But it should have started with the she-bang line:

#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w

The 'hanzi2pinyin' I put before the source code in the post(s) above is just a label in text message and not the part of source code. 🙂

All the best,

H

Reply

Answer 10

SGIII

Level 8

37,532 points

Sep 25, 2014 8:52 AM in response to Hiroto

Hi H,

Yes that was it, an extraneous line. After correcting hanzi2pinyin to start with the shebang line, running your AppleScript example (using all three parameters) produces the expected results in Mavericks... and presumably will in Yosemite too.🙂 This is great, because it spaces the pinyin into "words."

Thanks so much.

SG

Reply

Answer 11

mingsai Author

Level 1

30 points

Sep 25, 2014 9:40 AM in response to Hiroto

Hi Hiroto,

Thanks for posting the update. I am running Yosemite (currently at Beta 8) and seeing the following issue after I adjust the path for my ruby installation.

/usr/local/bin/h2p:277: warning: shadowing outer local variable - c

/usr/local/bin/h2p:282: warning: assigned but unused variable - b

/usr/local/bin/h2p:287: warning: assigned but unused variable - a

/usr/local/bin/h2p:294: warning: assigned but unused variable - a

/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyge ms/core_ext/kernel_require.rb:55:in `require': cannot load such file -- osx/cocoa (LoadError)

from /System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/lib/ruby/2.0.0/rubyg ems/core_ext/kernel_require.rb:55:in `require'

from /usr/local/bin/h2p:33:in `<main>'

Reply

Answer 12

Hiroto

Level 5

7,467 points

Sep 25, 2014 10:34 AM in response to SGIII

My pleasure! I'm really glad to hear it worked at last! 🙂

Hiroto

Reply

Answer 13

Hiroto

Level 5

7,467 points

Sep 25, 2014 10:41 AM in response to mingsai

Hello mingsai

The core part of the script is written in RubyCocoa which only works with Ruby 1.8. The default Ruby under OSX 10.9 or later is Ruby 2.0 or later and that is why I specified the full path of Ruby 1.8 interpreter in my script.

So please specify the full path of Ruby 1.8 under 10.10. If there's no Ruby 1.8 under OSX 10.10, RubyCocoa script won't work and you'd need to translate the script to C or Objective-C proper. (Or it would be possible to call the C functions by using DL module in Ruby if you wish.)

Good luck,

H

Reply

Answer 14

Hiroto

Level 5

7,467 points

Sep 25, 2014 12:05 PM in response to Hiroto

Good news!

rubycocoa 1.2.0 now supports ruby 2.0.

http://rubycocoa.sourceforge.net/

http://sourceforge.net/p/rubycocoa/svn/HEAD/tree/trunk/src/NEWS

http://sourceforge.net/projects/rubycocoa/files/RubyCocoa/

H

Reply

Answer 15

mingsai Author

Level 1

30 points

Sep 25, 2014 12:28 PM in response to Hiroto

Thanks for the update. I haven't tried the new rubycocoa yet but I was able to workaround the issue by copying over the prior version of the Ruby Frameworks into my system and pointing to the original source in the script. This enabled me to validate that the original script does work on Yosemite (beta 8). The other dictionaries did not return good results but the first item produced the desired results.

onrun {input}

set dictf to "/Library/Dictionaries/Simplified Chinese - English.dictionary"

--set dictf to "/Library/Dictionaries/The Standard Dictionary of Contemporary Chinese.dictionary"

--set dictf to "/Library/Dictionaries/小词典.dictionary"

--set dictf to "/Library/Dictionaries/unihan.dictionary"

--set dictf to "/Library/Dictionaries/小词典－繁体字.dictionary"

--set dictf to "/Library/Dictionaries/CC-CEDICT.dictionary"

set query to input as Unicode text

--hanzi2pinyin(dictf, 10, 0, query)

--hanzi2pinyin(dictf, 10, 1, query)

set pinyinText to hanzi2pinyin(dictf, 10, 2, query)

input & pinyinText

endrun

on hanzi2pinyin(dictf, max_count, output_format, query)

(*

string dictf : POSIX path of dictionary file

integer max_count : Max record count to retrieve

integer output_format : Output format.

0 = interleaved : H[p] H[p]...

1 = separate : H H...[p p...]

2 = pinyin only : p p...

string query : query string

return string : Hanzi[pinyin] in specified output format

*)

do shell script "d=" & dictf's quoted form & "; c=" & max_count & "; o=" & output_format & "

/usr/local/bin/h2p -d \"$d\" -c \"$c\" -o \"$o\" -e -- " & query's quoted form

end hanzi2pinyin

Reply