Hello SG and mingsai,
Just in case, here's minor update. I'm not sure if it helps but something tells me that the use of DCSCreateDictionary() function might be the cause of the reported error. So I refrained from calling it and instead used DCSCopyAvailableDictionaries() in this version. Also script is now entirely ruby script without using bash which had been used only for process substitution facility to create temporary bridgesupport file. Calling convention is the same as the previous version.
Tested under 10.6.8. (Sorry for not testing this under later OSes, which I don't use.)
Good luck,
H
hanzi2pinyin
#!/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby -w
# coding: utf-8
#
# ARGV = options query [query ...]
# -d, --dictionary DICTIONARY Dictionary file.
# -c, --count COUNT Max record count to retrieve (=10).
# -o, --output FORMAT Output format (=0).
# 0 = interleaved : H[p] H[p]...
# 1 = separate : H H...[p p...]
# 2 = pinyin only : p p...
# -e, --echo [CHARACTER] Character(s) to be echoed for no result.
# Given no CHARACTER, query is echoed.
# -h, --help Display this help.
#
# v0.34d
# written by Hiroto, 2014-09
#
# v0.34d -
# using DCSActiveDictionaries() or DCSCopyAvailableDictionaries() and DCSDictionaryGetURL() to get the specified dictionary ref
# (instead of using DCSDicionaryCreate())
# using DCSGetActiveDictionaries() to get the default dictionary in case no -d option is specified.
# (DCSGetActiveDictionaries().first is the 1st dictionary in the preferences of Dictionary.app
# Previously, DCSGetDefaultDictionary() is used, which returns fixed dictionary regardless of the preferences order)
#
# v0.34 -
# pure ruby version
# without using bash's process substitution to create temporary bridgesupport file
#
# * this is noticeably faster than v0.33
#
#
require 'optparse'
require 'osx/cocoa'
include OSX
# OSX.require_framework '/System/Library/Frameworks/CoreServices.framework/Frameworks/DictionaryServices.framework' # [1]
while File.exist?(BSFILE = "/tmp/DictionaryServices.#{rand(1e6)}.bridgesupport") do end
# while File.exist?(BSFILE = File.expand_path("~/desktop/DictionaryServices.#{rand(1e6)}.bridgesupport")) do end
Signal.trap("EXIT") { File.delete BSFILE if File.exist?(BSFILE) }
File.open(BSFILE, "w") { |f| f.print DATA.read }
OSX.load_bridge_support_file BSFILE # [2]
File.delete BSFILE if File.exist?(BSFILE)
# -----------------------------------------------------
# * some DictionaryServices.framework functions (OS X 10.6.8)
#
# (undocumented)
#
# extern CFArrayRef DCSGetActiveDictionaries (void)
# extern CFSetRef DCSCopyAvailableDictionaries (void)
# extern DCSDictionaryRef DCSGetDefaultDictionary (void)
# extern DCSDictionaryRef DCSGetDefaultThesaurus (void)
# extern DCSDictionaryRef DCSDictionaryCreate (CFURLRef)
# extern CFURLRef DCSDictionaryGetURL (DCSDictionaryRef)
# extern CFStringRef DCSDictionaryGetName (DCSDictionaryRef)
# extern CFStringRef DCSDictionaryGetIdentifier (DCSDictionaryRef)
#
# extern CFArray DCSCopyRecordsForSearchString (DCSDictionaryRef, CFStringRef, unsigned long long, long long)
# unsigned long long method
# 0 = exact match
# 1 = forward match (prefix match)
# 2 = partial query match (matching (leading) part of query; including ignoring diacritics, four tones in Chinese, etc)
# >=3 = ? (exact match?)
#
# long long max_record_count
#
# extern CFStringRef DCSRecordGetString (DCSRecordRef)
# extern CFStringRef DCSRecordGetHeadword (DCSRecordRef)
# extern CFStringRef DCSRecordGetRawHeadword (DCSRecordRef)
# extern CFStringRef DCSRecordGetTitle (DCSRecordRef)
# extern CFStringRef DCSRecordGetAnchor (DCSRecordRef)
# extern CFURLRef DCSRecordGetDataURL (DCSRecordRef)
#
# extern CFStringRef DCSRecordCopyData (DCSRecordRef, long)
# long output_style
# 0 = XML XHTML <html> string
# 1 = XML XHTML <html> string
# 2 = XML XHTML <html> string
# 3 = plain text
# 4 = XML XHTML <text> string (single element)
# * corresponding to (?)
# Transform.xsl
# TransformApp.xsl
# TransformPanel.xsl
# TransformSimpleText.xsl
# TransformText.xsl
#
# (documented)
#
# CFStringRef DCSCopyTextDefinition (DCSDictionaryRef, CFStringRef, CFRange)
# CFRange DCSGetTermRangeInString (DCSDictionaryRef, CFStringRef, CFIndex)
#
# -----------------------------------------------------
def hanzi2pinyin(argv)
#
# argv = options query [query ...]
# -d, --dictionary DICTIONARY Dictionary file.
# -c, --count COUNT Max record count to retrieve (=10).
# -o, --output FORMAT Output format (=0).
# 0 = interleaved : H[p] H[p]...
# 1 = separate : H H...[p p...]
# 2 = pinyin only : p p...
# -e, --echo [CHARACTER] Character(s) to be echoed for no result.
# Given no CHARACTER, query is echoed.
# -h, --help Display this help.
#
args = {
:dictf => nil,
:count => 10,
:output => 0,
:echo => '',
}
op = OptionParser.new do|o|
o.banner = "Usage: #{File.basename($0)} options query [query ...]"
o.on('-d', '--dictionary DICTIONARY', String, "Dictionary file.") do |f|
args[:dictf] = f
end
o.on('-c', '--count COUNT', Integer, "Max record count to retrieve (=10).") do |i|
raise OptionParser::InvalidArgument, i unless i.to_i > 0
args[:count] = i.to_i
end
o.on('-o', '--output FORMAT', Integer, "Output format (=0).",
" 0 = interleaved : H[p] H[p]...",
" 1 = separate : H H...[p p...]",
" 2 = pinyin only : p p...") do |i|
raise OptionParser::InvalidArgument, i unless [0, 1, 2].include?(i.to_i)
args[:output] = i.to_i
end
o.on('-e', '--echo [CHARACTER]', String, "Character(s) to be echoed for no result.",
"Given no CHARACTER, query is echoed.") do |s|
args[:echo] = s || ''
end
o.on( '-h', '--help', 'Display this help.' ) do
$stderr.puts o; exit 1
end
end
begin
op.parse!(argv)
rescue => ex
$stderr.puts "#{ex.class} : #{ex.message}"
$stderr.puts op.help(); exit 1
end
if argv.length == 0
$stderr.puts op.help(); exit 1
end
if (dctf = args[:dictf])
unless File.exists?(dctf)
$stderr.puts "No such dictionary: %s" % dctf
exit 1
end
url = NSURL.fileURLWithPath(dctf)
# dct = DCSDictionaryCreate(url)
# dct, = DCSGetActiveDictionaries().select { |d| DCSDictionaryGetURL(d).path == url.path }
# dcts = DCSCopyAvailableDictionaries()
dcts = dcts.allObjects if (dcts = DCSCopyAvailableDictionaries()).is_a? NSSet # [5]
dct, = dcts.select { |d| DCSDictionaryGetURL(d).path == url.path }
unless dct
$stderr.puts "Failed to get dictionary for: %s" % dctf
exit 2
end
else
# dct = DCSGetDefaultDictionary()
dct, = DCSGetActiveDictionaries()
unless dct
$stderr.puts "Failed to get the 1st active dictionary"
exit 2
end
end
query_method = 0 # exact match
max_record_count = args[:count] # max record count to be retrieved
output_format = args[:output] # output format option
# 0 = interleaved : H[p] H[p]...
# 1 = separate : H H...[p p...]
# 2 = pinyin only : p p...
#
# e.g., given query '我的母亲'
# 0 => 我[wǒ] 的[de(dī,dí,dì)] 母亲[mǔqīn]
# 1 => 我 的 母亲[wǒ de(dī,dí,dì) mǔqīn]
# 2 => wǒ de(dī,dí,dì) mǔqīn
trim_chars = "\t\n |" # characters to be trimmed at both ends of pronunciation string
echo_query = '' # special character to let it echo query if result is not found
echo_char = args[:echo] # character(s) to be echoed if no result is found for query
# if echo_query is specified, query string is echoed for no result
trim_chars_set = NSCharacterSet.characterSetWithCharactersInString(trim_chars)
echo_ns = echo_char.to_ns
argv.map {|a| a.to_ns }.each do |q| # [3]
dd = []
while true do
#
# Until given query string (q) is exhausted, repeat as follows -
# get longest leading substring (qu) of the query string matching a term in dictionary,
# look the substring up in dictionary and retrieve title and pronunciation of the matching entry.
#
u = DCSGetTermRangeInString(dct, q, 0) # try to find longest leading range matching a term in dictionary
u = NSMakeRange(0, 1) if u.location == KCFNotFound # fallback [4]
qu = q.substringWithRange(u)
rr = DCSCopyRecordsForSearchString(dct, qu, query_method, max_record_count)
unless rr
c = q.substringWithRange(NSMakeRange(0, 1)) # give up one character at the beginning
dd << [[c, echo_char == echo_query ? c : echo_ns]]
break if q.length < 2
q = q.substringFromIndex(1)
else
tt, pp = [], {}
rr.each do |r| # r = DCSRecordRef
#
# parse xml representation of record entry to get title and pronunciation
#
xml = DCSRecordCopyData(r, 0)
err = OCObject.new
doc = NSXMLDocument.alloc.objc_send(
:initWithXMLString, xml,
:options, 0,
:error, err)
unless doc
$stderr.puts "Failed to obtain XML document for %s: %s" % [qu, err.description]
next
end
nn = doc.objc_send(
:nodesForXPath, '//d:entry/@d:title', # d:title attribute
:error, nil)
title = nn && nn == [] ? echo_ns : nn.first.stringValue
nn = doc.objc_send(
:nodesForXPath, '//d:entry//span[@d:pr]', # span element with d:pr attribute
:error, nil)
pron = nn && nn == [] ? echo_ns : nn.first.stringValue
pron = pron.stringByTrimmingCharactersInSet(trim_chars_set).
stringByReplacingOccurrencesOfString_withString(' ', '').lowercaseString
tt << title unless tt.include?(title)
title_s = title.to_s # for use as hash key in ruby
if not pp.key?(title_s)
pp[title_s] = [pron]
elsif not pp[title_s].include?(pron)
pp[title_s] << pron
end
end
#
# Let query_{k} denote sub-query for k-th substring defined by range u,
# title_{k,i} denote i-th found title for query_{k},
# pron_{k,i,j} denote j-th pronunciation for title_{k,i};
#
# array cc_k holds each collection of pronunciations per tile_{k,i} found for query_{k}:
# cc_k = [c_{k,1}, c_{k,2}, ...]
# c_{k,i} = [ title_{k,i}, pron_{k,i,1} *1( '(' pron_{k,i,2} ',' pron_{k,i,3} ',' ... ')' ) ]
#
# array dd holds list of cc_k for every query_{k}
# dd = [cc_1, cc_2, ...]
#
cc_k = tt.map do |t|
a = pp[t.to_s]
[t, a.shift + (a == [] ? '' : "(%s)" % a.join(','))]
end
dd << cc_k
k = u.location + u.length
break unless k < q.length
q = q.substringFromIndex(k)
end
end
case output_format
# 0 = interleaved : H[p] H[p]...
# 1 = separate : H H...[p p...]
# 2 = pinyin only : p p...
when 0
ee = dd.map do |cc|
next '' if cc == []
("%s[%s]" % cc.shift) + (cc == [] ? '' : "(%s)" % cc.map {|c| "%s[%s]" % c}.join(','))
end
puts ee.join(' ')
when 1
aa = dd.map do |cc|
a, b = cc.transpose
next '' unless a
(a.shift) + (a == [] ? '' : "(%s)" % a.join(','))
end
bb = dd.map do |cc|
a, b = cc.transpose
next '' unless b
(b.shift) + (b == [] ? '' : "(%s)" % b.join(','))
end
puts "%s[%s]" % [aa.join(' '), bb.join(' ')]
when 2
bb = dd.map do |cc|
a, b = cc.transpose
next '' unless b
(b.shift) + (b == [] ? '' : "(%s)" % b.join(','))
end
puts bb.join(' ')
end
end
end
hanzi2pinyin(ARGV)
#
# [1] DictionaryServices.framework/Resources/BridgeSupport/DictionaryServices.bridgesupport has problem to be fixed.
# I.e., in signatures of DCSCopyTextDefinition(), DCSGetTermRangeInString() function etc,
# {??=qq} should have been {_CFRange=qq}
# {??=ii} should have been {_CFRange=ii}
# [2] Fixed and extended bridgesupport file is loaded by OSX.load_bridge_support_file.
# It now includes signatures for several undocumented functions as well.
# [3] argv.to_ns is required to handle unicode characters correctly (in ruby 1.8).
# [4] DCSGetTermRangeInString(dct, q, 0) returning range [KCFNotFound, 0] does not necessarily mean q's 1st character
# as query may not match any term in dictionary. It is necessary to use DCSCopyRecordsForSearchString()
# for the 1st character in order to know the (existence of) matching term(s).
# [5] DCSCopyAvailableDictionaries() returns CFSetRef under 10.6.8 but may return CFArrayRef in later OSes.
#
__END__
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE signatures SYSTEM "file://localhost/System/Library/DTDs/BridgeSupport.dtd">
<signatures version="0.9">
<function name="DCSCopyTextDefinition">
<arg type="^{__DCSDictionary=}"></arg>
<arg type="^{__CFString=}"></arg>
<arg type64="{_CFRange=qq}" type="{_CFRange=ii}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSGetTermRangeInString">
<arg type="^{__DCSDictionary=}"></arg>
<arg type="^{__CFString=}"></arg>
<arg type64="q" type="l"></arg>
<retval type64="{_CFRange=qq}" type="{_CFRange=ii}"></retval>
</function>
<function name="DCSDictionaryCreate">
<arg type="^{__CFURL=}"></arg>
<retval type="^{__DCSDictionary=}"></retval>
</function>
<function name="DCSGetActiveDictionaries">
<retval type="^{__CFArray=}"></retval>
</function>
<function name="DCSCopyAvailableDictionaries">
<retval type="^{__CFSet=}"></retval>
</function>
<function name="DCSGetDefaultDictionary">
<retval type="^{__DCSDictionary=}"></retval>
</function>
<function name="DCSGetDefaultThesaurus">
<retval type="^{__DCSDictionary=}"></retval>
</function>
<function name="DCSDictionaryGetURL">
<arg type="^{__DCSDictionary=}"></arg>
<retval type="^{__CFURL=}"></retval>
</function>
<function name="DCSDictionaryGetName">
<arg type="^{__DCSDictionary=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSDictionaryGetIdentifier">
<arg type="^{__DCSDictionary=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSCopyRecordsForSearchString">
<arg type="^{__DCSDictionary=}"></arg>
<arg type="^{__CFString=}"></arg>
<arg type="l"></arg>
<arg type="l"></arg>
<retval type="^{__CFArray=}"></retval>
</function>
<function name="DCSRecordGetHeadword">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSRecordGetString">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSRecordGetRawHeadword">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSRecordGetTitle">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSRecordGetAnchor">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFString=}"></retval>
</function>
<function name="DCSRecordGetDataURL">
<arg type="^{__DCSRecord=}"></arg>
<retval type="^{__CFURL=}"></retval>
</function>
<function name="DCSRecordCopyData">
<arg type="^{__DCSRecord=}"></arg>
<arg type="l"></arg>
<retval type="^{__CFString=}"></retval>
</function>
</signatures>