Subscript text vs unicode

With this I get a lot of help from Hiroto:

The script is working fine for years but I would like also let it work with subscript text...
When I have a subscript 2 it becomes a normal 2.
How can I get subscript 2 recognized as a subscript 2

Any help would be wonderful!
Many thanks!



-- SCRIPT
(*
This scirpt will modify given text (in Latin-1 encoding) such that
(1) every semicolon is replaced by tab
(2) its text encoding is converted from Latin-1 to Mac-Roman

* Using Satimage OSAX's
change
convert to Mac

Preparation:
Install Satimage OSAX in:
under OS9
System Folder:Scripting Additions:
under OSX (either of the followings)
/Library/ScriptingAdditions/
~/Library/ScriptingAdditions/

* OSAX download sources:
http://www.satimage.fr/software/en/soft9.html
http://www.satimage.fr/software/en/downloads_osaxen.html
*)
main()
on main()
set fdp to "HD Server Data 1:DATABASES map:TEST:"
set fda to fdp as alias
tell application "Finder"
try
set aa to (files of fda whose name extension is "txt") as alias list
on error
set aa to (files of fda whose name extension is "txt") as alias as list
end try
end tell
repeat with a in aa
set fp0 to a as text
set fp1 to (fp0's text 1 thru -5) & "_mac.txt"
set t to read file fp0
set t to «event SATIRPLl» ";" given «class by »:tab, «class $in »:t
set t to «event SATIWn2M» t
writeData(fp1, t, {_class:string, _append:false}) -- # if you need text in Mac-Roman
--writeData(fp1, t, {_class:«class utf8», _append:false}) -- # if you need text in UTF-8
end repeat
end main
on writeData(fp, x, {_class:typeclass, append:append})
(*
text fp: output file path
data x: anything to be written to output file
type class typeclass: class as which the data is written ("" for not-specified)
boolean _append: whether to append data or to replace data
*)
local a
try
open for access (file fp) with write permission
set a to fp as alias
if not _append then set eof a to 0
if typeclass = "" then
write x to a starting at eof
else
write x to a as typeclass starting at eof
end if
close access a
on error e
try
close access file fp
on error --
end try
error e
end try
end writeData
-- END OF SCRIPT

Different workstations from 10.3.9 to latest// G5 server - OS 10.3.9, Mac OS X (10.5.6)

Posted on Sep 7, 2009 7:48 AM

Reply
7 replies

Sep 10, 2009 6:07 AM in response to Colin @ mac.com

Hello

I guess you are talking about SUPERSCRIPT digits perhaps?
ISO-8859-1 contains SUPERSCRIPT ONE, TWO, THREE, but no subscript digits.

And if that is the case, I'm afraid you cannot convert such characters to MacRoman because MacRoman does not map them. Satimage OSAX's 'convert to Mac' command is unavoidably substituting them with normal digits.

You can of cource convert them to Unicode (UTF-8, for instance) but then your output file will be coded in UTF-8, which may not be acceptable in your workflow.

If UTF-8 output is fine, we can rewrite the script as such.

All the best,
H

Sep 14, 2009 6:36 AM in response to Hiroto

Hello Hiroto,

I changed the UTF-8 part(of the existing script) but it doesn't work. Do you have any suggestions. SO i can it like you think it's needed?

When I open the database on a PC it's all wright...
When I open the database on a MAC it's the superscript 2 becomes ≤
as you said before MAC doesn't recognize the superscript.

Then using the script with the changed of UTF-8 doesn't work...

Trying the utf-8 part a lot of text is wrong and also the superscript doesn't work. Don't know what the next step.

Do you have any suggestions?? OR something I can try.
Normally I should batch the files and after that - import the database in other programs... But I lose the superscript text.. That's the only thing I would like also to work.

I can't use a find and replace solution in my work flow. It's all one to one... So the database needs to be perfect.

Many thanks in advance.

Sep 14, 2009 9:46 AM in response to Colin @ mac.com

Hello Colin,

We need to convert ISO-8859-1 to UTF-8 directly, not going through MacRoman, in order to preserve SUPERSCRIPT TWO. So we cannot use 'convert to Mac' command of Satimage osax anymore.

Try the SCRIPT1 below which will yield UTF-8 output hopefully.
It will -
a) get the input file's content by cat(1); and
b) convert ISO-8859-1 to UTF-8 by iconv(1); and
c) change semicolon to tab by perl(1).

The rest is the same as the original script.
Hope this may help,
Hiroto


--SCRIPT1
main()
on main()
set fdp to "HD Server Data 1:DATABASES map:TEST:"
set fda to fdp as alias
tell application "Finder"
try
set aa to (files of fda whose name extension is "txt") as alias list
on error
set aa to (files of fda whose name extension is "txt") as alias as list
end try
end tell
repeat with a in aa
set fp0 to POSIX path of a
set fp1 to (fp0's text 1 thru -5) & "_mac.txt"

set sh to "cat " & (quoted form of fp0) & ¬
" | iconv -f iso-8859-1 -t utf-8" & ¬
" | perl -Mutf8 -pe 's/;/\t/og;' > " & (quoted form of fp1)
do shell script sh
end repeat
end main
--END OF SCRIPT

Sep 15, 2009 2:43 AM in response to Hiroto

Good morning Hiroto,

Thanks for helping me out. When I test your script it works when I manually check the UTF-8 file in a spreadsheet program.

But My work flow doesn't support it. That's such a setback for me.
The first time I see we have better a PC work flow for this. Now It's mac based.
Then there isn't anything difficult about this.

To let it work in the work flow It should be able to convert it back to mac roman. And that's not possible (I understand your suggestions)

I will check with my suppliers maybe they have a solution so i can import the utf-8 file directly. Now it's possible but it convert the text wrong.

Many thanks for your quick help!

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Subscript text vs unicode

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.