Hi Gary,
Thank you very much for all this valuable informations. Let me explain you the situation. One of my friends has some old Windows files ( approx 1000 of them) with western encoding with Turkish characters. Each line of these files contain one output of an old DOS program, all the values inside "" and separated by a comma. What he wanted, was to convert these files to Microsoft Office format as he works in Windows. So, I wrote a shell script which will parse these files, and convert them to LaTex files, adding required Titles, entries (for example first field of the files is the name field, let say "John Smith". It will be translated to: \textbf{\textit{name: }} John Smith\\). Then, using latex2rtf, I could convert it to RTF file.
I opened them with text editor, and I saw that Turkish characters are unreadable. I replaced these characters (Ä -> Ç for example), and saved the file in UTF-8 format with Text Editor. Then I run:
cat -ev file > newfile
for being able to distinguish the end of line. I use the extra characters added by cat command at the end of each line for letting the script know that all datas has been taken from a line. Then I run the script newfile being the input, but I saw that the output latex files do not have the correct characteres.
I tried now, and saw that
-ev parameter to cat command causes this error. If I use simply cat, I can see the Turkish characters, but if I use cat -ve I can't.
I think I have to find another way instead of
cat -ev .
Thank you very much for your help....
haris
Hi hsaybasili,
I have one more thought that is a
reach. The HFS+ filesystem, at the deepest level,
uses a "decomposed" format, where accented characters
consist of the character being accented and the
"combining" accent character, in that order. The
combining accent character is above ASCII and is
encoded in UTF-8 so technically the whole thing is
"legal UTF-8", just unusual. Maybe your file was
created with a text editor that uses this
"decomposition" but UNIX text editors don't know it.
Come to think of it, was this file
the result of output from the "ls" command? That
output is also always "decomposed" so that would
explain why UNIX editors might not deal with it as
expected.
--
Gary
~~~~
It is
so soon that I am done for,
I
wonder what I was begun
for.
&nbs
p; -- Epitaph, Cheltenham Churchyard