French accents

I have been talked into putting together the local village magazine. I am English but this is France so the mag is in french. I'm having a lot of problems with the accents and special characters which come across as strange symbols. I do not understand why if they are plain text files and then I use the same font as the original writer, the accents don't show up correctly? Must be something to do with ASCII characters. Anyone have any experience of this subject?
cheers

2x2.66 dual intel xenon mac pro, Mac OS X (10.5.5), RME hardware

Posted on Nov 3, 2008 10:59 AM

Reply
23 replies

Nov 3, 2008 11:59 AM in response to oddshoes

I do not understand why if they are plain text files and then I use the same font as the original writer, the accents don't show up correctly?


The fonts don't matter, it's the encoding. Perhaps people are giving you text files produced in Latin-1, while Pages expects UTF-8 Unicode. You may need to open the text files (for example with TextEdit) as Latin-1 and then copy/paste into Pages.

Nov 3, 2008 12:12 PM in response to Tom Gewecke

while Pages expects UTF-8 Unicode.


Are you sure of that Tom ?

When I work under 10.4.11 and copy text from AppleWorks, It's not UTF 8 but 'Mac OS Roman'.
I may paste it in Pages without any kind of problem.

In its documents, Pages doesn't use the Utf 8 encoding.
It uses plain ASCII and this kind of entities :
User uploaded file

Yvan KOENIG (from FRANCE lundi 3 novembre 2008 21:12:55)

Nov 3, 2008 12:30 PM in response to KOENIG Yvan

while Pages expects UTF-8 Unicode.


Are you sure of that Tom ?


You are right, when you try to open a text file via File > Open in Pages, it expects MacRoman, not UTF-8. Still if you try to open a French document produced by Windows, for example, in Latin-1, it will not display correctly.

I may paste it in Pages without any kind of problem.


When you have a text displayed in an app, you can copy/paste into Pages without problem regardless of the encoding of the original. It's therefore better to copy/paste than to try to open some text file directly with Pages (which may be what the OP was doing).

In its documents, Pages doesn't use the Utf 8 encoding.


Right, the example you gave is using html ascii escape codes.

Nov 3, 2008 1:15 PM in response to Tom Gewecke

Ironically, Pages will open a UTF-8 text file correctly if the file begins with a BOM (byte order mark). But normally only Windows apps create UTF-8 files with BOM's, and it is not supposed to be necessary to have one (TextEdit does not offer the option).


I knew that but as I wrote, Pages is able to work with non Utf8 datas like the 'styled text' encoded as Mac OS Roman which is pasted from AppleWorks or other old beasts.

Yvan KOENIG (from FRANCE lundi 3 novembre 2008 22:15:40)

Nov 3, 2008 2:00 PM in response to KOENIG Yvan

Pages is able to work with non Utf8 datas like the 'styled text' encoded as Mac OS Roman which is pasted from AppleWorks or other old beasts.


When you are copy/pasting a (correctly) displayed text, the encoding of the original document, whether MacRoman or UTF-8 or Chinese GB-2312, doesn't matter. As I understand it, everything becomes UTF-8 when put on the clipboard (if not earlier) and that is what gets transferred to Pages, even though Pages later stores its Saved data as ascii escapes.

When you do File/Open in Pages, the encoding of the document does matter, as I think only MacRoman and UTF-8 with BOM will be correctly displayed.

Nov 4, 2008 1:29 AM in response to Tom Gewecke

When you are copy/pasting a (correctly) displayed text, the encoding of the original document, whether MacRoman or UTF-8 or Chinese GB-2312, doesn't matter. As I understand it, everything becomes UTF-8 when put on the clipboard


I repeat: are you sure of that ?

AppleWorks is unable to use Unicode encoding and it is able to receive text from Pages.

In fact what is copied from AppleWorks remains MacRoman in the clipboard.
Here is an example.
tell current application
the clipboard as record
{styled Clipboard text:«data styl000100000000000F000A00150000000C000000000000»,
string:"Clémentine achète des œufs."}
end tell

What is copied from Pages is stored this way in the clipboard:

running:
set zz to the clipboard as record

I get:

tell current application
the clipboard as record
{Unicode text:"Clémentine achète des œufs.",
styled Clipboard text:«data styl000100000000000C00090015001B000C000000000000»,
string:"Clémentine achète des œufs.",
uniform styles:«data ustl0000000200000090000000000000001400000020000000010000001B0000000000000001000 0006C000000040000000000000000000001020000000100000000000001050000002C6E616D64000 0002400000001000000040000000100000000000000000000000948656C766574696361000000000 0010600000004000C000000000107000000060000000000000000»,
«class ut16»:"Clémentine achète des œufs.",
«class utf8»:"Clémentine achète des œufs.",
«class RTF »:«data RTF 7B5C727466315C6D61635C616E736963706731303030305C636F636F617274663832345C636F636 F617375627274663438300A7B5C666F6E7474626C5C66305C6673776973735C66636861727365743 7372048656C7665746963613B7D0A7B5C636F6C6F7274626C3B5C7265643235355C677265656E323 5355C626C75653235353B7D0A5C6465667461623730380A5C706172645C74783536305C747831313 2305C7478313638305C7478323234305C7478323830305C7478333336305C7478333932305C74783 43438305C7478353034305C7478353630305C7478363136305C7478363732305C706172646566746 1623730385C7061726469726E61747572616C0A0A5C66305C66733234205C636630205C6578706E6 4305C6578706E647477305C6B65726E696E67300A5C757030205C6E6F7375706572737562205C756 C6E6F6E65205C6F75746C305C7374726F6B65776964746830205C7374726F6B65633020436C5C273 8656D656E74696E65206163685C273866746520646573205C2763667566732E7D»,
«class rtfd»:«data rtfd72746664000000000300000002000000070000005458542E727466010000002E900100002B0 0000001000000880100007B5C727466315C6D61635C616E736963706731303030305C636F636F617 274663832345C636F636F617375627274663438300A7B5C666F6E7474626C5C66305C66737769737 35C666368617273657437372048656C7665746963613B7D0A7B5C636F6C6F7274626C3B5C7265643 235355C677265656E3235355C626C75653235353B7D0A5C6465667461623730380A5C706172645C7 4783536305C7478313132305C7478313638305C7478323234305C7478323830305C7478333336305 C7478333932305C7478343438305C7478353034305C7478353630305C7478363136305C747836373 2305C7061726465667461623730385C7061726469726E61747572616C0A0A5C66305C66733234205 C636630205C6578706E64305C6578706E647477305C6B65726E696E67300A5C757030205C6E6F737 5706572737562205C756C6E6F6E65205C6F75746C305C7374726F6B65776964746830205C7374726 F6B65633020436C5C2738656D656E74696E65206163685C273866746520646573205C27636675667 32E7D010000002300000001000000070000005458542E7274661000000000000000B601000000000 00000000000»}
end tell

So the program in which we paste grabs the format which it is aware of.

Yvan KOENIG (from FRANCE mardi 4 novembre 2008 10:29:52)

Nov 4, 2008 6:36 AM in response to Tom Gewecke

Tom Gewecke wrote:
If you run the system in Chinese, Japanese or Russian, I assume it reads local encodings instead of MacRoman.


I don't know, that would be interesting to test, especially since there are various alternative local encodings.


I know. That's the reason I did not test it straight away. Come to think about it, there must be a "default local" encoding for each language, that is used when TextEdit saves as text, for example. That may be the one.

I'll check next time I'm using a mac. This is typed using... ehm... another operating system.

Nov 4, 2008 7:11 AM in response to KOENIG Yvan

As I understand it, everything becomes UTF-8 when put on the clipboard


I repeat: are you sure of that ?


No, I'm not completely sure how the clipboard works in conjunction with the Text Encoding converter. One reference I found is:

http://hsivonen.iki.fi/kesakoodi/clipboard/

Whatever the mechanism, it seems clear that there is no problem copy/pasting correctly displayed text into Pages no matter
what the encoding of the origin document (and so this is what the OP should do to avoid the problem he described).

As for going the other direction, copy/pasting from Pages into Appleworks, it seems that some kinds of text will work but others will not.

Nov 4, 2008 10:08 AM in response to SermoDaturCunctis

Come to think about it, there must be a "default local" encoding for each language, that is used when TextEdit saves as text, for example. That may be the one.


I tried setting my OS to Russian. The encoding used when Pages exports Cyrillic as text in that case is Cyrillic (MacOS), and a document in that encoding can be opened via Page > File > Open without problem. So perhaps the default is always the "MacOS" version of the appropriate script.

Nov 4, 2008 11:39 AM in response to Tom Gewecke

Whatever the mechanism, it seems clear that there is no problem copy/pasting correctly displayed text into Pages no matter what the encoding of the origin document


Interesting, including the bit on the byte order mark.

One would have thought that a byte order mark was obligatory since code elements without identification of which way is the important one could be considered meaningless.

Management of character information and colour information (should) begin by explicitly identifying the source structure within which the information is supposed to be represented.

/hh

Nov 4, 2008 11:51 AM in response to Tom Gewecke

As for going the other direction, copy/pasting from Pages into Appleworks, it seems that some kinds of text will work but others will not.



Sure because AppleWorks is unable to wiork with Unicode encoding.

If I type:
Éléphant ⌘ ǞẴǢCÇ
in Pages then copy to the clipboard,
this one will contain:

tell current application
the clipboard as record
{Unicode text:"Éléphant ???CÇ",
styled Clipboard text:«data styl000300000000000C000900150047000C00000000000000000009000F000C04000047000C000 0000000000000000A000C000900150047000C000000000000»,
string:"Éléphant ? ???CÇ",
uniform styles:«data ustl000000020000009000000000000000140000002000000001000000100000000000000001000 0006C000000040000000000000000000001020000000100000000000001050000002C6E616D64000 0002400000001000000040000000100000000000000000000000948656C766574696361000000000 0010600000004000C000000000107000000060000000000000000»,
«class ut16»:"Éléphant ???CÇ",
«class utf8»:"Éléphant ???CÇ",
«class RTF »:«data RTF 7B5C727466315C6D61635C616E736963706731303030305C636F636F617274663832345C636F636 F617375627274663438300A7B5C666F6E7474626C5C66305C6673776973735C66636861727365743 7372048656C7665746963613B7D0A7B5C636F6C6F7274626C3B5C7265643235355C677265656E323 5355C626C75653235353B7D0A5C6465667461623730380A5C706172645C74783536305C747831313 2305C7478313638305C7478323234305C7478323830305C7478333336305C7478333932305C74783 43438305C7478353034305C7478353630305C7478363136305C7478363732305C706172646566746 1623730385C7061726469726E61747572616C0A0A5C66305C66733234205C636630205C6578706E6 4305C6578706E647477305C6B65726E696E67300A5C757030205C6E6F7375706572737562205C756 C6E6F6E65205C6F75746C305C7374726F6B65776964746830205C7374726F6B656330205C2738336 C5C2738657068616E74205C7563305C753839383420205C75343738205C7537383630205C7534383 220435C2738327D»,
«class rtfd»:«data rtfd72746664000000000300000002000000070000005458542E727466010000002E9F0100002B0 0000001000000970100007B5C727466315C6D61635C616E736963706731303030305C636F636F617 274663832345C636F636F617375627274663438300A7B5C666F6E7474626C5C66305C66737769737 35C666368617273657437372048656C7665746963613B7D0A7B5C636F6C6F7274626C3B5C7265643 235355C677265656E3235355C626C75653235353B7D0A5C6465667461623730380A5C706172645C7 4783536305C7478313132305C7478313638305C7478323234305C7478323830305C7478333336305 C7478333932305C7478343438305C7478353034305C7478353630305C7478363136305C747836373 2305C7061726465667461623730385C7061726469726E61747572616C0A0A5C66305C66733234205 C636630205C6578706E64305C6578706E647477305C6B65726E696E67300A5C757030205C6E6F737 5706572737562205C756C6E6F6E65205C6F75746C305C7374726F6B65776964746830205C7374726 F6B656330205C2738336C5C2738657068616E74205C7563305C753839383420205C75343738205C7 537383630205C7534383220435C2738327D010000002300000001000000070000005458542E72746 61000000000000000B60100000000000000000000»}
end tell

During this attempt, I discover a 'funny' thing. AppleWorks is in fact able to grab some eastern characters, even with no special extension.

If I copy the character CJK UNIFIED IDEOGRAPH-56A1 (Unicode 56A1) then paste it in AppleWorks, it is correctly displayed.
User uploaded file

it is grabbed from a font whose name ends with Pro W3".

Yvan KOENIG (from FRANCE mardi 4 novembre 2008 20:51:35)

Nov 4, 2008 12:15 PM in response to Henrik Holmegaard

One would have thought that a byte order mark was obligatory since code elements without identification of which way is the important one could be considered meaningless.


UTF-8 is only one way so a BOM is optional and can cause problems in some circumstances. In some kinds of UTF-16 a BOM is not permitted. Some more detail is here:

http://unicode.org/faq/utf_bom.html#BOM

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

French accents

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.