French accents

I have been talked into putting together the local village magazine. I am English but this is France so the mag is in french. I'm having a lot of problems with the accents and special characters which come across as strange symbols. I do not understand why if they are plain text files and then I use the same font as the original writer, the accents don't show up correctly? Must be something to do with ASCII characters. Anyone have any experience of this subject?
cheers

2x2.66 dual intel xenon mac pro, Mac OS X (10.5.5), RME hardware

Posted on Nov 3, 2008 10:59 AM

Reply
23 replies

Nov 3, 2008 11:59 AM in response to oddshoes

I do not understand why if they are plain text files and then I use the same font as the original writer, the accents don't show up correctly?


The fonts don't matter, it's the encoding. Perhaps people are giving you text files produced in Latin-1, while Pages expects UTF-8 Unicode. You may need to open the text files (for example with TextEdit) as Latin-1 and then copy/paste into Pages.

Nov 3, 2008 12:12 PM in response to Tom Gewecke

while Pages expects UTF-8 Unicode.


Are you sure of that Tom ?

When I work under 10.4.11 and copy text from AppleWorks, It's not UTF 8 but 'Mac OS Roman'.
I may paste it in Pages without any kind of problem.

In its documents, Pages doesn't use the Utf 8 encoding.
It uses plain ASCII and this kind of entities :
User uploaded file

Yvan KOENIG (from FRANCE lundi 3 novembre 2008 21:12:55)

Nov 3, 2008 12:30 PM in response to KOENIG Yvan

while Pages expects UTF-8 Unicode.


Are you sure of that Tom ?


You are right, when you try to open a text file via File > Open in Pages, it expects MacRoman, not UTF-8. Still if you try to open a French document produced by Windows, for example, in Latin-1, it will not display correctly.

I may paste it in Pages without any kind of problem.


When you have a text displayed in an app, you can copy/paste into Pages without problem regardless of the encoding of the original. It's therefore better to copy/paste than to try to open some text file directly with Pages (which may be what the OP was doing).

In its documents, Pages doesn't use the Utf 8 encoding.


Right, the example you gave is using html ascii escape codes.

Nov 3, 2008 1:15 PM in response to Tom Gewecke

Ironically, Pages will open a UTF-8 text file correctly if the file begins with a BOM (byte order mark). But normally only Windows apps create UTF-8 files with BOM's, and it is not supposed to be necessary to have one (TextEdit does not offer the option).


I knew that but as I wrote, Pages is able to work with non Utf8 datas like the 'styled text' encoded as Mac OS Roman which is pasted from AppleWorks or other old beasts.

Yvan KOENIG (from FRANCE lundi 3 novembre 2008 22:15:40)

Nov 3, 2008 2:00 PM in response to KOENIG Yvan

Pages is able to work with non Utf8 datas like the 'styled text' encoded as Mac OS Roman which is pasted from AppleWorks or other old beasts.


When you are copy/pasting a (correctly) displayed text, the encoding of the original document, whether MacRoman or UTF-8 or Chinese GB-2312, doesn't matter. As I understand it, everything becomes UTF-8 when put on the clipboard (if not earlier) and that is what gets transferred to Pages, even though Pages later stores its Saved data as ascii escapes.

When you do File/Open in Pages, the encoding of the document does matter, as I think only MacRoman and UTF-8 with BOM will be correctly displayed.

Nov 4, 2008 1:29 AM in response to Tom Gewecke

When you are copy/pasting a (correctly) displayed text, the encoding of the original document, whether MacRoman or UTF-8 or Chinese GB-2312, doesn't matter. As I understand it, everything becomes UTF-8 when put on the clipboard


I repeat: are you sure of that ?

AppleWorks is unable to use Unicode encoding and it is able to receive text from Pages.

In fact what is copied from AppleWorks remains MacRoman in the clipboard.
Here is an example.
tell current application
the clipboard as record
{styled Clipboard text:«data styl000100000000000F000A00150000000C000000000000»,
string:"Clémentine achète des œufs."}
end tell

What is copied from Pages is stored this way in the clipboard:

running:
set zz to the clipboard as record

I get:

tell current application
the clipboard as record
{Unicode text:"Clémentine achète des œufs.",
styled Clipboard text:«data styl000100000000000C00090015001B000C000000000000»,
string:"Clémentine achète des œufs.",
uniform styles:«data ustl0000000200000090000000000000001400000020000000010000001B0000000000000001000 0006C000000040000000000000000000001020000000100000000000001050000002C6E616D64000 0002400000001000000040000000100000000000000000000000948656C766574696361000000000 0010600000004000C000000000107000000060000000000000000»,
«class ut16»:"Clémentine achète des œufs.",
«class utf8»:"Clémentine achète des œufs.",
«class RTF »:«data»,
«class rtfd»:«data rtfd»}
end tell

So the program in which we paste grabs the format which it is aware of.

Yvan KOENIG (from FRANCE mardi 4 novembre 2008 10:29:52)

Nov 4, 2008 6:36 AM in response to Tom Gewecke

Tom Gewecke wrote:
If you run the system in Chinese, Japanese or Russian, I assume it reads local encodings instead of MacRoman.


I don't know, that would be interesting to test, especially since there are various alternative local encodings.


I know. That's the reason I did not test it straight away. Come to think about it, there must be a "default local" encoding for each language, that is used when TextEdit saves as text, for example. That may be the one.

I'll check next time I'm using a mac. This is typed using... ehm... another operating system.

Nov 4, 2008 7:11 AM in response to KOENIG Yvan

As I understand it, everything becomes UTF-8 when put on the clipboard


I repeat: are you sure of that ?


No, I'm not completely sure how the clipboard works in conjunction with the Text Encoding converter. One reference I found is:

http://hsivonen.iki.fi/kesakoodi/clipboard/

Whatever the mechanism, it seems clear that there is no problem copy/pasting correctly displayed text into Pages no matter
what the encoding of the origin document (and so this is what the OP should do to avoid the problem he described).

As for going the other direction, copy/pasting from Pages into Appleworks, it seems that some kinds of text will work but others will not.

Nov 4, 2008 10:08 AM in response to SermoDaturCunctis

Come to think about it, there must be a "default local" encoding for each language, that is used when TextEdit saves as text, for example. That may be the one.


I tried setting my OS to Russian. The encoding used when Pages exports Cyrillic as text in that case is Cyrillic (MacOS), and a document in that encoding can be opened via Page > File > Open without problem. So perhaps the default is always the "MacOS" version of the appropriate script.

Nov 4, 2008 11:39 AM in response to Tom Gewecke

Whatever the mechanism, it seems clear that there is no problem copy/pasting correctly displayed text into Pages no matter what the encoding of the origin document


Interesting, including the bit on the byte order mark.

One would have thought that a byte order mark was obligatory since code elements without identification of which way is the important one could be considered meaningless.

Management of character information and colour information (should) begin by explicitly identifying the source structure within which the information is supposed to be represented.

/hh

Nov 4, 2008 11:51 AM in response to Tom Gewecke

As for going the other direction, copy/pasting from Pages into Appleworks, it seems that some kinds of text will work but others will not.



Sure because AppleWorks is unable to wiork with Unicode encoding.

If I type:
Éléphant ⌘ ǞẴǢCÇ
in Pages then copy to the clipboard,
this one will contain:

tell current application
the clipboard as record
{Unicode text:"Éléphant ???CÇ",
styled Clipboard text:«data styl000300000000000C000900150047000C00000000000000000009000F000C04000047000C000 0000000000000000A000C000900150047000C000000000000»,
string:"Éléphant ? ???CÇ",
uniform styles:«data ustl000000020000009000000000000000140000002000000001000000100000000000000001000 0006C000000040000000000000000000001020000000100000000000001050000002C6E616D64000 0002400000001000000040000000100000000000000000000000948656C766574696361000000000 0010600000004000C000000000107000000060000000000000000»,
«class ut16»:"Éléphant ???CÇ",
«class utf8»:"Éléphant ???CÇ",
«class RTF »:«data»,
«class rtfd»:«data rtfd»}
end tell

During this attempt, I discover a 'funny' thing. AppleWorks is in fact able to grab some eastern characters, even with no special extension.

If I copy the character CJK UNIFIED IDEOGRAPH-56A1 (Unicode 56A1) then paste it in AppleWorks, it is correctly displayed.
User uploaded file

it is grabbed from a font whose name ends with Pro W3".

Yvan KOENIG (from FRANCE mardi 4 novembre 2008 20:51:35)

Nov 4, 2008 12:15 PM in response to Henrik Holmegaard

One would have thought that a byte order mark was obligatory since code elements without identification of which way is the important one could be considered meaningless.


UTF-8 is only one way so a BOM is optional and can cause problems in some circumstances. In some kinds of UTF-16 a BOM is not permitted. Some more detail is here:

http://unicode.org/faq/utf_bom.html#BOM

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

French accents

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.