characters with accents incorrectly displayed

Question

Level 1

0 points

characters with accents incorrectly displayed

Hello,

when writing and sending Rich Text emails with Mail 2.1, the characters with accents such as "é" "è" "à" and "ù" are incorrectly displayed as "࠭" 饳 or "顬" is MS Outlook, or as "ŕ " in Gmail.

What can i do to avoid this problem. I don't want to switch to Plain Text!

Thanks a lot,

Felipe

Powerbook G4 Mac OS X (10.4.8) Mail 2.1

Posted on Dec 7, 2006 6:45 AM

Reply

Answer 1

Best reply

thomas_r.

Level 7

31,989 points

Dec 7, 2006 7:00 AM in response to avilarey

That's a text encoding problem -- in other words, the map Mail is using to convert numbers (since all text is just a string of bytes) into letters is different than the map being used by Outlook or GMail. Although I haven't used either of these, my advice would be to try using UTF-8 as the text encoding. Go to the Message menu and choose Unicode (UTF-8) from the Text Encoding submenu.

BTW, GMail has an FAQ on this topic:

http://mail.google.com/support/bin/answer.py?answer=22840

Reply

Answer 2

Tom Gewecke

Level 10

115,888 points

Dec 7, 2006 7:14 AM in response to avilarey

when writing and sending Rich Text emails with Mail
2.1, the characters with accents such as "é" "è" "à"
and "ù" are incorrectly displayed as "࠭" 饳 or "顬" is
MS Outlook, or as "ŕ " in Gmail.

See this note for information on the problem and possible fixes, at least for Win Outlook:

http://homepage.mac.com/thgewecke/woutlook.html

As for GMail and other webmail systems, trying to accomodate all their bugs and idiosyncracies is sometimes impossible.

Reply

Answer 3

R C-R

Level 6

17,782 points

Dec 7, 2006 8:53 AM in response to avilarey

In addition to what the others have said, note that this problem may occur with both web mail & email client access to any messages (or other text content) that use an extended character set (anything other than 7 bit ASCII).

This has nothing to do with Rich Text, which is a method for marking up or tagging text with metadata about how it is intended to be formatted. (HTML is another method for doing the same thing.) The only thing that matters is the text encoding. This can be a little confusing, but this may help:

A character & its display style are two different things. An "é" doesn't require Rich Text, just a character set that includes that character. But if you want the bold version " é" you have to use Rich Text or HTML or some similar form of marking up the plain text with a 'bold' tag of some sort. The tag itself is not displayed in an application aware of its purpose as a formatting instruction.

The trick is that even if the app is aware of the tag's purpose, it still has to decide what character set to use for the text. Since not all character sets contain all characters, using the wrong set leads to display problems, no matter how the text is marked up.

The best you can do is to use apps that handle text encoding properly, & hope that others do too.

Reply

Answer 4

Tom Gewecke

Level 10

115,888 points

Dec 7, 2006 9:12 AM in response to R C-R

In addition to what the others have said, note that
this problem may occur with both web mail & email
client access to any messages (or other text content)
that use an extended character set (anything other
than 7 bit ASCII).

This has nothing to do with Rich Text, which is a
method for marking up or tagging text with metadata
about how it is intended to be formatted. (HTML is
another method for doing the same thing.) The only
thing that matters is the text encoding.

Sorry, this is not correct. The particular problem being referred to here, accented charcters turning into Chinese, is particular to Win Outlook and occurs only with Rich Text messages created by Tiger Mail, which are in fact HTML and not RTF format, and which results in correct but mixed encodings. Going to Plain Text will fix it, but many users find Rich Text essential. The link provided earlier explains it in detail.

Reply

Answer 5

R C-R

Level 6

17,782 points

Dec 7, 2006 10:30 AM in response to Tom Gewecke

1. As you note, Tiger Mail's formatted text is not sent as Rich Text but as HTML. Thus, my statement that this has nothing to do with Rich Text is correct.

2. As you note, Outlook's particular problem is with mixed text encodings. Thus, my statement that what matters is the text encoding is also correct.

3. Both Felipe & another reply included references to similar problems with Gmail. Thus, the issue here is not confined to Mail/Outlook interaction, but is in fact a general one relating to text encoding.

4. "Plain Text" is often mistaken as meaning 7 bit ASCII. I wanted to make it clear that there are other character sets that can be used when characters not in the 7 bit ASCII set are required. This was in response to Felipe's comment that he did not want to use plain text, implying (to me, anyway) that he may have assumed using plain text would prevent him from using accented characters while using Rich Text would not.

5. Your link adequately addresses the Mail/Outlook interaction issue, but it does not go into sufficient detail for the more general issue discussed here. It may even mislead some with such things as the reference to "anything beyond basic ascii" if taken out of context. This too is a reason I wanted to make the point of #4 clear.

Reply

Answer 6

Tom Gewecke

Level 10

115,888 points

Dec 7, 2006 12:40 PM in response to R C-R

1. As you note, Tiger Mail's formatted text is not
sent as Rich Text but as HTML. Thus, my statement
that this has nothing to do with Rich Text is
correct.

I think that you are confusing the terminology. Mail does not use the term "formatted text." It offers the user the choice between "Rich Text" and "Plain Text." In Tiger Mail, choosing "Rich Text" means HTML. When you use the term "Rich Text" to mean something else, like .rtf, your statements can become misleading or incorrect. As a practical matter, the problem posed has a lot to do with "Rich Text," as this term is used by Apple/Mail, because when the user selects "Plain Text" instead of "Rich Text" in the Mail menus, the problem will not exist.

2. As you note, Outlook's particular problem is with
mixed text encodings. Thus, my statement that what
matters is the text encoding is also correct.

Except that the text encoding does not matter when you set Mail to Plain Text instead of Rich Text.

5. Your link adequately addresses the Mail/Outlook
interaction issue, but it does not go into sufficient
detail for the more general issue discussed here. It
may even mislead some with such things as the
reference to "anything beyond basic ascii" if taken
out of context.

I'm always open to corrections and improvements of the note. If you could explain how the reference to characters beyond basic ascii could be misleading, I could try to fix it. The intended point is that getting bad characters in Win Outlook at the other end is only likely to be a problem when you have characters beyond basic ascii in your text (in addition or course to using the Rich Text setting in Mail).

Reply

Answer 7

R C-R

Level 6

17,782 points

Dec 8, 2006 3:53 AM in response to Tom Gewecke

I used terminology appropriate for the general issue under discussion, which includes display problems both in MS Outlook & in Gmail. You have said something I have said about this is not correct, but it appears you are instead quibbling about how applicable it is to the one aspect of the issue you focus on.

The point I'm trying to make is simple: display problems of this type are the result of incorrect text decoding by the viewing application, not by its handling of formatting metadata. If there is something you feel is incorrect about this, please explain what it is.

Regarding the use of the "Rich Text" terminology in general, please remember that this actually properly refers to a proprietary Microsoft standard. Mail 1.x didn't actually send formatted text in "Rich Text" format but in enriched text format, which is often confused with it. Mail 2.x sends formatted text in HTML. Apple still refers to this as "rich text" in Mail (& even as "RTF" in a few places, like in the popup "tooltip" for the Mail 2 help topic "Enabling use of text styles and pictures in email") but this is not correct -- the only supported choice for formatted text is HTML.

Moreover, Mail's use of "Plain Text" with attachments is as likely to cause problems with some email clients as is HTML. This is because Mail always uses the "inline" MIME declaration for attachments in plain text messages. Although there is nothing about this that violates any MIME standard, some email client apps misbehave when they encounter the "inline" declaration in the middle of a multipart, plain text message. For this reason, when using plain text format with attachments in Mail, it is best to place all attachments at the end of the text.

Please note that all of these considerations deal with bugs in other applications, but none of them actually are triggered by the presence of text formatting metadata as found in Rich Text, HTML, enriched text, or any similar method for marking up the the text part of a message. This is widely referred to as "formatted text" but if you prefer some other term, I have no objection, as long as the distinction remains clear.

Reply

Answer 8

Tom Gewecke

Level 10

115,888 points

Dec 8, 2006 7:14 AM in response to R C-R

The point I'm trying to make is simple: display
problems of this type are the result of incorrect
text decoding by the viewing application, not by its
handling of formatting metadata. If there is
something you feel is incorrect about this, please
explain what it is.

No problem with that. The point I am making is a very practical one, namely that when someone asks here in the forums "How can I stop my accented characters from looking like Chinese at the other end?" (which has occurred hundreds of times since Tiger was released), one of the correct answers is "switch to Plain Text instead of Rich Text," and not "This has nothing to do with Rich Text." Because for the person with the problem, who can't change things at the other end, Rich Text does matter. This isn't really a "general issue," but one involving some very specific choices in Mail parameters and what platform/program is being used at the other end.

Regarding the use of the "Rich Text" terminology in
general, please remember that this actually properly
refers to a proprietary Microsoft standard.

People who are involved in text encoding issues, and currently Apple's menu's as well, generally use this term in a broader sense, including both proprietary and open varieties, such as found in the Unicode standard's glossary of definitions: "Rich Text. Also known as styled text. The result of adding information to plain text. Examples of information that can be added include font data, color, formatting information, phonetic annotations, interlinear text, and so on. "

Moreover, Mail's use of "Plain Text" with attachments
is as likely to cause problems with some email
clients as is HTML.

As far as I know, those "problems" do not include the misreading or mislabeling of character encodings, but if you have an example, I would love to see it.

Please note that all of these considerations deal
with bugs in other applications, but none of them
actually are triggered by the presence of text
formatting metadata

I agree with that.

Reply

Answer 9

thomas_r.

Level 7

31,989 points

Dec 8, 2006 7:14 AM in response to Tom Gewecke

Except that the text encoding does not matter when
you set Mail to Plain Text instead of Rich Text.

Of course it does! The text encoding means everything!

A text encoding is a map that translates bytes into human-readable characters. All text in a computer is encoded as a string of bytes according to some text encoding, and different text encodings map numeric values to different characters. In order to get accurate text from a string of bytes, you must use the same text encoding to turn the bytes into characters as was used to turn characters into bytes.

Unfortunately, there are many different text encodings, and each of them encodes characters differently. UTF-8 is as close to a "universal" text encoding as there is these days, and it's a huge step forward compared to encodings like MacRoman.

Also, referring to "ASCII text" is technically incorrect. There isn't really such a thing any more. ASCII is a 7-bit character subset that happens to be common to most -- but not all by any means -- text encodings. Which means that text encoding problems most often show up with characters outside the ASCII set, like accented characters, bullets, trademark symbols, etc.

Plain text in Mail is encoded using some text encoding, which is totally different from text containing special formatting codes, which Mail calls "Rich Text", such as HTML or RTF. The problem described here happens when the map used to decode the byte string in the mail client on the receiving end does not match what Mail used to encode the byte string.

Using UTF-8 as your text encoding gives you the best chance these days of having your text interpreted as you meant it to be.

Reply

Answer 10

Tom Gewecke

Level 10

115,888 points

Dec 8, 2006 7:29 AM in response to thomas_r.

Except that the text encoding does not matter when
you set Mail to Plain Text instead of Rich Text.

Of course it does! The text encoding means
everything!

What is being stated here is that when Plain Text is used in Mail and Win Outlook is being used at the other end, the Mail user does not need to be concerned about text encoding, as it will automatically be handled correctly. With Rich Text the Mail user will need to be concerned, because changing his encoding may be necessary to ensure correct display.

Also, referring to "ASCII text" is technically
incorrect. There isn't really such a thing any more.

Yes there is. If you send a simple plain text English message in Mail, the charset will normally be declared as US-ASCII, for example.

Using UTF-8 as your text encoding gives you the best
chance these days of having your text interpreted as
you meant it to be.

Though this is generally true, UTF-8 causes some major problems for non-Latin scripts on cell phone email systems, which are heavily used in Asia, for example. See this note:

http://discussions.apple.com/thread.jspa?threadID=121808&tstart=60

Reply

Answer 11

R C-R

Level 6

17,782 points

Dec 8, 2006 9:43 AM in response to Tom Gewecke

The point I am making is a very practical one,
namely that when someone asks here in the
forums "How can I stop my accented characters
from looking like Chinese at the other end?" ...

Tom! Please reread the original post & initial replies carefully. This is not an accurate representation of what Felipe asked. Your initial reply acknowledged you addressed only a part of the overall issue ("at least for Win Outlook"). I tried to make it explicitly clear that I was supplying additional information ("In addition to what the others have said") that might be helpful with understanding the overall issue, not just how it impacted Outlook users.

As the discussion has progressed, I have tried to do more of that, which unfortunately means delving into how the terminology is sometimes used without much regard for accuracy or appropriate context. Regarding the Glossary of Unicode Terms entry for Rich Text, the full entry illustrates exactly what I have been talking about:

Rich Text. Also known as styled text. The result of adding information to plain text. Examples of information that can be added include font data, color, formatting information, phonetic annotations, interlinear text, and so on. The Unicode Standard does not address the representation of rich text. It is expected that systems and applications will implement proprietary forms of rich text. Some public forms of rich text are available (for example, ODA, HTML, and SGML). When everything except primary content is removed from rich text, only plain text should remain.

Part of the confusion is because "Rich Text" appears capitalized in headings & in Apple menus, making it difficult to determine if it means the generic, lower case "rich text" term found in carefully worded documentation (including much of Apple's) or to the proprietary "Rich Text Format" of Microsoft. It would be helpful if notes & commentary by users were as careful about this, but I have little hope that will ever happen.

The same goes for "ASCII" which is in fact only a 7-bit coded character set that (quoting the same Glossary of Unicode Terms), "has been incorrectly used to refer to various 8-bit character encodings that include ASCII characters in the first 128 positions." Similarly, "plain text" properly refers to any character set, the only requirement being that it be free of other formatting or structural information.

We all occasionally get sloppy about this, using "Plain Text" to mean ASCII or visa versa, or making some careless statement about how using plain text frees us from any concerns of text encoding problems, which it most certainly does not.

Reply

Answer 12

R C-R

Level 6

17,782 points

Dec 8, 2006 10:44 AM in response to Tom Gewecke

If you believe using "simple plain text English" frees you from concerns about text encoding, try sending someone a message using simple English sentences like "It's 20° in North Texas this morning," or "I wish I had a cheap 99¢ cap to keep my ears warm." Those are very 'US English' sentences ... they just aren't US-ASCII code-able ones.

Plain text ≠ ASCII text, even for English writers. (View this message with Safari, using various text encodings, for graphic evidence of this.)

Reply

Answer 13

Tom Gewecke

Level 10

115,888 points

Dec 8, 2006 11:49 AM in response to R C-R

If you believe using "simple plain text English"
frees you from concerns about text encoding,

Agreed, Plain Text only frees you from some of these concerns. In particular, using Plain Text in Mail means you do not have to worry about playing with your text encoding menu to ensure that the millions of people using Win Outlook to receive email will see it correctly, regardless of the the language you are writing in. This is of daily practical importance to a great many Mail users. If you use Rich Text, especially when you also have an attachment, you do need to worry about the encoding menu.

As for what webmail users will see, I think the only way to figure that out is by experiment, because these systems do not necessarily handle encodings other than ascii, or html messages, in any easily predictable way. How GMail could display an r-acute in place of an accented vowel is a mystery, not conforming to any encoding misreading I'm familiar with so far (more Google innovation?).

Plain text ≠ ASCII text, even for English writers.

Agreed. I'd be the last person to equate Plain Text with ASCII, or to assume anyone else was doing so. But English can be written in ASCII, and often is.

Reply

Answer 14

Barolo

Level 1

0 points

Feb 5, 2007 9:05 AM in response to thomas_r.

That's a text encoding problem -- in other words, the
map Mail is using to convert numbers (since all text
is just a string of bytes) into letters is different
than the map being used by Outlook or GMail.
Although I haven't used either of these, my advice
would be to try using UTF-8 as the text encoding.
Go to the Message menu and choose Unicode (UTF-8)
from the Text Encoding submenu.

BTW, GMail has an FAQ on this topic:

http://mail.google.com/support/bin/answer.py?answer=22
840

This solution works for me, but everytime i open a new mail i have to reset the text encoding to UTF-8.
is there a way to de it by default?

Reply

Answer 15

Tom Gewecke

Level 10

115,888 points

Feb 5, 2007 9:19 AM in response to Barolo

This solution works for me, but everytime i open a
new mail i have to reset the text encoding to UTF-8.
is there a way to de it by default?

See this note for ways to do this:

http://homepage.mac.com/thgewecke/woutlook.html

Reply