Address Book will not export in Unicode (UTF-8)

Hi,

I have a small problem with Address Book (10.4.8).
Although I have set the encoding to Unicode (UTF-8) and the vCard format to v2.1, Address Book exports any vCards with special characters (like ä,ö,ü,ß,é etc) by putting "CHARSET=LATIN1:" before the name field for example.

Is there any chance to persuade Address Book to export the correct Unicode characters and placing a "CHARSET=UTF-8:" before the name field ?

Thank you very much in advance !
Cheers,
Thorsten

MacBook Pro 15'' 2.33GHz Core2, Mac OS X (10.4.8)

Posted on Dec 11, 2006 12:54 PM

Reply
5 replies

Dec 12, 2006 6:15 AM in response to Thor777

I have a small problem with Address Book (10.4.8).
Although I have set the encoding to Unicode (UTF-8)
and the vCard format to v2.1, Address Book exports
any vCards with special characters (like ä,ö,ü,ß,é
etc) by putting "CHARSET=LATIN1:" before the name
field for example.


Looks like a bug to me.

Is there any chance to persuade Address Book to
export the correct Unicode characters and placing a
"CHARSET=UTF-8:" before the name field ?


Try including a character outside the Latin 1 range somewhere in the card. The Euro should work, or others from the lists on this page:

http://www.alanwood.net/demos/charsetdiffs.html

Dec 12, 2006 6:49 AM in response to Tom Gewecke

I have a small problem with Address Book (10.4.8).
Although I have set the encoding to Unicode

(UTF-8)
and the vCard format to v2.1, Address Book exports
any vCards with special characters (like ä,ö,ü,ß,é
etc) by putting "CHARSET=LATIN1:" before the name
field for example.


Looks like a bug to me.

Is there any chance to persuade Address Book to
export the correct Unicode characters and placing

a
"CHARSET=UTF-8:" before the name field ?


Try including a character outside the Latin 1 range
somewhere in the card. The Euro should work, or
others from the lists on this page:

http://www.alanwood.net/demos/charsetdiffs.html


Thank you for your hint !
After including an "€" into the field it was correctly exported with "CHARSET=UTF-8:". Unfortunately Address Book is only doing this for fields which has e.g. an €-Sign in the text. The other fields still get exported as LATINT1 ... makes me crazy !

Any further ideas ?

Thank you very much !
Cheers,
Thorsten

Dec 12, 2006 11:55 AM in response to Thor777

FWIW, the vCard 2.1 standard specifies that the default character set is ASCII but can be overridden for an individual property value by using the "CHARSET" property parameter. This document also says it does not intend to define the implementation of the specification, except for some fundamental capabilities.

IOW, there is no vCard specification that requires any field value to use a particular character set in Address Book, only that if it does so, it must use a character set registered with the Internet Assigned Numbers Authority (IANA), using the specified format for that declaration.

Note also that "Latin1" (or "LATIN1" -- it is case insensitive) is in fact an IANA alias for ISO-8859-1. This is not the same thing as "Windows Latin-1", "the ANSI character set," or "Windows 1252" - these are all names for the proprietary 8 bit character set of Microsoft. Nor is it the same as the Latin-1 Supplement in Unicode, sometimes referred to as the Latin 1 range.

Significantly, ISO-8859-1 does not map a code point to the Euro symbol "€." Both Windows 1252 & Unicode encodings do, but not to the same code point.

Taking all this into account, I believe Address Book doesn't have a bug so much as a feature (yeah, yeah, I know) that may not be welcome: it exports vCards in the most economical format possible. When a field value can be encoded in a 7 or 8 bit character set, it does so. When it can't, it uses Unicode. For the 8 bit character sets or for Unicode, it allows you to choose any of the available, appropriate encodings. For Unicode, one of these is UTF-8.

IOW, you can set the encoding but not the character set.

Dec 12, 2006 12:02 PM in response to R C-R

FWIW, the vCard 2.1
standard
specifies that the default character
set
is ASCII but can be overridden for an
individual property value
by using the "CHARSET"
property parameter. This document also says it does
not intend to define the implementation of the
specification, except for some fundamental
capabilities.

IOW, there is no vCard specification that
requires any field value to use a particular
character set in Address Book, only that if it does
so, it must use a character set registered with the
Internet Assigned Numbers Authority (IANA), using the
specified format for that declaration.

Note also that "Latin1" (or "LATIN1" -- it is case
insensitive) is in fact an IANA alias for ISO-8859-1.
This is not the same thing as "Windows Latin-1", "the
ANSI character set," or "Windows 1252" - these are
all names for the proprietary 8 bit character set of
Microsoft. Nor is it the same as the Latin-1
Supplement in Unicode, sometimes referred to as the
Latin 1 range.

Significantly, ISO-8859-1 does not map a code point
to the Euro symbol "€." Both Windows 1252 & Unicode
encodings do, but not to the same code point.

Taking all this into account, I believe Address Book
doesn't have a bug so much as a feature (yeah, yeah,
I know) that may not be welcome: it exports vCards in
the most economical format possible. When a field
value can be encoded in a 7 or 8 bit character set,
it does so. When it can't, it uses Unicode. For the 8
bit character sets or for Unicode, it allows you to
choose any of the available, appropriate
encodings. For Unicode, one of these is
UTF-8.

IOW, you can set the encoding but not the character
set.


Thank you very much for your detailed answer !
I feared that it has something to do with such "things" ...

At the end I have just finished to write an AppleScript doing must of the stuff I needed - and getting UTF8 encoding at the end was again the most time consuming part in it !

But now, it's working - more or less.

In case anyone stumbles accross a similar problem (e.g. he or she wants to export their contacts to an LG K810 😉, you can use the AppleScript (v0.1 or so) attached below.

Thanks to all again !
Cheers,
Thorsten

PS: Take care with the wrap around ...


tell application "Address Book"

set path_name to "Path:To:Folder:" --set here your destination folder

set num_persons to count of people

repeat with i from 1 to num_persons
tell person i

set first_name to first name as text
set last_name to last name as text
set p_name to name as text

set p_note to note as text
set is_company to company
set company_name to organization as text

set h_phone to (value of first phone where label is "home") as text
set w_phone to (value of first phone where label is "work") as text
set c_phone to (value of first phone where label is "mobile") as text

set p_email to (value of first email where label is "home") as text
try
if p_email is "" then set p_email to (value of first email) as text
on error
set p_email to ""
end try

--if i is 8 then display dialog (department as text)

end tell

if is_company then
set last_name to company_name
set p_name to company_name
set first_name to ""
else
if last_name is "missing value" then
set last_name to first_name
set first_name to ""
end if
end if

set vcard_2 to ""

set vcard_2 to (vcard_2 & "BEGIN:VCARD" & return)
set vcard_2 to (vcard_2 & "VERSION:2.1" & return)
set vcard_2 to (vcard_2 & "N;CHARSET=UTF-8:" & last_name & ";" & first_name & ";;;" & return)
set vcard_2 to (vcard_2 & "FN:" & p_name & return)

if c_phone is not "" then set vcard_2 to (vcard_2 & "TEL;CELL;VOICE:" & c_phone & return)
if h_phone is not "" then set vcard_2 to (vcard_2 & "TEL;HOME;VOICE:" & h_phone & return)
if w_phone is not "" then set vcard_2 to (vcard_2 & "TEL;WORK;VOICE:" & w_phone & return)

if p_email is not "" then set vcard_2 to (vcard_2 & "EMAIL;PREF;INTERNET:" & p_email & return)
if p_note is not "missing value" then set vcard_2 to (vcard_2 & "NOTE:" & p_note & return)

set vcard_2 to (vcard_2 & "END:VCARD")

--set vcard_str to vcard_2 as text
--set vcard_utf8 to vcard_2 as «class utf8»

set target_file to path_name & p_name & i & ".vcf" as text
try
set the open_file to open for access file target_file with write permission
set eof of the open_file to 0
write vcard_2 as «class utf8» to the open_file starting at eof
close access the open_file
on error theErrMsg number theErrNumber
close access the open_file
return theErrNumber
end try
end repeat
end tell

Dec 12, 2006 4:46 PM in response to R C-R

Taking all this into account, I believe Address Book
doesn't have a bug so much as a feature (yeah, yeah,
I know) that may not be welcome: it exports vCards in
the most economical format possible. When a field
value can be encoded in a 7 or 8 bit character set,
it does so. When it can't, it uses Unicode. For the 8
bit character sets or for Unicode, it allows you to
choose any of the available, appropriate
encodings. For Unicode, one of these is
UTF-8.


As far as I can tell, Address Book (in 2.1 mode) just won't let you encode those characters that can be covered by ISO Latin-1 as UTF-8. Everything else -- MacRoman and Win1252 stuff not in ISO Latin-1, everything in other ISO 8-bit charsets (Greek, Cyrillic, etc), everything not in these but in Unicode -- can be made UTF-8 by setting the button. Some stuff can be made UTF-8 but not UTF-16, but Chinese/Japanese can be made either. Quite a mess.

Is it correct that in format 3.0 the button plays no role and everything not in ascii is encoded UTF-16? A lot simpler if so.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Address Book will not export in Unicode (UTF-8)

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.