how to set the system default file character encoding to UTF-8?

Hi all. This is driving me nuts, on both my Windows box and Snow Leopard; I figure much more chance of finding the answer for OS X.

My language and locale are set to Australian English. $LANG=en_AU.UTF-8

However, as I believe is expected, OS X (and Windows for that matter) will create files by default with character encoding of Cp1252 (Latin-1). That is, the FILE encoding in the file metadata - the Byte Order Mark I believe. The file itself, not the characters written to it.

This, in a word, bites. I don't want to be restricted to only ASCII by default, and it is causing me problems with certain software (a Firefox plugin) that creates text files, passing in UTF-8 encoded content, which is then mangled because the file encoding itself is still Cp1252. (I know, I've tested this by changing the file encoding manually and having it overwritten again by the plugin: works correctly.)

As a simple example, just `touch somefile` from terminal creates a file in Cp1252 -- I'm obtaining that info by opening in jEdit by the way (anyone know of something better?).

In other locales that are not English-based, I believe the default file encoding is UTF-8. But surely this can be controlled independently? There must be a system configuration value somewhere that specifies file encoding default. Can someone please tell me what it is?

Thanks!

MBP 15", Mac OS X (10.6.2)

Posted on Feb 2, 2010 10:41 PM

Reply
13 replies

Feb 3, 2010 5:44 AM in response to civilisationshift

However, as I believe is expected, OS X (and Windows for that matter) will create files by default with character encoding of Cp1252 (Latin-1). That is, the FILE encoding in the file metadata - the Byte Order Mark I believe. The file itself, not the characters written to it.


Apps like TextEdit and Mail have settings that let you determine the encoding of text produced. The default would normally depend on the character content of the file, ranging from ASCII for basic English to Windows Latin-1 (Win 1252) or ISO Latin -1 (ISO 8859-1) to UTF-8 for other content.

Win 1252 is not ASCII, but has twice the number of characters in the latter.

Byte Order Mark is something totally different --it's a particular character used to signal certain encodings.

http://en.wikipedia.org/wiki/Byteordermark

As a simple example, just `touch somefile` from terminal creates a file in Cp1252 -- I'm obtaining that info by opening in jEdit by the way (anyone know of something better?).


For what Terminal does and how to change it, it might best to post in the Unix forum:

http://discussions.apple.com/forum.jspa?forumID=735

For problems with a FireFox plugin, it might be good to ask on their own forums as well.

Mar 13, 2010 3:43 PM in response to civilisationshift

I would really like an answer to this as well. I have a person using the same s/w I am using, but he is in Canada and his computer (which is identical) saves files from the same s/w as Western (MacRoman), which mangles the text pretty badly. Same s/w, different encoding! If I could figure out how he can reset his computer to make it save text files as UTF-8, we'd have our problem solved. Anyone know how to do this???

Mar 13, 2010 4:03 PM in response to Patrick Besong

Patrick Besong wrote:
I would really like an answer to this as well. I have a person using the same s/w I am using, but he is in Canada and his computer (which is identical) saves files from the same s/w as Western (MacRoman), which mangles the text pretty badly. Same s/w, different encoding! If I could figure out how he can reset his computer to make it save text files as UTF-8, we'd have our problem solved. Anyone know how to do this???


Most programs can save files in a number of different encodings. It depends on the program.

Mar 13, 2010 4:03 PM in response to Patrick Besong

his computer (which is identical) saves files from the same s/w as Western (MacRoman), which mangles the text pretty badly. Same s/w, different encoding!


What software is this exactly? I don't think you can change the OS the way you want. Normally you do such things in each app. E.g. in TextEdit you can choose dozens of encodings for saving or opening.

Mar 13, 2010 5:30 PM in response to Patrick Besong

so what exactly controls how which encoding is used by the operating system to save text files? apparently different languages must have something to do with it.


Maybe the developer forum could help:

http://discussions.apple.com/forum.jspa?forumID=727

In apps I am familiar with, e.g. Mail, the default decoding is determined by the character content of the text. Also the default encoding for some kinds of text may be different depending on the language which the OS is set to. I don't know if that is documented anywhere -- I have always had to find out by testing.

Mar 13, 2010 5:55 PM in response to Patrick Besong

Patrick Besong wrote:
the app is built on realbasic, not xcode.


That doesn't matter.

so what exactly controls how which encoding is used by the operating system to save text files?


The operating system doesn't save text files.

apparently different languages must have something to do with it.


The app in question may look at the current locale to determine what format to use when saving files. Still, that is something the app is doing. You could easily check this by switching your computer to Canadian English or Français canadien, whatever our friend happens to use, and trying your app.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

how to set the system default file character encoding to UTF-8?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.