Java API 1.7 & character encoding / Site XML

I'm not too sure what is causing this but characters with accent marks are showing up as ? marks in my site XML. I noticed this first when trying to use the Java API. I tried to use the Automator actions, still am getting ? marks. I tried different encoding options (UTF-8, Western, etc...) in Automator, but they don't seem to help.

Any ideas? I tried specifying the encoding type on my jsp page, but it doesn't help. Is this a limitation of the site XML? Or am I not supposed to paste in special characters into iTunes U track names?

mac Mini Core Duo, MBP Core 2 Duo 2.2, Mac OS X (10.5.1)

Posted on Apr 24, 2008 9:25 AM

Reply
4 replies

Apr 24, 2008 11:00 AM in response to ururk

To the best of my knowledge, the Java API uses UTF-8 everywhere. At ASU, we have many titles with accented characters, and I have verified in the past that they could be read and written without corruption. However, I have not tested this recently, and it is entirely possible that the Java API has encoding issues. I will do some tests and let you know what I find out.

I'm not too familiar with Automator, but it sounds like you are having encoding problems whether you use the Java API or not. This suggests that the problem may lie elsewhere. I wouldn't be so bold as to claim that the Java API is not at fault, though. Character encoding is a recurring issue with software development these days, and despite our best efforts, it rarely works the first time (or the second). "Apply more UTF-8 to the affected area" is the usual prescription.

Thanks,
Dave

Apr 25, 2008 4:53 PM in response to ururk

Woolamaloo ... the OS X application and the Automator actions ... send UTF-8 whenever they send a web services request. There is no way for you to change the encoding.

When Woolamaloo gets a reply from Apple, it passes the XML to Apple's Cocoa XML utility classes. These interpret the data based on the encoding specified in the returned XML. I'm going by memory here, but I would be shocked if Apple were saying anything other than UTF-8 is being returned to us.

Because of the way UTF-8 operates, I'm basically on the same page as Dave ("ramen"). The brilliance of UTF-8 is that something encoded as UTF-8 can, for the most part, still be treated as a C string and held in a "char \[\]" style array. Many ANSI C string routines, like "strcat()", work on UTF-8 strings because they're just a bag of non-zero bytes terminated by a zero-byte. But therein also lies the danger ... it's entirely possible to interpret a UTF-8 string as a legal ASCII string. In a case like that, a two-byte encoding of a character would be interpreted as two distinct characters. This is likely what's going on in deep, dark places within iTunes, possibly way outside the iTunes U realm in the store. 🙂 In other words, we send UTF-8 to Apple's iTunes U team, they get it and pass it to "the store" ... and the store does something funkified behind the scenes, misinterpreting UTF-8 as ASCII somewhere along the line. Any results are returned to us as UTF-8 by the iTunes U gang ... but it's too late ... a two-byte character encoding has been transformed into two, one-byte, characters.

In the case of Java ... and, again, going by memory ... all characters are two bytes long. Still, the "right thing" is almost certainly happening in Dave's jar. When Dave gets data back from Apple, he's likely saying, "Interpret this data using the encoding specified in the XML and save it to Java's native character representation."

Just to round out the discussion, .NET uses UTF-16 as its native character encoding for strings. But, again, whenever moving data from one encoding to another, you can say "interpret XML as UTF-8 when saving to native UTF-16."

Apr 29, 2008 12:48 PM in response to ururk

I can confirm that the site XML incorrectly contains '?' characters instead of high-numbered (I'm guessing, above 128) Unicode characters. I can reproduce this with or without the Java API; doing a simple "wget" on the ShowTree URL produces the same result. If I go into the iTunes application, the characters display fine, and I can enter new descriptions with Unicode characters (I tried Chinese, for example) and they work properly.

I think the web services (Apple's side) are at fault. It looks like they are converting all strings to Latin 1 or ASCII. My guess is the latter, since even curly quotes come out as '?', and they should be available in Latin 1.

Sending Unicode characters to Apple through the web services seems to work, on the other hand. I was able to set a course description with Chinese characters via the Java API and see the correct results in the iTunes application.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Java API 1.7 & character encoding / Site XML

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.