Base64 decoding - perl & openssl vs. html embedded image

I am having a strange problem. I have a TIFF image stored in a SQLite database encoded as Base64. When I retrieve the Base64 characters, save them to a file using an applescript writing a utf-8 text file to disk (or just saving it in TextWrangler), and then try to feed that file into openSSL using the Base64 decoding option, I get an empty file or just a small (not nearly large enough) file that will not open. BUT! When I put that same string into a img tag using base64 decoding in an HTML document, it displays properly in a browser window. Anyone have an idea why this might happen? It does not work using perl either.

The html code is here: http://web.me.com/danaleighton/base64.html

The file I am feeding into openSSL is here: http://web.me.com/danaleighton/base64sm.txt

I am using the following command: openssl enc -base64 -d -in base64sm.txt -out testscript.tiff

This perl command also does not work: perl -i.txt -MMIME::Base64 -e 'undef $/;while(){print decode_base64($_);}' base64sm

Thanks in advance for any help!!!

MPB (Early 2008) 2.24GHz, 4GB, 200GB SATA, Mac OS X (10.6.4)

Posted on Jul 27, 2010 11:32 PM

Reply
6 replies

Jul 28, 2010 4:04 AM in response to Dana Leighton

Hi Dana -

The decode command line you posted is correct, so it's most likely the text you're extracting from the database is either corrupted or wasn't encoded as you expect. Note that your success with the img tag decoding may be misleading. I can't give you details that apply to this case, but browsers typically use smarter decoding tools which are more robust than a single command line with fixed options. So unless your target display is in fact a browser, I'm not sure it's useful to try and find out how your browser is making sense of that text file.

1) Was the text in the database produced by an internal encoder or was it obtained either from a standard e-mail attachment or produced by this command or its equivalent?:
openssl enc -base64 -in testscript.tiff -out base64sm.txt

2) Are you sure you're retrieving exactly the same bytes that were stored in the database?

To isolate the problem, you might use the above command to encode a small image file, e.g.:
openssl enc -base64 -in picture.png -out picture.txt

Load picture.txt into your database (as text, without any further processing), and then extract that text as best you can, saving the result in picture2.txt. Then:
diff picture.txt picture2.txt

These files should be identical, so if you get any output from diff, you need to look at how you stored and/or retrieved the bytes to/from the database. If the files are identical, you need to look at how your tiff file was encoded.

Hope that helps!
\- Ray

Jul 28, 2010 6:58 AM in response to Dana Leighton

Dana Leighton wrote:
I have a TIFF image stored in a SQLite database encoded as Base64.


Why? SQLite can do BLOBs.

When I retrieve the Base64 characters, save them to a file using an applescript writing a utf-8 text file to disk


You don't need UTF for Base64, just ASCII.

Anyone have an idea why this might happen?


Yes. Your Base64 is incorrect. Each line can be no more than 76 characters long. Here is the RFC for your review: http://www.ietf.org/rfc/rfc2045.txt

Jul 28, 2010 9:57 AM in response to RayNewbie

it's most likely the text you're extracting from the database is either corrupted or wasn't encoded as you expect. Note that your success with the img tag decoding may be misleading. I can't give you details that apply to this case, but browsers typically use smarter decoding tools which are more robust ...
1) Was the text in the database produced by an internal encoder or was it obtained either from a standard e-mail attachment or produced by this command or its equivalent?:

Thanks Ray - I was afraid of that. I am extracting the image from a database written by a commercial application. I checked with the developers, and they said it was base64, so that's all I have to go on. I reported this anomaly to them but haven't heard back in over a week.

2) Are you sure you're retrieving exactly the same bytes that were stored in the database?

As sure as I can be. For testing, I invoke sqlite from a shell to do the select statement and dump the result to the terminal. Then copy and paste from terminal to a TextWrangler document. I have done the same with an Applescript.

One complicating factor: the text stored in the database has these weird markup tags that I can't find any documentation. They look like this: <iimg><preferredFilename>Embedded image</preferredFilename><segment>TU0AK...=
</segment><segment>UdSVL...=
</segment></iimg>

I strip these out of the field coming from the database.

To isolate the problem...

Unfortunately, since I am not creating the database, I am loath to write to it. I could, but since the problem is likely in the way the developers are encoding or storing the data, it wouldn't do much to solve my problem... 😟

Thanks so much for your suggestions!! Very helpful.

Jul 28, 2010 10:19 AM in response to etresoft

etresoft wrote:
Why? SQLite can do BLOBs.

As I wrote to Ray above, this data is generated by the developers of the application whose data I am reading. They are storing the image and regular character data in the same field, so I presume that's why they chose to do it this way.

You don't need UTF for Base64, just ASCII.

thanks. I get the same results when I store it as regular ASCII.

Anyone have an idea why this might happen?


Yes. Your Base64 is incorrect. Each line can be no more than 76 characters long. Here is the RFC for your review: http://www.ietf.org/rfc/rfc2045.txt

OK - I reviewed that. The data I am sending is 73 characters followed by a line break, whcih is within the spec.

Jul 28, 2010 11:30 AM in response to Dana Leighton

Dana Leighton wrote:
OK - I reviewed that. The data I am sending is 73 characters followed by a line break, whcih is within the spec.


OK. I get it now. I saw the content and thought my text editor was wrapping the lines. It has been a few years since I've hacked up base64.

Your base64 is definitely incorrect. The only place you need a '=' is at the end. You could have 0, 1, or 2 '=' characters at the end, depending on the size of your input data. The format of your data looks like you have two levels of encoding in it somehow. Each line is 73 characters, but then you have an '=' every 677 characters. That doesn't add up in base64-ese any way you look at it.

Tell those developers they are doing base64 wrong and forward a link to that RFC document.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Base64 decoding - perl & openssl vs. html embedded image

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.