Skip navigation

What tool(s) are preferred for processing binary files?

494 Views 12 Replies Latest reply: Apr 27, 2011 9:20 AM by Resn8tor RSS
Resn8tor Calculating status...
Currently Being Moderated
Apr 26, 2011 3:23 PM

What I want to do is to open binary files, extract some data that is in big-endian nibble format, convert it to a character format (e.g. (e.g. x00 x02 = x20) and save it as text to a CSV file.  There are other data in the file that I want to use to index into tables.  I also need to traverse folders and handle multiple files.

 

Assuming I will have to learn the tool(s) from scratch, what tool or tools would be preferred for something like this?  Performace is not important, but learning curve is.

SR MBP 2.4 4GB, Fusion: XP Pro/Win 7, Mac OS X (10.6.7), Samsung 226BW, Wired aluminum keyboard
  • etresoft Level 7 Level 7 (23,905 points)

    What is "nibble" format? If this is text data you want to put into a CSV, the data is probably in UTF-16 or something.

  • Cyclosaurus Level 6 Level 6 (12,915 points)
    What is "nibble" format?

    nibble < byte

    in old days: nibble = 4 bits

  • MrHoffman Level 6 Level 6 (11,720 points)

    Classic the bash-pipeline-script...

     

    Probably use xxd to convert the file to text, and then process the text.

    Depending on what you're up to, possibly awk to parse the text file.

    The bulk of the glue code would be bash.

     

    If you want to bring out somewhat more formal tools...

     

    Lua and its add-on parsing, or any of the available parsing libraries for php, python, ruby, java or perl.

     

    Or for the industrial-strength tools, Xcode and Objective C and Cocoa, maybe Parsekit.  Probably haul the whole file into memory, and mung on it.

     

    But then this is Unix, so there are a gazillion different ways to do this sort of stuff.

  • BobHarris Level 6 Level 6 (12,505 points)

    There is also hexdump.

     

    man hexdump

     

    If you really have text strings, you could use the 'strings' command.

     

    man strings

    iMac, Mac OS X (10.6.6), 27" i7 w/Magic Trackpad
  • etresoft Level 7 Level 7 (23,905 points)

    That is not difficult per se, but may be if you aren't used to such things. Converting those two bytes into one is easy. The bigger issue is finding the ASCII data in the file and locating the index data. You need detailed informaion on the protocol in order to find where that data lives in the file. I did a quick check on the MIDI sysex format and it isn't trivial. I suggest a 3rd party solution such as: http://www.snoize.com/SysExLibrarian/

  • MrHoffman Level 6 Level 6 (11,720 points)

    You're going to have to mung the dump results to match your requirements, and which is why xxd is part of the solution, but not a whole solution, if you choose that bash path.

     

    Here is an example of what I am referring to:

     

    $ xxd x.x > x.y
    $ cat x.y
    0000000: 5468 6520 7175 6963 6b20 6272 6f77 6e20  The quick brown 
    0000010: 666f 7820 6a75 6d70 6564 206f 7665 7220  fox jumped over 
    0000020: 7468 6520 6c61 7a79 2064 6f67 2e0a       the lazy dog.
    $ vi x.y  # swap the nibbles
    $ cat x.y
    0000000: 4568 6520 7175 6963 6b20 6272 6f77 6e20  The quick brown 
    0000010: 666f 7820 6a75 6d70 6564 206f 7665 7220  fox jumped over 
    0000020: 7468 6520 6c61 7a79 2064 6f67 2e0a       the lazy dog..
    $ xxd -r x.y > x.z
    $ cat x.z
    Ehe quick brown fox jumped over the lazy dog.
    $

     

    In summary, the encoding of the word "The" is 54 68 65 in 8-bit ASCII shown in hexadecimal, which just happens to be 5 4 6 8 6 5 in hexadecimal nibbles.  Swap the 5 4 of the "T" over to a 4 5 in that first byte (I used vim to do that), and you get an "E" when you xxd -r the file back to "binary", and man ascii says that hexadecimal 45 is "E" in ASCII. 

     

    As for a demonstration of nibble-swapping in a truly binary file, QED.

     

    If you are inclined, switch xxd to its binary mode and you can tweak individual bits directly.

     

    There's also the Hex Fiend tool, and which can also be useful for exploring a low-level file format.

     

    If you're working with MIDI-format data files, then there may well be (are?) editing and dump tools available.

Actions

More Like This

  • Retrieving data ...

Bookmarked By (1)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.