Skip navigation

How can I use Automator to extract specific Data from a text file?

3686 Views 26 Replies Latest reply: Apr 14, 2014 6:51 PM by Tony T1 RSS
1 2 Previous Next
ChrisAbreu Level 1 Level 1 (0 points)
Currently Being Moderated
Oct 12, 2013 1:19 PM

I have several hundred text files that contain a bunch of information. I only need six values from each file and ideally I need them as columns in an excel file.

 

How can I use Automator to extract specific Data from the text files and either create a new text file or excel file with the info? I have looked all over but can't find a solution. If anyone could please help I would be eternally grateful!!! If there is another, better solution than automator, please let me know!

 

Example of File Contents:

 

 

Link Time =DD/MMM/YYYY
RandomText

161 179

bytes of CODE    memory (+                68 range fill )
16 789bytes of DATA    memory (+    59 absolute )
1 875bytes of XDATA   memory (+ 1 855 absolute )
90 783bytes of FARCODE memory

 

What I would like to have as a final file:

 

EXCEL COLUMN1Column 2Column3Column4Column5Column6
MM/DD/YYYYfilename116117916789187590783
MM/DD/YYYY

filename2

xxxxxxxxxxxxxxxxxxxx
MM/DD/YYYYfilename3xxxxxxxxxxxxxxxxxxxx

 

Is this possible? I can't imagine having to go through each and every file one by one. Please help!!!

Automator
  • Tony T1 Level 6 Level 6 (8,115 points)

    Need more information.  Is Link Time always the 1st line? Is Random Text always the 2nd line, CODE line 3, etc.

    Are there tab delimiters in the file, or is it delimted by spaces?

    AWK or Perl might be the best tool for this (I don't know AWK, or Perl so can't help with that)

  • Bernard Harte Level 4 Level 4 (3,025 points)

    I don't know much about Automator, but it's certainly possible to do what you want with AppleScript.

     

    I have a couple of questions:

     

    1.     Are all the files located in the same folder.  How are they named?

     

    2.     From the layout of the example file contents, there is some ambiguity about what they actually contain, for example: is "DD/MMM/YYYY" literally that, or does it contain some (useful) date string?

     

    3.     By the way the data are set out, it's unclear what separators are used: is it just spaces or a mix of tabs and spaces?

     

    4.     Is this all the file contains, or just the first few lines?

     

    If it's easier, you can get my email from my profile and send me one of the files to answer Q2-Q4!

  • Frank Caggiano Level 7 Level 7 (22,755 points)

    The description you gave of the data file is not sufficient to work on this.

     

    Post one of the  files to something line Dropbox or at least cut and paste a section of one of the files and post it here.

  • Tony T1 Level 6 Level 6 (8,115 points)

    Here's a bash script that will work on the data that you posted.

    Change ~/Downloads/* to the directory with the files to process.

    A text file named Report.txt will be on your Desktop.

    (you can wrap this in Automator (Run Shell Script Action) if you want):

     

    #/bin/bash
    
    Report=~/Desktop/Report.txt
    
    for f in ~/Downloads/*
    do
         if [ -f "$f" ] ; then
              echo >> "$Report"
              /usr/bin/grep -E -o '[0-9]{2}/[0-9]{2}/[0-9]{4}' "$f" | sed 's/\([0-9][0-9]\/\)\([0-9][0-9]\/*\)/\2\1/' | tr '\n' ' ' >> "$Report"
              echo -n "${f##*/} " >> "$Report"
              /usr/bin/grep -E -o '^[ [:blank:] | [:digit:] ].*bytes' "$f" | sed 's/[^0-9]*//g' | tr '\n' ' ' >> "$Report"
         fi
    done
    

     

     

    Note:  I assumed that DD/MMM/YYYY was a typo and that you meant DD/MM/YYYY

    If this is not the case, please clafify and I can adjust the script

    MacBook Air, MacBook, Mac mini, OS X Mountain Lion (10.8.5)
  • Tony T1 Level 6 Level 6 (8,115 points)

    Note:  I assumed that DD/MMM/YYYY was a typo and that you meant 14/10/2013 (i.e. todays date)

     

    Also assumes that order in file is always bytes of CODE, bytes of DATA, bytes of XDATA, bytes of FARCOD

    and always is "bytes of"

     

     

     

    Report.txt output is:

         10/14/2013 LCRC_0.28.0_QC.map 161179 16789 1875 90783

  • Tony T1 Level 6 Level 6 (8,115 points)

    This is a little bit more efficient. 

    grep stops searching the file after the 1st match (-m1) of "DD/MM/YYYY" and after the 4th match (-m4) of "bytes"

     

     

    #/bin/bash
    
    Report=~/Desktop/Report.txt
    
    for f in ~/Downloads/*
    do
         if [ -f "$f" ] ; then
              echo >> "$Report"
              /usr/bin/grep -E -o -m1 '[0-9]{2}/[0-9]{2}/[0-9]{4}' "$f" | sed 's/\([0-9][0-9]\/\)\([0-9][0-9]\/*\)/\2\1/' | tr '\n' ' ' >> "$Report"
              echo -n "${f##*/} " >> "$Report"
              /usr/bin/grep -E -o -m4 '^[ [:blank:] | [:digit:] ].*bytes' "$f" | sed 's/[^0-9]*//g' | tr '\n' ' ' >> "$Report"
         fi
    done
    
    MacBook Air, MacBook, Mac mini, OS X Mountain Lion (10.8.5)
  • Hiroto Level 5 Level 5 (4,810 points)

    Hello

     

    You may try the following AppleScript script. It will ask you to choose a root folder where to start searching for *.map files and then create a CSV file named "out.csv" on desktop which you may import to Excel.

     

     

    set f to (choose folder with prompt "Choose the root folder to start searching")'s POSIX path
    if f ends with "/" then set f to f's text 1 thru -2
    
    do shell script "/usr/bin/perl -CSDA -w <<'EOF' - " & f's quoted form & " > ~/Desktop/out.csv
    use strict;
    use open IN => ':crlf';
    
    chdir $ARGV[0] or die qq($!);
    local $/ = qq(\\0);
    my @ff = map {chomp; $_} qx(find . -type f -iname '*.map' -print0);
    local $/ = qq(\\n);
    
    # 
    #     CSV spec
    # 
    #     - record separator is CRLF
    #     - field separator is comma
    #     - every field is quoted
    #     - text encoding is UTF-8
    # 
    local $\\ = qq(\\015\\012);    # CRLF
    local $, = qq(,);            # COMMA
    
    # print column header row
    my @dd = ('column 1', 'column 2', 'column 3', 'column 4', 'column 5', 'column 6');
    print map { s/\"/\"\"/og; qq(\").$_.qq(\"); } @dd;
    
    # print data row per each file
    while (@ff) {
        my $f = shift @ff;    # file path
        if ( ! open(IN, '<', $f) ) {
            warn qq(Failed to open $f: $!);
            next;
        }
        $f =~ s%^.*/%%og;    # file name
        @dd = ('', $f, '', '', '', '');
        while (<IN>) {
            chomp;
            $dd[0] = \"$2/$1/$3\" if m%Link Time\\s+=\\s+([0-9]{2})/([0-9]{2})/([0-9]{4})%o;
            ($dd[2] = $1) =~ s/ //g if m/([0-9 ]+)\\s+bytes of CODE\\s/o;
            ($dd[3] = $1) =~ s/ //g if m/([0-9 ]+)\\s+bytes of DATA\\s/o;
            ($dd[4] = $1) =~ s/ //g if m/([0-9 ]+)\\s+bytes of XDATA\\s/o;
            ($dd[5] = $1) =~ s/ //g if m/([0-9 ]+)\\s+bytes of FARCODE\\s/o;
            last unless grep { /^$/ } @dd;
        }
        close IN;
        print map { s/\"/\"\"/og; qq(\").$_.qq(\"); } @dd;
    }
    EOF
    "
    

     

    Hope this may help,

    H

  • Tony T1 Level 6 Level 6 (8,115 points)

    That AppleScript looks a lot like Perl

  • Tony T1 Level 6 Level 6 (8,115 points)

    This will handle any order or CODE, DATA, etc...

    Change ~/Downloads/* to the directory with the files to process

    Run as a bash script, or just copy into Automator

     

    #/bin/bash
    
    Report=~/Desktop/Report.txt
    
    for f in ~/Downloads/*
    do
         if [ -f "$f" ] ; then
              DATE=$(/usr/bin/grep -E -o -m1 '[0-9]{2}/[0-9]{2}/[0-9]{4}' "$f" | sed 's/\([0-9][0-9]\/\)\([0-9][0-9]\/*\)/\2\1/')
              CODE=$(/usr/bin/grep -E -o -m1 '^[ [:blank:] | [:digit:] ].*bytes of CODE' "$f" | sed 's/[^0-9]*//g')
              DATA=$(/usr/bin/grep -E -o -m1 '^[ [:blank:] | [:digit:] ].*bytes of DATA' "$f" | sed 's/[^0-9]*//g')
              XDATA=$(/usr/bin/grep -E -o -m1 '^[ [:blank:] | [:digit:] ].*bytes of XDATA' "$f" | sed 's/[^0-9]*//g')
              FARCODE=$(/usr/bin/grep -E -o -m1 '^[ [:blank:] | [:digit:] ].*bytes of FARCODE' "$f" | sed 's/[^0-9]*//g')
              echo $DATE ${f##*/} $CODE $DATA $XDATA $FARCODE >> "$Report"
         fi
    done
    

     

    Open in Excel as a space delimited file.

    MacBook Air, MacBook, Mac mini, OS X Mountain Lion (10.8.5)
1 2 Previous Next

Actions

More Like This

  • Retrieving data ...

Bookmarked By (0)

Legend

  • This solved my question - 10 points
  • This helped me - 5 points
This site contains user submitted content, comments and opinions and is for informational purposes only. Apple disclaims any and all liability for the acts, omissions and conduct of any third parties in connection with or related to your use of the site. All postings and use of the content on this site are subject to the Apple Support Communities Terms of Use.