11 Replies Latest reply: Feb 28, 2013 11:59 AM by rccharles
Pelorus1 Level 1 Level 1 (35 points)

Hi,

 

I have a file of the form (testfile.txt):

 

here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here 
$file1.txt$
Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more $file2.txt$
Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here 
$file3.txt$

I also have text files with names of the form "filen.txt" which have Markdown tables in them.

--------------------
age   weight   sex  
----- -------- -----
15    45       m    

20    56       f    

25    65       f    

30    72       m    
--------------------

 

I want to run a script to search for all the markers, and "inject" the relevant table at the marker by opening the file of that name in the same directory and doing a replace with it.

 

I have the "shell" of the script (sorry about the pun). I cannot however get it to do the injecting part.

 

I've tried Awk, sed, perl....you name it.

 

The script so far is:

#!/bin/bash

line=""
places=""

while read -r line; do
    places=$(awk '/\$*\$/ { print $0 }')
done < testfile.txt

cp testfile.txt try.txt

echo "$places" | {
    while read line; do
        strip=$(echo $line | sed 's/^\$//')
        strip=$(echo $strip | sed 's/\$$//')
        curtext=$(cat $strip)
         cat try.txt | {
            while read line; do

            # Something in here to take $strip add a $ at each end and replace
            #+with $curtext

            done
            }
    done
}

I would really appreciate some help to fill in the area where the comment is. I am successfully getting all the other bits to work.

 

Many thanks

Mike

  • Hiroto Level 5 Level 5 (6,170 points)

    Hello

     

    If I understand it correctly, you may try something like this. It's actually all in Perl, though.

     

    #!/bin/bash
    
    perl -CSD <<'EOF' - testfile.txt > try.txt
    while (<>) {
        s/\$(.*?)\$/&readfile($1)/oge;
        print;
    }
    
    sub readfile ($) {
        my $f = shift;
        open F, "<$f" or return "[## $!: $f ##]";
        local $/;
        my $t = <F>;
        close F;
        return $t;
    }
    EOF
    

     

    Good luck,

    H

  • Pelorus1 Level 1 Level 1 (35 points)

    G'day Hiroto

     

    You're going to have to explain that to me...it's beyond me

     

    Kind regards

    Mike

  • Pelorus1 Level 1 Level 1 (35 points)

    So I've made some progress

     

    When I execute this in Terminal it works:

     

     

    cat try.txt | sed 's/@file3.txt@/Counter: 3/g'
    

     

    It works perfectly. It does the substitution. Both the find pattern and the substitution pattern are variable driven, but the way I got the command was to echo it from a script...so I know that the variable expansion is working OK.

     

    If I execute the _same_ command inside a while loop that's working on another file, even though the variables expand properly, it doesn't do the substitution. If I echo the result nothing has happened.

     

    I'm really certin that I have a fundamental blind spot here...any suggestions.

     

    To the poster who suggested the Perl approach, many thanks, but without help I think the learning curve for me and Perl is too great for this job.

     

    Regards

    Mike

  • Hiroto Level 5 Level 5 (6,170 points)

    OK. Sorry for late reply. It is much easier to provide code than to explain it.

    Some annotations follow.

     

    #!/bin/bash
    
    perl -CSD <<'EOF' - testfile.txt > try.txt          # 1
    while (<>) {                                        # 2
        s/\$(.*?)\$/&readfile($1)/oge;                  # 3
        print;                                          # 4
    }
    
    sub readfile ($) {                                  # 5
        my $f = shift;                                  # 6
        open F, "<$f" or return "[## $!: $f ##]";       # 7
        local $/;                                       # 8
        my $t = <F>;                                    # 9
        close F;                                        # 10
        return $t;                                      # 11
    }
    EOF
    

     

    # 1. -C option specifies text encodings. See perlrun manpage for details.

    S = IOE = STDIN, STDOUT and STDERR are all assumed to be in UTF-8.

    D = io = UTF-8 is the default PerlIO layer for input and output streams.

     

    <<'EOF' denotes Here-document. See bash manpage for details.

    - tells perl that the arguments follow - in case the program is given via STDIN. Here-document is a form of STDIN redirection.

     

    # 2. <X> is line input operator. It reads line from filehandle X terminated by input record separator defined in global variable $/ whose default value is linefeed.

    while(<>) is special form to mean while(<X>) where X is filehandle opened for every argument in command line argument list @ARGV or STDIN in case @ARGV is empty. Until X's end-of-file is reached, <X> returns the read line which is evaluated as true in boolean context. When X's end-of-file is reached, <X> returs undefined value which is evaluated as false in boolean context and terminate the while loop. In while(<>) loop, read line is assigned to global variable $_.

     

    # 3. s/X/Y/oge; is a regular expression replacement statement, which here is short form of $_ =~ s/X/Y/oge;. $_ is the read line.

    Options are as follows.

    o = optimize, which means to compile the regexp pattern once.

    g = global, which means to replace every occurence of X.

    e = evaluate, which means Perl evaluate Y as an expression.

     

    \$(.*?)\$ is regular expression pattern to match. *? is quantifier to indicate minimal match while * is maximal match.

    &readfile($1) is subroutine call with $1 as its argument. $1 is back reference of the 1st regexp group matched in X; in this case the minimal string between two $'s. As a whole, this statement replaces the $filename$ with the contents read from filename.

     

    # 4. print; is short form of print STDOUT $_;, which prints the current line to STDOUT.

     

    # 5. subroutine definition. ($) means it accepts one scalar argument.

     

    # 6. shift; is short form of shift @_;. @_ is argument list for subroutine and thus $f is set to the first item in argument list, which is supposed to be a filename. The function "my" limits the scope of the variable to the current block.

     

    # 7. open filehandle for given filename. If it fails, this subroutine returns string "[## $!: $f ##]" where $! is interpolated by error message and $f by filename.

     

    # 8. A safer way to undefine global variable $/ that is the input record separator whose default value is linefeed. The function "local" limits the scope of the effect of this statement to the current block. This changes the definitoin of "line" and lets the following statement (# 9) read the entire contents of the file as single line.

     

    # 9. read line from the filehandle.

     

    # 10. close the filehandle.

     

    # 11. return the read contents.

     

     

    I hope this may suffice.

    And Perl is well worth learning.

     

    Good luck,

    H

  • Pelorus1 Level 1 Level 1 (35 points)

    Hello Hiroto,

     

    thank you very much for such a great explanation. You have inspired me to put in effort to learn Perl.

     

    My final solution however has been in sed. I found the 'r' flag which allows you to read in a file to replace your search pattern. The final code looked like this, with the 'd' flag deleting the found pattern. The quoting is incredibly sensitive and it _only_ works with quotes like this on Mac (ask me how I know ).

     

     

    while read line1; do
        sed '/'"@$line1@"'/ {
                r '"$line1"'
                d
            }' <try.txt >delete.txt
            cat delete.txt > try.txt
    done < search.txt
    

     

     

    Where 'search.txt' contains the list of markers that we've extracted from the file and stripped back so that they are the filename; 'try.txt' is a working copy of the file containing the markers and 'delete.txt' is a file that we use to write each cycle of output before we send it to the input file again.

     

    it may not be an elegant solution, perhaps I should use variables instead of files...but I do clean up afterwards and it is robust

     

    Many thanks for everyone's assistance.

     

    Regards

    Mike

  • Hiroto Level 5 Level 5 (6,170 points)

    Hello Mike,

     

    No problem. And thanks for the feedback.

     

    One thing to note, though. The sed's d command deletes the pattern space, that is initially the input line itself not limited to the matched part. In your case, the sed command deletes the whole line which contains the marker. If the marker is on its own line, it's fine. Otherwise not, I'm afraid.

     

    E.g.

     

    #!/bin/bash
    
    t="Humpty Dumpty sat on a wall,
    Humpty Dumpty had a great fall;
    All the king's horses and all the king's men
    Couldn't put Humpty together again."
    
    s="horses"
    r="<REPLACED>"
    
    cd ~/desktop
    echo "$r" > file.txt
    
    echo "$t" | \
    sed '/'"$s"'/ {
        r file.txt
        d
    }'
    

     

    Result:

     

    Humpty Dumpty sat on a wall,
    Humpty Dumpty had a great fall;
    <REPLACED>
    Couldn't put Humpty together again.
    

     

    I say this because your original sample indicated the marker can be a part of a line.

    If you have changed the structure of source text, that's fine and disregard this message.

     

    Kind regards,

    Hiroto

  • twtwtw Level 5 Level 5 (4,900 points)

    sed gives me a headache.  It has its place in the world, but I would never use it unless I had no other option. You do. Here's the same thing in AppleScript:

     

    -- set main paths

    set master_file to "/path/to/master file.txt"

    set main_folder to "/path/to/folder/"

     

    -- read master text, and break it down around '$' marks

    set master_text to read master_file

    set text_chunks to tid({input:master_text, delim:"$"})

     

    -- loop through chunks, replacing file names with file contents

    repeat with this_chunk in text_chunks

              if this_chunk ends with ".txt" then

                        set contents of this_chunk to (read (main_folder & this_chunk))

              end if

    end repeat

     

    -- write it all back out to the master file

    set fp to open for access master_file with write permission

    write tid({input:text_chunks, delim:""}) to fp

    close access fp

     

    on tid({input:input, delim:delim})

      -- generic handler for text delimiters

              set {oldTID, my text item delimiters} to {my text item delimiters, delim}

              if class of input is list then

                        set output to input as text

              else

                        set output to text items of input

              end if

              set my text item delimiters to oldTID

              return output

    end tid

     

     

    A bit wordier, but so much easier on the brain.

  • Pelorus1 Level 1 Level 1 (35 points)

    Hi Hiroto,

     

    as it happens that's exactly the behaviour that I want...but I understand how it could be a real issue!

     

    Hi twtwtw,

     

    It's interesting that you say AppleScript is easier on your brain...I find it all too hard!!

     

    My problem is that this solution is designed to be menu-driven and accessed from an ssh session from iPad. I use the same menu-driven interface to send Markdown docs to docx (or LaTeX or HTML or PDF or ePub...) using Pandoc; to generate Pandoc Markdown tables using R and in this case to insert those tables into Markdown documents.

     

    It works superbly for my wife as well and she doesn't normally use Terminal. I'm not aware that I can use AppleScript as easily (or perhaps at all really) from an ssh session.

     

    Regards

    Mike

  • twtwtw Level 5 Level 5 (4,900 points)

    I suppose it's what you're used to.  The difference is that sed (which I can program; I just don't like to) makes you talk like a machine, whereas applescript at least throws a bone to human language structures. 

     

    You can use applescript from any terminal using osascript:  just enter osascript followed by the script path.  You can pass it parameters just like a shell script as well, though you need to twiddle the script a bit to get it to see the parameters.  There's no problem using the script over ssh if the script is resident on the remote machine, but you may run into security issues if you try to run a local applescript on a remote machine (or maybe not, I've never actually tried).

     

    But use what makes you comfortable.  I only added this because (as I said) sed gives me a headache, and I wanted a more me-friendly version out there for other sedophobes.

  • rccharles Level 5 Level 5 (6,655 points)

    I'd throw out learning Python.

     

    Can program in a procedural style, has a large library of functions/methods, has a consistant systax, and designed to avoid syntax surprises.

     

    Robert

  • rccharles Level 5 Level 5 (6,655 points)

    With help from the Stack Overflow community and in particular sotapme, I am presenting a Python solution.

    http://stackoverflow.com/questions/15098789/python-regular-expression-for-r-find all

     

     

    #!/usr/bin/env python
    
    #
    # https://discussions.apple.com/message/21202021#21202021
    #
    # fyi:
    # Python use /n as record delimiter.  It converts other record delimiters
    #   as needed.
    
    import argparse
    import datetime
    import re
    import sys
    import time
    
    # ------------------------------------------------------
    #  
    # Learning Python: Powerful Object-Oriented Programming [Paperback]
    #   by Mark Lutz 
    #
    # The Python Standard Library by Example (Developer's Library) [Paperback]
    #   by Doug Hellmann
    #   It's online at:
    #     http://www.doughellmann.com/PyMOTW/contents.html
    #
    # ------------------------------------------------------
    
    
    # ----------------------------------------------------------
    # insert include file
    def readFile ( m ):
        global debug
        if debug >= 2 :
            print "in readFile.  ", m.group(1)
        # read insertion file
        with open( m.group(1),"r" ) as moreDataFile:
            moreData = moreDataFile.read()
        return moreData
        
        
    # ============================== Main =======================
        
    # Parse input arguments
    # -h to print help.  ( Automatically generated )
    parser = argparse.ArgumentParser(
        description="Merge Demonstration code.",
        epilog="See Apple discussions." +   
        "  https://discussions.apple.com/message/21202021#21202021",
        version="version 0.999")
    
    
    parser.add_argument('-d', 
                        action="store", 
                        dest="debug",
                        default=0,
                        type=int,
                        choices=(0,1,2),
                        help="debug levels: 0 for no debugging, 1 minimal but reasonable amount, 2 everything thinkable")
    
    parser.add_argument('-if', 
                        action="store", 
                        dest="inputFile",
                        default="",
                        help="input filename")
                        
    parser.add_argument('-of', 
                        action="store", 
                        dest="outputFile",
                        default="mergedOutput.txt",
                        help="output filename")                
    
    options = parser.parse_args()
    
    debug = options.debug
    
    if debug >= 1 :
        print "Welcome to " + __file__ + "  " + str( datetime.datetime.now() )
    
    if debug >= 2 :
        print "input arguements:", sys.argv
        
    if options.inputFile == "" :
        print "You need to specify an input file.  -h for help"
        sys.exit(1)
    
    # Read in complete file
    # explicitly closed when block is down.
    with open( options.inputFile,"r" ) as mergeInputData:
        allData = mergeInputData.read()
    
    mergedData = re.sub(r'\$(.*?)\$', readFile, allData)
    
    # Output mergedData
    with open( options.outputFile,"w" ) as mergeOutputFile:
        mergeOutputFile.write(mergedData)