11 Replies Latest reply: Feb 28, 2013 11:59 AM by rccharles
Pelorus1 Level 1 Level 1 (35 points)



I have a file of the form (testfile.txt):


here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here 
Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more $file2.txt$
Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here 

I also have text files with names of the form "filen.txt" which have Markdown tables in them.

age   weight   sex  
----- -------- -----
15    45       m    

20    56       f    

25    65       f    

30    72       m    


I want to run a script to search for all the markers, and "inject" the relevant table at the marker by opening the file of that name in the same directory and doing a replace with it.


I have the "shell" of the script (sorry about the pun). I cannot however get it to do the injecting part.


I've tried Awk, sed, perl....you name it.


The script so far is:



while read -r line; do
    places=$(awk '/\$*\$/ { print $0 }')
done < testfile.txt

cp testfile.txt try.txt

echo "$places" | {
    while read line; do
        strip=$(echo $line | sed 's/^\$//')
        strip=$(echo $strip | sed 's/\$$//')
        curtext=$(cat $strip)
         cat try.txt | {
            while read line; do

            # Something in here to take $strip add a $ at each end and replace
            #+with $curtext


I would really appreciate some help to fill in the area where the comment is. I am successfully getting all the other bits to work.


Many thanks


  • Hiroto Level 5 Level 5 (6,295 points)



    If I understand it correctly, you may try something like this. It's actually all in Perl, though.


    perl -CSD <<'EOF' - testfile.txt > try.txt
    while (<>) {
    sub readfile ($) {
        my $f = shift;
        open F, "<$f" or return "[## $!: $f ##]";
        local $/;
        my $t = <F>;
        close F;
        return $t;


    Good luck,


  • Pelorus1 Level 1 Level 1 (35 points)

    G'day Hiroto


    You're going to have to explain that to me...it's beyond me


    Kind regards


  • Pelorus1 Level 1 Level 1 (35 points)

    So I've made some progress


    When I execute this in Terminal it works:



    cat try.txt | sed 's/@file3.txt@/Counter: 3/g'


    It works perfectly. It does the substitution. Both the find pattern and the substitution pattern are variable driven, but the way I got the command was to echo it from a script...so I know that the variable expansion is working OK.


    If I execute the _same_ command inside a while loop that's working on another file, even though the variables expand properly, it doesn't do the substitution. If I echo the result nothing has happened.


    I'm really certin that I have a fundamental blind spot here...any suggestions.


    To the poster who suggested the Perl approach, many thanks, but without help I think the learning curve for me and Perl is too great for this job.




  • Hiroto Level 5 Level 5 (6,295 points)

    OK. Sorry for late reply. It is much easier to provide code than to explain it.

    Some annotations follow.


    perl -CSD <<'EOF' - testfile.txt > try.txt          # 1
    while (<>) {                                        # 2
        s/\$(.*?)\$/&readfile($1)/oge;                  # 3
        print;                                          # 4
    sub readfile ($) {                                  # 5
        my $f = shift;                                  # 6
        open F, "<$f" or return "[## $!: $f ##]";       # 7
        local $/;                                       # 8
        my $t = <F>;                                    # 9
        close F;                                        # 10
        return $t;                                      # 11


    # 1. -C option specifies text encodings. See perlrun manpage for details.

    S = IOE = STDIN, STDOUT and STDERR are all assumed to be in UTF-8.

    D = io = UTF-8 is the default PerlIO layer for input and output streams.


    <<'EOF' denotes Here-document. See bash manpage for details.

    - tells perl that the arguments follow - in case the program is given via STDIN. Here-document is a form of STDIN redirection.


    # 2. <X> is line input operator. It reads line from filehandle X terminated by input record separator defined in global variable $/ whose default value is linefeed.

    while(<>) is special form to mean while(<X>) where X is filehandle opened for every argument in command line argument list @ARGV or STDIN in case @ARGV is empty. Until X's end-of-file is reached, <X> returns the read line which is evaluated as true in boolean context. When X's end-of-file is reached, <X> returs undefined value which is evaluated as false in boolean context and terminate the while loop. In while(<>) loop, read line is assigned to global variable $_.


    # 3. s/X/Y/oge; is a regular expression replacement statement, which here is short form of $_ =~ s/X/Y/oge;. $_ is the read line.

    Options are as follows.

    o = optimize, which means to compile the regexp pattern once.

    g = global, which means to replace every occurence of X.

    e = evaluate, which means Perl evaluate Y as an expression.


    \$(.*?)\$ is regular expression pattern to match. *? is quantifier to indicate minimal match while * is maximal match.

    &readfile($1) is subroutine call with $1 as its argument. $1 is back reference of the 1st regexp group matched in X; in this case the minimal string between two $'s. As a whole, this statement replaces the $filename$ with the contents read from filename.


    # 4. print; is short form of print STDOUT $_;, which prints the current line to STDOUT.


    # 5. subroutine definition. ($) means it accepts one scalar argument.


    # 6. shift; is short form of shift @_;. @_ is argument list for subroutine and thus $f is set to the first item in argument list, which is supposed to be a filename. The function "my" limits the scope of the variable to the current block.


    # 7. open filehandle for given filename. If it fails, this subroutine returns string "[## $!: $f ##]" where $! is interpolated by error message and $f by filename.


    # 8. A safer way to undefine global variable $/ that is the input record separator whose default value is linefeed. The function "local" limits the scope of the effect of this statement to the current block. This changes the definitoin of "line" and lets the following statement (# 9) read the entire contents of the file as single line.


    # 9. read line from the filehandle.


    # 10. close the filehandle.


    # 11. return the read contents.



    I hope this may suffice.

    And Perl is well worth learning.


    Good luck,


  • Pelorus1 Level 1 Level 1 (35 points)

    Hello Hiroto,


    thank you very much for such a great explanation. You have inspired me to put in effort to learn Perl.


    My final solution however has been in sed. I found the 'r' flag which allows you to read in a file to replace your search pattern. The final code looked like this, with the 'd' flag deleting the found pattern. The quoting is incredibly sensitive and it _only_ works with quotes like this on Mac (ask me how I know ).



    while read line1; do
        sed '/'"@$line1@"'/ {
                r '"$line1"'
            }' <try.txt >delete.txt
            cat delete.txt > try.txt
    done < search.txt



    Where 'search.txt' contains the list of markers that we've extracted from the file and stripped back so that they are the filename; 'try.txt' is a working copy of the file containing the markers and 'delete.txt' is a file that we use to write each cycle of output before we send it to the input file again.


    it may not be an elegant solution, perhaps I should use variables instead of files...but I do clean up afterwards and it is robust


    Many thanks for everyone's assistance.




  • Hiroto Level 5 Level 5 (6,295 points)

    Hello Mike,


    No problem. And thanks for the feedback.


    One thing to note, though. The sed's d command deletes the pattern space, that is initially the input line itself not limited to the matched part. In your case, the sed command deletes the whole line which contains the marker. If the marker is on its own line, it's fine. Otherwise not, I'm afraid.




    t="Humpty Dumpty sat on a wall,
    Humpty Dumpty had a great fall;
    All the king's horses and all the king's men
    Couldn't put Humpty together again."
    cd ~/desktop
    echo "$r" > file.txt
    echo "$t" | \
    sed '/'"$s"'/ {
        r file.txt




    Humpty Dumpty sat on a wall,
    Humpty Dumpty had a great fall;
    Couldn't put Humpty together again.


    I say this because your original sample indicated the marker can be a part of a line.

    If you have changed the structure of source text, that's fine and disregard this message.


    Kind regards,


  • twtwtw Level 5 Level 5 (4,900 points)

    sed gives me a headache.  It has its place in the world, but I would never use it unless I had no other option. You do. Here's the same thing in AppleScript:


    -- set main paths

    set master_file to "/path/to/master file.txt"

    set main_folder to "/path/to/folder/"


    -- read master text, and break it down around '$' marks

    set master_text to read master_file

    set text_chunks to tid({input:master_text, delim:"$"})


    -- loop through chunks, replacing file names with file contents

    repeat with this_chunk in text_chunks

              if this_chunk ends with ".txt" then

                        set contents of this_chunk to (read (main_folder & this_chunk))

              end if

    end repeat


    -- write it all back out to the master file

    set fp to open for access master_file with write permission

    write tid({input:text_chunks, delim:""}) to fp

    close access fp


    on tid({input:input, delim:delim})

      -- generic handler for text delimiters

              set {oldTID, my text item delimiters} to {my text item delimiters, delim}

              if class of input is list then

                        set output to input as text


                        set output to text items of input

              end if

              set my text item delimiters to oldTID

              return output

    end tid



    A bit wordier, but so much easier on the brain.

  • Pelorus1 Level 1 Level 1 (35 points)

    Hi Hiroto,


    as it happens that's exactly the behaviour that I want...but I understand how it could be a real issue!


    Hi twtwtw,


    It's interesting that you say AppleScript is easier on your brain...I find it all too hard!!


    My problem is that this solution is designed to be menu-driven and accessed from an ssh session from iPad. I use the same menu-driven interface to send Markdown docs to docx (or LaTeX or HTML or PDF or ePub...) using Pandoc; to generate Pandoc Markdown tables using R and in this case to insert those tables into Markdown documents.


    It works superbly for my wife as well and she doesn't normally use Terminal. I'm not aware that I can use AppleScript as easily (or perhaps at all really) from an ssh session.




  • twtwtw Level 5 Level 5 (4,900 points)

    I suppose it's what you're used to.  The difference is that sed (which I can program; I just don't like to) makes you talk like a machine, whereas applescript at least throws a bone to human language structures. 


    You can use applescript from any terminal using osascript:  just enter osascript followed by the script path.  You can pass it parameters just like a shell script as well, though you need to twiddle the script a bit to get it to see the parameters.  There's no problem using the script over ssh if the script is resident on the remote machine, but you may run into security issues if you try to run a local applescript on a remote machine (or maybe not, I've never actually tried).


    But use what makes you comfortable.  I only added this because (as I said) sed gives me a headache, and I wanted a more me-friendly version out there for other sedophobes.

  • rccharles Level 5 Level 5 (6,825 points)

    I'd throw out learning Python.


    Can program in a procedural style, has a large library of functions/methods, has a consistant systax, and designed to avoid syntax surprises.



  • rccharles Level 5 Level 5 (6,825 points)

    With help from the Stack Overflow community and in particular sotapme, I am presenting a Python solution.

    http://stackoverflow.com/questions/15098789/python-regular-expression-for-r-find all



    #!/usr/bin/env python
    # https://discussions.apple.com/message/21202021#21202021
    # fyi:
    # Python use /n as record delimiter.  It converts other record delimiters
    #   as needed.
    import argparse
    import datetime
    import re
    import sys
    import time
    # ------------------------------------------------------
    # Learning Python: Powerful Object-Oriented Programming [Paperback]
    #   by Mark Lutz 
    # The Python Standard Library by Example (Developer's Library) [Paperback]
    #   by Doug Hellmann
    #   It's online at:
    #     http://www.doughellmann.com/PyMOTW/contents.html
    # ------------------------------------------------------
    # ----------------------------------------------------------
    # insert include file
    def readFile ( m ):
        global debug
        if debug >= 2 :
            print "in readFile.  ", m.group(1)
        # read insertion file
        with open( m.group(1),"r" ) as moreDataFile:
            moreData = moreDataFile.read()
        return moreData
    # ============================== Main =======================
    # Parse input arguments
    # -h to print help.  ( Automatically generated )
    parser = argparse.ArgumentParser(
        description="Merge Demonstration code.",
        epilog="See Apple discussions." +   
        "  https://discussions.apple.com/message/21202021#21202021",
        version="version 0.999")
                        help="debug levels: 0 for no debugging, 1 minimal but reasonable amount, 2 everything thinkable")
                        help="input filename")
                        help="output filename")                
    options = parser.parse_args()
    debug = options.debug
    if debug >= 1 :
        print "Welcome to " + __file__ + "  " + str( datetime.datetime.now() )
    if debug >= 2 :
        print "input arguements:", sys.argv
    if options.inputFile == "" :
        print "You need to specify an input file.  -h for help"
    # Read in complete file
    # explicitly closed when block is down.
    with open( options.inputFile,"r" ) as mergeInputData:
        allData = mergeInputData.read()
    mergedData = re.sub(r'\$(.*?)\$', readFile, allData)
    # Output mergedData
    with open( options.outputFile,"w" ) as mergeOutputFile: