Bash script to replace markers in file

Question

Level 1

35 points

Bash script to replace markers in file

Hi,

I have a file of the form (testfile.txt):

here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here $file1.txt$ Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more $file2.txt$ Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here Some text in here and more here $file3.txt$

I also have text files with names of the form "filen.txt" which have Markdown tables in them.

-------------------- age   weight   sex  ----- -------- ----- 15    45       m    20    56       f    25    65       f    30    72       m    --------------------

I want to run a script to search for all the markers, and "inject" the relevant table at the marker by opening the file of that name in the same directory and doing a replace with it.

I have the "shell" of the script (sorry about the pun). I cannot however get it to do the injecting part.

I've tried Awk, sed, perl....you name it.

The script so far is:

#!/bin/bash line="" places="" while read -r line; do     places=$(awk '/\$*\$/ { print $0 }') done < testfile.txt cp testfile.txt try.txt echo "$places" | {     while read line; do         strip=$(echo $line | sed 's/^\$//')         strip=$(echo $strip | sed 's/\$$//')         curtext=$(cat $strip)          cat try.txt | {             while read line; do             # Something in here to take $strip add a $ at each end and replace             #+with $curtext             done             }     done }

I would really appreciate some help to fill in the area where the comment is. I am successfully getting all the other bits to work.

Many thanks

Mike

Posted on Feb 5, 2013 7:42 PM

Reply

Answer 1

Hiroto

Level 5

7,461 points

Feb 6, 2013 3:29 AM in response to Pelorus1

Hello

If I understand it correctly, you may try something like this. It's actually all in Perl, though.

#!/bin/bash

perl -CSD <<'EOF' - testfile.txt > try.txt
while (<>) {
    s/\$(.*?)\$/&readfile($1)/oge;
    print;
}

sub readfile ($) {
    my $f = shift;
    open F, "<$f" or return "[## $!: $f ##]";
    local $/;
    my $t = <F>;
    close F;
    return $t;
}
EOF

Good luck,

H

Reply

Answer 2

Pelorus1 Author

Level 1

35 points

Feb 6, 2013 4:36 AM in response to Hiroto

G'day Hiroto

You're going to have to explain that to me...it's beyond me 😐

Kind regards

Mike

Reply

Answer 3

Pelorus1 Author

Level 1

35 points

Feb 6, 2013 8:38 PM in response to Pelorus1

So I've made some progress

When I execute this in Terminal it works:

cat try.txt | sed 's/@file3.txt@/Counter: 3/g'

It works perfectly. It does the substitution. Both the find pattern and the substitution pattern are variable driven, but the way I got the command was to echo it from a script...so I know that the variable expansion is working OK.

If I execute the _same_ command inside a while loop that's working on another file, even though the variables expand properly, it doesn't do the substitution. If I echo the result nothing has happened.

I'm really certin that I have a fundamental blind spot here...any suggestions.

To the poster who suggested the Perl approach, many thanks, but without help I think the learning curve for me and Perl is too great for this job.

Regards

Mike

Reply

Answer 4

Hiroto

Level 5

7,461 points

Feb 7, 2013 12:12 PM in response to Pelorus1

OK. Sorry for late reply. It is much easier to provide code than to explain it.

Some annotations follow.

#!/bin/bash

perl -CSD <<'EOF' - testfile.txt > try.txt          # 1
while (<>) {                                        # 2
    s/\$(.*?)\$/&readfile($1)/oge;                  # 3
    print;                                          # 4
}

sub readfile ($) {                                  # 5
    my $f = shift;                                  # 6
    open F, "<$f" or return "[## $!: $f ##]";       # 7
    local $/;                                       # 8
    my $t = <F>;                                    # 9
    close F;                                        # 10
    return $t;                                      # 11
}
EOF

# 1. -C option specifies text encodings. See perlrun manpage for details.

S = IOE = STDIN, STDOUT and STDERR are all assumed to be in UTF-8.

D = io = UTF-8 is the default PerlIO layer for input and output streams.

<<'EOF' denotes Here-document. See bash manpage for details.

- tells perl that the arguments follow - in case the program is given via STDIN. Here-document is a form of STDIN redirection.

# 2. <X> is line input operator. It reads line from filehandle X terminated by input record separator defined in global variable $/ whose default value is linefeed.

while(<>) is special form to mean while(<X>) where X is filehandle opened for every argument in command line argument list @ARGV or STDIN in case @ARGV is empty. Until X's end-of-file is reached, <X> returns the read line which is evaluated as true in boolean context. When X's end-of-file is reached, <X> returs undefined value which is evaluated as false in boolean context and terminate the while loop. In while(<>) loop, read line is assigned to global variable $_.

# 3. s/X/Y/oge; is a regular expression replacement statement, which here is short form of $_ =~ s/X/Y/oge;. $_ is the read line.

Options are as follows.

o = optimize, which means to compile the regexp pattern once.

g = global, which means to replace every occurence of X.

e = evaluate, which means Perl evaluate Y as an expression.

\$(.*?)\$ is regular expression pattern to match. *? is quantifier to indicate minimal match while * is maximal match.

&readfile($1) is subroutine call with $1 as its argument. $1 is back reference of the 1st regexp group matched in X; in this case the minimal string between two $'s. As a whole, this statement replaces the $filename$ with the contents read from filename.

# 4. print; is short form of print STDOUT $_;, which prints the current line to STDOUT.

# 5. subroutine definition. ($) means it accepts one scalar argument.

# 6. shift; is short form of shift @_;. @_ is argument list for subroutine and thus $f is set to the first item in argument list, which is supposed to be a filename. The function "my" limits the scope of the variable to the current block.

# 7. open filehandle for given filename. If it fails, this subroutine returns string "[## $!: $f ##]" where $! is interpolated by error message and $f by filename.

# 8. A safer way to undefine global variable $/ that is the input record separator whose default value is linefeed. The function "local" limits the scope of the effect of this statement to the current block. This changes the definitoin of "line" and lets the following statement (# 9) read the entire contents of the file as single line.

# 9. read line from the filehandle.

# 10. close the filehandle.

# 11. return the read contents.

I hope this may suffice.

And Perl is well worth learning.

Good luck,

H

Reply

Answer 5

Pelorus1 Author

Level 1

35 points

Feb 8, 2013 1:47 PM in response to Hiroto

Hello Hiroto,

thank you very much for such a great explanation. You have inspired me to put in effort to learn Perl.

My final solution however has been in sed. I found the 'r' flag which allows you to read in a file to replace your search pattern. The final code looked like this, with the 'd' flag deleting the found pattern. The quoting is incredibly sensitive and it _only_ works with quotes like this on Mac (ask me how I know 😉).

while read line1; do     sed '/'"@$line1@"'/ {             r '"$line1"'             d         }' <try.txt >delete.txt         cat delete.txt > try.txt done < search.txt

Where 'search.txt' contains the list of markers that we've extracted from the file and stripped back so that they are the filename; 'try.txt' is a working copy of the file containing the markers and 'delete.txt' is a file that we use to write each cycle of output before we send it to the input file again.

it may not be an elegant solution, perhaps I should use variables instead of files...but I do clean up afterwards and it is robust 🙂

Many thanks for everyone's assistance.

Regards

Mike

Reply

Answer 6

Hiroto

Level 5

7,461 points

Feb 9, 2013 11:54 AM in response to Pelorus1

Hello Mike,

No problem. And thanks for the feedback.

One thing to note, though. The sed's d command deletes the pattern space, that is initially the input line itself not limited to the matched part. In your case, the sed command deletes the whole line which contains the marker. If the marker is on its own line, it's fine. Otherwise not, I'm afraid.

E.g.

#!/bin/bash

t="Humpty Dumpty sat on a wall,
Humpty Dumpty had a great fall;
All the king's horses and all the king's men
Couldn't put Humpty together again."

s="horses"
r="<REPLACED>"

cd ~/desktop
echo "$r" > file.txt

echo "$t" | \
sed '/'"$s"'/ {
    r file.txt
    d
}'

Result:

Humpty Dumpty sat on a wall,
Humpty Dumpty had a great fall;
<REPLACED>
Couldn't put Humpty together again.

I say this because your original sample indicated the marker can be a part of a line.

If you have changed the structure of source text, that's fine and disregard this message.

Kind regards,

Hiroto

Reply

Answer 7

twtwtw

Level 5

4,936 points

Feb 9, 2013 1:15 PM in response to Pelorus1

sed gives me a headache. It has its place in the world, but I would never use it unless I had no other option. You do. Here's the same thing in AppleScript:

-- set main paths

set master_file to "/path/to/master file.txt"

set main_folder to "/path/to/folder/"

-- read master text, and break it down around '$' marks

set master_text to readmaster_file

set text_chunks to tid({input:master_text, delim:"$"})

-- loop through chunks, replacing file names with file contents

repeat with this_chunk in text_chunks

if this_chunk ends with ".txt" then

set contents of this_chunk to (read (main_folder & this_chunk))

end if

end repeat

-- write it all back out to the master file

set fp to open for accessmaster_file with write permission

writetid({input:text_chunks, delim:""}) tofp

close accessfp

on tid({input:input, delim:delim})

-- generic handler for text delimiters

set {oldTID, my text item delimiters} to {my text item delimiters, delim}

if class of input is list then

set output to input as text

else

set output to text items of input

end if

set my text item delimiters to oldTID

return output

end tid

A bit wordier, but so much easier on the brain.

Reply

Answer 8

Pelorus1 Author

Level 1

35 points

Feb 9, 2013 5:35 PM in response to twtwtw

Hi Hiroto,

as it happens that's exactly the behaviour that I want...but I understand how it could be a real issue!

Hi twtwtw,

It's interesting that you say AppleScript is easier on your brain...I find it all too hard!!

My problem is that this solution is designed to be menu-driven and accessed from an ssh session from iPad. I use the same menu-driven interface to send Markdown docs to docx (or LaTeX or HTML or PDF or ePub...) using Pandoc; to generate Pandoc Markdown tables using R and in this case to insert those tables into Markdown documents.

It works superbly for my wife as well and she doesn't normally use Terminal. I'm not aware that I can use AppleScript as easily (or perhaps at all really) from an ssh session.

Regards

Mike

Reply

Answer 9

twtwtw

Level 5

4,936 points

Feb 9, 2013 5:46 PM in response to Pelorus1

I suppose it's what you're used to. The difference is that sed (which I can program; I just don't like to) makes you talk like a machine, whereas applescript at least throws a bone to human language structures. 🙂

You can use applescript from any terminal using osascript: just enter osascript followed by the script path. You can pass it parameters just like a shell script as well, though you need to twiddle the script a bit to get it to see the parameters. There's no problem using the script over ssh if the script is resident on the remote machine, but you may run into security issues if you try to run a local applescript on a remote machine (or maybe not, I've never actually tried).

But use what makes you comfortable. I only added this because (as I said) sed gives me a headache, and I wanted a more me-friendly version out there for other sedophobes.

Reply

Answer 10

rccharles

Level 6

12,957 points

Feb 10, 2013 10:57 AM in response to twtwtw

I'd throw out learning Python.

Can program in a procedural style, has a large library of functions/methods, has a consistant systax, and designed to avoid syntax surprises.

Robert

Reply

Answer 11

rccharles

Level 6

12,957 points

Feb 28, 2013 11:59 AM in response to rccharles

With help from the Stack Overflow community and in particular sotapme, I am presenting a Python solution.

http://stackoverflow.com/questions/15098789/python-regular-expression-for-r-find all

#!/usr/bin/env python

#
# https://discussions.apple.com/thread/4780717?answerId=21202021022#21202021022
#
# fyi:
# Python use /n as record delimiter.  It converts other record delimiters
#   as needed.

import argparse
import datetime
import re
import sys
import time

# ------------------------------------------------------
#  
# Learning Python: Powerful Object-Oriented Programming [Paperback]
#   by Mark Lutz 
#
# The Python Standard Library by Example (Developer's Library) [Paperback]
#   by Doug Hellmann
#   It's online at:
#     http://www.doughellmann.com/PyMOTW/contents.html
#
# ------------------------------------------------------


# ----------------------------------------------------------
# insert include file
def readFile ( m ):
    global debug
    if debug >= 2 :
        print "in readFile.  ", m.group(1)
    # read insertion file
    with open( m.group(1),"r" ) as moreDataFile:
        moreData = moreDataFile.read()
    return moreData
    
    
# ============================== Main =======================
    
# Parse input arguments
# -h to print help.  ( Automatically generated )
parser = argparse.ArgumentParser(
    description="Merge Demonstration code.",
    epilog="See Apple discussions." +   
    "  https://discussions.apple.com/thread/4780717?answerId=21202021022#21202021022",
    version="version 0.999")


parser.add_argument('-d', 
                    action="store", 
                    dest="debug",
                    default=0,
                    type=int,
                    choices=(0,1,2),
                    help="debug levels: 0 for no debugging, 1 minimal but reasonable amount, 2 everything thinkable")

parser.add_argument('-if', 
                    action="store", 
                    dest="inputFile",
                    default="",
                    help="input filename")
                    
parser.add_argument('-of', 
                    action="store", 
                    dest="outputFile",
                    default="mergedOutput.txt",
                    help="output filename")                

options = parser.parse_args()

debug = options.debug

if debug >= 1 :
    print "Welcome to " + __file__ + "  " + str( datetime.datetime.now() )

if debug >= 2 :
    print "input arguements:", sys.argv
    
if options.inputFile == "" :
    print "You need to specify an input file.  -h for help"
    sys.exit(1)

# Read in complete file
# explicitly closed when block is down.
with open( options.inputFile,"r" ) as mergeInputData:
    allData = mergeInputData.read()

mergedData = re.sub(r'\$(.*?)\$', readFile, allData)

# Output mergedData
with open( options.outputFile,"w" ) as mergeOutputFile:
    mergeOutputFile.write(mergedData)

Reply