Using the
POSIX path of your file name alters it by adding a leading slash, so your path isn't being set correctly (the original use is to convert from a Finder path). Scripts run from the
Terminal can't have user interaction (e.g. the choose file dialog), so to use it there you can cheat a bit by asking another application to do that part:
tell application "System Events" to set input to POSIX path of (choose file)
You can also use the commands directly, dragging the various files to the
Terminal window (this pastes in the path to the file), or use a here-document.
I didn't think you were familiar with the
Terminal though, which is why I used an
AppleScript. You can just paste the following script into the *AppleScript Editor* application and run it from there, or save it as an application. The script can also be used in an
Automator workflow (for example as a Service), although as I mentioned earlier
Automator chokes pretty bad on large files if you were wanting to use the results in another action. Post back with more details about your workflow to to see about wrapping it around the shell script.
<pre style="
font-family: Monaco, 'Courier New', Courier, monospace;
font-size: 10px;
font-weight: normal;
margin: 0px;
padding: 5px;
border: 1px solid #000000;
width: 720px;
color: #000000;
background-color: #DAFFB6;
overflow: auto;"
title="this text can be pasted into the AppleScript Editor">
set input to POSIX path of (choose file)
set output to POSIX path of (((path to desktop folder) as text) & "results.txt")
do shell script "grep -v '[!@#$%&?/.:]' " & quoted form of input & " | sed 's/+/" & return & "/g' > " & quoted form of output
-- do shell script "grep -E '(([ACTG]{6,})|(^\+$))' " & quoted form of input & " | sed 's/+/" & return & "/g' > " & quoted form of output
</pre>
In the above script I included a couple of different
do shell scripts in case something slips through (just use one of them at a time, though). The first one uses an option that reverses the
grep matching (it selects lines that do
not match) and looks for lines that contain various characters that are only in the metadata lines. It then pipes the results to
sed, which converts the lines that are "+" (this character was not included in the previous search) into an extra return.
The second
do shell script (commented) is another one that I was playing with. it differs in that grep looks for lines that contain 6 or more of any of the base pair characters, and lines that consist of only a single "+" character.
I've tested both of these on the example text you posted earlier, and the large file that
mns579 found - both methods appear to work OK (I didn't go through the entire results for the large document).