Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Can applescript search one file for non-matches in another file?

I'd like to know if it's possible for Applescript to do the following:


Take a text file with 600+ URLs and compare it to another text file with 100,000+ URLs and set out as a result any URL from the first small list that is NOT included in the second larger list?


If it is possible to do this, is there anyone out there willing to create such a script for a modest fee?

Posted on Oct 6, 2013 2:18 PM

Reply
46 replies

Oct 6, 2013 3:19 PM in response to cdworin

set f1 to POSIX path of (choose file with prompt "Select smaller file to compare:" default location alias (the path to desktop folder as text))

set f2 to POSIX path of (choose file with prompt "Select larger file to compare:" default location alias (the path to desktop folder as text))

do shell script "comm -2 -3" & space & f1 & space & f2 & space & "> ~/Desktop/diff.txt"



No charge 😉

Oct 6, 2013 4:01 PM in response to cdworin

Just remembered that the above will only work if both files are sorted.

This will sort the flles into temporary files "sorted1.txt" and "sorted1.txt" in ~/Library/Caches/TemporaryItems/ and then delete the temp files after comparison.

The result will be output to your Desktop as "diff.txt"



set f1 to POSIX path of (choose file with prompt "Select smaller file to compare:" default location alias (the path to desktop folder as text))

set f2 to POSIX path of (choose file with prompt "Select larger file to compare:" default location alias (the path to desktop folder as text))

set t1 to (POSIX path of (path to "temp" from user domain)) & "sorted1.txt"

set t2 to (POSIX path of (path to "temp" from user domain)) & "sorted2.txt"

do shell script "sort" & space & f1 & space & ">" & space & t1

do shell script "sort" & space & f2 & space & ">" & space & t2

do shell script "comm -2 -3" & space & t1 & space & t2 & space & "> ~/Desktop/diff.txt"

do shell script "rm" & space & t1 & space & t2

Oct 6, 2013 4:10 PM in response to Tony T1

Tony,


Thanks very much for this. I appreciate it. Unfortunately, I don't think your solution will work for me, as the URLs are embedded in a string of other characters, and sorting will not put the URLs in any kind of useful sort order.


To be more specific, my smaller list has hundreds of URLs in a list like this:


The Langham, Bostonhttp://images.travelnow.com/hotels/1000000/10000/2600/2558/2558_66_b.jpg

And the much longer list has 100,000 lines something like this:


4110|Resorts Casino Hotel Atlantic City||http://images.travelnow.com/hotels/1000000/50000/40200/40186/40186_12_b.jpg|Expedia Hotels|350|350||http://images.travelnow.com/hotels/1000000/50000/40200/40186/40186_12_t.jpg|False


I'm trying to determine if the image URL for The Langham, Boston (and the other 600 images for various), no longer appear embedded in the strings in the longer list.

Oct 6, 2013 4:21 PM in response to cdworin

Perhaps this will clarify what I want, at least conceptually:


Look for first URL in File1

Search File2 for that URL

If URL is found in File2, go to next step

If URL is not found in File2, output that URL to a list and go to next step


Find second URL in File1

repeat

repeat

repeat

..... until all URLs in File1 have been searched for in File2

Oct 6, 2013 6:15 PM in response to cdworin

So you are just interested in the URL's, not in any of the other text. And the output is just the URLs' not the surrounding text or anything else?


What I seeas one possible solution is to first go through each file and strip out the URL's into two separate files and then use those files to generate the list but this will only work if you are not interested in the context the URL's are in.

Oct 6, 2013 6:51 PM in response to Frank Caggiano

Frank,


You're correct that all I need as the output is the bare URL of any URL that appears in File1 but not in File2. Each week I don't expect that the script would generate more than a handful of results. So, if the output of the script was, for example, five URLs from File1 that no longer appear in File2 I could do a subsequent manual search of File2 for those five URLs and find the specific context.


But, if the pre-requisite to your possible solution is to strip out all the URLs from File2 it seems to me that the scripting required to accomplish that would be as complex as the script I'm requesting that simply searched File2 for each URL in File1, consecutively, and flagged in some way any URL search that results in a null set. But then I'm not a programmer... 🙂


Thanks,


Chris

Oct 6, 2013 7:28 PM in response to cdworin

OK really quick and dirty and will need work to make it a final script but this

(*


The patteren file is the smaller file with the URLs that we will look for in the URL file


*)


set file1 to POSIX path of (choose file with prompt "Select patteren file:" default location alias (the path to desktop folder as text))

set file2 to POSIX path of (choose filewith prompt "Select URL file:" default locationalias (the path todesktop folderastext))



do shell script "grep -o -E '(https?|ftp|file)://.+' " & file1 & " > ~/patternFile"


do shell script "grep -o -E '(https?|ftp|file)://.+' " & file2 & " > ~/urlFile"


do shell script "grep -v -f ~/patternFile ~/urlFile > ~/missingUrlFile"

I do believe will do what you want.


The first prompt will ask for the patteren file, that is the file that has the URLs that should be in the second file.

The second prompt will ask for the URL file, that is the file that has all the URLs


The output will be a file in your home directory call missingUrlFile and should have the URLs that are in the pattern file but not in the URL file.


As I said needs work, no error checking and the tempoary files are left behind so that if it doesn't work I can see what was going on.


Give it a shot and see what happens.


regards

Oct 6, 2013 7:56 PM in response to Frank Caggiano

Thanks so much, Frank. I tried putting the entire code you sent to me into the Applescript editor and ran it. It did appear to run and it successively asked me to select the two files, which I did. But it then found errors. Those files are on my desktop, so I'm not sure why the "no such files or directory" error message is coming up.


User uploaded file


I also tried it by deleting the first lines, so that the script started with "Set file1...." Same result.


Thanks,


Chris

Oct 6, 2013 8:25 PM in response to cdworin

Hard to read the screenshot. next time cut and paste the test into the reply. Also select Replies as the display for the window in Applescript.


The error you get, the no such file or directory are you sure that the file exists? Looks like the file is called List and it is in your desktop?


Just tried it here and it is working OK


Here is another copy just in case something got messed up in the first one:


(*


The patteren file is the smaller file with the URLs that we will lok for in the URL file


*)


set file1 to POSIX path of (choose file with prompt "Select patteren file:" default location alias (the path to desktop folder as text))

set file2 to POSIX path of (choose filewith prompt "Select URL file:" default locationalias (the path todesktop folderastext))



do shell script "grep -o -E '(https?|ftp|file)://.+' " & file1 & " > ~/patternFile"


do shell script "grep -o -E '(https?|ftp|file)://.+' " & file2 & " > ~/urlFile"


do shell script "grep -v -f ~/patternFile ~/urlFile > ~/missingUrlFile"



--- stop copying above this line

Just so you can see what the output in the Replies window of Applescript will look like:

tell current application


path todesktopastext


--> "Mac OS Lion:Users:frank:Desktop:"

end tell

tell application "AppleScript Editor"


choose file with prompt "Select patteren file:" default location alias "Mac OS Lion:Users:frank:Desktop:"


--> alias "Mac OS Lion:Users:frank:Desktop:f1"

end tell

tell current application


path todesktopastext


--> "Mac OS Lion:Users:frank:Desktop:"

end tell

tell application "AppleScript Editor"


choose file with prompt "Select URL file:" default location alias "Mac OS Lion:Users:frank:Desktop:"


--> alias "Mac OS Lion:Users:frank:Desktop:f2"

end tell

tell current application


do shell script "grep -o -E '(https?|ftp|file)://.+' /Users/frank/Desktop/f1 > ~/patternFile"


--> ""


do shell script "grep -o -E '(https?|ftp|file)://.+' /Users/frank/Desktop/f2 > ~/urlFile"


--> ""


do shell script "grep -v -f ~/patternFile ~/urlFile > ~/missingUrlFile"


--> ""

end tell

Result:

""


Message was edited by: Frank Caggiano - See Tony's post below

Oct 6, 2013 8:30 PM in response to Tony T1

That's probably it but do you see spaces in the filename he input?


I took the liberty of adding your changes into the script to make it easier for the OP, thanks.


Revised script with quoted form of added in



(*


The patteren file is the smaller file with the URLs that we will lok for in the URL file


*)


set file1 to POSIX path of (choose file with prompt "Select patteren file:" default location alias (the path to desktop folder as text))

set file2 to POSIX path of (choose filewith prompt "Select URL file:" default locationalias (the path todesktop folderastext))



do shell script "grep -o -E '(https?|ftp|file)://.+' " & quoted form of file1 & " > ~/patternFile"


do shell script "grep -o -E '(https?|ftp|file)://.+' " & quoted form of file2 & " > ~/urlFile"


do shell script "grep -v -f ~/patternFile ~/urlFile > ~/missingUrlFile"


Message was edited by: Frank Caggiano

Can applescript search one file for non-matches in another file?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.