Currently Being ModeratedOct 9, 2012 5:09 AM (in response to SadLion)
Are either or both files compressed?
The grep(1) man page indicates that fgrep is faster than grep or eqrep on fixed patterns. Try the following to see if your elapsed time improves.Code:
fgrep -x -f patternfile basefile > outputfile
Currently Being ModeratedOct 9, 2012 5:13 AM (in response to VikingOSX)
fgrep -x -f patternfile basefile > outfileMac mini, OS X Mountain Lion (10.8.2), • 8GB • Vertex 4 128GB SSD • 500GB
Currently Being ModeratedOct 9, 2012 6:11 AM (in response to VikingOSX)
No, neither file is compressed and I'm afraid fgrep has made no significant difference in that it is still running 2 hours later when grep under snow leopard took less than a minute to do the same job.
Currently Being ModeratedOct 9, 2012 7:02 AM (in response to SadLion)
Is it possible to restructure the pattern file in a highest to lowest probability of matches to the basefile?
Mountain Lion has shed GNU grep (Snow Leopard) for FreeBSD grep due to licensing whims outside of Apple. The Perl-Regex (-P) code that made the Snow Leopard grep so fast was the sacrificial lamb. Even though you stated you were not explicitly using the -P option in Snow Leopard, it may have been implicitly used in the pattern matching process anyway.
I have spent some time now with Google search attempting to find any optimization techniques for FreeBSD grep and so far --- mostly rhetoric. It is possible to download the source to GNU grep and build it locally with Xcode command line tools, but it also has some pre-requisites and poor build documentation that make this a migraine.
If you have language skills, perhaps you can write something in Ruby, Python, or Perl that allows you to create a faster pattern matching solution than offered by FreeBSD grep.
Currently Being ModeratedOct 9, 2012 7:14 AM (in response to SadLion)
Test in safe mode. Same?
Currently Being ModeratedOct 9, 2012 8:02 AM (in response to VikingOSX)
No, I'm araid there is no way of structuring the pattern file as there are no patterns intrinsically more likely to match than others (it's biological data).
Interestingly the pattern file is generated by running awk on a different file also ~25,000 rows in length and it works just fine in well under a minute. I'm therefore inclined to agree that the new free grep is the problem - it's obviously rubbish.
I have very basic knowledge of terminal. Is there a way I can use awk to do the job of grep? I think I tried at the beginning and ran into problems because the patterns to be matched are so odd. Here's an example of a typical pattern....
(Note that the first quotation mark is part of the pattern to be matched)
I would need awk take each pattern in turn from a file with around 20,000 patterns as weird as the one above and then search for it in a separate csv file and, if it found that pattern (the whole pattern that is, not part of it) in any field of the file (unlike before, it doesn't have to be the only thing in the field; there may be more text before or after), then to print the the whole row to a new file. Does any one know how to do that?
Old grep did do that job easily in under a minute. I have a new found respect for whoever wrote old grep.
Currently Being ModeratedOct 10, 2012 12:13 PM (in response to SadLion)
Short of downloading the latest stable GNU grep, compiling it, and installing it in /usr/local/bin so it doesn't step on the FreeBSD grep, there probably is no liveable grep solution for you on Mountain Lion. Depending on coding skill and energy, you may or may not have an AWK or other reqular expression solution working before the next paragraph.
I just revisited 10.7.5 and it has GNU grep. If you had originally purchased Lion, then you could redownload it to Mountain Lion (but not install it), burn it to a USB stick, and then perform a clean 10.7.5 install on an external USB drive. Then, reboot from the 10.7.5 drive, and let fly with GNU grep as before. Based on the amount of time the FreeBSD grep is costing you, the Lion download and installation would probably finish first.
Currently Being ModeratedNov 10, 2012 5:05 PM (in response to SadLion)
I had the same problem on Moutain Lion. Using grep -v -f patterns-file on my 24MB test data file took >10s. I then used the grep from my Lion partition and it took only 0.2s!
Currently Being ModeratedNov 27, 2012 7:17 PM (in response to codinglamp)
Here is my solution: using MacPorts, https://www.macports.org, install the GNU version of grep.
$ sw_vers -productVersion
$ sudo port install grep
$ which grep
$ grep --version
grep (GNU grep) 2.14
Copyright (C) 2012 Free Software Foundation, Inc.
Currently Being ModeratedJan 2, 2013 7:25 AM (in response to vogelw)
Thanks, Vogelw. That worked perfectly and was relatively easy.