You can make a difference in the Apple Support Community!

When you sign up with your Apple Account, you can provide valuable feedback to other community members by upvoting helpful replies and User Tips.

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Find every number in a PDF?

Hey Apple Support,

I was given the task of finding every number in a 70 page PDF document. Instead of just going through manually, would there be a way of searching for ANYTHING including a number? I don't want it to find a specific number, I want it to find every single number in the document. Is there any way this is possible? You can link any sort of software that may help with this. I also do have Acrobat Pro, so that is an option for use. Let me know what you got!

MacBook Pro, OS X Mountain Lion (10.8.3)

Posted on Mar 19, 2014 12:55 PM

Reply
22 replies

Mar 19, 2014 5:39 PM in response to KooKilla

This Automator Workflow will find any number (including page numbers)

It will convert the PDF to a text file on the Desktop, then use grep to output a report of only the numbers found (the "o" grep option. The line number of where found will be printed the "n" grep option) and will be printed to a file named Report.txt on the Desktop


User uploaded file



The Run Shell Script Action is:



textutil -stdout -convert txt "$1" | grep -Eno  '[0-9]{1,}' > Desktop/Report.txt


If you want to see the text on the line with the number use: grep -En

Mar 20, 2014 5:11 PM in response to KooKilla

This bash script will work on the Report.txt file created with Automator:


#!/bin/bash

while read line
do
    name=$line
    s=$( echo "$name" | grep -Eo  '[0-9]{1,}' )
    a=($s)
    for i in "${a[@]}"
     do
         printf "%s %s\n" "$i": "$line" >> ~/Desktop/Report2.txt
     done
done < ~/Desktop/Report.txt
sort ~/Desktop/Report2.txt > ~/Desktop/ReportSorted.txt


This can easily be incorporated into the original Automator Workflow.

Just replace the Run Shell Script Action with:


textutil -stdout -convert txt "$1" | grep -E  '[0-9]{1,}' > ~/Desktop/Report.txt
while read line
do
    name=$line
    s=$( echo "$name" | grep -Eo  '[0-9]{1,}' )
    a=($s)
    for i in "${a[@]}"
  do
  printf "%s %s\n" "$i": "$line" >> ~/Desktop/Report2.txt
  done
done < ~/Desktop/Report.txt
sort ~/Desktop/Report2.txt > ~/Desktop/ReportSorted.txt
rm "$1" ~/Desktop/Report.txt ~/Desktop/Report2.txt


The Workflow is:


User uploaded file

User uploaded file


So, if the PDF is:


"I am 20. My friend is 30. Her husband is 40."

"I am 30. My friend is 40. Her husband is 50."


ReportSorted.txt will be:


20: "I am 20. My friend is 30. Her husband is 40."

30: "I am 20. My friend is 30. Her husband is 40."

30: "I am 30. My friend is 40. Her husband is 50."

40: "I am 20. My friend is 30. Her husband is 40."

40: "I am 30. My friend is 40. Her husband is 50."

50: "I am 30. My friend is 40. Her husband is 50."

Mar 20, 2014 5:56 AM in response to KooKilla

Another approach.


Using Adobe Reader (and I imagine it's true for Acrobat Pro as well):


Next to the Find field, click the dropdown triangle, then select the Open Full Reader (or perhaps Acrobat?) Search option. You'll see a new search window.


In that window, click Use Advanced Search Options. You'll see additional options pop up. Set options as follows:


Look in: The Current Document

What word or phrase . . . ? 0 1 2 3 4 5 6 7 8 9

Return results containing: Match Any of the words


Click Search to find all the numbers.

Mar 20, 2014 11:38 AM in response to Tony T1

Beginners are often unaware of the power of many built-in features in OS X that are automatically inherited by Cocoa apps. Unfortunately, they often think 3rd party apps with blatant but over-complicated UIs are more powerful.


In part, this is due to Apple's unfortunate reluctance to provide "in your face" user documentation (after all, they claim it should all be 'intuitive') that would help you discover these things for yourself; and in part, its due to their most admirable design-choice to keep UIs as minimalist as possible.


You win some, you lose some. 😉

Mar 20, 2014 12:03 PM in response to KooKilla

KooKilla wrote:


Wow the automator workflow worked great! Would there be any way that I can organize this stuff now using Excel or Numbers or something else to create a chart? Something that will take anything that has like the number 16 and put it into a 16 category with all other lines that include that number?


Possible, but I need an example of what your output looks like (redact any sensitive data), and an example of how you would like it organized in Excel.


(Also, keep in mind that the line numbers listed in the extracted text probably will not agree to a line number in the original PDF)

Mar 20, 2014 12:07 PM in response to Tony T1

I didn't use the line number options so thats fine. An example of what i may have for data is something like "Brain Development: Research shows that the portion of the brain that assesses risk and danger does not fully develop until the mid 20’s". I would want this to be put under a "20" category with all other points that mention something about 20.

Mar 20, 2014 12:21 PM in response to KooKilla

If you want to do something like this, you're going to have to plump for a particular application and a particular format, as the specific scripting answer is going to depend on those.


Preview is, alas, not highly scriptable, and the only way you're going to get anywhere close to what you're trying to do with that app is via some dodgy and unpredictable GUI scripting. There used to be (10.6 or so ) a highly "applescriptable" pdf app called 'Skim' that would have made this easier, but I'm not sure what its development status is now. I have no knowledge of scripting Adobe Acrobat, I'm afraid.


Ideally, if possible, you'd extract this data out of pdf into rtf or even better to plain txt format. If that were possible, then sorting your data becomes a fairly standard exercise. It's whether you can either find a scriptable pdf app or get the data into a more manageable format that really determines the next move.

Mar 20, 2014 12:33 PM in response to KooKilla

KooKilla wrote:


I didn't use the line number options so thats fine. An example of what i may have for data is something like "Brain Development: Research shows that the portion of the brain that assesses risk and danger does not fully develop until the mid 20’s". I would want this to be put under a "20" category with all other points that mention something about 20.


If you have this in plain text, then you're 90% of the way there. Let's assume you have something like


"I am 20. My friend is 30. Her husband is 40."


With this kind of data you can search by number and end up with:


10:

20: I am 20.

30: My friend is 30.

40: Her husband is 40.

50:



Is this what we're talking about? The more specific you can be the more tailored an answer we can provide.

Mar 20, 2014 12:36 PM in response to Phil Stokes

yeah something like this will work. I just need to get the numbers in the same group so I can put it into a table. If I can do something like you said, then thats easy as copy and pasting. Or it can also be like...


"I am 20. My friend is 30. Her husband is 40."


10:

20: "I am 20. My friend is 30. Her husband is 40."

30: "I am 20. My friend is 30. Her husband is 40."

40: "I am 20. My friend is 30. Her husband is 40."

50:


Where it will include the whole line and any other numbers. I just need to get the numbers and thier text with them in the same area so I can create a table.

Mar 20, 2014 5:10 PM in response to KooKilla

KooKilla wrote:


yeah something like this will work. I just need to get the numbers in the same group so I can put it into a table. If I can do something like you said, then thats easy as copy and pasting. Or it can also be like...


"I am 20. My friend is 30. Her husband is 40."


10:

20: "I am 20. My friend is 30. Her husband is 40."

30: "I am 20. My friend is 30. Her husband is 40."

40: "I am 20. My friend is 30. Her husband is 40."

50:


Where it will include the whole line and any other numbers. I just need to get the numbers and thier text with them in the same area so I can create a table.


Got it. You just need to make a change to the Run Shell Script Action in the Automator Workflow.

See the last post below.

Find every number in a PDF?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.