Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Applescript: Extract data from pdf

I am trying to extract a string from a pdf, and then rename the pdf with that string. The string varies in length, but always comes between "Name:" and "ID:"

Ideally I could drop a pdf with multiple pages, and it would extract the individual sheets and rename as new documents with this string.


From another thread, I've tried using this shell script (the Citing Patent and Classifications were the delimiters):

for f in "$@"

do

echo "$f" >> ~/Desktop/Patent01.txt

cat "$f" | sed -n '/Citing Patent/,/CLASSIFICATIONS/p' | sed 's/CLASSIFICATIONS//p' >> ~/Desktop/Patent01.txt

done

Thanks!

iMac, OS X Yosemite (10.10.1)

Posted on Feb 24, 2015 5:46 PM

Reply
2 replies

Feb 25, 2015 2:30 PM in response to gandrew1055

Replying to self with progress, and maybe someone can help.

Drop the single page pdf onto the script; it calls an automator which converts the pdf to plaintext; applescript then reads the txt file for the data I want; last step is to rename the original file with the string that I want (a name).


I'm getting error -10006, can't rename. Below is the script and a screenshot of the automator.



on openfileList

tell application "Finder"



--set thePDFfile to (choose file)

repeat with thePDFfile in fileList

set theInfo to info forthePDFfile

set theFile to name of theInfo


set qtdstartpath to quoted form of (POSIX path of thePDFfile)

set workflowpath to "/Users/Galen/PDFextract/NoInput.workflow"

set qtdworkflowpath to quoted form of (POSIX path of workflowpath)

set command to "/usr/bin/automator -i " & qtdstartpath & " " & qtdworkflowpath

set output to do shell scriptcommand



--do shell script "automator /Users/Galen/PDFextract/NoInput.workflow"

set AppleScript'stext item delimiters to "

set thetext to text items of (read "/Users/Galen/Desktop/ExtractOutput.txt")

set studentName to item 2 of thetext

set AppleScript'stext item delimiters to " "

set thetext to text items of studentName

set lastName to item 2 of thetext

set firstName to item 3 of thetext

set lastFirst to (lastName & " " & firstName)


--return lastFirst

set AppleScript'stext item delimiters to ""



set the name of theFile to ((lastFirst as text) & ".pdf")

end repeat

end tell


end open

User uploaded file

Feb 25, 2015 3:37 PM in response to gandrew1055

I'm getting error -10006, can't rename. Below is the script and a screenshot of the automator.

Well, let's look at that line:


set the name of theFile to ((lastFirst as text) & ".pdf")

At face value, this should tell the Finder (yep, we checked we're in a tell block) to rename theFile to the relevant file name.


So where could this go wrong? Let's look at what theFile is...


set theFile to name of theInfo

Ooops. theFile is the name of the file - 'some.pdf'. Nothing more. The Finder cannot change a filename based on the name alone - it needs a full path (there could be hundreds of "some.pdf" files on disk. So there's your error. Looking at the script, changing this one line to:


set the name of thePDFfile to ((lastFirst as text) & ".pdf")

should take care of it.

Applescript: Extract data from pdf

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.