Using Ruby, Rename PDF with text from input PDF file

I have this script running pretty well thanks to the great support on this site. It currently opens a PDF, searches for a text string, and then saves the PDF with the string as the file name.


For a different project I need to check the string matches the original filename, so my thoughts are to read the string and then append it to the original filename like this: Originalname_Foundstring.pdf - then I can take over and compare the first and second parts of the filename.


This script searches for a string beginning "1000…." then saves the file with a new name based on the string.


How can I grab the input filename and then append the string like this: Originalname_Foundstring.pdf?

(I am only a beginner in Applescript and even more clueless in Ruby script).



_main()

on _main()

script o

property aa : choose file with prompt ("Choose PDF Files.") of type {"com.adobe.pdf"} ¬

default location (path to desktop) with multiple selections allowed


set my aa's beginning to choose folder with prompt ("Choose Destination Folder.") ¬

default location (path to desktop)


set args to ""

repeat with a in my aa

set args to args & a's POSIX path's quoted form & space

end repeat


considering numeric strings

if (system info)'s system version < "10.9" then

set ruby to "/usr/bin/ruby"

else

set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby"

end if

end considering


do shell script ruby & " <<'EOF' - " & args & "

require 'osx/cocoa'

include OSX

require_framework 'PDFKit'



outdir = ARGV.shift.chomp('/')



ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|

url = NSURL.fileURLWithPath(f)

doc = PDFDocument.alloc.initWithURL(url)

path = doc.documentURL.path

pcnt = doc.pageCount

(0 .. (pcnt - 1)).each do |i|

page = doc.pageAtIndex(i)

page.string.to_s =~ /1000\\S*/

name = $&

unless name

puts \"no matching string in page #{i + 1} of #{path}\"

next # ignore this page


end

newname = name[1..-1]


doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page

unless doc1.writeToFile(\"#{outdir}/#{newname}.pdf\")

puts \"failed to save page #{i + 1} of #{path}\"

end

end

end

EOF"

end script

tell o to run

end _main

Posted on Apr 14, 2015 7:11 AM

Reply
24 replies

Dec 7, 2016 1:16 AM in response to Phillip Briggs

Hello


Or you may directly interpolate ARGV[2] into pattern like this:



page.string.to_s =~ /#{ARGV[2]}\S*/




But more safely, you'd better build pattern without using string interpolation, which is to evaluate tainted string given via run-time argument as arbitrary Ruby expression, and with escaping regular expression meta characters in the string as follows:



page.string.to_s =~ Regexp.new(Regexp.escape(ARGV[2]) + '\S*')




All the best,

H

Dec 7, 2016 4:50 PM in response to Phillip Briggs

Hello Phillip Briggs,


You're quite welcome! 🙂


And you might try the following script to search the first page for the string and save the document (all pages) with new name in the specified output directory.



#!/bin/bash # # for Esko Automation Engine Script Runner # # $1 : input files separatated by : # $2 : output directory # $3.. : additional parameters # exit : 0 = OK, 1 = Warning, 2 = Error # ruby1=/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby ruby2=/System/Library/Frameworks/Ruby.framework/Versions/2.0/usr/bin/ruby [[ -e $ruby2 ]] && ruby=$ruby2 || ruby=$ruby1 $ruby <<'EOF' - "$@" # # ARGV[0] : input files separatated by : # ARGV[1] : output directory # ARGV[2..] : additional parameters # ARGV[2] => search string in page 1 # require 'osx/cocoa' include OSX require_framework 'PDFKit' ret = 0 outdir = ARGV[1].chomp('/') ARGV[0].split(':').select {|f| f =~ /\.pdf$/i }.each do |f| url = NSURL.fileURLWithPath(f) doc = PDFDocument.alloc.initWithURL(url) path = doc.documentURL.path bname = File.basename(f, File.extname(f)) page = doc.pageAtIndex(0) page.string.to_s =~ Regexp.new(Regexp.escape(ARGV[2]) + '\S*') name = $& unless name ret = [ret, 1].max $stderr.puts "No matching string in page 1 of #{path}." next # ignore this document end newname = name[1..-1] path1 = "#{outdir}/#{bname}_#{newname}.pdf" unless doc.writeToFile(path1) ret = [ret, 2].max $stderr.puts "Failed to save as #{path1}." end end exit ret EOF



Briefly tested under OS X 10.6.8.


Best wishes,

Hiroto

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Using Ruby, Rename PDF with text from input PDF file

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.