Need help with applescript for folder action

Question

Level 1

8 points

Need help with applescript for folder action

As a non-programmer, I am trying to cobble together an applescript that will monitor a folder for any new pdf files and then convert them to TIFF format. Using another applescript, I will convert the TIFF back to a flat PDF and then run OCR on the PDF file to make it searchable. I have all the components of this workflow working except the applescript for PDF to TIFF conversion. I know there are are other ways to create the TIFF from the PDF but have had problems having all pages of the PDF included in the TIFF file. So, here is what I need help with.

The following script works as desired. It opens the pdf file in Preview, duplicates the PDF and saves the duplicate as in TIFF format. The script then deletes the original PDF file. Perfect except that I have to manually run the script every time I want to convert a PDF file.

set this_file to choose file without invisibles

tell application "Preview"

activate

openthis_file

end tell

tell application "System Events" to tell process "Preview"

clickmenu item 9 of menu 1 of menu bar item "File" of menu bar 1

tell sheet 1 of window 1

clickpop up button 1 of group 1

clickmenu item "TIFF" of menu of pop up button 1 of group 1

delay 0.5

clickbutton "Save"

delay 0.5

end tell

clickbutton 1 of window 1

end tell

--delete the original PDF file

tell application "Finder"

deletethis_file

end tell

So, then I decided to try to create a folder action script to monitor a folder and convert any added PDF file to TIFF format and then delete the original PDF file. Well, the following script works . . . sort of. It opens the PDF file in Preview. Preview duplicates the PDF file and saves the duplicate as a TIFF file. At this point things go haywire. Instead of closing Preview and deleting the original PDF, the following happens. Preview opens the original PDF file again and attempts to duplicate and convert the file again. The script hangs here with an open dialogue window requiring my intervention and the original PDF file is not deleted. Here is the buggy script:

on adding folder items tothis_folderafter receivingfilelist

tell application "System Events"

set these_files to (every file in this_folder whose file type is "PDF" or name extension is "pdf")

end tell

repeat with i from 1 to the count of these_files

set this_file to POSIX path of (itemi of these_files as alias)

tell application "Preview"

activate

openthis_file

end tell

tell application "System Events" to tell process "Preview"

clickmenu item 9 of menu 1 of menu bar item "File" of menu bar 1

tell sheet 1 of window 1

clickpop up button 1 of group 1

clickmenu item "TIFF" of menu of pop up button 1 of group 1

delay 0.5

clickbutton "Save"

delay 0.5

end tell

--Closes Preview of new TIFF file

clickbutton 1 of window 1

end tell

tell application "Finder"

deletethis_file

end tell

end repeat

end adding folder items to

Any help fixing this second applescript would be greatly appreciated.

Mac mini (Late 2012), iOS 10.3.2, Cinema Display

Posted on Jul 2, 2017 11:45 PM

Reply

Answer 1

VikingOSX

Level 10

123,470 points

Jul 21, 2017 9:27 AM in response to Hiroto

Hi Hiroto,

On OS X 10.11.6, and the standard System Python 2.7.10, the Python script (with your odir updates) gets an AttributeError as follows. It makes no difference if I use full, explicit paths in argv.

odin: ~/Desktop$ pdf2tiffh.py pythonbrew275.pdf Test

Traceback (most recent call last):

File "./pdf2tiffh.py", line 167, in <module>

main()

File "./pdf2tiffh.py", line 87, in main

idst = CGIO.CGImageDestinationCreateWithData(data, 'public.tiff', pcnt, None)

File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/PyO bjC/objc/_lazyimport.py", line 143, in __getattr__

raise AttributeError(name)

AttributeError: CGImageDestinationCreateWithData

Here is the entire __getattr__ function from _lazyimport.py:

Reply

Answer 2

Hiroto

Level 5

7,467 points

Jul 31, 2017 1:13 PM in response to VikingOSX

Hello

Ah, I've not noticed the transparent background issue in my brief tests with limited samples. Here's a revised code to fill the canvas with opaque white colour before drawing pdf content.

#!/usr/bin/python2.6 # coding: utf-8 # # file: # pdf2tiff.py # # function: # convert pdf to multi-page tiff file # # usage: # ./pdf2tiff.py pdf [pdf ...] [outdir] # # pdf : input pdf file # outdir : output directory # # * Input file without name extension as .pdf is ignored. # * If outdir directory is specified and present, tiff for each input file is saved in the directory # with the same basename without extension followed by '.tiff'; e.g., src/a.pdf => outdir/a.tiff. # Othewise, tiff is saved in the same directory as the original with new file name as # basename without extension followed by .tiff; e.g., src/a.pdf => src/a.tiff. # * Resolution of output tiff is defined by DPI constant, currently DPI = 150.0. # * If PDF_DISPOSITION == 1, source pdf is removed when its tiff version is created successfully. # * If LOGGING == 1, each conversion is logged to stdout. # # version: # 0.12 # - invoking /usr/bin/python2.6 in order to avoid lazy importer error of pybobjc bundled with python 2.7 # - added code to fill CGContext with white background before drawing PDF page # # 0.11c # - using CGImageDestinationCreateWithData() in lieu of CGImageDestinationCreateWithURL() and # invoking writeToURL:options:error: method on CFData so as to treat certain errors properly. # # 0.11a # - added code to handle source file disposition # - added code to log each conversion # 0.11 # - increased pyobjc performance (for recent versions of pyobjc) by using: # - import Quartz.CoreGraphics as CG # - import Quartz.ImageIO as CGIO # 0.10 # - draft # # written by Hiroto, 2017-07 # import sys, os import re, math, time import Quartz.CoreGraphics as CG import Quartz.ImageIO as CGIO def usage(): sys.stderr.write('Usage: %s pdf [pdf...] [outdir]\n' % os.path.basename(sys.argv[0])) sys.exit(2) def tstamp(): return time.strftime('%Y-%m-%d %H:%M:%S %Z', time.localtime()) def main(): uargv = [ a.decode('utf-8') for a in sys.argv ] odir = uargv.pop(-1).rstrip('/') if os.path.isdir(uargv[-1]) else None if len(uargv) < 2: usage() err = 0 cspace = CG.CGColorSpaceCreateWithName(CG.kCGColorSpaceAdobeRGB1998) blanc1 = CG.CGColorCreate(cspace, [1.0, 1.0, 1.0, 1.0]) for f in [ a for a in uargv[1:] if re.search(r'\.pdf$', a, re.I | re.U) ]: url = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f, CG.kCFURLPOSIXPathStyle, False) pdf = CG.CGPDFDocumentCreateWithURL(url) if not pdf: err += 1 sys.stderr.write('%-24s Not a pdf file: %s\n' % (tstamp(), f.encode('utf-8'))) continue pcnt = CG.CGPDFDocumentGetNumberOfPages(pdf) if pcnt < 1: continue # ignore blank pdf n = os.path.basename(f) m, ext = os.path.splitext(n) if not odir: odir = os.path.dirname(f) if odir != '': f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s.tiff' % (m) data = CG.CFDataCreateMutable(CG.kCFAllocatorDefault, 0) idst = CGIO.CGImageDestinationCreateWithData(data, 'public.tiff', pcnt, None) err1 = 0 for i in range(0, pcnt): page = CG.CGPDFDocumentGetPage(pdf, i + 1) if not page: err += 1 err1 += 1 sys.stderr.write('%-24s Could not get page %d of %s\n' % (tstamp(), i + 1, f.encode('utf-8'))) continue scale = DPI / 72.0 r = CG.CGPDFPageGetBoxRect(page, CG.kCGPDFMediaBox) w = math.ceil(r.size.width * scale) h = math.ceil(r.size.height * scale) ctx = CG.CGBitmapContextCreate( None, w, h, 8, 0, cspace, CG.kCGImageAlphaPremultipliedLast) CG.CGContextSaveGState(ctx) # fill white CG.CGContextSetFillColorWithColor(ctx, blanc1) CG.CGContextFillRect(ctx, CG.CGRectMake(0.0, 0.0, w, h)) # draw pdf page CG.CGContextScaleCTM(ctx, scale, scale) CG.CGContextDrawPDFPage(ctx, page) CG.CGContextRestoreGState(ctx) cgi = CG.CGBitmapContextCreateImage(ctx) CGIO.CGImageDestinationAddImage(idst, cgi, { CGIO.kCGImagePropertyDPIHeight : DPI, CGIO.kCGImagePropertyDPIWidth : DPI, CGIO.kCGImagePropertyTIFFDictionary : { CGIO.kCGImagePropertyTIFFCompression : 5 # kCGImagePropertyTIFFCompression # 1 : no compression # 5 : LZW # 32773 : PackBits } }) del ctx del cgi b = CGIO.CGImageDestinationFinalize(idst) del idst del pdf if not b: err += 1 sys.stderr.write('%-24s Failed to finalize destination data: %s\n' % (tstamp(), f1.encode('utf-8'))) del data continue url1 = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f1, CG.kCFURLPOSIXPathStyle, False) b, e = data.writeToURL_options_error_(url1, 0, None) del data if not b: err += 1 sys.stderr.write('%-24s Failed to write destination file: %s: %s\n' % (tstamp(), f1.encode('utf-8'), e.description().encode('utf-8'))) continue if err1 == 0 and PDF_DISPOSITION == 1: # source file dispositon try: os.remove(f) except Exception as e: sys.stderr.write('%-24s Failed to delete source file: %s: %s\n' % (tstamp(), f.encode('utf-8'), e.__repr__().encode('utf-8'))) if LOGGING == 1: sys.stdout.write('%-24s Completed conversion: %s => %s\n' % (tstamp(), f.encode('utf-8'), f1.encode('utf-8'))) sys.exit(1 if err > 0 else 0) DPI = 150.0 PDF_DISPOSITION = 0 # PDF_DISPOSITION # 0 : leave pdf alone # 1 : remove pdf if tiff is created successfully LOGGING = 1 # LOGGING # 0 : without logging # 1 : with logging main()

---

And here's its AppleScript wrapper if it helps the original poster.

--APPLESCRIPT on run choose file of type {"pdf"} with multiple selections allowed pdf2tiff(result) end run on open aa repeat with a in aa if a's POSIX path ends with ".pdf" then set a's contents to a as alias else set a's contents to false end if end repeat if (count aa's aliases) > 0 then pdf2tiff(aa's aliases) end open on adding folder items to d after receiving aa repeat with a in aa if a's POSIX path ends with ".pdf" then set a's contents to a as alias else set a's contents to false end if end repeat if (count aa's aliases) > 0 then pdf2tiff(aa's aliases & (d's POSIX path & "tiff")) end adding folder items to on pdf2tiff(argv) (* list argv : list of alias or POSIX path of pdf files, optionally last item as output directory *) set args to "" repeat with a in argv if a's class is in {alias, «class bmrk»} then set a to a's POSIX path set args to args & a's quoted form & space end repeat do shell script "/bin/bash -s <<'EOF' - " & args & " LOG=${1%/*}/_log.txt exec >> \"$LOG\" 2>&1 /usr/bin/python2.6 <<'END' - \"$@\"; exit 0 # coding: utf-8 # # file: # pdf2tiff.py # # function: # convert pdf to multi-page tiff file # # usage: # ./pdf2tiff.py pdf [pdf ...] [outdir] # # pdf : input pdf file # outdir : output directory # # * Input file without name extension as .pdf is ignored. # * If outdir directory is specified and present, tiff for each input file is saved in the directory # with the same basename without extension followed by '.tiff'; e.g., src/a.pdf => outdir/a.tiff. # Othewise, tiff is saved in the same directory as the original with new file name as # basename without extension followed by .tiff; e.g., src/a.pdf => src/a.tiff. # * Resolution of output tiff is defined by DPI constant, currently DPI = 150.0. # * If PDF_DISPOSITION == 1, source pdf is removed when its tiff version is created successfully. # * If LOGGING == 1, each conversion is logged to stdout. # # version: # 0.12 # - invoking /usr/bin/python2.6 in order to avoid lazy importer error of pybobjc bundled with python 2.7 # - added code to fill CGContext with white background before drawing PDF page # # 0.11c # - using CGImageDestinationCreateWithData() in lieu of CGImageDestinationCreateWithURL() and # invoking writeToURL:options:error: method on CFData so as to treat certain errors properly. # # 0.11a # - added code to handle source file disposition # - added code to log each conversion # 0.11 # - increased pyobjc performance (for recent versions of pyobjc) by using: # - import Quartz.CoreGraphics as CG # - import Quartz.ImageIO as CGIO # 0.10 # - draft # # written by Hiroto, 2017-07 # import sys, os import re, math, time import Quartz.CoreGraphics as CG import Quartz.ImageIO as CGIO def usage(): sys.stderr.write('Usage: %s pdf [pdf...] [outdir]\\n' % os.path.basename(sys.argv[0])) sys.exit(2) def tstamp(): return time.strftime('%Y-%m-%d %H:%M:%S %Z', time.localtime()) def main(): uargv = [ a.decode('utf-8') for a in sys.argv ] odir = uargv.pop(-1).rstrip('/') if os.path.isdir(uargv[-1]) else None if len(uargv) < 2: usage() err = 0 cspace = CG.CGColorSpaceCreateWithName(CG.kCGColorSpaceAdobeRGB1998) blanc1 = CG.CGColorCreate(cspace, [1.0, 1.0, 1.0, 1.0]) for f in [ a for a in uargv[1:] if re.search(r'\\.pdf$', a, re.I | re.U) ]: url = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f, CG.kCFURLPOSIXPathStyle, False) pdf = CG.CGPDFDocumentCreateWithURL(url) if not pdf: err += 1 sys.stderr.write('%-24s Not a pdf file: %s\\n' % (tstamp(), f.encode('utf-8'))) continue pcnt = CG.CGPDFDocumentGetNumberOfPages(pdf) if pcnt < 1: continue # ignore blank pdf n = os.path.basename(f) m, ext = os.path.splitext(n) if not odir: odir = os.path.dirname(f) if odir != '': f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s.tiff' % (m) data = CG.CFDataCreateMutable(CG.kCFAllocatorDefault, 0) idst = CGIO.CGImageDestinationCreateWithData(data, 'public.tiff', pcnt, None) err1 = 0 for i in range(0, pcnt): page = CG.CGPDFDocumentGetPage(pdf, i + 1) if not page: err += 1 err1 += 1 sys.stderr.write('%-24s Could not get page %d of %s\\n' % (tstamp(), i + 1, f.encode('utf-8'))) continue scale = DPI / 72.0 r = CG.CGPDFPageGetBoxRect(page, CG.kCGPDFMediaBox) w = math.ceil(r.size.width * scale) h = math.ceil(r.size.height * scale) ctx = CG.CGBitmapContextCreate( None, w, h, 8, 0, cspace, CG.kCGImageAlphaPremultipliedLast) CG.CGContextSaveGState(ctx) # fill white CG.CGContextSetFillColorWithColor(ctx, blanc1) CG.CGContextFillRect(ctx, CG.CGRectMake(0.0, 0.0, w, h)) # draw pdf page CG.CGContextScaleCTM(ctx, scale, scale) CG.CGContextDrawPDFPage(ctx, page) CG.CGContextRestoreGState(ctx) cgi = CG.CGBitmapContextCreateImage(ctx) CGIO.CGImageDestinationAddImage(idst, cgi, { CGIO.kCGImagePropertyDPIHeight : DPI, CGIO.kCGImagePropertyDPIWidth : DPI, CGIO.kCGImagePropertyTIFFDictionary : { CGIO.kCGImagePropertyTIFFCompression : 5 # kCGImagePropertyTIFFCompression # 1 : no compression # 5 : LZW # 32773 : PackBits } }) del ctx del cgi b = CGIO.CGImageDestinationFinalize(idst) del idst del pdf if not b: err += 1 sys.stderr.write('%-24s Failed to finalize destination data: %s\\n' % (tstamp(), f1.encode('utf-8'))) del data continue url1 = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f1, CG.kCFURLPOSIXPathStyle, False) b, e = data.writeToURL_options_error_(url1, 0, None) del data if not b: err += 1 sys.stderr.write('%-24s Failed to write destination file: %s: %s\\n' % (tstamp(), f1.encode('utf-8'), e.description().encode('utf-8'))) continue if err1 == 0 and PDF_DISPOSITION == 1: # source file dispositon try: os.remove(f) except Exception as e: sys.stderr.write('%-24s Failed to delete source file: %s: %s\\n' % (tstamp(), f.encode('utf-8'), e.__repr__().encode('utf-8'))) if LOGGING == 1: sys.stdout.write('%-24s Completed conversion: %s => %s\\n' % (tstamp(), f.encode('utf-8'), f1.encode('utf-8'))) sys.exit(1 if err > 0 else 0) DPI = 150.0 PDF_DISPOSITION = 1 # PDF_DISPOSITION # 0 : leave pdf alone # 1 : remove pdf if tiff is created successfully LOGGING = 1 # LOGGING # 0 : without logging # 1 : with logging main() END EOF" end pdf2tiff --END OF APPLESCRIPT

All the best,

Hiroto

PS. As for system's python version under 10.13, I found this:

https://forums.developer.apple.com/thread/78891

Reply

Answer 3

red_menace

Level 6

17,066 points

Jul 3, 2017 3:51 AM in response to castrohouse

First, adding to or renaming files in a watched folder will result in the folder action being triggered again (which is your main problem), so you don't want to do your processing in the watched folder. In the script below, items are moved to a processing folder in the watched folder to prevent triggering the folder action again - change the folderName property to whatever you like, but the folder needs to already be in the watched folder. Also, any items dropped into a watched folder are already passed to the folder action, so there is no need to go through all the other files in the folder.

For this kind of thing I start with a template that has run and open handlers in it, so that the script can be run as an applet/droplet/folder action or from the Script Editor for testing. The final result would be something like:

#
# Convert a PDF file to TIFF.
# The converted file will be in the same location as the original - if a folder action, the
# converted file will be in a process folder in the watched folder, which must already exist.
# The original PDF file(s) are moved to the trash.
#

property folderName : " Processed Items" -- the name of the processing folder (leading space to sort to the top in list/column view)
global processFolder -- this will be the path to the processing folder in a folder action

on run -- application double-clicked or script run from the Script Editor
  set processFolder to missing value -- process in place
  process(choose file without invisibles)
end run

on open theItems -- droplet
  set processFolder to missing value -- process in place
  repeat with anItem in theItems
    process(anItem)
  end repeat
end open

on adding folder items to this_folder after receiving filelist -- Folder Action
  set processFolder to ((this_folder as text) & folderName) as alias -- move to another folder for processing
  repeat with anItem in filelist
    process(anItem)
  end repeat
end adding folder items to

to process(theFile) -- check if theFile is PDF and convert
  tell application "System Events"
    set theType to theFile's file type
    set theExtension to theFile's name extension
  end tell
  if theType is "PDF" or theExtension is "pdf" then
    if processFolder is not missing value then -- move out of any FA folder
      tell application "Finder" to move theFile to processFolder
    end if
    tell application "Preview"
      activate
      open theFile
    end tell
    convert(theFile)
  end if
end process

to convert(this_file)
  tell application "System Events" to tell process "Preview"
    click menu item 9 of menu 1 of menu bar item "File" of menu bar 1

    tell sheet 1 of window 1
      click pop up button 1 of group 1
      click menu item "TIFF" of menu of pop up button 1 of group 1
      delay 0.5

      click button "Save"
      delay 0.5
    end tell

    click button 1 of window 1
  end tell

  --delete the original PDF file
  tell application "Finder" to delete this_file

end convert

Reply

Answer 4

VikingOSX

Level 10

123,470 points

Jul 3, 2017 9:39 AM in response to red_menace

Could not one use the following in a do shell script directly on the dropped PDF to generate a TIFF image at 300 dpi? This will ignore any PDF pages after the first page. I am concerned about Apple's history of changing application UI elements that unexpectedly break GUI scripting.

sips -s format TIFF -s formatOptions best -s dpiHeight 300.0 -s dpiWidth 300.0 --out page1.tiff page1.pdf

If ghostscript were installed with a package manager (e.g. homebrew), then the following command would extract each sequentially numbered PDF page as a 300 dpi TIFF image in the current directory's Extracted folder. If one omitted the %03d formatting, then a multi-page TIFF is produced from a multi-page PDF.

gs -q -dNOPAUSE -d SAFER -dBATCH -dDEVICE=tiff24nc -r300 -dTextAlphaBits=4 -dGraphicsAlphaBits=4 \

-sOutputFile=./Extracted/foo_%03d.tiff foo.pdf

Output in Extracted folder for a 3-page PDF: foo_001.tiff, foo_002.tiff, foo_003.tiff.

Once one has OCR'd the tiff images back to PDF, one can use Ghostscript to merge all of those sequentially numbered PDF files back into a single PDF v1.4 document. Or Automator without Ghostscript.

gs -q -dBATCH -dSAFER -dNOPAUSE -dDetectDuplicateImages -sDEVICE=pdfwrite \

-dCompressFonts=true -dCompatibilityLevel=1.4 -dPDFSETTINGS=/prepress -sOutputFile=out.pdf *.pdf

Reply

Answer 5

castrohouse Author

Level 1

8 points

Jul 3, 2017 10:58 PM in response to red_menace

Thanks red_menace. Your use of starting with a template that has run and open handlers in it for running the script as as an applet/droplet/folder action or from the Script Editor for testing is a great idea. I tested your script as a folder action and was unable to get the folder action to work properly. It moves the pdf file from the source folder to the Processed Items folder without generating the TIFF file. Any thoughts?

Reply

Answer 6

red_menace

Level 6

17,066 points

Jul 4, 2017 10:57 AM in response to castrohouse

Sorry about that, there seems to be some weirdness with the file path between Finder and Preview. Folder actions fail silently, so I've added some error handling to narrow down the problem if something else comes up (I've also made the GUI scripting a little more robust). Don't forget to add the various applications to System Preferences > Security & Privacy > Accessibility, and if you are running the script as an application, it will also need to be code signed or made run-only.

If making individual pages will work, in addition to VikingOSX's solution, you can also do something similar with Automator.

#
# Convert a PDF file to TIFF.
# The converted file will be in the same location as the original - if a folder action, the
# converted file will be in a process folder in the watched folder, which must already exist.
# The original PDF file(s) are moved to the trash (unless there is an error).
#

property folderName : " Processed Items" -- the name of the processing folder (leading space to sort to the top in list/column view)
global processFolder -- this will be the path to the processing folder in a folder action

on run -- application double-clicked or script run from the Script Editor
  set processFolder to missing value -- process in place
  process(choose file without invisibles)
end run

on open theItems -- droplet
  set processFolder to missing value -- process in place
  repeat with anItem in theItems
    process(anItem)
  end repeat
end open

on adding folder items to this_folder after receiving filelist -- Folder Action
  set processFolder to ((this_folder as text) & folderName) as alias -- use the specified folder for processing
  repeat with anItem in filelist
    process(anItem)
  end repeat
end adding folder items to

to process(theFile) -- check if theFile is PDF and convert
  tell application "System Events"
    set theName to name of theFile
    set theType to theFile's file type
    set theExtension to theFile's name extension
  end tell
  try
    if theType is "PDF" or theExtension is "pdf" then
      if processFolder is not missing value then -- move out of any FA folder
        tell application "Finder" to move theFile to processFolder
        set theFile to the result as alias -- not sure why this is needed to make the folder action work
        delay 0.25
      end if
      tell application "Preview"
        activate
        open theFile
      end tell
      convert(theFile)
    end if
  on error errmess
    display alert "Error with processing item" message theName & ": " & return & return & errmess
  end try
end process

to convert(this_file)
  tell application "System Events" to set theName to name of this_file
  try
    tell application "System Events" to tell process "Preview"
    repeat until window 1 exists
      delay 0.25
    end repeat
    click menu item 9 of menu 1 of menu bar item "File" of menu bar 1

    repeat until sheet 1 of window 1 exists
      delay 0.25
    end repeat
    tell sheet 1 of window 1
      click pop up button 1 of group 1
      click menu item "TIFF" of menu of pop up button 1 of group 1
      delay 0.5

      click button "Save"
      delay 0.5
    end tell

    click button 1 of window 1
  end tell

  --delete the original PDF file (unless there is an error)
  tell application "Finder" to delete this_file

  on error errmess
    display alert "Error with converting file" message theName & return & return & errmess
  end try
end convert

Reply

Answer 7

castrohouse Author

Level 1

8 points

Jul 4, 2017 8:41 PM in response to red_menace

Red_Menace, this script works now with the folder action if there is only one PDF in the folder to be processed and if there are only a few pages in the PDF to be converted. I did encounter several unexpected issues. First, if there is a duplicate named file already present, the script stops and requires my input to cancel or replace the existing file. Second, if multiple PDF files are added to the watched folder, the script cycles too quickly opening multiple instances of the Finder application. The Finder application eventually hangs with spinning wheel. Third, if a PDF with many pages is added to the watched folder, then the script cycles and attempts to reopen the same file multiple times while it processes the multiple pages of the PDF in first iteration of the script. These issues may require more effort than your time allows. But if time does allow, your help is very much appreciated!

Reply

Answer 8

red_menace

Level 6

17,066 points

Jul 5, 2017 1:37 AM in response to castrohouse

That is pretty much a limitation of GUI scripting - a lot depends on the timing of the various user interface items, and getting feedback on the progress is problematic. I only had smaller files to test with, but there sure are a lot of windows and menus flashing about.

A random/unique file name could be used to prevent issues with duplicates, but then you would have to deal with losing or changing the original name. Watching for file activity in the processing folder might help with keeping things from jamming up, versus trying to figure out what window and menu items are open or not. Unfortunately, it also isn't very difficult to overload the folder actions themselves if you repeatedly drop bunches of files into a watched folder while it is still trying to work with previously dropped files.

Automator could also be used to avoid the GUI scripting, but the way the Render PDF Pages as Images action works is to output a separate image file for each page, so I guess it depends on your actual workflow.

Reply

Answer 9

Hiroto

Level 5

7,467 points

Jul 18, 2017 12:19 AM in response to castrohouse

Hello

It's been a while but here's another option you might try. It is to convert pdf to multi-page tiff using CoreGraphics and ImageIO frameworks via pyobjc bridge. The code below is python script to be run in shell.

#!/usr/bin/python # coding: utf-8 # # file: # pdf2tiff.py # # function: # convert pdf to multi-page tiff file # # usage: # ./pdf2tiff.py pdf [pdf ...] [outdir] # # pdf : input pdf file # outdir : output directory # # * Input file without name extension as .pdf is ignored. # * If outdir directory is specified and present, tiff for each input file is saved in the directory # with the same basename without extension followed by '.tiff'; e.g., src/a.pdf => outdir/a.tiff. # Othewise, tiff is saved in the same directory as the original with new file name as # basename without extension followed by .tiff; e.g., src/a.pdf => src/a.tiff. # * Resolution of output tiff is defined by DPI constant, currently DPI = 150.0. # * If PDF_DISPOSITION == 1, source pdf is removed when its tiff version is created successfully. # * If LOGGING == 1, each conversion is logged to stdout. # # version: # 0.11c # - using CGImageDestinationCreateWithData() in lieu of CGImageDestinationCreateWithURL() and # invoking writeToURL:options:error: method on CFData so as to treat certain errors properly. # # 0.11a # - added code to handle source file disposition # - added code to log each conversion # 0.11 # - increased pyobjc performance (for recent versions of pyobjc) by using: # - import Quartz.CoreGraphics as CG # - import Quartz.ImageIO as CGIO # 0.10 # - draft # # written by Hiroto, 2017-07 # import sys, os import re, math, time import Quartz.CoreGraphics as CG import Quartz.ImageIO as CGIO def usage(): sys.stderr.write('Usage: %s pdf [pdf...] [outdir]\n' % os.path.basename(sys.argv[0])) sys.exit(2) def tstamp(): return time.strftime('%Y-%m-%d %H:%M:%S %Z', time.localtime()) def main(): uargv = [ a.decode('utf-8') for a in sys.argv ] odir = uargv.pop(-1).rstrip('/') if os.path.isdir(uargv[-1]) else None if len(uargv) < 2: usage() err = 0 for f in [ a for a in uargv[1:] if re.search(r'\.pdf$', a, re.I | re.U) ]: url = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f, CG.kCFURLPOSIXPathStyle, False) pdf = CG.CGPDFDocumentCreateWithURL(url) if not pdf: err += 1 sys.stderr.write('%-24s Not a pdf file: %s\n' % (tstamp(), f.encode('utf-8'))) continue pcnt = CG.CGPDFDocumentGetNumberOfPages(pdf) if pcnt < 1: continue # ignore blank pdf n = os.path.basename(f) m, ext = os.path.splitext(n) if odir: f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s/%s.tiff' % (os.path.dirname(f), m) data = CG.CFDataCreateMutable(CG.kCFAllocatorDefault, 0) idst = CGIO.CGImageDestinationCreateWithData(data, 'public.tiff', pcnt, None) err1 = 0 for i in range(0, pcnt): page = CG.CGPDFDocumentGetPage(pdf, i + 1) if not page: err += 1 err1 += 1 sys.stderr.write('%-24s Could not get page %d of %s\n' % (tstamp(), i + 1, f.encode('utf-8'))) continue scale = DPI / 72.0 r = CG.CGPDFPageGetBoxRect(page, CG.kCGPDFMediaBox) w = math.ceil(r.size.width * scale) h = math.ceil(r.size.height * scale) ctx = CG.CGBitmapContextCreate( None, w, h, 8, 0, CG.CGColorSpaceCreateWithName(CG.kCGColorSpaceAdobeRGB1998), CG.kCGImageAlphaPremultipliedLast) CG.CGContextSaveGState(ctx) CG.CGContextScaleCTM(ctx, scale, scale) CG.CGContextDrawPDFPage(ctx, page) CG.CGContextRestoreGState(ctx) cgi = CG.CGBitmapContextCreateImage(ctx) CGIO.CGImageDestinationAddImage(idst, cgi, { CGIO.kCGImagePropertyDPIHeight : DPI, CGIO.kCGImagePropertyDPIWidth : DPI, CGIO.kCGImagePropertyTIFFDictionary : { CGIO.kCGImagePropertyTIFFCompression : 5 # kCGImagePropertyTIFFCompression # 1 : no compression # 5 : LZW # 32773 : PackBits } }) del ctx del cgi b = CGIO.CGImageDestinationFinalize(idst) del idst del pdf if not b: err += 1 sys.stderr.write('%-24s Failed to finalize destination data: %s\n' % (tstamp(), f1.encode('utf-8'))) del data continue url1 = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f1, CG.kCFURLPOSIXPathStyle, False) b, e = data.writeToURL_options_error_(url1, 0, None) del data if not b: err += 1 sys.stderr.write('%-24s Failed to write destination file: %s: %s\n' % (tstamp(), f1.encode('utf-8'), e.description().encode('utf-8'))) continue if err1 == 0 and PDF_DISPOSITION == 1: # source file dispositon try: os.remove(f) except Exception as e: sys.stderr.write('%-24s Failed to delete source file: %s: %s\n' % (tstamp(), f.encode('utf-8'), e.__repr__().encode('utf-8'))) if LOGGING == 1: sys.stdout.write('%-24s Completed conversion: %s => %s\n' % (tstamp(), f.encode('utf-8'), f1.encode('utf-8'))) sys.exit(1 if err > 0 else 0) DPI = 150.0 PDF_DISPOSITION = 0 # PDF_DISPOSITION # 0 : leave pdf alone # 1 : remove pdf if tiff is created successfully LOGGING = 0 # LOGGING # 0 : without logging # 1 : with logging main()

And the following is an AppleScript wrapper of it which you may use as a folder actions script. Currently it is set to delete source pdf file when converted to tiff successfully and log each conversion in a log file named after '_log.txt' in the watched folder. Tiff file is saved in 'tiff' directory in the watched folder if it is present, or in the watched folder itself if it is not. Existing tiff file will be overwritten.

--APPLESCRIPT on run choose file of type {"pdf"} with multiple selections allowed pdf2tiff(result) end run on open aa repeat with a in aa if a's POSIX path ends with ".pdf" then set a's contents to a as alias else set a's contents to false end if end repeat if (count aa's aliases) > 0 then pdf2tiff(aa's aliases) end open on adding folder items to d after receiving aa repeat with a in aa if a's POSIX path ends with ".pdf" then set a's contents to a as alias else set a's contents to false end if end repeat if (count aa's aliases) > 0 then pdf2tiff(aa's aliases & (d's POSIX path & "tiff")) end adding folder items to on pdf2tiff(argv) (* list argv : list of alias or POSIX path of pdf files, optionally last item as output directory *) set args to "" repeat with a in argv if a's class is in {alias, «class bmrk»} then set a to a's POSIX path set args to args & a's quoted form & space end repeat do shell script "/bin/bash -s <<'EOF' - " & args & " LOG=${1%/*}/_log.txt exec >> \"$LOG\" 2>&1 /usr/bin/python <<'END' - \"$@\"; exit 0 # coding: utf-8 # # file: # pdf2tiff.py # # function: # convert pdf to multi-page tiff file # # usage: # ./pdf2tiff.py pdf [pdf ...] [outdir] # # pdf : input pdf file # outdir : output directory # # * Input file without name extension as .pdf is ignored. # * If outdir directory is specified and present, tiff for each input file is saved in the directory # with the same basename without extension followed by '.tiff'; e.g., src/a.pdf => outdir/a.tiff. # Othewise, tiff is saved in the same directory as the original with new file name as # basename without extension followed by .tiff; e.g., src/a.pdf => src/a.tiff. # * Resolution of output tiff is defined by DPI constant, currently DPI = 150.0. # * If PDF_DISPOSITION == 1, source pdf is removed when its tiff version is created successfully. # * If LOGGING == 1, each conversion is logged to stdout. # # version: # 0.11c # - using CGImageDestinationCreateWithData() in lieu of CGImageDestinationCreateWithURL() and # invoking writeToURL:options:error: method on CFData so as to treat certain errors properly. # # 0.11a # - added code to handle source file disposition # - added code to log each conversion # 0.11 # - increased pyobjc performance (for recent versions of pyobjc) by using: # - import Quartz.CoreGraphics as CG # - import Quartz.ImageIO as CGIO # 0.10 # - draft # # written by Hiroto, 2017-07 # import sys, os import re, math, time import Quartz.CoreGraphics as CG import Quartz.ImageIO as CGIO def usage(): sys.stderr.write('Usage: %s pdf [pdf...] [outdir]\\n' % os.path.basename(sys.argv[0])) sys.exit(2) def tstamp(): return time.strftime('%Y-%m-%d %H:%M:%S %Z', time.localtime()) def main(): uargv = [ a.decode('utf-8') for a in sys.argv ] odir = uargv.pop(-1).rstrip('/') if os.path.isdir(uargv[-1]) else None if len(uargv) < 2: usage() err = 0 for f in [ a for a in uargv[1:] if re.search(r'\\.pdf$', a, re.I | re.U) ]: url = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f, CG.kCFURLPOSIXPathStyle, False) pdf = CG.CGPDFDocumentCreateWithURL(url) if not pdf: err += 1 sys.stderr.write('%-24s Not a pdf file: %s\\n' % (tstamp(), f.encode('utf-8'))) continue pcnt = CG.CGPDFDocumentGetNumberOfPages(pdf) if pcnt < 1: continue # ignore blank pdf n = os.path.basename(f) m, ext = os.path.splitext(n) if odir: f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s/%s.tiff' % (os.path.dirname(f), m) data = CG.CFDataCreateMutable(CG.kCFAllocatorDefault, 0) idst = CGIO.CGImageDestinationCreateWithData(data, 'public.tiff', pcnt, None) err1 = 0 for i in range(0, pcnt): page = CG.CGPDFDocumentGetPage(pdf, i + 1) if not page: err += 1 err1 += 1 sys.stderr.write('%-24s Could not get page %d of %s\\n' % (tstamp(), i + 1, f.encode('utf-8'))) continue scale = DPI / 72.0 r = CG.CGPDFPageGetBoxRect(page, CG.kCGPDFMediaBox) w = math.ceil(r.size.width * scale) h = math.ceil(r.size.height * scale) ctx = CG.CGBitmapContextCreate( None, w, h, 8, 0, CG.CGColorSpaceCreateWithName(CG.kCGColorSpaceAdobeRGB1998), CG.kCGImageAlphaPremultipliedLast) CG.CGContextSaveGState(ctx) CG.CGContextScaleCTM(ctx, scale, scale) CG.CGContextDrawPDFPage(ctx, page) CG.CGContextRestoreGState(ctx) cgi = CG.CGBitmapContextCreateImage(ctx) CGIO.CGImageDestinationAddImage(idst, cgi, { CGIO.kCGImagePropertyDPIHeight : DPI, CGIO.kCGImagePropertyDPIWidth : DPI, CGIO.kCGImagePropertyTIFFDictionary : { CGIO.kCGImagePropertyTIFFCompression : 5 # kCGImagePropertyTIFFCompression # 1 : no compression # 5 : LZW # 32773 : PackBits } }) del ctx del cgi b = CGIO.CGImageDestinationFinalize(idst) del idst del pdf if not b: err += 1 sys.stderr.write('%-24s Failed to finalize destination data: %s\\n' % (tstamp(), f1.encode('utf-8'))) del data continue url1 = CG.CFURLCreateWithFileSystemPath(CG.kCFAllocatorDefault, f1, CG.kCFURLPOSIXPathStyle, False) b, e = data.writeToURL_options_error_(url1, 0, None) del data if not b: err += 1 sys.stderr.write('%-24s Failed to write destination file: %s: %s\\n' % (tstamp(), f1.encode('utf-8'), e.description().encode('utf-8'))) continue if err1 == 0 and PDF_DISPOSITION == 1: # source file dispositon try: os.remove(f) except Exception as e: sys.stderr.write('%-24s Failed to delete source file: %s: %s\\n' % (tstamp(), f.encode('utf-8'), e.__repr__().encode('utf-8'))) if LOGGING == 1: sys.stdout.write('%-24s Completed conversion: %s => %s\\n' % (tstamp(), f.encode('utf-8'), f1.encode('utf-8'))) sys.exit(1 if err > 0 else 0) DPI = 150.0 PDF_DISPOSITION = 1 # PDF_DISPOSITION # 0 : leave pdf alone # 1 : remove pdf if tiff is created successfully LOGGING = 1 # LOGGING # 0 : without logging # 1 : with logging main() END EOF" end pdf2tiff --END OF APPLESCRIPT

Brieflfy tested with pyobjc 2.2b3 and python 2.6.1 under OS X 10.6.8.

Good luck,

H

Reply

Answer 10

Hiroto

Level 5

7,467 points

Jul 18, 2017 1:10 AM in response to Hiroto

Oops. Output directory will be wrong if directory of input file is omitted, i.e., assumed current directory and no outdir is specified in command line. Correction would be as follows -

WRONG:

if odir: f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s/%s.tiff' % (os.path.dirname(f), m)

CORRECT:

if not odir: odir = os.path.dirname(f) if odir != '': f1 = '%s/%s.tiff' % (odir, m) else: f1 = '%s.tiff' % (m)

Sorry for any confusion this may have made.

Hiroto

Reply

Answer 11

Hiroto

Level 5

7,467 points

Jul 28, 2017 8:58 PM in response to VikingOSX

Hello

I had brief opportunity to test my code under OS X 10.12 lately and observed the said error. It seems to me a bug of lazy importer of pyobjc bundled with python 2.7 under recent OS X.

The simplest workaronud would be to invoke python 2.6 using she-bang as:

#!/usr/bin/python2.6

in lieu of:

#!/usr/bin/python

Also I have confirmed that avoiding lazy importer can avoid the error by using the following import statements:

from Quartz.CoreGraphics import * from Quartz.ImageIO import *

in lieu of:

import Quartz.CoreGraphics as CG import Quartz.ImageIO as CGIO

in which case we have to remove every CG and CGIO qualifier from relevant symbols.

Regards,

H

Reply

Answer 12

VikingOSX

Level 10

123,470 points

Jul 29, 2017 7:44 AM in response to Hiroto

Hiroto,

I am now generating a multi-page TIFF at 300 dpi using python2.6 on 10.11.6. However, each TIFF page has the text on a transparent background, when viewed in Preview, or Affinity Designer. Was that your original intention?

If you know, what version of Python has Apple put in High Sierra?

Reply

Answer 13

VikingOSX

Level 10

123,470 points

Jul 31, 2017 2:02 PM in response to Hiroto

Ok. Working with white tiff background. 😉

Unfortunately the link you provided requires a paid Developer account. Signed in as an ordinary developer is blocked. Otherwise, googling for that Python version has not produced results.

By the way, more activity over at RubyCocoa, with some source code fixes for Sierra. I don't know how to build and install it on Sierra, as it appears that it has migrated away from the familiar Ruby installation approach. Readme is no help.

Reply

Answer 14

Hiroto

Level 5

7,467 points

Jul 31, 2017 4:45 PM in response to VikingOSX

Hello VikingOSX,

As for the link, I can see the page when I am NOT logged in this ASC site, whilst I can not see it when I am logged in here. I don't have paid developer account either. So you might try the link without logging in ASC (or disabling cookie temporarily).

And thank you for the news on rubycocoa. I'll take a look later.

Regards,

H

Reply

Answer 15

castrohouse Author

Level 1

8 points

Jul 5, 2017 10:33 PM in response to red_menace

Thanks again Red_Menace. I do appreciate your time and effort! I'll have to find another work around to get things flowing smoothly. I'll try and learn more about Viking OSX's suggestions.

Reply