Hi, I have lots of pdfs containing scans/photos of receipts. The OCRd Text is not embedded yet.
I just learned it's easy to do that via "Export..." in the Preview app. If Preview recognizes that a pdf does not have text embedded yet, it shows a checkbox "Embed Text".
The AppleScript support for Preview is abysmal and there is no reserved AppleScript verb to open the Export panel.
Back in the Summer of 2023, I wrote a GUI AppleScript solution for Preview that forced the Export panel to open, enter a custom Finder tag, change the output filename to append "_PDFA", click the PDF/A, and Linearize the PDF check boxes before saving the PDF. Still works in macOS Sequoia v15.3.1.
The caveat with GUI scripting is that with any future release of macOS, Apple can alter the Preview application's internal framework structure and break the GUI scripting code. If this happens, I am unlikely to fix it due to time constraints.
I don't have macOS Ventura installed, so cannot say if its Preview has an Export panel Embed Text checkbox. I have noticed the Embed Text check box is not available for a scanned PDF in Big Sur or Monterey operating systems. In Sonoma 14.7.4, only the Embed Text checkbox is present, and with Sequoia v15.3.1, all check boxes that you showed in your post are available.
I modified my 2023 AppleScript by commenting out the previous Finder Tag, PDF/A, and Linearize check box settings, but do append "_ocr" to the PDF basename on this panel. It checks if the operating system version is 14 and will only select Embed Text, or select all three check boxes if in Sequoia.
This is currently not a batch solution as it only prompts for a single PDF. If that PDF is scanned, and hence contains no fonts, processing will continue. Otherwise, the script will quit as it would be a different Export panel for regular PDF documents. It may take a fair amount of work to make this into a batch solution such as a Shortcut Quick Action that accepts a single PDF or a folder of scanned PDFs.
In the Finder, press shift+cmd+U to open the Utility folder and then double-click Script Editor to launch it. Then copy/paste the following AppleScript code into the Script Editor, click compile and run to select your scanned PDF. The script will write the OCR'd PDF into the same filesystem location. I have confirmed that the PDF search works afterward.
Although I am in the U.S., I tested this with scanned German text.
(*
pdf_scan_embed.applescript
Select a PDF that was scanned (PDF wrapper around an image of the scanned text)
and then using GUI scripting, open the PDF in Preview, open the Export panel,
and select Embed Text (OCR), Optimize images for screen, and save images
as JPEG before saving the renamed PDF. This PDF now supports text search.
Revision: 2
Tested: macOS Sequoia v15.3.1 (probably will work on previous versions of
Preview that provide the embed text Export item
Requirements: None
Author: VikingOSX, 2025-03-05, Apple Support Communities, No warranty at all.
*)
use scripting additions
-- property TAG_NAME : "PDF/A"
set macOS_product_version to (do shell script "sw_vers -productVersion")
if macOS_product_version < 14 then
display alert "macOS " & macOS_product_version message "Preview export to PDF/A unsupported in this version of macOS." as critical giving up after 15 # seconds
return
end if
set thisPDF to (choose file of type {"com.adobe.pdf"} default location (path to desktop))
-- test if PDF has fonts in it. If scanned but not ORC'ed the font count will be 0
try
set isORCed to (do shell script "egrep -ac \"(\\/FontName)\" " & (POSIX path of thisPDF)'s quoted form)
if isORCed > 0 then return
end try
tell application "Preview"
activate
open file thisPDF
tell application "System Events"
tell process "Preview"
set frontmost to true
click menu item "Export…" of menu "File" of its menu bar
tell sheet 1 of window 1
-- remove .pdf from input filename
set temp to text 1 thru -5 of ((value of text field "Export As:" of it) as text)
set value of text field "Export As:" of it to temp & "_ocr.pdf"
(*
set tagx to value of text field "Tags:" of it
key code 48 # tab to Tags field
click text field "Tags:" of it
delay 1
if tagx = "" or tagx = missing value then
keystroke TAG_NAME
end if
key code 53 # send an escape key to dismiss tag menu
delay 2
# the PDF/A and Linearized selections are toggles and without
# the following tests, subsequent clicks on them when already
# set will switch these features off in the export. The value of
# 1 means it is already selected.
if not ((value of checkbox "Create PDF/A" of it) = 1) then
click checkbox "Create PDF/A" of it
end if
delay 1
if not ((value of checkbox "Create Linearized PDF" of it) = 1) then
click checkbox "Create Linearized PDF" of it
end if
delay 1
*)
try
-- only macOS 14 and later have this Export panel checkbox
if not ((value of checkbox "Embed Text" of it) = 1) then
click checkbox "Embed Text" of it
end if
on error
-- for some reason the Export panel did not show checkbox
click button "Cancel" of it
tell application "Preview" to if it is running then quit
return "Either a regular PDF or already OCR'd PDF"
end try
delay 1
-- only macOS 15 has these two Export panel entries
if macOS_product_version ≥ 15 then
if not ((value of checkbox "Optimize images for screen" of it) = 1) then
click checkbox "Optimize images for screen" of it
end if
delay 1
if not ((value of checkbox "Save images as JPEG" of it) = 1) then
click checkbox "Save images as JPEG" of it
end if
click button "Save" of it
end if
end tell
end tell
end tell
close front document
end tell
tell application "Preview" to if it is running then quit
return
Hi VikingOSX, this works like a charm, thank you so much!!!
One quick question: it seems that the new file is not saved into the same folder as the old one, but rather into the folder that was selected as a "Export to" location previously. Is there an easy way to change this?
Here is an update to the AppleScript. It now gets the container folder of the selected PDF and uses that on the Export panel for the folder destination of the output OCR'd PDF.
(*
pdf_scan_embed.applescript
Select a PDF that was scanned (PDF wrapper around an image of the scanned text)
and then using GUI scripting, open the PDF in Preview, open the Export panel,
and select Embed Text (OCR), Optimize images for screen, and save images
as JPEG before saving the renamed PDF. This PDF now supports text search.
Revision: 3
Tested: macOS Sequoia v15.3.1 (probably will work on previous versions of
Preview that provide the embed text Export item
Requirements: None
Author: VikingOSX, 2025-03-05, Apple Support Communities, No warranty at all.
*)
use scripting additions
-- property TAG_NAME : "PDF/A"
set macOS_product_version to (do shell script "sw_vers -productVersion")
if macOS_product_version < 14 then
display alert "macOS " & macOS_product_version message "Preview export to PDF/A unsupported in this version of macOS." as critical giving up after 15 # seconds
return
end if
set thisPDF to (choose file of type {"com.adobe.pdf"} default location (path to desktop))
-- this is the variable to set in the Where pop up button on the Export panel
tell application "System Events" to set where_location to name of container of thisPDF
-- test if PDF has fonts in it. If scanned but not ORC'ed the font count will be 0
try
set isORCed to (do shell script "egrep -ac \"(\\/FontName)\" " & (POSIX path of thisPDF)'s quoted form)
if isORCed > 0 then return
end try
tell application "Preview"
activate
open file thisPDF
tell application "System Events"
tell process "Preview"
set frontmost to true
click menu item "Export…" of menu "File" of its menu bar
tell sheet 1 of window 1
-- remove .pdf from input filename
set temp to text 1 thru -5 of ((value of text field "Export As:" of it) as text)
set value of text field "Export As:" of it to temp & "_ocr.pdf"
-- set the current files folder location instead of last export location
tell first pop up button of it
click
tell menu 1
click menu item where_location
end tell
end tell
(*
set tagx to value of text field "Tags:" of it
key code 48 # tab to Tags field
click text field "Tags:" of it
delay 1
if tagx = "" or tagx = missing value then
keystroke TAG_NAME
end if
key code 53 # send an escape key to dismiss tag menu
delay 2
# the PDF/A and Linearized selections are toggles and without
# the following tests, subsequent clicks on them when already
# set will switch these features off in the export. The value of
# 1 means it is already selected.
if not ((value of checkbox "Create PDF/A" of it) = 1) then
click checkbox "Create PDF/A" of it
end if
delay 1
if not ((value of checkbox "Create Linearized PDF" of it) = 1) then
click checkbox "Create Linearized PDF" of it
end if
delay 1
*)
try
-- only macOS 14 and later have this Export panel checkbox
if not ((value of checkbox "Embed Text" of it) = 1) then
click checkbox "Embed Text" of it
end if
on error
-- for some reason the Export panel did not show checkbox
click button "Cancel" of it
tell application "Preview" to if it is running then quit
return "Either a regular PDF or already OCR'd PDF"
end try
delay 1
-- only macOS 15 has these two Export panel entries
if macOS_product_version ≥ 15 then
if not ((value of checkbox "Optimize images for screen" of it) = 1) then
click checkbox "Optimize images for screen" of it
end if
delay 1
if not ((value of checkbox "Save images as JPEG" of it) = 1) then
click checkbox "Save images as JPEG" of it
end if
click button "Save" of it
end if
end tell
end tell
end tell
close front document
end tell
tell application "Preview" to if it is running then quit
return
I will look into this. Do you want that Export panel folder location to be reset to the current folder of the file opened in Preview? This will take some testing time…