Is there free OCR app for pdf files ?

I was searching OCR online, like Ilovepdf etc. They are not free.

Is there a cheep or free app for OCR pdf?

MacBook Pro 13″, macOS 13.6

Posted on Apr 14, 2024 3:19 AM

Reply
Question marked as Top-ranking reply

Posted on Apr 21, 2024 2:13 AM

Bomiboll wrote:

Luis Sequeira1
The response on April 19 does not mention what the issue was - I thought it may be pdfgear?


I was talking about Preview - the application that comes with every mac.


For a few versions of macOS now, there is a capability to detect text in images - "Live Text" is the moniker that Apple uses.

It is not just with Preview, but since the discussion was about pdf, my suggestion is this:

1) Open the pdf in Preview (or Safari)

2) See if it recognizes the text and can select it

3) If it does, copy and then paste in a text application


I just tried a quick example, using my poor handwriting, and was able to do as above.



19 replies
Question marked as Top-ranking reply

Apr 21, 2024 2:13 AM in response to Bomiboll

Bomiboll wrote:

Luis Sequeira1
The response on April 19 does not mention what the issue was - I thought it may be pdfgear?


I was talking about Preview - the application that comes with every mac.


For a few versions of macOS now, there is a capability to detect text in images - "Live Text" is the moniker that Apple uses.

It is not just with Preview, but since the discussion was about pdf, my suggestion is this:

1) Open the pdf in Preview (or Safari)

2) See if it recognizes the text and can select it

3) If it does, copy and then paste in a text application


I just tried a quick example, using my poor handwriting, and was able to do as above.



Apr 19, 2024 9:43 AM in response to Bomiboll

It will depend on the files, and how they were created. I assume these are scanned images that contain text.


Try opening the pdf in Preview.

If you can you select text, there you have it. Select, copy, and then paste, for example, in TextEdit to make any adjustments.

This may not always work, but if the original was printed and scanned, it usually does.

And it is free.

Jun 3, 2024 12:28 PM in response to Bomiboll

I have read some of the back and forth in this post.


For converting from HTML to PDF, the best way to handle that on a Mac is to open the HTML file or page in a browser and using the Print function. In the lower left is an opportunity to save as PDF. This PDF will be searchable since the text is embedded.


Many of my PDFs are page images (photographed or scanned). Often these are books. Other than Adobe Acrobat Pro (which is slow, buggy, and expensive), I have also used a command line script called "ocrmypdf" which uses the Tessaract engine for the OCR function. These were installed with Brew on this machine. I know you have expressed objection to this in the thread but it is a means of adding an OCR layer to a PDF with page images for free. It is also faster and makes use of a multi-core Mac system in ways that Acrobat Pro does not.


When files are on your central SSD for any length of time, they will be searchable via Spotlight using the "Live Text" (VisionKit) software. This applies particularly to images with typed, typeset, or handwritten text. This is impressive for search. You can copy-paste from it but the OCR errors will become obvious when you do. The search is fuzzy enough to be very effective.


The exact tools to use depends on the nature of your files and your general workflow. I live a good portion of my day at the command line. I know others who are in mortal fear of it. You may be somewhere in the middle.

Apr 14, 2024 6:08 AM in response to Bomiboll

Some paid PDF editors can place a layer of OCR text over the scanned PDF text (which is an image) with the PDF open in the editor. Some paid scanning software (e.g. VueScan Professional) can generate the OCR concurrent with scanning to PDF resulting in a searchable PDF.


Apple's Preview and Adobe's Acrobat Reader are not PDF editors and lack the ability to OCR PDFs.


Apple's Shortcuts app has an action named Extract Text from Image that also works on PDFs. It simply does what it says and the assumption is that you then get that extracted text into a text file. There is no applying OCR text to the original PDF.


There is open-source OCR software such as Tesseract which would require either installation by a package manager (e.g. homebrew), or your own compiling of the source code and dependencies. Tesseract has a huge dependency tree.

Apr 14, 2024 10:17 AM in response to Bomiboll

You can us MS Word. Open the PDF in Word; it will convert the PDF to Word. It's not exactly OCR, but it's essentially the same. BTW, if you are using Office 365 you will have to enable Connected Experiences in Word's privacy preferences in order to enable conversion from PDF to Word.


If you have an Epson scanner, both Epson Scan 2 and Epson Document Capture can do OCR. So can Vuescan Professional.

Apr 20, 2024 12:26 PM in response to Bomiboll

Bomiboll wrote:

Luis Sequeira1 regarding PdfGear
The OCR is not working for me.
BTW the section Convert to Pdf has no Html to Pdf.


I don't know why you are mentioning me and PdfGear in the same phrase. I never mentioned PdfGear - actually I was not even aware of its existence.

And what does html have to do with it? OCR has to do with interpreting characters that are part of an image.


This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Is there free OCR app for pdf files ?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.