Is there free OCR app for pdf files ?

Question

Bomiboll Author

Level 1

78 points

Is there free OCR app for pdf files ?

I was searching OCR online, like Ilovepdf etc. They are not free.

Is there a cheep or free app for OCR pdf?

MacBook Pro 13″, macOS 13.6

Posted on Apr 14, 2024 3:19 AM

Reply

Answer 1

Top-ranking reply

Luis Sequeira1

Level 9

73,242 points

Apr 21, 2024 2:13 AM in response to Bomiboll

Bomiboll wrote:

Luis Sequeira1

The response on April 19 does not mention what the issue was - I thought it may be pdfgear?

I was talking about Preview - the application that comes with every mac.

For a few versions of macOS now, there is a capability to detect text in images - "Live Text" is the moniker that Apple uses.

It is not just with Preview, but since the discussion was about pdf, my suggestion is this:

1) Open the pdf in Preview (or Safari)

2) See if it recognizes the text and can select it

3) If it does, copy and then paste in a text application

I just tried a quick example, using my poor handwriting, and was able to do as above.

Reply

Answer 2

Apr 19, 2024 9:43 AM in response to Bomiboll

It will depend on the files, and how they were created. I assume these are scanned images that contain text.

Try opening the pdf in Preview.

If you can you select text, there you have it. Select, copy, and then paste, for example, in TextEdit to make any adjustments.

This may not always work, but if the original was printed and scanned, it usually does.

And it is free.

Reply

Answer 3

keeline

Level 1

24 points

Jun 3, 2024 12:28 PM in response to Bomiboll

I have read some of the back and forth in this post.

For converting from HTML to PDF, the best way to handle that on a Mac is to open the HTML file or page in a browser and using the Print function. In the lower left is an opportunity to save as PDF. This PDF will be searchable since the text is embedded.

Many of my PDFs are page images (photographed or scanned). Often these are books. Other than Adobe Acrobat Pro (which is slow, buggy, and expensive), I have also used a command line script called "ocrmypdf" which uses the Tessaract engine for the OCR function. These were installed with Brew on this machine. I know you have expressed objection to this in the thread but it is a means of adding an OCR layer to a PDF with page images for free. It is also faster and makes use of a multi-core Mac system in ways that Acrobat Pro does not.

When files are on your central SSD for any length of time, they will be searchable via Spotlight using the "Live Text" (VisionKit) software. This applies particularly to images with typed, typeset, or handwritten text. This is impressive for search. You can copy-paste from it but the OCR errors will become obvious when you do. The search is fuzzy enough to be very effective.

The exact tools to use depends on the nature of your files and your general workflow. I live a good portion of my day at the command line. I know others who are in mortal fear of it. You may be somewhere in the middle.

Reply

Answer 4

VikingOSX

Level 10

123,254 points

Apr 14, 2024 6:08 AM in response to Bomiboll

Some paid PDF editors can place a layer of OCR text over the scanned PDF text (which is an image) with the PDF open in the editor. Some paid scanning software (e.g. VueScan Professional) can generate the OCR concurrent with scanning to PDF resulting in a searchable PDF.

Apple's Preview and Adobe's Acrobat Reader are not PDF editors and lack the ability to OCR PDFs.

Apple's Shortcuts app has an action named Extract Text from Image that also works on PDFs. It simply does what it says and the assumption is that you then get that extracted text into a text file. There is no applying OCR text to the original PDF.

There is open-source OCR software such as Tesseract which would require either installation by a package manager (e.g. homebrew), or your own compiling of the source code and dependencies. Tesseract has a huge dependency tree.

Reply

Answer 5

MartinR

Level 7

24,143 points

Apr 14, 2024 10:17 AM in response to Bomiboll

You can us MS Word. Open the PDF in Word; it will convert the PDF to Word. It's not exactly OCR, but it's essentially the same. BTW, if you are using Office 365 you will have to enable Connected Experiences in Word's privacy preferences in order to enable conversion from PDF to Word.

If you have an Epson scanner, both Epson Scan 2 and Epson Document Capture can do OCR. So can Vuescan Professional.

Reply

Answer 6

Yer_Man

Level 10

165,912 points

Apr 14, 2024 11:22 AM in response to Bomiboll

Several apps include OCR as part of their feature set: DevonThink, for instance, or EagleFiler. Elucidate is less that $5. You can OCR pdfs in Google Docs too, and with MS OneNote. Lastly, if you just want to extract paragraphs then an app liek textSniper is excellent for that job.

Reply

Answer 7

VikingOSX

Level 10

123,254 points

Apr 14, 2024 12:29 PM in response to MartinR

It is a roll of the dice if Word accurately converts the PDF text or there are misspelled or dropped words — but in the long run, still handy. Not the same as placing an exactly registered text on a layer over the scanned PDF image text (aka OCR), and the saved PDF from Word would be just as searchable.

Reply

Answer 8

a brody

Level 10

85,308 points

Apr 20, 2024 2:03 PM in response to Bomiboll

There are a few web based OCR tools that are free. But my experience with them has been 90% if the image was typewritten. Handwritten, forget it.

Reply

Answer 9

MartinR

Level 7

24,143 points

Apr 14, 2024 4:56 PM in response to VikingOSX

Fair assessment. I also think the same could be said of the capabilities of most OCR apps, at least every one I have ever tried.

The OP wanted something "free" and on the assumption he may have MS Office I thought it worth suggesting.

Reply

Answer 10

Apr 20, 2024 12:26 PM in response to Bomiboll

Bomiboll wrote:

Luis Sequeira1 regarding PdfGear

The OCR is not working for me.

BTW the section Convert to Pdf has no Html to Pdf.

I don't know why you are mentioning me and PdfGear in the same phrase. I never mentioned PdfGear - actually I was not even aware of its existence.

And what does html have to do with it? OCR has to do with interpreting characters that are part of an image.

Reply

Answer 11

stedman1

Level 10

264,442 points

Apr 20, 2024 12:43 PM in response to Bomiboll

There was another user in this thread that had posted the link regarding the PDF software. They had posted the same recommendation in numerous threads. It appears those replies may have been considered as Spam.

Reply

Answer 12

Owl-53

Level 10

102,523 points

Apr 14, 2024 4:20 AM in response to Bomiboll

Short answer

One gets what you pay for.

Reply

Answer 13

Bomiboll Author

Level 1

78 points

Apr 14, 2024 7:41 AM in response to VikingOSX

Tesseract is complicated to install and so I don't care.

Usually, the s c "open source" sofwares are complicated

Reply

Answer 14

VikingOSX

Level 10

123,254 points

Apr 14, 2024 7:49 AM in response to Bomiboll

I didn't instruct you to install Tesseract, just mentioned it in passing and yes, it is a complicated beast. I personally don't use it as I have VueScan Professional for OCR while I scan.

Reply

Answer 15

Barney-15E

Level 10

122,138 points

Apr 14, 2024 11:28 AM in response to Bomiboll

Many of the scanning software packages come with OCR when you scan. I don't know if it can be used after the fact.

Reply

Is there free OCR app for pdf files ?

Similar questions