Want to highlight a helpful answer? Upvote!

Did someone help you, or did an answer or User Tip resolve your issue? Upvote by selecting the upvote arrow. Your feedback helps others! Learn more about when to upvote >

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Find/identify pdfs with corrupted text layers?

Does anyone know a good way to check pdf files, and find out which ones have corrupted text layers?


For example, when perfectly good text has been replaced with a mess like this:




.1918., « » .!

, " # , " # $ % - "» # # "«& ' »( ,

# " ( .&..).


Sometimes the originals have bad text layers, and require ocr. Sometimes other applications such as Preview or Ghostscript can corrupt the text layers, but it's hard to tell when they have corrupted the text layers until I need to search or need to copy and paste into translation software.

MacBook Air (11-inch Mid 2013), macOS Sierra (10.12.6)

Posted on Mar 17, 2018 11:30 AM

Reply
Question marked as Best reply

Posted on Mar 17, 2018 2:26 PM

There are no layers in a PDF, only random indexes to types of content that can occur anywhere in the PDF. Short of writing a custom application to access the PDF, and walk down document index requesting a specific object type, there is no other means that I am aware of that can isolate the “corrupted text” PDF objects.


Some of the content in the PDF can be encrypted, or even appear in a different encoding, making the first paragraph an even more onerous task.

Similar questions

2 replies

There are no replies.

Find/identify pdfs with corrupted text layers?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.