Apple Event: May 7th at 7 am PT

Looks like no one’s replied in a while. To start the conversation again, simply ask a new question.

Copy and paste from PDF in Preview results in gibberish

When I copy and then paste text from a PDF in Preview into Pages, the resulting text is gibberish symbols. The odd thing is it only happens with some text in the PDF and other PDFs do the same thing.

I have a screen recording to demonstrate my problem. http://blip.tv/file/2823667 View it in full screen to see the text.

MacBook 2.2Ghz Core 2 Duo, Mac OS X (10.6.1), iWork '09

Posted on Nov 8, 2009 5:40 PM

Reply
38 replies

Nov 15, 2009 9:21 PM in response to skidogallard

I believe the following topic is related:

http://discussions.apple.com/thread.jspa?threadID=1743102

For me, I've noticed the problem occurs after I make and save a second annotation (a highlight, to be exact). For whatever reason, the first annotation can be saved without breaking the ability to copy text. After the second annotation and save, the ability to copy text is broken.

FWIW, the following line of text:...

Most students appear to like the use of overheads


Will, after annotations, copy out as :...

\\>&.-$ .-52#,-.$ ((#/$ -&$ )+9#$ -"#$ 5.#$ &$ &0#/"#*2.


Each space character appears to be mapped to $ + space character. Lower case s is mapped to dot ("."). Lower case a is mapped to *. Lower case t is mapped to a hyphen ➖. Is there some kind of pattern? Might this point to an answer/solution?

My source PDF allows me full permissions. There are no security locks.

Nov 16, 2009 10:33 AM in response to etresoft

Three files are posted:

http://www.kinasevych.ca/c/200911/16/test.pdf

http://www.kinasevych.ca/c/200911/16/test.2.pdf

http://www.kinasevych.ca/c/200911/16/test.3.pdf

The first file, test.pdf, was opened in Preview (Version 4.2 (469.5)) and annotations were added. The file was saved as test.2.pdf, opened again, annotations added and deleted, then saved.

In Preview, text can be copied out correctly from test.pdf. Not so with test.2.pdf. My Mac OS X version is 10.5.8.

I opened both files in Acrobat Pro 9.2.0. Copying text produces the same result as in Preview: test.pdf is okay, test.2.pdf is broken.

I created the file test.3.pdf by opening test.pdf in Acrobat Pro, saving as test.3.pdf, and annotating the document, saving, then quitting Acrobat. Opened test.3.pdf again, annotated some more, saved and closed. After this process, text can still be copied correctly from test.3.pdf. This would suggest that Preview is generating the error, but only in certain documents.

Note that test.pdf is a single page from a document where I found this problem. When I created a simple text document in TextWrangler or in OpenOffice, then did Print>Save as PDF, neither instance generated a file which became corrupted as test files given here. There is something about test.pdf that lends itself to the error described in this thread.

Nov 16, 2009 11:36 AM in response to okinasevych

Excellent job!

Now you need to file the bug report with Apple. Be as specific as possible. All you need to give them is the original. Then say:
1) Add highlight annotation to word "color" in sentence "Professors of color have published poignant accounts of harshly negative student evaluations." and save as test2.pdf
2) Add strikethrough annotation to word "color" and save as test3.pdf.
3) Open test3.pdf and try to copy that sentence.

Nov 22, 2009 11:25 AM in response to etresoft

Here's how I worked around the problem. I had the same problem, but didn't do anything myself to annotate the document I downloaded from the internet. In any case, this requires the full version of Adobe Acrobat or scanner software with OCR (optical character recognition).

IF YOU HAVE THE FULL VERSION OF ADOBE ACROBAT:
1. Use Preview to save the document as a TIFF file, preferably at 300 dpi. This took about 20 seconds per page on my Macbook, so be prepared to wait for longer documents.
2. Use Preview to open the resulting TIFF file (if necessary), and save it again as a PDF. This essentially has converted the text to an image file, discarding the erroneous character information.
3. Open the resulting document in Acrobat.
4. Choose the menu item "Document > Recognize Text Using OCR > Start" (at least in my version of Acrobat), and choose "all pages" in the resulting dialogue box.
5. Wait (again about 20 seconds per page).
6. Save changes to the document, which now should work exactly as expected.

IF YOU DON'T HAVE ACROBAT, BUT DO HAVE A SCANNER:
1. Print out the PDF.
2. Scan the PDF you just printed out.
3. Use the OCR software options (usually part of the scanning software) to perform optical character recognition on the document you scan.
4. Save the resulting document, which now should include selectable text.

No easy answers, but answers nonetheless.

Nov 24, 2009 12:19 AM in response to Bob Spaulding

I have encountered a comparable problem. I use -and love- the preview to make annotated pdf's at the same time I copy text quotes out of the pdf into my quaotation database (normal scientific working). I save the pdf's with the annotation. This worked perfectly (OS X 10.6.2) until today.
I loaded a World Bank Document from 2005
http://siteresources.worldbank.org/INTURBANDEVELOPMENT/Resources/dynamicsurbanexpansion.pdf
This is with 21 mb quite large. After I did save my annotation changes. The copy of the document text parts as decribed above changes in the gibberish as described here. Before it did not.
With other documents this did not happen.
Perhaps its the size, or the age of the pdf document.

Feb 7, 2010 10:08 AM in response to MeBeMac

I've been able to replicate the problem with a reasonably straightforward case.

Using a freshly downloaded ebook from O'Reilly, Head First iPhone Development. That file reports that it was created by Adobe PDF Library 9.0 and Adobe InDesign CS4 (6.0). My steps were:
* went to logical page 37 (chapter 1 page 3).
* selected a line of text that read "ported desktop apps"
* copied; SUCCESSFUL
* annotated that line with a highlight
* selected "ported desktop apps" again
* copied; SUCCESSFUL
* saved
* selected "ported desktop apps" again
* copied; SUCCESSFUL
* select the first occurrence of the word "about" on the page
* annotated with a highlight
* selected "ported desktop apps" again
* copied; SUCCESSFUL
* saved
* selected "ported desktop apps" again
* copied; FAILED!

The results of each copy can be seen in this screenshot showing iClip's contents (last at the top; first at the bottom) after each copy operation: http://dl.dropbox.com/u/1184374/resultsOfCopy.png

I was NOT able to replicate the failure in simple PDFs that were generated from
(a) TextEdit using Lorem Ipsum text, (b) in Acrobat 9 Pro creating a PDF from a webpage [yahoo.com], or (c) in a PDF created from Firefox from a webpage.

-Wil

Feb 22, 2010 4:15 PM in response to skidogallard

I have this issue too but I'm on 10.5.8 and running Preview.app 4.2 (469.5). I downloaded the Practical symfony PDF [1] and was able to copy and paste successfully until I saved the file back to my hard-drive with a different name. I first noticed it when I added a note in a previously downloaded file and couldn't copy and paste anymore without strange characters showing up instead.

Thoughts?

1. http://www.symfony-project.org/get/pdf/jobeet-1.4-doctrine-en.pdf

Copy and paste from PDF in Preview results in gibberish

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple ID.