While I'm copying some text from a pdf in Preview and pasting it to MS Word, all the text is getting replaced either by blank space or by " ��� ����". what I'm supposed to retain the text? Plz help.

While I'm copying some text from a pdf in Preview and pasting it to MS Word, all the text is getting replaced either by blank space or by " ��� ����". what I'm supposed to retain the text? Plz help.

MacBook Pro, iOS 9.3.2

Posted on May 21, 2016 7:14 PM

Reply
18 replies

May 26, 2016 11:20 AM in response to abhishekvermag

Since Fonts can be embedded in a PDF, you may be attempting to copy a font that is not installed on OS X, or by Word, and there is no replacement alternative.


Why not capture the attributed text from that PDF into an RTF document. Then open that RTF in TextEdit, or Word and see if it is still gibberish. You can build an Automator application to do that.


Steps:

  1. Launchpad : Other : Automator

    New Document

    1. Application
    2. Choose
  2. Library : Files & Folders : Ask for Finder Items. Click and drag/drop into the large workflow area.

    Click and drag the action into the right-hand workflow area. Click and drag/drop into the large workflow area beneath previous action.

  3. Library : Files & Folders : Filter Finder Items. Click and drag/drop below the previous action.
  4. Library : PDFs : Extract PDF Text
  5. Configure workflow items as shown in the following image.
  6. Automator : File menu : Save pdf2rtf to your Desktop.


When you double-click this application, it will prompt you for the input PDF, and write the PDF text to your Desktop. Click to enlarge the following image.

User uploaded file

May 26, 2016 11:22 AM in response to abhishekvermag

The main problem is the font being used, and the current language the OS is set to. If you look at the properties of the PDF files, they all contain the font, CairoFont.


User uploaded file


I've looked around for it (and so have others from some of the Google matches that appeared), and without that specific font, you aren't going to get very far. It was easy to find Cairo.ttf, but I couldn't located CarioFont. Note that the font type in the screen shot shows it's an older Type 1 PostScript font.


Also, once you find that particular font (if at all possible), you would have to setup OS X to change the keyboard layout. Since the PDF is an India publication, you can presume it's Hindi. Which for keyboard selection means you would use Devanagari in order for the text to appear correctly.


Then there's Word. It doesn't support Arabic, or many other foreign languages worth a darn. You'd have to try TextEdit, or something else. Tom Gewecke (the master of font use in apps and keyboard layouts) mentioned a word processor that supports all kinds of languages, but I don't recall what it was.


Edit: Excellent approach by VikingOSX that may work. Though you'd likely have to do it for each PDF. The embedded fonts are all subsets, meaning only the characters of a font used in the PDF are saved in the document. It's not the full font.

May 26, 2016 1:27 PM in response to abhishekvermag

The problem appears to be related to how the PDF file was created. It seems to have been processed using Ghostscript and/or the Cairo Graphics package and the fonts names have been replaced with CairoFont.


User uploaded file

User uploaded file


I found this on StackOverflow which seems to explain what is going on.

http://stackoverflow.com/questions/36037460/why-does-ghostscript-replace-fontnam es-to-cairofont


You can found out more about the Cairo Graphics package here not that this will help.

https://www.cairographics.org/


I also opened the file in Adobe Illustrator and got gibberish.


User uploaded file


In the end I don't think you are going to extract the text out of this document with a simple Copy/Paste. You may have to actually key in the text by hand.

May 26, 2016 2:12 PM in response to Kurt Lang

Because there are different types of PDF files, I added a link in my post.

Scroll down in that link there is an app Able2extract Professional 10.

Thought with an OCR convert app it is possible to convert.

I download page 9 and converted online, the result is not excelent but good read.


User uploaded file

http://www.onlineocr.net/

User uploaded file


An example of the converted text:

'ALLEGIANCE PLEDGE' IN BENGAL Cong: State matter... Look at Arunachal, Uttarakhand

EXPRESS NEWS SERVICE NEW DELHI, MAY25

ADAYafterThelndian Express re-ported about Congress MIAs in West Bengal being made to sign an undertaking "swearing un-qualified allegiance" to the party led by Sonia Gandhi and Rahul Gandhi, the Congress high com-mand sought to distance itself from the move, saying that the AICC had not issued any instruc-tions in this regard. The decision to make MLAs sign a Rs 100-stamp paper was unanimously agreed upon at a meetingbetween legislators, dis-trict presidents, state party lead-ers and West Bengal Congress president Adhir Chowdhury. On Wednesday,AICCgeneral secretary in-charge of West Bengal C Pjoshi had a word with Chowdhury about the undertak-ing. Later, AICC communication department head Randeep Surjewala said, "I have conferred with Joshi. No instructions in this regard were ever issued by the AICC or by the general secretary in-charge (joshi)." Surjewala said Chowdhury had informed the party high corn-mandthat it was "a voluntary ex-

TWO YEARS OF NDA Cong media blitz on govt's failures New Delhi: A short film on the BJP-led NDA government's al-leged failures and flip-flops in the last two years, a booklet on how the Narendra Modi govemment reneged on the party's poll prom-ises, and a series of press confer-ences in more than 30 cities and towns by top Congress leaders. These are part of a strategy for a media offensive the main Opposition party plans to un-leash in the next three days to counter the Centre's second an-niversary celebrations. Top Congress leaders, includ-ing leaders of both Houses Ghulam Nabi Azad and Mallikarjun Kharge, will address press conferences in as many as 27 cities Thursday, where the short film will be played. The theme of the Congress offensive will be 'Pragati ki thamm gayi chaal, desh ka bum haat. 'Two years on, the Modi gov-ernment has utterly and com-pletely failed on the single-most importantindex of development The govemment has become a government of rhetoric and the-atrics....Twokey issues missingare governance and development," said Randeep Surjewala ENS

ercise and nothing more should be read into it". "Neither is itanex-ercise in imposing the will of the PCC president" Surjewala said He said Chowdhury had told the high command that there was a background to the move. "During the last assembly, 10 Congress legislators were poached in contravention of the rigours and mandate of the anti-defection act Even otherwise, to violate Constitution with im-punity has become fashionable in this country. We saw re-spected Narendra Modi do it, first in Arunachal Pradesh and then in Uttarakhand, for which the Union government has faced severe reprimand from the Uttarakhand high court and the Supreme Court" Surjewala said. "If legislators, by their own volition, get together and reaf-firm their commitment to their own party, it is a subject matter that the PCC and the Congress legislature party of West Bengal are able to handle themselves. Nothing more should be read into it," Surjewala said. On Tuesday, Chowdhury had told The Indian Express: "It is not a bond... It is more of a voluntary pledge by partymen as a gesture of their allegiance."

The whole page 9 converted text I copied to Word and in total six pages.

If I could attach a .docx I will do it.

May 22, 2016 8:08 AM in response to abhishekvermag

Thanks for doing that test. It appears that the problem originates from trying to copy the text from the PDF. It has nothing to do with any problems with MS Word.


There can be a number of reasons for this. As Esquared, suggested it might be due to restriction placed on the PDF by the creator. You can check this by going to the "Tools" menu in Preview while the document is open and selecting "Show Inspector" and clicking on the "Lock" icon. If there are any restrictions in the PDF that restrict copying of text it will be stated there. I don't think it is this problem though. If I remember correctly, a protected PDF will give you an error stating that the PDF is protecting from copy text and you wouldn't even be able to select the text in the PDF.


As leroydouglas suggested it could also be a font problem. Try his suggestion to restore the fonts in Font Book. Again, I doubt that this is the problem. The reason I had you do the test with TextEdit set to Plain Text format was to rule out a font problem. Plain Text documents don't used styled text they use the default system font so any problems with Font Book should not affect it.


It could be a text encoding problem. This likely would originate on the system that created the PDF file in which case you may not be able to do anything to fix the problem on your end. Or, it could just be a problem in Preview app with how it is interpreting the text when it is copied.


A couple of things you can try in addition to the suggestions from Esquared and leroydouglas.


Go to adobe.com and download the free Adobe Reader. Open the PDF file in Adobe Reader and see if it corrects the copy/paste problem.


To rule out any problems related to your user account on the system you could try logging in to the Guest Account and opening the document and running my test again. If it solves the problem you will know that there is an error in your user account on the system it work towards figuring out a solution based on that information.


If that PDF file is publicly available on the Internet and you post the address here I would be willing to download a copy and running some tests on my end to see if I can figure out a solution for you. Or, if you can make the document available somewhere like Dropbox, OneDrive, Google Drive etc., and send me a download link, again I will be able to run some test on it.

May 26, 2016 10:43 AM in response to abhishekvermag

Hello Gino.
I have tried almost everything which has been mentioned till now, but nothing has helped me.
I'm hereby sending you the link of the file as the PDF attachment isn't allowed here.
you will have to download this e-paper of the English daily of India, and then try to copy something from this to MS Word. I just hope that this will help us.


http://www.readwhere.com/read/download/newspaper/821231?show=v4


and if you want me to mail you the PDF file, please tell me ur email id.


with regards.

May 26, 2016 2:10 PM in response to Tom Gewecke

Tom Gewecke wrote:


Raicya wrote:



To copy (select tekst) & paste to Word is always possible.


That's nonsense. There are countless pdfs, especially in non-Latin scripts, which use junk encoding systems that make copy/paste and search totally impossible.

That is not nonsense, I know there are different types of PDF files, see / read the link I added in my post.

If you have missed it : TYPES OF PDFS: NATIVE VS. SCANNED PDFS

http://www.investintech.com/resources/articles/pdftypes/


Scroll down in that link there is an app Able2extract Professional 10.

With an OCR convert app it is almost possible to convert.

May 26, 2016 2:24 PM in response to Raicya

I have to presume you tried neither of these Able2 apps. Neither the standard or pro versions would convert anything. You're supposed to be able to select the text you want, and then choose a conversion option. Problem. Neither one will let you select anything. You can only highlight an entire page as if it were a single, embedded image. Attempting to convert to plain text gets you a blank file. Anything else generates an error message.

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

While I'm copying some text from a pdf in Preview and pasting it to MS Word, all the text is getting replaced either by blank space or by " ��� ����". what I'm supposed to retain the text? Plz help.

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.