Preview: PDF annotations disappearing

As many others have, I've found that after making some notes or highlights on a pdf in Preview (and even after saving by hand), closing the pdf and reopening it will result in some (but not all) of the annotations having simply vanished.


What's interesting is that the annotations are still present in the file, they apparently just haven't been saved in a way in which Preview or any other pdf viewer can show. (The annotations still don't appear when opening the file with Acrobat, Chrome, etc.)


If I open the file with TextEdit, and search (Command+F) for "/Subtype /Text", for example, I can find my apparently "deleted" notes.


(For anyone else trying to recover their notes, sometimes the text of the notes appears with the string "\000" between every character, so that it appears like this: "\000l\000i\000k\000e\000 \000t\000h\000i\000s". After copying it into another document, simply find and replace "\000" with nothing ("") to recover the original text—though you may find that some other characters, such as single quotes, are still encoded this way, as "\031", for example. This phenomenon appears unrelated to whether the notes are viewable, though.)


What's going on here? Is there any way to "repair" a PDF that isn't showing these objects?


(I'm running macOS Catalina 10.15.5.)


By the way, I noticed that one of the distinguishing features the "surviving" notes had was that their object ids (e.g. 1470 in "1470 0 obj") were all referenced near the very end of the file, whereas at least most of the other ones appeared not to be. Two out of the three objects were actually duplicated near the very end of the file, whereas the other one was only referenced by object id there.

Posted on Jun 30, 2020 4:15 PM

Reply
Question marked as Top-ranking reply

Posted on Jul 1, 2020 4:16 PM

UPDATE: I managed to repair my PDF file! It was a bit painstaking, but could be automated.


I first downloaded PeePDF to inspect the structure of my pdf file. I noticed that the surviving annotation objects were each referenced by a page object (or an array object which was in turn referenced by a page object). However, the "deleted" annotations were second-degree orphans: they were referenced by some array object, but that array object was never referenced by anything. I'll call these non-referenced array objects "orphanages", since they contain references to our orphaned data (and a collection of orphans is an orphanage).


I noticed further that each page object had a field called /Annots whose value was some array of references, and that each orphanage was a superset of some page's /Annots value. It was as though Preview had built an array object (an orphanage) intended to extend each page's /Annots field—but had forgotten to go back and actually put this new array into the /Annots field!


So, I used PeePDF to modify the value of each /Annots field by replacing it with a reference to the right orphanage. Note that you can't just modify your PDF in a plain text editor, for reasons I don't fully understand, but which I think have to do with the encoding.

Similar questions

4 replies
Question marked as Top-ranking reply

Jul 1, 2020 4:16 PM in response to thorimur

UPDATE: I managed to repair my PDF file! It was a bit painstaking, but could be automated.


I first downloaded PeePDF to inspect the structure of my pdf file. I noticed that the surviving annotation objects were each referenced by a page object (or an array object which was in turn referenced by a page object). However, the "deleted" annotations were second-degree orphans: they were referenced by some array object, but that array object was never referenced by anything. I'll call these non-referenced array objects "orphanages", since they contain references to our orphaned data (and a collection of orphans is an orphanage).


I noticed further that each page object had a field called /Annots whose value was some array of references, and that each orphanage was a superset of some page's /Annots value. It was as though Preview had built an array object (an orphanage) intended to extend each page's /Annots field—but had forgotten to go back and actually put this new array into the /Annots field!


So, I used PeePDF to modify the value of each /Annots field by replacing it with a reference to the right orphanage. Note that you can't just modify your PDF in a plain text editor, for reasons I don't fully understand, but which I think have to do with the encoding.

Jun 30, 2020 5:42 PM in response to VikingOSX

Thanks, I'll submit that feedback—but in the meantime, while maybe they can't help with changing Preview itself, they might be able to help with repairing what Preview produces!


For instance, after a bit of spelunking with PeePDF, I've managed to find out that the annotation objects that survive are all referenced either by a page object directly, or by an array(?) object which is in turn referenced by a page object. All the other ones that I've checked so far are referenced by an array object which is not referenced by any other object!


This suggests that I might be able to insert an appropriate reference to that array object in the appropriate page object, if I can determine—either by context or by trial and error—which page object it ought to belong in.

Jul 1, 2020 4:17 PM in response to thorimur

In more detail:


  • I opened my PDF in TextEdit, searched for all occurrences of "/Subtype /Text" (for general annotations, "/Type /Annot"), and copied the object id (here, 1036) into a separate document to keep track of. Shown below is an example object; the object id appears at the top, in 1036 0 obj.
1036 0 obj
<< /Contents (This is the text)
/F 4
/C [ 1 0.92 0.42 ]
/Type /Annot
/Rect [ 22.5588 444.5828 46.5588 468.5828 ]
/Border [ 0 0 0 ]
/DA (/Helvetica 12 Tf 0 g)
/Subtype /Text >>
endobj


  • Searched for all instances of "/Type /Page" (note that there is also a "pages" object which appears as /Type /Pages, and that's not what we want) and kept track of both its object id and the value of its /Annots field together in a separate document. Below is an example page object, with object id 70 and /Annots field value [ 202 0 R 581 0 R 582 0 R 583 0 R ].
204 0 obj
<< /Annots [ 202 0 R 581 0 R 582 0 R 583 0 R ]
/Type /Page
/MediaBox [ 0 0 612 792 ]
/Parent 126 0 R
/Resources 203 0 R
/Contents 205 0 R
/Group 172 0 R >>
endobj


  • Downloaded PeePDF, cd'd to peepdf-master, started interactive mode with my file (./peepdf.py -i -f myfile.pdf ; note the -f to continue despite errors), and used 'references to' to determine which orphanages each of my annotations were in. E.g., I'd have run
PPDF> references to 1036

and gotten a list of obect id's, something like [580] or [250, 1468, 250], which I wrote down next to each of my annotation object id's. For orphanages, you'll then find that running

PPDF> references to 580

returns "No references!!", whereas surviving annotations will eventually be referenced by a page. We only need to keep track of orphanages, though.


  • I then viewed each resulting object via, e.g.
PPDF> object 580

and, if it were an orphanage, would get something like

[ 202 0 R 583 0 R 581 0 R 582 0 R 627 0 R 625 0 R 626 0 R 628 0 R ]

For the purposes of this example, I've chosen the example object and the page presciently, such that you can see how the start of it it matches up with the /Annots field in our example page, which is

[ 202 0 R 581 0 R 582 0 R 583 0 R ]

Note that they are not necessarily in the same order—but nonetheless the value of the page's /Annots field is contained in the orphanage as a subset.


  • I wrote down which orphanage id (580) matched up to which page id (204).


  • To actually update the value, I then used PeePDF's modify functionality to modify the page object:
PPDF> modify object 204

To do so, progress through the fields making no change (n, enter) and then delete (d, enter) the /Annots field. (Attempting to modify it will ask you to modify each individual entry, and that's not what we want to do.) When you reach the end of the object and are asked if you want to add new entries, hit y, Enter; type in /Annots when prompted for a name object; choose to add a reference value (we're going to refer to the orphanage); and then type in the orphanage id followed by "0 R", e.g. 580 0 R.


  • Once you've replaced each page's /Annots value with the right reference, you're done; be sure to evaluate
PPDF> save

to save your modifications.


As you can see, all of this can be automated, but until I write a script, here's the manual procedure!

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Preview: PDF annotations disappearing

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.