So, what I am envisioning is that you initially want the screen to be filled with your image talking, and then the document pops in with your image continuing to talk but inserted as a PiP into the document. Is that right?
If so, you could split the main clip of you talking into Segments 1 and 2 at the desired point in the narrative. Put Segment 1 in front of the document clip. Then insert Segment 2 as the PiP of you talking into the document clip. It would look like this:
Main clip of you talking, as shown in the Preview screen:

Then, as the playhead passes through your image, the document comes in with your image from Segment 2 of the split clip inserted as a PiP:

-- Rich