Umm, okay, it's a closeup of the iPhone, Apple will like that.
You may be approaching this all wrong or I'm still not understanding but try to think of the hand and the phone as a single object. All you need to do is not apply the greenscreen keyer to the hand-arm-phone object. You can protect that object by using a hand-drawn matte (also known as a garbage matte) but you can create a protection matte many different ways including pulling a hard luma key.
Here is one approach (if Patrick was tuned in, he'd have another, much more elegant, suggestion for you):
You have a your video as a layer, yes, you apply the green keyer to it and part of the phone screen drops out, too. That's okay, you're going to fill the phone back in with itself.
You place another copy of your video on top (or even underneath) and you draw a mask around the screen of the phone. Doesn't need to be precise. Maybe you need to keyframe the mask position a few time, maybe you can attach the mask to a tracker. But all you're doing with the mask is allowing the keyed video to show everywhere exept in the mask or, if you apply your mask to the keyed video below, you're just allowing the clseup shot to be seen through the hoel from below.
Umm, clear as mud, I know. The point I'm trying to make is the task is probably not as complex as you're thinking it is.
If you were using AE, you'd use a four corner motion tracking setup with trackers in each corner of the screen.
PS: Thank you for the kind words, mostly I ust pi$$ folks off around here.
bogiesan