First, you can state that "my usage with my iP4S is comparable to the iP4 it replaced" but you cannot state that "so far the iP4S behaves about the same as the iP4 that it replaced"... because comparable "usage" and comparable "behavior' doesn't mean the same thing. One reason for this is that you didn't have to do a settings reset periodically on your 4 to achieve such results, nor turn off diagnostics and what not. I'm not repeating my arguments about marketing and the type of experience one expects from Apple.
Secondly, Anandtech's review is not "proof" of anything. The review provides data and analysis. You can cling to the bolded "potential" statement they make, but Apple didn't make a "mistake" when they printed the standby specs of 200hrs for the 4S vs. 300 for the 4. And herein lies the "enigma" which the engineer that you are should try to decipher. And if you disclosed you precise usage and setup, and qualify your activities on the phone, we could all try together. But assuming all devices have perfect hardware (and that's already a big "if"), you only really attract attention to the following question: "if Anandtech's power consumption numbers are correct - and they basically state that for basically everything but gaming the 4S consumes the same or less than the 4 in terms of power - then why is the device rated for a full 100hrs less of standby than the 4?". Could it be iOS? Within Anandtech's review paradigm, you closely have to look at their wifi hotspot test. They tested just before the power consumption of the device under 3G with downstream data - and in that case it's less than the iP4 by 0.4W which is quite a lot and the greatest difference between the 4 and 4S. Then they use the device as a hotspot and get 0.02 hours less use than with the 4 - ok, that's negligeable because it's like what, 1 minute or whatever. But then they wonder and speculate "It is surprising that despite the peak power advantages above, we didn't see any improvement in our WiFi hotspot test. The only explanation I have is that the power advantage may not be as pronounced if we're not pushing the limits of the wireless interfaces." - as before they had pointed out "Under load however, Apple is bound by the same physical realities as its competitors and the question of battery life becomes one of battery capacity divided by peak power draw." One can speculate that the last 700 pages explain the difference between the 4 and the 4S and that iOS explains the difference between what's expected from peak power calculations of the device vs. real life device usage. I would coin it as a "failed balancing act".
Consequently, side by side under heavy non stop (non 3D intensive) use from 100%, the 4S may even yield slightly better results than the 4, granted*, but as usage spreads over time, the "leak" (conceptually, for the lack of a better term) becomes obvious (and thanks for really allowing me to reflect on this further) and so much so Apple actually accounted for it in their specs.
(*) periodical maintenance required with some options off.