I disagree on some things about CPU performance, specially this about pipeline length:
What that means is that it takes the Pentium 4 from our example 31 clock cycles to complete a single instruction before it can start another.
This is very misleading. While it is true that it's not possible to push more than one pipeline stage per clock, you can't say pipeline length is equal to cycles per instruction. Modern desktop processors are based on superscalar, out-of-order designs. Determining how much clocks an instruction needs to complete is far more complex than just counting the number of pipeline stages.
Did you know that out of the 31 stages of the Prescott pipeline, 21 alone are dedicated to branch prediction? You could easily just group these 21 stages on a single stage labeled "Branch Prediction" and say the Prescott has only an 11-stage pipeline. Sure, you'd still need at least 21 cycles to complete the stage, but if that allows to predict the flow of instructions with much more precision, then in the end you'll end up winning cycles.
A longer pipeline means you can split up work on smaller, easier to handle bits (and thus achieve better logic control, such as brand prediction), but when an instruction stalls deep down the pipeline, you lose a lot of cycles. The main problem with long pipelines is that they require high clock frequencies to remain competitive, and this carries severe transistor leakage issues. This is why NetBurst was abandoned.
The argument that the Conroe is faster than the Prescott because
of a shorter pipeline is wrong. A better (but still superficial) description would be that Conroes are better because their pipeline is wider