OK... it looks like people here need a little explanation of how a pipeline works.
The common analogy is to think about washing your clothes. Doing this chore requires several steps. First you have to put clothes in the washer, and wait 30 minutes for the load to complete. Then you have to put the clothes in the dryer, and wait about 1 hour for them to dry. Then you have to sit down and fold them and put them away, which takes about 15 minutes. Every load of laundry that you do in this kind of example takes about 1 hour and 45 minutes.
However, consider an alternate example. Instead of waiting 1 hour for the dryer to complete its cycle, say you put another load in the washer, so that both things can go on at the same time. In addition, when it comes time to fold and put away the clothes, say you put additional loads in the washer and dryer. This way, everything is happening at the same time. It is much more efficient. Let's pretend you have as many loads of laundry as a processor has chunks of data (a lot of loads! https://www.sharkyforums.com/images/.../2005/06/5.gif). If this were the case, it would take you 1 hour for every load, instead of 1 hour and 45 minutes, because the dryer is the limiting factor. No matter how fast you fold your clothes and put them away, you still have to wait the hour for the dryer to finish. This is PIPELINING, but it's UNBALENCED. In this example, you have 3 pipeline stages: the washer, the dryer, and the folding. Let's take this one step further.
Suppose you didn't like waiting an hour for the dryer, and a 1/2 hour for the washer, so you bought new machines. Now you have 2 washers and 4 dryers, each spaced in time so that they finish in 15 minute intervals. Remember it takes you 15 minutes to finish folding and putting away your clothes, so if you have clothes being finished by at least one washer and one dryer every 15 minutes, then you have peak efficiency, and can get one load of laundry put away every 15 minutes. This is like having a 7-stage pipeline. It accounts for a 7x reduction in time than if you didn't do pipelining at all! This is called BALENCED PIPELINING.
I am speculating here, but in the Pentium 4, it may have been that the ALU was like the dryer in the above example. It was the limiting factor in an already balenced pipeline. By doubling the clock on the ALU unit, you have effectively made twice as many of them, and it's just like buying extra dryers.
Thus, if you were to only take 1 packet of data (just like one load of laundry), it would take you 20 clocks to get from one side of the pipe to the other. However, since processors have much more data than you have loads of laundry, there is usually 20 pieces of data; one in each of the pipeline stages. Thus, when the pipeline is operating at top efficiency, one piece of data will pop out on each of the Pentium 4's 1.5GHz clocks.
The problem, however, is that the pipe isn't always full. The analogy is that you accidently forgot and put a pen in your pants pocket, and did the laundry. You only notice your mistake when you get to the point where you fold your clothes. Now, all your clothes have turned blue, and you have to wash them all over again. It's time to take all the clothes out of all the machines, and start all over again. (Fortunately, you have bleech to get out the stains https://www.sharkyforums.com/images/.../2005/06/5.gif).
In the Pentium 4, the ink pen is the same as a branch misprediction. Like forgetting about your pen, it is rare, but it sure takes a lot of time to reverse the mistake. Fortunately, the Pentium 4 has an excellent branch predictor, so it's like searching your pockets for pens before putting them in the wash. Maybe you'll catch all the pens in your pants pockets, but you might miss a couple in your shirt pocket. OK... here the analogy starts to fall apart, but you get the idea.
The 20-stage pipeline allows for each stage to take the shortest amount of time, just like the above example only take 15 minutes to do a load of laundry. This allows for much higher clock speeds. But, every time a branch is mispredicted, for example, it takes a long time to fix everything, and performance will slow down. Overall, though, Intel is hoping that high clock speeds will eventually counteract the long pipeline penalty, and you will still get a faster processor.
Hope this helps you guys https://www.sharkyforums.com/images/.../2005/06/5.gif.
