Quote:
Originally posted by Arcadian:
You seem to think that the Pentium 4 has poor x87 performance, but I think you're wrong. I believe the Pentium 4 has superior floating pointer performance compared to the Athlon, even without SSE. The only problem is it will likely need software optimizations to make the most use out of it. You see, the Athlon could also use optimizations, as can be seen in the new DirectX 8, which uses Athlon floating point optimizations in the T&L engine.
The Pentium 4 could also use optimizations. The Athlon may have 3-way superscalar floating point, but when you think of it, the Pentium 4 has 4-way. It's actually 2 superscalar pipelines, but they are both double pumped, which gives an effect of a 4-way engine. For optimized programs, the Pentium 4 should perform better.
One way of optimizing for this is to align data at certain memory addresses so that the data variables are more likely to be in cache. Another way is to group floating point arithmetic in sets such that they can make the most use of the superscalar pipelines. Yet another way is to try and reduce complex math into simpler instructions in order to streamline the Pentium 4 pipeline. In other words, instead of using cos or sin instructions, use optimized routines made from adds and shifts (which is usually included in most modern libraries).
There are many ways that have been found to optimize around Pentium III architecture, and I'm sure that soon enough, there will be great optimizations around the Pentium 4 architecture. The unfortunate thing is that it will likely take a while, and we may not see drastic improvements in current software.
Don't know what you mean by "double pumped" but from what I read only the ALU will be double pumped.