Click to See Complete Forum and Search --> : Pentium 4 pros and cons


Arcadian
10-31-2000, 12:16 AM
We all know a good amount of what will be included in the Pentium 4 architecture. Intel has made a great amount of information available about this. However, one thing we do not have is any reliable performance numbers. So, I wanted to get some hard core speculation regarding the Pentium 4. This thread has really died down this past week, so I wanted to recessitate it a little. Come on and help me out. http://www.sharkyforums.com/ubb/smile.gif

What I'm looking for are people's opinions on what aspects of the Pentium 4 will likely be fairly positive or what will be fairly negative. What do you think has been overrated to the point of marketing hype, and what do you think needs to be discussed more? Do you think the Pentium 4 will surprise everybody, or dissapoint. I welcome opinions, but would like it more if you could keep this technical, so we can respond to ideas.

Here are some aspects that could warrent discussion:

- 400MHz Front Side Bus
- Double Pumped Execution Units
- Lack of Dual Processing at Launch
- More than Double the Heat of Pentium III
- Rambus Memory
- High Bandwidth Caches
- 20-stage Pipeline / Branch Misprediction
- High Clock Frequencies
- Higher Clock Frequencies in the Future
- Excellent Branch Predictor
- Small L1 Cache
- Trace Cache
- Large # of Instructions in Flight
- New PC Specification / Large Heat Sinks
- Low Expectations (on Websites)
- Rumors on the Web
- i850 Delays
- Price Issues
- Much More!

Wow... this is a lot to talk about. Hopefully, we get some good discussions going. http://www.sharkyforums.com/ubb/smile.gif

gamigin
10-31-2000, 06:40 AM
Rambus mem's? what for? they're expensive and are they really that good?
Doubled heat aye... gotta get a hell of a fan i guess... or just stick with AMD http://www.sharkyforums.com/ubb/smile.gif

------------------
- GamigiN

jtshaw
10-31-2000, 11:01 AM
I know a lot of people have already counted the P4 out. There has been lots of speculation that it isn't going to be as fast as the Athlon or even the PIII on a Mhz. to Mhz. basis and people seam to have fun talking about how crappy of a product it is. This all might be true but it could just as easily be wrong.
There are a few things I see in the P4 that lead me to believe Intel understands what new processors need. One is the 400 Mhz. (or 100Mhz. "Quad Bumped") bus. AMD seamed to catch on a little earlier with the Duron and the Athlon that bus speed is an important factor (maybe a lesson they learned back when the PII beat them to 100Mhz. and won the proformance title). Intel took a nice jump from 133mhz. all the way to 400mhz. Another interesting thing I heard is that Intel is working on a DDR chipset for the P4 and is trying to figure out a way to break the contract with Rambus.
RDRAM is definitly something to talk about with the P4 because even if Intel does create a DDR chipset we won't see it for a while. From what I gather, in terms of the PIII, RDRAM had no real benifit because the bus couldn't match the high ram speed. It is possible that with the P4 RDRAM could actually cause proformance increase because of it's added bus bandwidth. Troubles with RDRAM aren't looking to go away and it will still probably be rather expensive and hard to get a hold of.
Another major feature of the P4 I think is worth discussing is the 20-stage pipeline with "Branch Misprediction". This seams to be the heart of the P4's high clock speeds but it could also be the reason why P4 doesn't appear to be all that much better then the Athlon or PIII at this time. It could be software needs to be optimized to take full advantage of this. The longer pipeline also means if bad information gets into the pipeline it will take longer to flush it out...
The double pumped executions units are another feature that I think could be excellent for proformance if code is properly optimized to take advantage of it.
One thing many people on these forums, as well as elsewhere, are going to gripe about is the price of the P4. It will be expensive comparitive to AMD parts. Intel has a pricing stratagy that involves supply and demand. Intel hasn't needed to lower the price of there processors as much as AMD has because they are still selling more of them. AMD is making money for the first time ever but they are still making far less then Intel. Businesses are still more likely to go with Intel parts because they have been buying them for years. I am not saying this is right but for now it is the way things are.
I have been blabbing on for too long already...to end this for now, I think that the P4 has some features which look like they could be really great for proformance. They are also plenty of questions in my mind about the ability of the P4 to really proform. I can't wait to see it come out so we can run it to the ground and see just how well it runs.


------------------
My computer said "WindowsME/2000 or better" so I installed Linux.

LiquidGoop
10-31-2000, 11:34 PM
One thing that you overlooked, so I just had to bring it up, was the fact that the p4 will be TWICE as big as the p3. This means taht intel will only be able to produce half the amount of cpus on the same wafer. However, intel has said that supply will not be a problem. I think everyone remembers the p3 1.13Ghz.. err.. incident. I'm not tryingto bash intel, the p4has alot going for it, especially the 400Mhz bus. The 20-stage pipeline might hurt it somewhat,that lost ground might easily be made up by the excellent branch predicotr. Hard to tell at this point, looks like some pros and cons toeven out, coincidently the title of teh topic.. The small L1 cache looks like a bad decision, IMO. Seeing the high clockspeeds, you'd think that it would have more l1 cache.

Anybody have any mustang specswecould compare the p4 to? Wear anasbestos coated jacket tho, we know how things can heat up in here http://www.sharkyforums.com/ubb/smile.gif

------------------
"A P2 450 and a Savage should be enough for anybody"
-Mike, October 2000

Phoenix
11-01-2000, 03:52 AM
I'm interested to see how the 400MHz FSB helps mempry performance as I have heard arguments from both sides. I don't really think that a lack of dual processing at launch will hurt them too much, as long as AMD doesn't have a good dual setup that performs better than the Intel line. The only large market for dual, or larger, setups are servers for business, which Intel will still have control over, even if AMD does come out with something better. I'm wondering if the small cache will be a good idea, even if the cache is faster. Would a small amount of cache have a bigger hit on performace if it was used in a server? If so, maybe this is why they haven't come up with dual support yet. Price could hurt it, and low expectation could really help the P4 if it turns out good, the public loves a good surprise.

------------------
When asked how World War III would be fought, Einstein replied that he didn't know. But he knew how World War IV would be fought: With sticks and stones!

Arcadian
11-01-2000, 01:04 PM
Originally posted by Phoenix:
I'm interested to see how the 400MHz FSB helps mempry performance as I have heard arguments from both sides. I don't really think that a lack of dual processing at launch will hurt them too much, as long as AMD doesn't have a good dual setup that performs better than the Intel line. The only large market for dual, or larger, setups are servers for business, which Intel will still have control over, even if AMD does come out with something better. I'm wondering if the small cache will be a good idea, even if the cache is faster. Would a small amount of cache have a bigger hit on performace if it was used in a server? If so, maybe this is why they haven't come up with dual support yet. Price could hurt it, and low expectation could really help the P4 if it turns out good, the public loves a good surprise.


I may have some insight regarding your cache size concerns. From what I have heard (from various sources http://www.sharkyforums.com/ubb/wink.gif) is that the 8KB d-cache is sized to allow for scalability into >2.0GHz ranges. This will not apply at first, but remember that Intel would like the design to be around for approximately 5 years like the P6 architecture.

In terms of servers, there will be a version of the Pentium 4 called Foster (which will probably be named Pentium 4 Xeon). Like the Pentium III Xeon, there will probably be small cache versions and large cache versions, the latter of which will be used in departmental/enterprise servers. The large cache will not be L2, but rather L3, and will probably come in sizes ranging from 1MB to 3MB. From what I know, this product will come later next year, so as to not conflict with the 900MHz Pentium III Xeon (with 1MB or 2MB of cache) due out in a couple of months.

The server market should really be heating up this next year between Sun's Ultrasparc III and Intel's Pentium 4 Xeon. Personally, I think the Pentium 4 Xeon (still not certain on this name?) will kick the pants off Ultrasparc III http://www.sharkyforums.com/ubb/biggrin.gif.

jtshaw
11-01-2000, 05:51 PM
A quick comment about the Pentium 4 Foster/Xeon vs. the UltraSparcIII. If The P4 Xeon is kin to the P4 like the P3 Xeon is to the P3 then it will probably crush the ultraSparcIII on many levels...you ever tried to get upgraded hardware for a sparc station? PAIN IN THE BUTT!

------------------
My computer said "WindowsME/2000 or better" so I installed Linux.

-= HaX0r =-
11-02-2000, 02:58 AM
Just go with AMD from now on, Intel is losing their strong hold on the market.

------------------
PIII-800@ 1002mhz
Asus CULS2 mobo
512mb Micron ram
46.1gig ata-100 IBM hdd
30.1gig ata-100 IBM hdd
30.1gig ata-100 IBM hdd
Hercules 3D Prophet DDR-DVI 32MB
Sound Blaster Live! 1024
Kenwood 72x Cd-Rom
Ricoh CD-RW 6/4/32
8x Toshiba DVD-Rom
Linksys LNE100TX nic card
Running WinMe,Win2k,and Whislter

Phoenix
11-02-2000, 03:16 AM
Originally posted by Arcadian:
I may have some insight regarding your cache size concerns. From what I have heard (from various sources http://www.sharkyforums.com/ubb/wink.gif) is that the 8KB d-cache is sized to allow for scalability into >2.0GHz ranges. This will not apply at first, but remember that Intel would like the design to be around for approximately 5 years like the P6 architecture.

In terms of servers, there will be a version of the Pentium 4 called Foster (which will probably be named Pentium 4 Xeon). Like the Pentium III Xeon, there will probably be small cache versions and large cache versions, the latter of which will be used in departmental/enterprise servers. The large cache will not be L2, but rather L3, and will probably come in sizes ranging from 1MB to 3MB. From what I know, this product will come later next year, so as to not conflict with the 900MHz Pentium III Xeon (with 1MB or 2MB of cache) due out in a couple of months.

The server market should really be heating up this next year between Sun's Ultrasparc III and Intel's Pentium 4 Xeon. Personally, I think the Pentium 4 Xeon (still not certain on this name?) will kick the pants off Ultrasparc III http://www.sharkyforums.com/ubb/biggrin.gif.

Thanks for the info. Arcadian. I have been playing some Sun Sparc's lately, at the University I attend some of the member of the Linux User's Group setup a Beowulf cluster of Sun Sparc 5's, not the fastest CPU's in the world, but they were free http://www.sharkyforums.com/ubb/smile.gif

------------------
When asked how World War III would be fought, Einstein replied that he didn't know. But he knew how World War IV would be fought: With sticks and stones!

Flip
11-02-2000, 04:19 AM
The Pentium 4
New Architechure?
New Instrucion Sets?
Quad-pumped 100MHz FSB?

Lets think about this whole thing logically, so we can really see if this new chip will do more for us than a Pentium 3.

First of all, the p4 implements hyper-pipelining, which means that it has a 20 stage pipeline, which the p3 only has a 10 stage pipeline. With this hyper-pipeline there is a possibility for higher clock frequencies, but also with this deeper pipe comes more room for error. Lets think: If the p3 has 10 stages, that means that the ALU is predicting instructions and executing them 20 steps down the line, that means that in simple terms, if there were only two instructions, 1 and 0, there would 1024 different paths down that pipe, and the algorithums job is to try very hard to speculate which path out of 1024 to take. Now, that being said, if the p4 has 20 stages, do the math for two instructions, and that would over 1.04 million paths, so now these new and improved algorithms can now predict with the same accuracy which path to take out of 1.04 million? I think not, no matter how many super-genius's they had working on the new algorithms. Not to say that this new 20 stage pipeline is all that bad, Intel has also implemented Advanced Dynamic Execution can keep up to 126 instructions in flight, which helps keep the mis-predictions to a minimum. Another way that Intel intends to solve this problem is the variation of L1 cache called trace cache, which stores decoded x86 instructions in the flow that they are to be processed. Also with the 256K of Advanced Transfer Cache running at full speed with a 256 bit path to the double clocked ALU's this processor will not have much time to rest, and in turn will generate MUCH MORE heat!
About the SSE2, of course they had to throw some new instruction sets in, but as of right now, there is no such software that is coded to take advantage of these new instruction sets, so we will talk about how good they are when we have a chance to see them in action. And to conclude, there is no clear definition as to which is better(p3 v. p4), all we can do now is just sit back, relax, wait until they are released, and let the benchmarks be the judge.

Flip
(as in the chip)

Arcadian
11-02-2000, 11:14 AM
This is for Flip.

I'm sorry, but you have the wrong idea of how a pipeline works. If I weren't so busy right now, I'd explain it to you. Maybe I'll have time later. I just wanted to clear that up, because your relation of a pipeline to branch prediction is not correct. Thanks for the response, though... I'll try to get back to you later.

Arcadian
11-02-2000, 11:16 AM
Originally posted by -= HaX0r =-:
Just go with AMD from now on, Intel is losing their strong hold on the market.


HaX0r, the idea here is to get some discussion going, not to close the argument with a one line opinion. If you think AMD has a stronger hold on the market, can you please explain why?

jtshaw
11-02-2000, 11:58 AM
Surprisingly Intel isn't really lossing anything. They are still making more money and I believe even selling more chips then AMD. Not to say AMD chips are crap, they aren't, but Intel is still the proven solution for businesses as has been stated time and time again on these forums. I would hardly say Intel's stance is in trouble at this time.

Originally posted by -= HaX0r =-:
Just go with AMD from now on, Intel is losing their strong hold on the market.



------------------
My computer said "WindowsME/2000 or better" so I installed Linux.

[This message has been edited by jtshaw (edited November 02, 2000).]

Flip
11-02-2000, 12:16 PM
Arcadian
Sorry, I didn't mean to sound like an idiot, I wasn't sure completely how it works, I just inferred a lot from what I had read, if you could point me to a good article explaining the 20 stage pipeline, it would be much appcreciated, or even better, if you gave me an explaination, I'd be very greatful!

Flip

p.s. no hard feelings? I'm pretty new at the deep down technical interworkings of the microchip.

zombor
11-02-2000, 01:30 PM
hmmmm...i cant remember offhand how big the PIII pipeline is, but isnt 20 more than double what the p3 had? If this is true, it would take the proc twice as long to process n instruction, but the clock speed would be able to be doubled, right? Then, if the chips will debue at 1.4 and 1.5 Ghz, the "rawprocessing power" would be an effective 700-800Mhz p3(assumming there arent any p4 optimized instructions being used). If im wrong tell me, cause at this point, in seriously doubting this chip as a contender for games and shuch.

------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k

who said you cant game on a notebook????

Arcadian
11-02-2000, 02:18 PM
OK... it looks like people here need a little explanation of how a pipeline works.

The common analogy is to think about washing your clothes. Doing this chore requires several steps. First you have to put clothes in the washer, and wait 30 minutes for the load to complete. Then you have to put the clothes in the dryer, and wait about 1 hour for them to dry. Then you have to sit down and fold them and put them away, which takes about 15 minutes. Every load of laundry that you do in this kind of example takes about 1 hour and 45 minutes.

However, consider an alternate example. Instead of waiting 1 hour for the dryer to complete its cycle, say you put another load in the washer, so that both things can go on at the same time. In addition, when it comes time to fold and put away the clothes, say you put additional loads in the washer and dryer. This way, everything is happening at the same time. It is much more efficient. Let's pretend you have as many loads of laundry as a processor has chunks of data (a lot of loads! http://www.sharkyforums.com/ubb/smile.gif). If this were the case, it would take you 1 hour for every load, instead of 1 hour and 45 minutes, because the dryer is the limiting factor. No matter how fast you fold your clothes and put them away, you still have to wait the hour for the dryer to finish. This is PIPELINING, but it's UNBALENCED. In this example, you have 3 pipeline stages: the washer, the dryer, and the folding. Let's take this one step further.

Suppose you didn't like waiting an hour for the dryer, and a 1/2 hour for the washer, so you bought new machines. Now you have 2 washers and 4 dryers, each spaced in time so that they finish in 15 minute intervals. Remember it takes you 15 minutes to finish folding and putting away your clothes, so if you have clothes being finished by at least one washer and one dryer every 15 minutes, then you have peak efficiency, and can get one load of laundry put away every 15 minutes. This is like having a 7-stage pipeline. It accounts for a 7x reduction in time than if you didn't do pipelining at all! This is called BALENCED PIPELINING.

I am speculating here, but in the Pentium 4, it may have been that the ALU was like the dryer in the above example. It was the limiting factor in an already balenced pipeline. By doubling the clock on the ALU unit, you have effectively made twice as many of them, and it's just like buying extra dryers.

Thus, if you were to only take 1 packet of data (just like one load of laundry), it would take you 20 clocks to get from one side of the pipe to the other. However, since processors have much more data than you have loads of laundry, there is usually 20 pieces of data; one in each of the pipeline stages. Thus, when the pipeline is operating at top efficiency, one piece of data will pop out on each of the Pentium 4's 1.5GHz clocks.

The problem, however, is that the pipe isn't always full. The analogy is that you accidently forgot and put a pen in your pants pocket, and did the laundry. You only notice your mistake when you get to the point where you fold your clothes. Now, all your clothes have turned blue, and you have to wash them all over again. It's time to take all the clothes out of all the machines, and start all over again. (Fortunately, you have bleech to get out the stains http://www.sharkyforums.com/ubb/smile.gif).

In the Pentium 4, the ink pen is the same as a branch misprediction. Like forgetting about your pen, it is rare, but it sure takes a lot of time to reverse the mistake. Fortunately, the Pentium 4 has an excellent branch predictor, so it's like searching your pockets for pens before putting them in the wash. Maybe you'll catch all the pens in your pants pockets, but you might miss a couple in your shirt pocket. OK... here the analogy starts to fall apart, but you get the idea.

The 20-stage pipeline allows for each stage to take the shortest amount of time, just like the above example only take 15 minutes to do a load of laundry. This allows for much higher clock speeds. But, every time a branch is mispredicted, for example, it takes a long time to fix everything, and performance will slow down. Overall, though, Intel is hoping that high clock speeds will eventually counteract the long pipeline penalty, and you will still get a faster processor.

Hope this helps you guys http://www.sharkyforums.com/ubb/smile.gif.

Flip
11-02-2000, 03:26 PM
Arcadian,
You sure have a way with analgies, that is a great explanation, thanks for spending the time to show us the light.

Flip

Moridin
11-02-2000, 04:02 PM
Originally posted by Flip:
Arcadian
Sorry, I didn't mean to sound like an idiot, I wasn't sure completely how it works, I just inferred a lot from what I had read, if you could point me to a good article explaining the 20 stage pipeline, it would be much appcreciated, or even better, if you gave me an explaination, I'd be very greatful!

Flip

p.s. no hard feelings? I'm pretty new at the deep down technical interworkings of the microchip.


Instead of re-inventing the wheel and writing a long post on pipelining I'll give you a ling to a discussion thread from a few months back on the topic. It starts of a G4 discussion but turns into one of the best Q&A on pipelining I have seen.
http://arstechnica.infopop.net/OpenTopic/page?q=Y&a=tpc&s=50009562&f=77909774&m=224092771

Moridin
11-02-2000, 04:56 PM
Fun topic.

- 400MHz Front Side Bus

Nice improvement, but still well below what RISC workstations will be using. Not that X86 has ever had the memory bandwidth of RISC workstations, but they may be loosing ground. The Alpha EV7 for example will have 4 times the memory bandwidth of the P4.
This is probably an absolute minimum for a new processor given growing gap between processor and memory speed.

- Double Pumped Execution Units

I love this idea. If it scales it could be the most significant part of the architecture. It doubles you throughput and eliminates Read After Write (RAW) dependencies for instructions it applies to.

- Lack of Dual Processing at Launch

Not a big factor in my mind. Multi-processor is primarily used for servers and most IT shops will want to give the chip a while to stabilize before trusting anything important to it. PIII Xeons would likely be preferred to the P4 for some time yet.

- More than Double the Heat of Pentium III

Heat is not a concern for the end user. The chip either works or does not. It is up to the system builder to insure an appropriate thermal solution. The large die size should help this somewhat since it keeps the W/cm^2 about the same as the PIII. You won't see the P4 in notebooks any time soon though.

- Rambus Memory

DDR memory will be an option at some point. We don't know what will happen to the price of RDRAM if the P4 increases demand, but with Intel rebates current prices are competitive.

- High Bandwidth Caches

No kidding the bandwidth of the caches is high. The L1 D cache bandwidth is twice that of the PIII while the L2 bandwidth is more then the processor can use. The L2 can deliver 256 bits per clock while the L1 D cache is only 128 bits per clock and the (single) decoder is on 32 bits per clock. To add to the mystery (to me anyway) is that the L1 D cache is single ported, the load store unit is not double pumped but the AGU is. This may mean that the extra wide data path from the L1 gives something like the effect of a duel ported L1.
I'd be interested in any comment about this.

- 20-stage Pipeline / Branch Misprediction

A lot has been said about the longer pipeline lowering IPC. It may or may not, given the effect of better branch prediction and new branch hint instructions.
I'll go out on a limb and say that even if IPC is lowered it won't hurt overall performance, and here is why I think that. If you take two balanced pipelines, one 11 stages the other 22, if both do the same amount of work (same amount of logic) the 22 stage pipeline should clock twice as high. Lets say that the 11 stage pipeline can reach 1 GHz then the 22 stage should reach 2 GHz. Lets also assume that the branch penalty is 10 and 20 respectively.

If you work out the amount of time each pipeline is stalled by a branch mispredict it comes to exactly the same value. (10 ns) Both will sit idle for the same length of time on a mispredict, but the longer pipeline will be faster the rest of the time and therefor complete any given job more quickly. This only applies if the pipelines are balanced.

Again, any comments would be welcome.

I'll finish off in another post to keep this short (er) and somewhat readable.

Arcadian
11-02-2000, 06:04 PM
Thanks for the long and interesting reply, Moridin. I wanted to touch on a few comments that you made.

Originally posted by Moridin:
- High Bandwidth Caches

No kidding the bandwidth of the caches is high. The L1 D cache bandwidth is twice that of the PIII while the L2 bandwidth is more then the processor can use. The L2 can deliver 256 bits per clock while the L1 D cache is only 128 bits per clock and the (single) decoder is on 32 bits per clock. To add to the mystery (to me anyway) is that the L1 D cache is single ported, the load store unit is not double pumped but the AGU is. This may mean that the extra wide data path from the L1 gives something like the effect of a duel ported L1.
I'd be interested in any comment about this.

Hmm... I'm wondering what you're getting at here. Can you explain further, because your idea intrigues me.

I've noticed, too, that the cache bandwidth is almost TOO big for what the processor needs. Either they wanted to make sure the cache was the least of the bottlenecks, or Intel has something sneaky in mind http://www.sharkyforums.com/ubb/wink.gif.

Originally posted by Moridin:
- 20-stage Pipeline / Branch Misprediction

A lot has been said about the longer pipeline lowering IPC. It may or may not, given the effect of better branch prediction and new branch hint instructions.
I'll go out on a limb and say that even if IPC is lowered it won't hurt overall performance, and here is why I think that. If you take two balanced pipelines, one 11 stages the other 22, if both do the same amount of work (same amount of logic) the 22 stage pipeline should clock twice as high. Lets say that the 11 stage pipeline can reach 1 GHz then the 22 stage should reach 2 GHz. Lets also assume that the branch penalty is 10 and 20 respectively.

If you work out the amount of time each pipeline is stalled by a branch mispredict it comes to exactly the same value. (10 ns) Both will sit idle for the same length of time on a mispredict, but the longer pipeline will be faster the rest of the time and therefor complete any given job more quickly. This only applies if the pipelines are balanced.

Again, any comments would be welcome.

I have the feeling that Intel was sort of aiming for bandwidth intensive programs, which would be somewhat latency tolerant. Programs that wait for user input (Word Processing) certianly do not need to get any faster, but programs like video encoding, 3D rendering, and background encryption/decryption would be very bandwidth intensive programs that will probably do well on the Pentium 4. It seems that the large number of pipeline stages was a tradeoff that was needed in order to reach the next generation of clock speeds, though I'm not sure a lot of the zealots on the boards will look at it that way at first.

Most people want immediate satisfaction in speed, and like you said, a lot of optimization is available, so we may have to wait for Pentium 4 to really get better programs written for it. I did not know that there was a Branch Hint instruction available, but that goes to show that optimizations are possible in programs, and that the Pentium 4 will probably not see these advantages at first. Also, SSE-2 needs to be optimized, as well as taking advantage of the Pentium 4's 128 byte cacheline boundaries.

Well, I'm a little off topic right now, but we should really spawn a discussion on optimizations.

Originally posted by Moridin:
I'll finish off in another post to keep this short (er) and somewhat readable.

Please continue to post. I enjoy reading from you.

Arcadian
11-02-2000, 06:12 PM
Originally posted by Flip:
Arcadian,
You sure have a way with analgies, that is a great explanation, thanks for spending the time to show us the light.

Flip

Thanks... I appreciate the kind words. You know, I am happy to share my knowledge with everybody in this board, and actually it was one of the reasons I requested the Highly Technical Forum to be created. If you want me to write concerning a different topic, I would be interested in teaching you more.

Take care.

OOAgentFiruz
11-02-2000, 06:34 PM
Another tutorial : ) http://www.hardwarecentral.com/hardwarecentral/tutorials/2427/1/

OOAgentFiruz
11-02-2000, 06:45 PM
Fortunately, the Pentium 4 has an excellent branch predictor, so it's like searching your pockets for pens before putting them in the wash.
What type of Branch Prediction is the P4 utilising,do you have a link to any info : )
F.Y.I
The K7 uses a 2,048-entry branch history table (BHT) with a simple two-bit Smith prediction algorithm. This predictor stands in sharp contrast to the K6's elaborate 8,192-entry BHT with its two-level GAs predictor, a feature that AMD now admits was overkill.

Arcadian
11-02-2000, 07:20 PM
Originally posted by OOAgentFiruz:
What type of Branch Prediction is the P4 utilising,do you have a link to any info : )

F.Y.I
The K7 uses a 2,048-entry branch history table (BHT) with a simple two-bit Smith prediction algorithm. This predictor stands in sharp contrast to the K6's elaborate 8,192-entry BHT with its two-level GAs predictor, a feature that AMD now admits was overkill.

I do indeed have a link for you http://www.sharkyforums.com/ubb/smile.gif. Here is a document that should answer a lot of your questions. It is by the Pentium 4 processor's lead architect, Doug Carmean.
http://www.intel.com/pentium4/download/nbarch.pdf

On Page 28, it says the following regarding the branch predictor of the Pentium 4.

"Accurate branch prediction is key to enabling longer pipelines"

"Dramatic improvement over P6 branch
predictor:
– 8x the size (4K)
– Eliminated 1/3 of the mispredictions"

"Proven to be better than all other
publicly disclosed predictors (g-share, hybrid, etc)"

You should read the rest of the document, too. It has a lot of information!

virusag13
11-02-2000, 09:03 PM
OOAgentFiruz, thanks for the link! That site should keep me busy for a while.

Phlux
11-02-2000, 10:45 PM
A little off the topic of the pipeline in PIV, but out of curiosity, are their technical limitations of socket 423? Why would Intel be changing the socket of the PIV so soon after it's release. Not to say that things cannot change but it is supposed to debut with the release of Foster? I do not understand this move.

Another question, will the PIV show a significant improvement with more memory bandwith, or will the dual RDRAM channel keep it happy. 3.2Gb/s is quite a bit. Looking for a flow chart for the 850 architecture as well. Any ideas?


Just a thought

------------------
Computers run on smoke....when it leaks out, you are in trouble.

Arcadian
11-02-2000, 11:38 PM
Originally posted by Phlux:
A little off the topic of the pipeline in PIV, but out of curiosity, are their technical limitations of socket 423? Why would Intel be changing the socket of the PIV so soon after it's release. Not to say that things cannot change but it is supposed to debut with the release of Foster? I do not understand this move.

Actually, you're not off topic at all. This topic deals with all the pros and cons of the Pentium 4 architecture and platform.

To answer your question, though, it could be any number of reasons why Intel is choosing to change sockets. One thing I do want to point out, though, is that I have yet to see an official press release stating that there will be a change in socket. The only things I have seen have been insubstantiated rumors, so I can't be sure that there will even be a change in socket. (I would really appreciate a link to verify this, if anybody has one).

Assuming that there is a change in socket, though, I would have to guess that it is probably for the following reason. Usually the same team that works on a product will not be the same team that works on that product's successor. I think that, because two different teams were working on Willamette (code name for Pentium 4) and Northwood (Pentium 4 on .13u shrink), that the requirements of the latter product was not received in time to be implemented in the first product. In other words, Willamette was being worked on first by one team, and they came up with a specification based on the requirements for the chip. Another team was working on Northwood, but before they came up with the requirements, the specification for socket 423 was probably already finished, so they created their own specification based on the new requirements. This is only my theory, but since miscommunication between the two groups can happen easily, it is a possibility.

Originally posted by Phlux:
Another question, will the PIV show a significant improvement with more memory bandwith, or will the dual RDRAM channel keep it happy. 3.2Gb/s is quite a bit. Looking for a flow chart for the 850 architecture as well. Any ideas?

Just a thought

RDRAM gives the Pentium 4 the same memory bandwidth as the front side bus bandwidth. Both interfaces allow for 3.2GB/s, and that allows for optimal transfers. It is doubtful that a different amount of memory bandwidth could allow for a better performance. My opinion is that DDR memory may in fact reduce the performance of the Pentium 4, since the bandwidth for PC2100 is only 2.1GB/s. Of course, we will have to wait until there are Pentium 4 boards with different memory interfaces to compare the two.

Also, the flow chart for the i850 will probably be very similar to other Intel chipsets, with the only exception being the dual memory interfaces.

Moridin
11-03-2000, 11:49 AM
Originally posted by Arcadian:
[B]Hmm... I'm wondering what you're getting at here. Can you explain further, because your idea intrigues me.B]

I'm not quite sure either. It's just that some things don't look like they should. The AGU is double pumped, so the processor can generate addresses for two loads/stores each clock cycle. The Load Store unit is not double pumped so it can only actually perform one load per clock. If the D cache was duel ported the processor could perform 2 load/stores per cycle just like the Alpha and Athlon but it is not (apparently there are a number of restrictions in the Athlon duel ported cache)

At the same time the D cache data path is twice as wide as it should be. Why would you have a 128 bit wide data paths to the L1 data cache when you can only load 32 bit integers and 64 bit floating point values?

There are a couple of possible answers. The first would be that that the 128-bit data path could be associated with SSE2, and the AGU is double pumped so that the address is ready 1/2 clock cycle sooner giving your cache extra time to return a value. I think this may be too much coincidence. Why would you have an AGU and D cache data paths that can support loading 2 64-bit numbers and build a load store unit that can only load 1 64 bit number.

Another more intriguing answer would be that the P4 can load/store 2 values per clock as long as the values were in the same 128-bit section of memory.

Or maybe Intel has some surprise in store for us.

The only reason I can think of for having the L2 bandwidth that the P4 does is so that L1 line fills occur more quickly. This would reduce latency if you had to loads that occur near each other that miss L1 and hit L2. The P4 L2 can fill the 128 Byte cacheline in 4 cycles, so you may save a few cycles of latency on occasion. My question is if this is enough to justify the L2 bandwidth.

As I said before, the P4 L2 has nearly twice as bandwidth as the core can use, so there may be something strange here as well.

Moridin
11-03-2000, 01:07 PM
I have a few minutes so I'll try to cover my opinions on a few more points.

- High Clock Frequencies

High clock frequencies are a very good thing, not everything but probably the most important. A lot of people around the web like to say clock speed is not everything and then infer that IPC is. I disagree with this, especially in X86.
The limited number of registers in the X86 architecture mean that it has to perform a lot of loads and stores compared to other architectures. IMHO this tends to make the code more linear and less suitable for executing a lot of instructions in parallel. The loads cause an even bigger problem since you can't perform calculations on data that you do not have yet. Limitations on how you can reorder loads and stores make the instruction stream even more linear.

All this combines to make it much more difficult to get IPL out of the X86 architecture compared to other newer architectures. You can get around some of this by using low latency caches, large OOO windows, and even by reducing Read after write dependencies (all of which the P4 has) but in the longer term the only way of increasing performance beyond a certain level is increasing clock speed.

(Ok more registers would be the real long-term answer but that would likely require new modes, OS support, and rewriting software. AMD has the right idea here in combining this change with the move to 64 bits. I'm not sure if AMD can get the market support this would require though, especially when you consider how long the move from 16 to 32 bits has taken and that is not even done yet.)

Higher clock frequencies have other good effects as well. Even where the P4 has increased latencies and stalls in terms of clock cycles the real latency (and performance lost) in terms of time is likely reduced because each clock cycle is shorter.

- Higher Clock Frequencies in the Future

It should be apparent by now that I think the P4 will prove to be much better than many people think. I think Intel has done an excellent job of targeting it to its intended market over its entire life span, and I think this is another example.

The pipeline of the P4 apparently has several stages with NO logic whatsoever. This is to allow signals to travel form one part of the chip to another. Now, this is a large chip, but not that large. The die itself will likely be 40 % wider and 40 % longer then the PIII, so it is unlikely that any wires are more then 40 % longer and in fact if Intel has done a really good layout job the wires may not be any longer then the PIII. So why do you need these "Drive" stages in the pipeline?

I think they are there to accommodate changes in process technology over the next few years. Right now we are on the verge of a fundamental change in the way processors speed reacts to a die shrink. A process shrink speeds up transistors, but not the wires. Up till now this has not mattered much since the wires were much faster then the transistors. We have now reached the point where the speed of the wires is often more important then transistor speed. Cu wires and SOI act as a one time speed up for the wires but this only pushes back the problem a little, it does not solve it.

I think these "Drive" stages will make the P4 much more scalable on future process shrinks then processors that do not have this type of a stage. In other words I think the P4 will get a better clock boost going from .13 Cu process to a .1 Cu (and smaller) process relative to the Athlon of PIII. It should also make it less sensitive to running on a .18 Al process, so don't be surprised if the PIII moves to .13 Cu before the P4.

Well I think that’s enough for one post, again any comments would be welcome.

Arcadian
11-03-2000, 01:19 PM
Originally posted by Moridin:
I'm not quite sure either. It's just that some things don't look like they should. The AGU is double pumped, so the processor can generate addresses for two loads/stores each clock cycle. The Load Store unit is not double pumped so it can only actually perform one load per clock. If the D cache was duel ported the processor could perform 2 load/stores per cycle just like the Alpha and Athlon but it is not (apparently there are a number of restrictions in the Athlon duel ported cache)

At the same time the D cache data path is twice as wide as it should be. Why would you have a 128 bit wide data paths to the L1 data cache when you can only load 32 bit integers and 64 bit floating point values?

There are a couple of possible answers. The first would be that that the 128-bit data path could be associated with SSE2, and the AGU is double pumped so that the address is ready 1/2 clock cycle sooner giving your cache extra time to return a value. I think this may be too much coincidence. Why would you have an AGU and D cache data paths that can support loading 2 64-bit numbers and build a load store unit that can only load 1 64 bit number.

Another more intriguing answer would be that the P4 can load/store 2 values per clock as long as the values were in the same 128-bit section of memory.

Or maybe Intel has some surprise in store for us.

The only reason I can think of for having the L2 bandwidth that the P4 does is so that L1 line fills occur more quickly. This would reduce latency if you had to loads that occur near each other that miss L1 and hit L2. The P4 L2 can fill the 128 Byte cacheline in 4 cycles, so you may save a few cycles of latency on occasion. My question is if this is enough to justify the L2 bandwidth.

As I said before, the P4 L2 has nearly twice as bandwidth as the core can use, so there may be something strange here as well.


Well, you certainly seem to know a lot about the Pentium 4 architecture, Moridin! I was wondering about what you said, and it struck me that one possibility is that Intel included these high bandwidths for future tweaking inside the core.

Maybe they wanted some features that weren't able to be implemented on time, so they left room to implement them in a future P7 microarchitecture. Perhaps Foster or Northwood will include some surprises.

I have a few questions fot you, though. What do you think is the impact on these specs for the L3 cache that Foster is reported to have? Also, how would these specs allow for future technologies, such as multithreading, to be implemented in some later P7 based processor?

Again, thanks for the response. http://www.sharkyforums.com/ubb/smile.gif

PS... I've been meaning to ask you about your screen name. I am playing Baldur's Gate 2, and if you attack dwarves, they will shout, "By Moridin's Hammer!" Just wondering if that's where you got the name from (not the game, but the same mythical reference that the game is using).

Arcadian
11-03-2000, 02:47 PM
Originally posted by Moridin:
I think they are there to accommodate changes in process technology over the next few years. Right now we are on the verge of a fundamental change in the way processors speed reacts to a die shrink. A process shrink speeds up transistors, but not the wires. Up till now this has not mattered much since the wires were much faster then the transistors. We have now reached the point where the speed of the wires is often more important then transistor speed. Cu wires and SOI act as a one time speed up for the wires but this only pushes back the problem a little, it does not solve it.

I think these "Drive" stages will make the P4 much more scalable on future process shrinks then processors that do not have this type of a stage. In other words I think the P4 will get a better clock boost going from .13 Cu process to a .1 Cu (and smaller) process relative to the Athlon of PIII. It should also make it less sensitive to running on a .18 Al process, so don't be surprised if the PIII moves to .13 Cu before the P4.

Wow... I'm actually glad someone brought this up. As you mention above, clock speeds will become much more fundamental to a fast processor in the future than what we have right now. A year or two down the road, I expect the Pentium 4 to scale far larger than the Pentium III or Athlon. I also agree with your frequency accessment due to copper interconnects on Intel's .13u process. If the Pentium 4 is able to reach 2.0GHz on aluminum, I imagine it can gain amazing momentum on copper. My understanding concerning SIO, though, is that Intel is cautious about using this. At one point (and I can't remember where), Intel said that SIO can do much less for their future processes than it can do now. That means it wouldn't be worth implementing, since future processes will not gain too much of a benefit. Very good post, Moridin!

Moridin
11-03-2000, 03:55 PM
It's a hobby more then anything else; I don't work in the industry so you shouldn't take anything I say as absolute fact. I have an EE background but have mostly worked in IT (information technology, not "it") since I graduated. Processor design was always my favorite topic, but any real knowledge I have is largely outdated since I studied it in the early to mid 90's and a lot has changed since then.

The P4 caught my attention with some interesting design features (many of which you mentioned) that got me interested in finding out anything I can about it (which really isn't much).

Maybe there are unimplemented features, or maybe Intel is hiding some of the real features. Have you read Paul DeMone's recent comments about P4 "Dark transistors" over at aces? He seams to feel that the feature Intel has told us about doesn’t account for the number of transistors in the core. Not to long ago I was looking at a floor plan for the chip (I can't remember where I saw it) and was struck by the amount of space that simply wasn't labeled, so maybe he is correct.


To me, the P4 L2 has a lot in common with a L1 unified cache. In fact it probably has a lower real access time then the PA-RISC L1 unified cache. If you look at it this way, with the L2 acting more like a L1 unified cache and the L1 D cache acting more like a buffer to speed up this L1 in the most common situations, adding L3 now makes a lot of sense. This is especially true if you can add L3 in the same way the as the Xeon with its large, full speed on package L2.

I am taking my name from Robert Jordan's Wheel of Time series. RJ does use a lot of mythical references in this series. In fact this is worked into the story line to a large degree, so it wouldn't surprise me to see other uses of the name in fiction and mythology. (In WOT world time is kind of circular so that we see elements of our myths in things that are happening in the story and you can even recognize some our world in their myths if you look carefully.)

Moridin is one of the chief bad guys in the series. I was a little uncomfortable with this since some people may interpret this to mean my intention was to be a troll, but the bad guy's have all the cool names and I figured my posts would speak for themselves. (In WOT Moridin = death in "the old tongue")

All in all WOT is one of the best series out there if you like this genre. It's very long though (nearing 10,000 pages and not done yet) and all one story so you can't read the books independently.

/WOT Plug

Arcadian
11-03-2000, 05:17 PM
Moridin, can you please post an email address so that I can contact you privately. There is something I'd like to tell you without making it public.

Also, I enjoy the WOT series, but it has been a year or so since I have read the last book. I am very much looking forward to "Heart of Winter" coming out this month. But I forget who Moridin is. Was he a Forsaken? I think the Forsaken had all the coolest names. Asmodean was my favorite http://www.sharkyforums.com/ubb/smile.gif.

theorcus
11-05-2000, 05:34 PM
size of the level 1 cache was intentionally kept small to make 2-cycle latency possible. Low latency cache is key to keeping the pipelines well-fed. As core frequency scales higher, the limiting factor might become the double pump'd ALU. Distributed clock signals is another neat feature.

P4 really represents a different design philosophy from K7. Many trade-offs were made to achieve high MHz. I do believe in terms of ipc, K7 will come out ahead. However in its current form, the rather simplistic branch predictor is seriously hampering its performance potential. I hope mustang will rectify that problem. That said, the next major performance boost will come from integrating memory controllers on-chip. This is drastically reduce the system memory latency issues. And then the multiple core on the same chip and what not will keep the Moore's law valid for a while to come.

peace

theorcus
11-05-2000, 05:38 PM
about those drive stages, I think they are there to account for the propogation delay. Seriously the signals strength probably needs boosting travelling from one end of the die to the other. Sorta like the repeater stations for optical network. Nov. 20th is the D-day, we will all know for sure how P4 performs then.

Moridin
11-06-2000, 12:13 PM
Originally posted by Arcadian:
Moridin, can you please post an email address so that I can contact you privately. There is something I'd like to tell you without making it public.

Also, I enjoy the WOT series, but it has been a year or so since I have read the last book. I am very much looking forward to "Heart of Winter" coming out this month. But I forget who Moridin is. Was he a Forsaken? I think the Forsaken had all the coolest names. Asmodean was my favorite http://www.sharkyforums.com/ubb/smile.gif.

Sorry I took so long to answer, I had a busy weekend and didn't get a chance to go online. You can email me at lomiller1@hotmail.com.

As for the WOT question, I'll start a thread in off topic if there isn't already one there.

Moridin
11-06-2000, 01:29 PM
Hopefully I can finish all the points; it's taken me 3 long posts so I hope people will have been able to read them without falling asleep.

- Excellent Branch Predictor

An absolute requirement with the long pipeline. It is also interesting that it looks like the P4 has 2 branch predictors that work in co-operation. One before the trace cache and one after.

- Small L1 Cache

I like to think of the P4 L1 D cache more as a buffer for frequently accessed data. This only works if you have a very fast inclusive L2, which it looks like the P4 has. (6 cycles ???) This gives you the best of both worlds, high speed L1 and high global hit rate.

- Trace Cache

This is the key design feature of the architecture IMHO. Everything else works around it. This is a big step forward for X86 processors, and brings them one step closer to RISC like performance.

Traditionally the biggest problem with CISC processors is the large, complex decoder that is required. With the trace cache the decoder is no longer in the critical path and is no longer a performance bottleneck in most cases.

In the future it may even be possible to get some benefit from making the decoder more complex, this is unheard of without a trace cache. For example the trace cache could also allow you to optimize code on the fly and even run multiple ISA's on a single core. It already allows some optimization like loop unrolling.

I also see a big parallel with Transmeta's "code morphing software". If you think about it, the trace cache is almost the same thing as code morphing, only it is done it in hardware and therefor is much faster.

- Large # of Instructions in Flight

The more instructions in flight the more ILP you can get and the higher you IPC. The only downside is that complexity grows quickly as you have more instructions in flight and the returns tend to drop off. Modern production processes allow for many more transistors in a processor core and this is one very good way to put those extra transistors to work.

- New PC Specification / Large Heat Sinks

I remember when the P5 first came out and people didn't like the fact that it often needed a fan. This is something that people will get used to. I do find it interesting that the P4 can still use passive cooling in some cases.

I wonder where this is going though. I remember in 1995 looking at some projections for chip power and speed and seeing desktop chips using 100 W by 2005 and 1000 W by 2015. I thought "no way they will make them work", clearly I was wrong. Think about it though, 100 W is a light bulb and 1000 w is a toaster!!!

- Low Expectations (on Websites)
- Rumors on the Web

People like to speculate; I see nothing wrong with that. It won't change the final results.

- i850 Delays

Better to have a short delay then risk problems after release. P4 production still continues so this may not even change the number of P4's sold this year. It may just increase availability on release.

- Price Issues

Market and economic conditions drive price. Even Intel only has a limited say on the selling price. For a long time, resellers were charging much more for the 1 GHz PIII then Intel was just due to its limited availability.

- Much More!

What can I add here, you covered just about everything. Hardware prefetch is about all I can think of to add.

Moridin
11-06-2000, 01:37 PM
Originally posted by theorcus:
about those drive stages, I think they are there to account for the propogation delay. Seriously the signals strength probably needs boosting travelling from one end of the die to the other. Sorta like the repeater stations for optical network. Nov. 20th is the D-day, we will all know for sure how P4 performs then.


Yes, I'm sure that is what they are for. He question is why. The P4 may be twice as large as the PIII in area, but that only translates into about 40 % longer per side. Signal paths were not as important when the PIII was designed as they are now and not nearly as important as they are likely to become.

No P4 designers might have been able to keep these paths to PIII levels with careful design, instead they chose to use drive stages. I think they did this to allow the processor to run well on future processes where long signal paths would prevent process shrinks unless a drive stage was included.

In other words Intel was really planning ahead.

Moridin
11-06-2000, 02:05 PM
I think the P4 will prove to be an excellent design in its intended market. Every processor design (or any other design for that matter) is a series of give and take. Most changes will not be universally good; they will help one thing but may hurt another. Success or failure is often a matter of optimizing for the right thing.

For example optimizing the P4 for Word likely would have been a mistake, since Word runs fast enough on just about anything. Gaming on the other hand will almost always use all the processor power it can get. If it came down to a choice between something that improved performance in Word and something that improved gaming then Intel almost certainly emphasize gaming.

I think the P4 will be particularly good at things like games, 3D graphics/eye candy, streaming media applications, and voice recognition. In other words it will be very good at things the typical home user/desktop user is interested in. On the other hand it may fall down a little on Workstation and server type software in professional/CAD and transactional database apps.

I think this is a strong possibility since this is the market IA-64 was supposed to compete in. It seems likely to me that this type of application received lower priority when optimization was considered.

For example if they wanted to compete strongly in the Workstation market they would have concentrated on making X87 much better, of even implanted a RISC like technical floating point instead they concentrated on SIMD which is more suited to consumer software.

Similarly, If they were hoping to concentrate on transactional database work they would have kept the pipeline as short as possible since that type of application consists of hard to predict branches. I expect the P4 to be a poor performer clock for clock on Oracle databases because of this. I don't know if the high clock speed will make up for this or not.

The more I think about it the more it seems to me Intel really did their homework and that the P4 is going to excel at processor intensive consumer apps we will se over the next five years. High end Workstation apps and Server apps may be a different story.

Arcadian
11-06-2000, 02:18 PM
Originally posted by Moridin:
Sorry I took so long to answer, I had a busy weekend and didn't get a chance to go online. You can email me at lomiller1@hotmail.com.

As for the WOT question, I'll start a thread in off topic if there isn't already one there.

Thanks. I read your last comments, and it seems pretty solid. I think I agree with what you say, and I think that the Pentium 4 will be more of a future looking product. We'll have to see how well it performs at launch, but I'm willing to believe it should do pretty well.

By the way, I did email you. Check your email if you haven't already http://www.sharkyforums.com/ubb/smile.gif.

OOAgentFiruz
11-06-2000, 06:47 PM
Arcadian thanks for the link to the pdf file, i also had a busy weekend, will have to read it tonight.

colonel
11-07-2000, 04:26 AM
Hey arcadian two great posts this one and the bottle neck one........

just a little off topic in regard to the amd intel thing i think it will be a long time before we see amd take intels crown.....I mean we're all well I won't say geeks lets say extremely computer literate and we know enough to look at the benchmarks now but the average user goes for an established name ie intel and its this market which intel has conered the compaq, dell, gateway, ibm, hewlett packard, packard bell etc and it will be a while before amd can take over such an established market

------------------
'From now on I'm thinking only of me.'

Major Danby replied indugently with a superior smile: 'But, Yossarian, suppose everyone felt that way.'
'Then,' said Yossarian, 'I'd certainly be a damned fool to feel any othe way wouldn't I?'

Arcadian
11-07-2000, 12:06 PM
Originally posted by colonel:
Hey arcadian two great posts this one and the bottle neck one........

just a little off topic in regard to the amd intel thing i think it will be a long time before we see amd take intels crown.....I mean we're all well I won't say geeks lets say extremely computer literate and we know enough to look at the benchmarks now but the average user goes for an established name ie intel and its this market which intel has conered the compaq, dell, gateway, ibm, hewlett packard, packard bell etc and it will be a while before amd can take over such an established market

Thanks for the reply, and also for the complement, colonel. I think AMD has the speed race covered right now with their Athlon + DDR, but in the spirit of this topic, I don't want to speak too soon. The Pentium 4, while fundementally flawed in some areas, is immensely powerful in others. I accept these tradeoffs as educated engineering decisions. I have the feeling the Pentium 4 will offer more performance, and much more scaling, than the Athlon processor can.

Ultimately, however, I believe the real winner is who can produce more product. Right now, and in the foreseeable future, that is Intel. AMD is doing well, though, and has increased capacity in their two fabs to help produce more than they ever have before. They are also selling very well.

In terms of growth, though, AMD is not likely to grow much as a company unless they quit with their price wars. They can afford to price their products lower, because of the fact that they can produce more, but you can't grow as a company if you're only breaking even. Maybe this is ok for AMD, but Intel is still interested in growing with the industry, I believe, which is why they are very concerned with increasing their own fab capacities. When Intel reaches .13u, and they ramp up on Pentium 4, AMD will have need to worry, I believe.

Floyd
11-11-2000, 02:11 AM
I have read the explanations of pipelining and followed the links to G4 thread and hardware centrals explanation. They have all been great, and I think I now have a pretty good grasp of it. But I dont understand one thing. The P3 at .18 micron can reach a max of about 1.13 GHz. The P4 at .18 micron is apparently not going to be much good because at max speeds of 1.4/1.5 GHz, the branch mispredictions will slow it and the advantages of the larger pipeline comapered to the P3 won't be realised at those speeds.

What I dont understand is that if the P4 has a pipeline of double the size, essentially cutting work of each stage in half, allowing a clock cycle to take half as long as the P3, why then is the max speed of the P4 at 0.18 micron 1.4/1.5 GHz and not 2.26GHz???

Oh wait, an idea just came to me. Because you have double the pipelines and so therefore approximately double the resistor count requiring double the amount of power to run, does this make the chip too hot to run stablely at anything past 1.4/1.5 GHz?? If this is the case then wouldnt Intel have made a mistake with these extra pipe stages, because even though they can theoreticaly run at double the speed of a P3, the extra heat caused only allows a clock improvement of about 25%, and maybe not worth extra cost in doubling the die size req'd for a chip. Or will the heat scale with the die shrink, allowing a .13 P4 to have 50% max clock increase over a .13 P3. (Intel reckons P3's will reach 2 GHz on .13 micron)?

Arcadian
11-11-2000, 02:58 AM
Don't worry, Floyd, the 1.4GHz and 1.5GHz speeds of the Pentium 4 are only launch speeds. Some Intel roadmaps released on the web indicate that the Pentium 4 will reach 1.7GHz before March, and 2.0GHz before June, and all this on .18u process. Intel's 130nm process will probably be able to get to nearly 3.0GHz once it is fully optimized. Make no mistake: the Pentium 4 was made for speed. http://www.sharkyforums.com/ubb/smile.gif

Moridin
11-11-2000, 10:54 AM
Originally posted by Floyd:
I have read the explanations of pipelining and followed the links to G4 thread and hardware centrals explanation. They have all been great, and I think I now have a pretty good grasp of it. But I dont understand one thing. The P3 at .18 micron can reach a max of about 1.13 GHz. The P4 at .18 micron is apparently not going to be much good because at max speeds of 1.4/1.5 GHz, the branch mispredictions will slow it and the advantages of the larger pipeline comapered to the P3 won't be realised at those speeds.

What I dont understand is that if the P4 has a pipeline of double the size, essentially cutting work of each stage in half, allowing a clock cycle to take half as long as the P3, why then is the max speed of the P4 at 0.18 micron 1.4/1.5 GHz and not 2.26GHz???

Oh wait, an idea just came to me. Because you have double the pipelines and so therefore approximately double the resistor count requiring double the amount of power to run, does this make the chip too hot to run stablely at anything past 1.4/1.5 GHz?? If this is the case then wouldnt Intel have made a mistake with these extra pipe stages, because even though they can theoreticaly run at double the speed of a P3, the extra heat caused only allows a clock improvement of about 25%, and maybe not worth extra cost in doubling the die size req'd for a chip. Or will the heat scale with the die shrink, allowing a .13 P4 to have 50% max clock increase over a .13 P3. (Intel reckons P3's will reach 2 GHz on .13 micron)?

Intel has already demonstrated a 2.0GHz P4 made in .18 Aluminum, so like Arcadian said the 1.4/1.5 is just the initial speed. AFAIK the PIII didn’t go above 733 MHz on this process initially.

It is quite possible to double the length of a pipeline and not double your clock speed for a few reasons. The first is if you add additional logic to the pipeline. This would make your processor clock more slowly but would improve your IPC since the additional logic would be there for a reason.

The second possibility is that the pipeline is not balanced. If you split every stage in half except for the longest one you clock speed will not increase at all. I doubt that the P4 has a problem in this regard. If it fails to run twice as fast as the PIII it is because Intel has added additional logic.

One more thing I thought I would point out is that the P4 actually has 26 pipeline stages when you count the decoder. The 6 stages in the decoder only come into play if you have a trace cache (L1 I cache) miss since the trace cache holds decoded instructions.

Do not assume that the benchmarks you have seen reflect the real performance of the chip. The improved branch predictor should be enough to offset the branch penalty of the longer pipeline and the P4 is as good or better then the PIII in every other area. I can’t see the P4 performing worse then the PIII clock for clock on anything but legacy code or code with impossible to predict branches.

Arcadian
11-11-2000, 02:07 PM
Originally posted by Moridin:
One more thing I thought I would point out is that the P4 actually has 26 pipeline stages when you count the decoder. The 6 stages in the decoder only come into play if you have a trace cache (L1 I cache) miss since the trace cache holds decoded instructions.

That's true, but did you know that there are even more stages than that? The 20-stage pipeline is only a best case scenario. I don't know the max amount of pipeline stages, but I bet it is well over 26. http://www.sharkyforums.com/ubb/smile.gif

Marsolin
11-15-2000, 06:04 PM
First I'd like to say that it's nice to see a microarchitecture discussion without a lot of rhetoric and handwaving. With that said I'll add my 2 cents.

Arcadian mentioned a while back about the package change for the P4. The primary benefit of moving from a 423 pin to a 478 pin package will be power delivery. I/O receives no benefit, but with the addition of more power and grounds Intel can better balance their power planes.

This will result in lower power requirements and less noise. In addition to the Northwood processor there will also be a Willamette-478 to ease transitional issues.

The cache lengths were also mentioned earlier. I like the bandwidth availability with 256-bit cache lines, and especially the fact that L2 can load one cache line each clock, versus every other clock for a PIII.

The only downside to this that I can see is latency. 256-bits must always be fetched from memory even if a smaller amount it needed. Yet again we are faced with one of those tradeoffs geared toward high bandwidth applications. In the long run it will turn out to be the right decision, but it also explains why new processors won't always lead in every benchmark even if they are capable of much higher performance.

Arcadian
11-15-2000, 06:49 PM
Originally posted by Marsolin:
First I'd like to say that it's nice to see a microarchitecture discussion without a lot of rhetoric and handwaving. With that said I'll add my 2 cents.

Arcadian mentioned a while back about the package change for the P4. The primary benefit of moving from a 423 pin to a 478 pin package will be power delivery. I/O receives no benefit, but with the addition of more power and grounds Intel can better balance their power planes.

This will result in lower power requirements and less noise. In addition to the Northwood processor there will also be a Willamette-478 to ease transitional issues.

The cache lengths were also mentioned earlier. I like the bandwidth availability with 256-bit cache lines, and especially the fact that L2 can load one cache line each clock, versus every other clock for a PIII.

The only downside to this that I can see is latency. 256-bits must always be fetched from memory even if a smaller amount it needed. Yet again we are faced with one of those tradeoffs geared toward high bandwidth applications. In the long run it will turn out to be the right decision, but it also explains why new processors won't always lead in every benchmark even if they are capable of much higher performance.

Thanks for the complement. I think this topic has a lot of insight as well, and I appreciate the technical aspects.

My one comment about you post is concerning the last paragraph. I don't believe the cache will ever transfer less than 256 bits of data. The reason being is that the Pentium 4 uses 128 byte cachelines, and will always transfer that amount of data. Using a 256 bit datapath, that means it takes 4 ticks of the 1.5GHz Pentium 4 clock to transfer the cacheline, and it should always take that long. In this case, the 256 bit wide datapath is a completely necessary addition.

mtl_hed
11-15-2000, 11:22 PM
I know many of you guys are PC gamers but if you look at discussions going on right now amout the Sony PS2 they are very similar to this one in a way I have noticed.

Whats my point?
Sony and Intel have made similar decisions in hardware design for their respective systems.
How?
The very small, but EXTREMELY fast caches and increased memory bandwith. A twist from the tradition of large caches with slow bandwiths and long latencies.

If you think about it this is a very smart move as more games and apps demand fast streaming info, not recuring large chunks of info.

Just my take.

Marsolin
11-16-2000, 10:14 AM
Originally posted by Arcadian:
My one comment about you post is concerning the last paragraph. I don't believe the cache will ever transfer less than 256 bits of data. The reason being is that the Pentium 4 uses 128 byte cachelines, and will always transfer that amount of data. Using a 256 bit datapath, that means it takes 4 ticks of the 1.5GHz Pentium 4 clock to transfer the cacheline, and it should always take that long. In this case, the 256 bit wide datapath is a completely necessary addition.

Thank's for the information on that. I guess I still have some confusion in that area and need to read a little more about it.

gaffo
11-17-2000, 01:12 AM
Hi - out of 50+ posts only one (mordilin(sp) has even mentioned x87 performance. While all the marketing departments in AMD/Intel talk about mmx, 3d-now, sse, and now sse-2, still after 3+ years 95 percent of all games and 3d-workstation apps use x87 (not SIMD). No x87 is not "sexy"? like SIMD, but at least its used in current on the shelf software - why even buy a chip that will run sofware (that MIGHT be optimized in two years), whan by then siad chip will be opsolete (or useless if software does'nt end up optimised for sse, sse-2 etc.....). x87 is not ONLY used "workstation" type software! This used in everthing that uses the fpu. In the early 90's games were integer based (i.e. Doom), after Quake (in 1995) virtually all 3d-games use x87 (not SSE (only Quake-3 as extention, still basic engine is x87!). the p-4 in every benchmark shows truely k-6 (and winchip-2) levels of x87 (comparing to same clock). This is 1/2 the speed of the Athlon (k-7). So, a k7 - 700 mhz will run Quake-1,2, Unreal, etc... at same same as a p-4 1400 mhz!. Until SSE is actually used, (if ever), there is no incentive in my mind to buy this thing. BTW CAD is integer based (if 2d), only 3d stuff uses the fpu. 3d-studio will run faster on a k7 due to fpu. True, high bandwidth is a help - but only so far, the p-4's x87 is truely poor and no amount of bandwidth will make it better. The rest of the chip looks fine to me, but without a good fpu, i'll have no use for it.
I'm not an EE, and no flames please - no disrespect, no ranting intended, just a reminder that x87 is like x86 - a foundation.

Marsolin
11-17-2000, 01:49 PM
To my understanding the FPU on the P4 should be better than that of the PIII, possibly even exceeding the Athlon (which most people, myself included, consider as the current FPU leader). I think the FPU has gotten a lot of knocks because people got a hold of benchmarks run on A0 Si, which I believe had a screwed up FPU.

We'll find out for sure on Monday, but I expect Quake III performance to blow away a 1.2 GHz Athlon on a DDR system. I wouldn't be surprised at more than 200 fps.

gaffo
11-17-2000, 11:49 PM
Originally posted by Marsolin:
To my understanding the FPU on the P4 should be better than that of the PIII, possibly even exceeding the Athlon (which most people, myself included, consider as the current FPU leader). I think the FPU has gotten a lot of knocks because people got a hold of benchmarks run on A0 Si, which I believe had a screwed up FPU.

We'll find out for sure on Monday, but I expect Quake III performance to blow away a 1.2 GHz Athlon on a DDR system. I wouldn't be surprised at more than 200 fps.

Yes Quake-3 will run faster due to sse extensions, but the other current 500+ game titles will run slower - as well as rendering programs like 3d-max, lightwave. The p-4 will run streaming apps like mpeg1,2, & 4 better than the p-3, and k-7 due to sse patches. But the vast majority of apps will are not sse and will run slower. I've got around 5 flight sims - ef2000, janes f-15, janes ww2, flying corps, and european air war NONE of these titles have mmx, sse, or 3d-now. All of them get their frame rates from the x87 fpu only!. And EAW, WW2 will give you about 40 fps on a 1-gig p-3, so you need all the x87 speed you can get esp. with 100+ bombers, flack, villages, trains, and low altitude!. As far as I know only FPS (first person shooters (and only a few at that)) have sse, 3d-now patches - I've looked for them for my sims, there aren't any. http://www.sharkyforums.com/ubb/frown.gif.

Arcadian
11-18-2000, 01:54 AM
Originally posted by gaffo:
Yes Quake-3 will run faster due to sse extensions, but the other current 500+ game titles will run slower - as well as rendering programs like 3d-max, lightwave. The p-4 will run streaming apps like mpeg1,2, & 4 better than the p-3, and k-7 due to sse patches. But the vast majority of apps will are not sse and will run slower. I've got around 5 flight sims - ef2000, janes f-15, janes ww2, flying corps, and european air war NONE of these titles have mmx, sse, or 3d-now. All of them get their frame rates from the x87 fpu only!. And EAW, WW2 will give you about 40 fps on a 1-gig p-3, so you need all the x87 speed you can get esp. with 100+ bombers, flack, villages, trains, and low altitude!. As far as I know only FPS (first person shooters (and only a few at that)) have sse, 3d-now patches - I've looked for them for my sims, there aren't any. http://www.sharkyforums.com/ubb/frown.gif.

Yeah... I have the feeling Pentium 4 will be a tough sell for Intel in the beginning. I believe they will have to rely on software improvements over time, while they work on hardware tweaking, and gaining megahertz. It looks like Pentium 4 will follow the path of the Pentium Pro in the respect that it will have little improvement in the beginning, but will be able to grow as Intel adds more value through the life of the processor. I believe Sapasion is right: Northwood with DDR will be the better choice for an upgrade, and that won't be available until the second half of 2001. I'm sure Intel will be able to make money with the Pentium 4 due to brand name recognition, but it seems that it will have little value over the Pentium III to begin with. My opinion only. http://www.sharkyforums.com/ubb/smile.gif

gaffo
11-18-2000, 02:29 AM
I agree, once the .13 micron is out then a weak x87 at 3ghz is still fast. I have nothing against "progress" and sse, 3d-now etc.. is fine by me. But I just don't see much support for them in my average software collection (most dating from mid-late 90's). It's just maddening to me to know the p-3 has two-piplined (1.5?) x87 units, the k-7 has three, and then the "newest-most advanced" chip has one! I know sse is projected to be the future, and thats fine, but why at the expense of x87. Seems to me at the very least the p-4 could have inherited the p-3's x87 and just added the sse2 feature set. The pentMMx had same fpu performance as classic, the p-3 had same as p-2 and added sse. SSE2 should not be acquired through paying a price, but added to current expected performance. I just hope the simple p-3 (the .13 micron one) is'nt going to go the way of the k-6's. Where the one I'd like to buy in a year ends up being phased out from the desktop and only offered for mobile! Last year I waited and waited and waited for a 600 mhz k-6+ to upgrade my winchip, but only the mobile ones were offered in retail - so my socket seven became a total deadend and not really worth upgrading to a 400 mhz k-6-3 http://www.sharkyforums.com/ubb/frown.gif hope p-3 will still be around after the p-4 blitz. Hell a little ATX-flex case with the up-and-coming integrated G-force videocard and 1.2 ghz p-3 with DDR (maybe ALi's?-don't like Via's memory performance) could prob. be had for 300-bucks in a year.

Arcadian
11-18-2000, 04:42 AM
Originally posted by gaffo:
I agree, once the .13 micron is out then a weak x87 at 3ghz is still fast. I have nothing against "progress" and sse, 3d-now etc.. is fine by me. But I just don't see much support for them in my average software collection (most dating from mid-late 90's). It's just maddening to me to know the p-3 has two-piplined (1.5?) x87 units, the k-7 has three, and then the "newest-most advanced" chip has one! I know sse is projected to be the future, and thats fine, but why at the expense of x87. Seems to me at the very least the p-4 could have inherited the p-3's x87 and just added the sse2 feature set. The pentMMx had same fpu performance as classic, the p-3 had same as p-2 and added sse. SSE2 should not be acquired through paying a price, but added to current expected performance. I just hope the simple p-3 (the .13 micron one) is'nt going to go the way of the k-6's. Where the one I'd like to buy in a year ends up being phased out from the desktop and only offered for mobile! Last year I waited and waited and waited for a 600 mhz k-6+ to upgrade my winchip, but only the mobile ones were offered in retail - so my socket seven became a total deadend and not really worth upgrading to a 400 mhz k-6-3 http://www.sharkyforums.com/ubb/frown.gif hope p-3 will still be around after the p-4 blitz. Hell a little ATX-flex case with the up-and-coming integrated G-force videocard and 1.2 ghz p-3 with DDR (maybe ALi's?-don't like Via's memory performance) could prob. be had for 300-bucks in a year.

You seem to think that the Pentium 4 has poor x87 performance, but I think you're wrong. I believe the Pentium 4 has superior floating pointer performance compared to the Athlon, even without SSE. The only problem is it will likely need software optimizations to make the most use out of it. You see, the Athlon could also use optimizations, as can be seen in the new DirectX 8, which uses Athlon floating point optimizations in the T&L engine.

The Pentium 4 could also use optimizations. The Athlon may have 3-way superscalar floating point, but when you think of it, the Pentium 4 has 4-way. It's actually 2 superscalar pipelines, but they are both double pumped, which gives an effect of a 4-way engine. For optimized programs, the Pentium 4 should perform better.

One way of optimizing for this is to align data at certain memory addresses so that the data variables are more likely to be in cache. Another way is to group floating point arithmetic in sets such that they can make the most use of the superscalar pipelines. Yet another way is to try and reduce complex math into simpler instructions in order to streamline the Pentium 4 pipeline. In other words, instead of using cos or sin instructions, use optimized routines made from adds and shifts (which is usually included in most modern libraries).

There are many ways that have been found to optimize around Pentium III architecture, and I'm sure that soon enough, there will be great optimizations around the Pentium 4 architecture. The unfortunate thing is that it will likely take a while, and we may not see drastic improvements in current software.

Humus
11-18-2000, 06:53 AM
Originally posted by Arcadian:
You seem to think that the Pentium 4 has poor x87 performance, but I think you're wrong. I believe the Pentium 4 has superior floating pointer performance compared to the Athlon, even without SSE. The only problem is it will likely need software optimizations to make the most use out of it. You see, the Athlon could also use optimizations, as can be seen in the new DirectX 8, which uses Athlon floating point optimizations in the T&L engine.

The Pentium 4 could also use optimizations. The Athlon may have 3-way superscalar floating point, but when you think of it, the Pentium 4 has 4-way. It's actually 2 superscalar pipelines, but they are both double pumped, which gives an effect of a 4-way engine. For optimized programs, the Pentium 4 should perform better.

One way of optimizing for this is to align data at certain memory addresses so that the data variables are more likely to be in cache. Another way is to group floating point arithmetic in sets such that they can make the most use of the superscalar pipelines. Yet another way is to try and reduce complex math into simpler instructions in order to streamline the Pentium 4 pipeline. In other words, instead of using cos or sin instructions, use optimized routines made from adds and shifts (which is usually included in most modern libraries).

There are many ways that have been found to optimize around Pentium III architecture, and I'm sure that soon enough, there will be great optimizations around the Pentium 4 architecture. The unfortunate thing is that it will likely take a while, and we may not see drastic improvements in current software.

Don't know what you mean by "double pumped" but from what I read only the ALU will be double pumped.
Data aligning is nothing one need to worry about either, all compilers does this for you. You have to specify it specifically if you want it the other way (for example when loading fileheaders into a struct). My guess is that in about 99% of the cases the software on he market today has close to all of it's data aligned in almost all situations. So, the P4 has nothing to gain from such things 'future optimizations' in this area.
Btw, the best way to do sin/cos is to use lookuptables. A couple of kb memory is all it takes to get enough precision for most cases.

James
11-18-2000, 04:03 PM
This discussion is what got me into the forums in the first place. I would just like to say kudos to Arcadian for starting some of the most interesting and informative discussions I have seen.

Also, I would like to thank all of you who have contributed to the discussion.

Danke http://www.sharkyforums.com/ubb/smile.gif

------------------
Think outside the box or be forced to live in one.

Arcadian
11-18-2000, 10:31 PM
Originally posted by James:
This discussion is what got me into the forums in the first place. I would just like to say kudos to Arcadian for starting some of the most interesting and informative discussions I have seen.

Also, I would like to thank all of you who have contributed to the discussion.

Danke http://www.sharkyforums.com/ubb/smile.gif

Hey, thanks for the complement! http://www.sharkyforums.com/ubb/smile.gif

It really means a lot that people are enjoying my posts. I even notice my topics are appearing a lot in the Sharky Forum Spotlight articles on the main page. Pretty cool... I guess I'm making a name for myself. http://www.sharkyforums.com/ubb/biggrin.gif

I've been trying to think of another good topic recently, and I have a couple ideas I may post tomorrow or Monday. Also, feel free to start a topic of your own if there is an area you want to talk about. You don't even need to know much about it... just throw a few questions out, and I'm sure some of us will take off with it.

gaffo
11-19-2000, 12:17 AM
A lot of the benchmarks you see are memory bandwidth dependant - specfp2000 is half memory and half fpu. whatever happened to benchmarks that stress fpu only (with the 1-cache). I'm thinking were gonna see more fpu benchmark FUD here soon. JC's site has a lot of non-bandwidth-ram benches. A guy named Tim Wilkins has two-three of them. A relavent site is http://www.tech-report.com/news_reply.x/1519/ --- a benchmark called molecular dynamics is used and all data fits into 1-cache and fpu (so memory bandwith is irrelavant - therefore each chips fpu alone is stressed - a a fair comparison possible). The athlon beats p-4 two to one.

zephyr
11-19-2000, 03:34 AM
There's been a lot of talk about the lack of MMX/SIMD software on this thread so I thought that I'd add my 2 cents.

The reason there are so few apps that use mmx or simd is that you have to program in assembler in order to use mmx or simd. The excetion to this is the Intel compiler that provides mmx/simd primitives in C/C++ but hardly anyone uses Intel's compiler. What's more, Microsoft has been a huge slacker concerning simd. They only recently added inline assembler support for simd.

Arcadian
11-19-2000, 01:27 PM
Originally posted by zephyr:
There's been a lot of talk about the lack of MMX/SIMD software on this thread so I thought that I'd add my 2 cents.

The reason there are so few apps that use mmx or simd is that you have to program in assembler in order to use mmx or simd. The excetion to this is the Intel compiler that provides mmx/simd primitives in C/C++ but hardly anyone uses Intel's compiler. What's more, Microsoft has been a huge slacker concerning simd. They only recently added inline assembler support for simd.

It was my understanding that Microsoft's VC++ has MMX built into it, and that SIMD (if it isn't already there) will be added soon. Usually, Intel's compiler enhansements filter down to Microsoft's compilers as well. In addition, I believe there are a lot more developers than you may think that use Intel's compilers.

Humus
11-19-2000, 02:22 PM
Originally posted by Arcadian:
It was my understanding that Microsoft's VC++ has MMX built into it, and that SIMD (if it isn't already there) will be added soon. Usually, Intel's compiler enhansements filter down to Microsoft's compilers as well. In addition, I believe there are a lot more developers than you may think that use Intel's compilers.

MSVC has support for MMX in inline assembler but does not generate MMX code itself. To get inline assembler support for 3dnow and sse you need to download and install msvc service pack 4 (126MB) and also a special processor pack.

gaffo
11-20-2000, 01:48 PM
Arcadian - i was right! the x87 is poor. Go to acehardware and see the finest reveiw yet! x87 is still the fpu standard - MDK-2, microstation, povray - all show the p-4 at 60-percent speed of k-7 http://www.sharkyforums.com/ubb/smile.gif. They say racing games and flight sims are not x87 based - racing may be true, but flight sims have no sse support. Only video encoding and quake-3 performed well - and mostly due to suppoir memory, not sse. When DDR-333-400 mhz come out the k-7 will win all benchmarks.

Arcadian
11-20-2000, 04:52 PM
Originally posted by gaffo:
Arcadian - i was right! the x87 is poor. Go to acehardware and see the finest reveiw yet! x87 is still the fpu standard - MDK-2, microstation, povray - all show the p-4 at 60-percent speed of k-7 http://www.sharkyforums.com/ubb/smile.gif. They say racing games and flight sims are not x87 based - racing may be true, but flight sims have no sse support. Only video encoding and quake-3 performed well - and mostly due to suppoir memory, not sse. When DDR-333-400 mhz come out the k-7 will win all benchmarks.

Yes, Gaffo. I will admit that your prediction ended up prooving more true. Without optimizations, the Pentium 4 floating point does seem pretty poor. However, if SPECfp2000 is any indicator of how the Pentium 4 will perform with SSE/SSE-2 enhansements, then I think we can see pretty dramatic score increases. Not like the 5% that MMX gave, but rather som serious 20%-30% improvements. So the jury is still out on the Pentium 4 floating point, except that the x87 without optimization does indeed underperform.

Angelus
11-22-2000, 07:44 AM
Arcadian (or anyone else ofcourse)

I'm very interested in learning more about the discussed subject and other technology related subjects.

You seem to be very well informed, can you point me to resources on the net were I can learn more?

Thanx

------------------
'There are no stupid questions, only stupid answers'

Arcadian
11-22-2000, 11:26 AM
Originally posted by Angelus:
Arcadian (or anyone else ofcourse)

I'm very interested in learning more about the discussed subject and other technology related subjects.

You seem to be very well informed, can you point me to resources on the net were I can learn more?

Thanx

Well, the first place I'd recommend is right here. http://www.sharkyforums.com/ubb/smile.gif

You can learn a lot by speaking to others who have experience in the field. There are other forums I'd recommend as well, including the general forums on www.aceshardware.com (http://www.aceshardware.com) .

I would also recommend a web page by Paul DeMone called www.realworldtech.com (http://www.realworldtech.com) .

However, you must remember that most of the opinions out there are just opinions. Don't believe everything you read, because a lot of it is FUD (Fear, Uncertainty, Doubt). Try to get a general idea from a lot of people's opinions, and then you can form some of your own. Good luck, and if you have any questions, feel free to post in this Highly Technical Forum here, and I'll be sure to read it. http://www.sharkyforums.com/ubb/smile.gif