I wanted to spawn a discussion on system performance. More specifically, what do you think is the biggest bottleneck in a computer system?
These days, we are starting to see diminishing returns for improvements in various hardware. It has gotten to the point where insignificant improvements are taken to be huge leaps in technology. We used to live in a time where the newest video card gave a 20-40% improvement over the competition, whereas today it is more like 5%. CPUs used to give similar increases, and now they, too, are giving < 5% improvements.
So what is the bottleneck? What is preventing our systems from doubling in performance? Is it system memory? Is it the CPU or the video? Is it AGP? Or front side bus speed? Is it the hard disk or removable media? Or is it something else entirely?
Before I give my opinion, I was just curious on what other people thought. So come on... state your opinions. Keep it technical, if you can, and site examples if possible. Web links are really great, too.
nkeezer
10-09-2000, 07:35 PM
As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives. Everything else in a computer is measured on the order of nanoseconds, yet hard drives are still way up in the millisecond range. As long as things like seek times stay up at say 9ms, things won't change much. I mean, the processors will get faster and the computer will be able to crunch more numbers, but in overall use, it won't "seem" that much faster. Unfortunately, I'd consider this to be a limitation of hard drives themselves, i.e. until we get away from mechanical forms of storage, and into the realm of say holographic or other optical storage methods, then things won't get dramatically better.
And the comments about smaller performance increases with succeeding generations of technology...I think that's only partway true. If you look at CPU speeds over the past couple years, processing power has actually beaten the estimate from Moore's Law (due to the chip wars between Intel and AMD). So in that sense, performance is still increasing just as much as before. However, I think the greater scale leads to the impression of diminishing returns. For instance: around 1995, Intel had the 100MHz Pentium. They followed up with chips running at 120MHz and 133MHz, representing increases of 20 and 33%. Pretty hefty. But the numbers themselves are small in comparison to today's chips...it'd be akin to Intel releasing a chip running at 800, then following it with a chip running at 960 (20%) and 1066 (33%). Which of course we know they don't do. In other words, the increments between chips has stayed pretty much the same, but the frequency of their releases has been increased instead (hell, I remember a few years back, it seemed like it took 3 or 4 months for a new chip to be released).
And then of course there's the bigger question: is the extra speed really necessary? For the most part, I'm inclined to think that it isn't. I upgraded my PII 350 to a Celeron II @877 several months ago. To be honest, the differences between the two processors in typical everyday use was negligable at best. To satisfy my own personal curiosity, I even timed how long it took to do certain things on my computer with the two different chips (the results are actually online: http://jpnsystems.8m.com/celeron566/index.html) but it turned out that the new chip didn't make much of a difference at all. And the 250% boost in clock speed only really came into play with things like video compression, where the extra clock cycles can be put to use. The rest of the time, the chip sat on it's *** waiting for the hard drive to catch up.
I'm not sure what the point of all that was http://www.sharkyforums.com/ubb/smile.gif
------------------
www.nweb-design.com (http://www.nweb-design.com) <-- Send me a client, I'll send you a 5% finder's fee
"You have just destroyed one model XQJ-37 nuclear powered pansexual roto-plooker....and you're gonna have to pay for it." -- Frank Zappa
zombor
10-09-2000, 11:28 PM
Originally posted by nkeezer:
As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives. Everything else in a computer is measured on the order of nanoseconds, yet hard drives are still way up in the millisecond range. As long as things like seek times stay up at say 9ms, things won't change much. I mean, the processors will get faster and the computer will be able to crunch more numbers, but in overall use, it won't "seem" that much faster. Unfortunately, I'd consider this to be a limitation of hard drives themselves, i.e. until we get away from mechanical forms of storage, and into the realm of say holographic or other optical storage methods, then things won't get dramatically better.
But hard drives have always been measured in ms and memory has always been measured in nanoseconds...that hasnt changed, they've just gotten smaller(proportionally?). I think its due to the FSB. Think about it, when we had 133mhz pentiums, wasnt the bus speed 66? Im not sure, because the first computer i built was a 233 k6, then before that, i just owned a 486. But a 66 bus speed compared to 133 or 233 is a bigger memory speed to cpu speed ratio than, say 133mhz buss to 1133mhz cpu speed. The problem with just speeding up the bus is the heat issue. Im assuming the chipset would generate gobs and gobs of heat running at 1133Mhz! But if memory/bus/cpu speeds were equal, the proc wouldt have to go into "wait states(??)" waiting for the memory to catch up to itself.
This is all based off of some articles i was reading a year or so back, so it may not all be 100% accurate. Im open to criticizim to it because it is probably partly wrong.
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
nkeezer
10-09-2000, 11:42 PM
My point exactly: hard drives as so slow compared to the *other* components in a computer, that huge increases in processing power and the like are effectively covered up (in normal use) by the lagging hard drives. So until we can get rid of spinning platters and other mechanical components as a means of data storage, then hard drives will continue to be the biggest bottleneck.
Originally posted by zombor:
But hard drives have always been measured in ms and memory has always been measured in nanoseconds...that hasnt changed
------------------
www.nweb-design.com (http://www.nweb-design.com) <-- Send me a client, I'll send you a 5% finder's fee
"You have just destroyed one model XQJ-37 nuclear powered pansexual roto-plooker....and you're gonna have to pay for it." -- Frank Zappa
zombor
10-10-2000, 12:01 AM
but the hard drive isn't continously being accessed like memory is. it is for loading things like games and programs, then once theyre loaded, it never accesses the HDD for that app again to load it anyway. In game performance has nothing to do with HDD speed(unless your using virtual memory!). although speeded up HDD will increase overall speed, its not really the "limiting reactant" as my chem teacher said today http://www.sharkyforums.com/ubb/wink.gif
And windows would load 10x faster if all the legacy code for the architecture was destroyed, but thats a whole other topic!
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
Ymaster
10-10-2000, 12:35 AM
Limiting factors...Slow refresh rates for monitors at 1600X+ Just wasted FPS...
Parts that work non-asink.
There are current techs on faster hardrives with no moving parts. They use crytal and can store the internet.. heh...joke
If HD's where as fast as dram. Then there would be no point in dram.
I personally think the limiting factors are the Motherboard to hardware parts interfaces..Pci, ISA, AGP...If we had agp slots for all cards...Call it apci?
When the speed of Cpu's have reached there max we will just make duel cpus a standurd. Multi CPu's are what made the super computers that tolk up rooms into those small Doctor Who sized Tartuses..
I dont want the see video cards anymore. I want to see GPU sockets! Adding Alphas to are Video GPU's like we do to new cpu's...Gives a whole new light to onboard video?
Ymaster
10-10-2000, 12:38 AM
I dont want the see video cards anymore. I want to see GPU sockets! Adding Alphas to are Video GPU's like we do to new cpu's...Gives a whole new light to onboard video?
[/B][/QUOTE]
Now that I think about it..Why not add on-die cash to the Gpu's? Hmmmm
Igor
10-10-2000, 12:48 AM
HD's are by far the slowest things computers have.. aside from printers http://www.sharkyforums.com/ubb/smile.gif.
If you want more speed go intp RAID 0. This will impore the speed. Also going SCSI will help, specialy if you get thouse 15,000 RPM Drives.
But for the real boost in performance we gotta wait 'till solid state HD's will get to the market for any1 who has less then a spare million sitting around http://www.sharkyforums.com/ubb/biggrin.gif
Igor
10-10-2000, 12:48 AM
BTW nkeezer the link you posted is dead http://www.sharkyforums.com/ubb/frown.gif
Raptor^
10-10-2000, 06:06 AM
This is an interesting question http://www.sharkyforums.com/ubb/smile.gif
I assume we're talking standard PC hardware here.
This very much depends on the sort of application that your running.
For example in my old job I was dealing with 3d scenes containing 100s MB of textures on screen at once and 1000s of polygons. Here the limiting factor for a standard PC was the speed of the AGP bus, even with 4X AGP it was simply slow. If you went to an area where the textures fit in the memory of the card, performance would suddenley increase.
The SGI320 and 540 PCs could texture directly from main memory and performed much more consistently.
I can also claim that memory bandwidth (more important than, but linked to FSB). If I'm running a sorting algorithm on data that will fit in RAM, then my memory bandwidth will certainly limit the speed of execution.
IMHO in normal use the HDD only becomes a limiting factor if you don't have enough RAM. If your limited by the speed of your disk, buy some more memory instead of a new processor.
HDDs do limit performance during startup and for the first time you load an application, but to me these are unimportant as they occur relatively rarely and are not performance critical.
I think of removeable media in the same way. If I install something from CD, for example, I know I'm only going to do it once and then the speed of the CD drive becomes irrelevant.
CPU speed can still be the limiting factor if you are running an application with a very tight innermost loop. If this loop fits the in L1 cache, then the processor will be running nearly flat out. Even if it misses the L1 and hits the L2 cache we can consider the processor the limiting factor as the L2 cache tends to be on die now. It is only if we get a complete cache miss and have to go across the bus to RAM that everything slows down.
So, where have the performance increases been recently?
Advances in CPU are most obvious, along with those in 3D gfx cards. Both have been pushing Moore's Law recently.
Memory bandwidth certainly hasn't improved vastly over the last few years, so in the general case this would probably be my choice. My 'old' P2-400 has exactly the same memory bandwidth as my p3-700 at its default FSB, yet the processor speed has almost doubled.
HDDs have increased in capacity at an astounding rate recently, but have not got significantly quicker in real life situations, especially if lots of seeks are involved.
CD drives have hit a plateau in terms of performance, but DVDs are significantly fastre. I'm not even considering floppy disks http://www.sharkyforums.com/ubb/smile.gif
In a general situation, I think that memory bandwidth is now the limiting factor in a PC. Advances in this area have been slow, yet memory is used fundamentally in a PC.
woah this has turned into a monster post... enough of my ramblings for now http://www.sharkyforums.com/ubb/smile.gif
[This message has been edited by Raptor^ (edited October 10, 2000).]
Humus
10-10-2000, 10:08 AM
The biggest bottle neck is Microsoft. They keep add new fancy features into the OS and software that we have to learn how to disable. If they would stop adding useless stuff and work for a year or two on increasing performance and removing some bugs I'd be more than happy.
The next bottleneck is system memory, with DDR the situation will be better and will perhaps hold for a couple of years. I hope FC-RAM becomes mainstream until then.
Originally posted by Ymaster:
Now that I think about it..Why not add on-die cash to the Gpu's? Hmmmm
On-die cash? http://www.sharkyforums.com/ubb/biggrin.gif Would be nice ...
I Guess you mean on-die cache ... they already have this, all graphic cards out there have a texture cache and a vertex cache.
zombor
10-10-2000, 10:33 AM
any os on a pc will be just as slow, reletavly, to any windows os. And even with ddr memory, it still wont be even as close to as fast(along with the FSB) as the proc.
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
Adisharr
10-10-2000, 11:41 AM
I would have to vote for the system bus. We are still running that relatively slow compared to other devices in the system. You can almost factor HD's out of the equation if your lucky enough to have enough memory. We really need some type of high speed, very wide data bus with very little latency. That would at least put the monkey back on processors somewhat.
$ .02
------------------
- Mouse not found - click to continue..
Sol
10-10-2000, 11:44 AM
Originally posted by Adisharr:
I would have to vote for the system bus. We are still running that relatively slow compared to other devices in the system. You can almost factor HD's out of the equation if your lucky enough to have enough memory. We really need some type of high speed, very wide data bus with very little latency. That would at least put the monkey back on processors somewhat.
$ .02
Yeah, i have to agree with this. The data still has to travel accross the busto get from component to component.
Moridin
10-10-2000, 12:34 PM
The thing limiting PC performance is a basic rule of economics. "The law of diminishing returns". Any time you add a new technology you see an immediate jump in performance. Over time the tech is perfected, expanded and extended, but each subsequent change yields smaller gains then the previous iteration of the technology. I may make a separate post about some of these "new" technologies that have allowed processors to keep pace.
A prime example of this would be 3D acceleration. The jump from no 3D to the first 3DFX chips was huge in comparison to the change between generations of GPU's now.
Every once and a while we reach a point where improvements in a given area become so difficult that that area begins to lag behind the rest of the system and changes to the whole system architecture are necessary to compensate. Two (relatively) recent examples of this are Ram and motherboards (I'll talk a little more about motherboards in another post). A lot of the design features we see today are dictated by this limitation so I consider these the limiting hardware factor.
HDD are not really a problem since if you have enough memory most consumer apps only use them during startup or when they save data.
I would also like to add that I disagree with some statements made about moors law. Moore's law is specific to speed and complexity of integrated circuits including MPU's; it says nothing about overall performance. It turns out that increased complexity of MPU's have offset limiting factors in other areas keeping IPC high while clock rate increased. As a result absolute performance has matched or bettered Moore's law, but Moore's law does not specifically address performance.
I would also disagree with the statements that performance gains have accelerated in the last 2 years. This may be true for the x86 world but if you look at the cutting edge of processor design (the highest performing processors) the opposite is true.
zombor
10-10-2000, 01:08 PM
i was just thinking about this in class:
what if someone built a PC that was all on die exept for the hard drive off course. Think about it...the proc would have direct access to the gpu, the ram, sound card, everything...sure you wouldnt be able to upgrade it at all, but wouldnt it be alot faster than having to go thru busses to et to everything? Just an idea
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
Arcadian
10-10-2000, 01:34 PM
Originally posted by zombor:
i was just thinking about this in class:
what if someone built a PC that was all on die exept for the hard drive off course. Think about it...the proc would have direct access to the gpu, the ram, sound card, everything...sure you wouldnt be able to upgrade it at all, but wouldnt it be alot faster than having to go thru busses to et to everything? Just an idea
What you refer to has been branded by computer architects as the SoC processor (System on a Chip). SiS is coming out with a form of SoC processor, which has integrated video, sound, northbridge, and southbridge. VIA is making a Cyrix processor with integrated video, sound, and northbridge. Intel's Timna processor was supposed to integrate video and northbridge, and just because that project was cancelled, I don't think you'll see the last of that technology.
Right now, integrating DRAM is a very difficult task. Nintendo has managed to create a platform called the Gamecube, which will be its next generation console, and that will be using integrated DRAM. However, it will only be 24MB. It will take semiconductor manufacturers a long time before larger amounts will be able to be integrated inexpensively. Though, they might start by including a small amount of embedded DRAM (say, 16 or 32MB), to act as an L3 cache, perhaps.
Zombor, what you say is true! Integration is in the future for both high and low performance systems.
zombor
10-10-2000, 01:44 PM
but is there that much of a speed increase in these integrated systems? Id think there would be due to the speed jump with moving L2 ondie. And if everything was integrated, nobody better realase a new system every 3.546 days http://www.sharkyforums.com/ubb/wink.gif
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
Moridin
10-10-2000, 02:36 PM
A few words about Motherboards and Ram.
Up until about 7 years ago motherboard speed kept up with processor speed. This has changed a lot and I'd like to take a few minutes to discuses why. But first some history.
When the IBM XT came out the ISA bus could be used as an expansion port to add additional RAM. It didn't take all that long before RAM outpaced the ISA bus but for many years RAM and Motherboards at least kept up with the FSB of the processor even if it took some latency hits along the way.
Back in the 486 days we first started to see the processors pull ahead of motherboards. Intel had a 50 MHz 486 on the market for some time, but even though a 50 MHz motherboard spec was available the chipsets and boards themselves were almost non-existent. Intel finely got feed up and took maters into their own hands and produces the 486 DX2 50 and DX2 66, the first x86 processors with clock multipliers. (This was also when they got into the chipset business) The first generation of Pentiums did not have multipliers but it didn't take long before they had them also.
The reason why motherboards now require multipliers is size. Electrically speaking they have become huge. I say electrically speaking because in many cases it is better measure electronics in wavelengths instead of inches or cm. The original XT board was physically large by today's standards, something on the order of 50 cm by 50 cm. (I don't feel like looking up the exact numbers but these make my calculations easier)
Using an 8 MHz clock and assuming the signal propagate at .667 c (2X10^8) m/s one wavelength is 200000000/8000000 or 25 m. The board itself is 1/50 of a wavelength across. This is a relatively simple thing to design. Even with harmonics it is unlikely you would have to deal with transmission line effects.
I guess at this point I should explain harmonics and transmission lines. If you want to transmit a square wave of a given frequency it is not good enough to transmit that frequency you also have to transmit harmonics. A harmonic is a sine wave that occurs at specific frequencies relative to the original signal. A square wave has harmonic frequency at multiples of every odd number. You usually require a couple harmonics at least to get a decent square wave.
For example, to get a square wave you would add a sine wave at F1, third harmonic at 3xF1 and the fifth harmonic at 5xF1. Each harmonic is smaller then the one below it so higher harmonics play an ever smaller role in the signal. So you need to allow for frequencies 3 -5 time that of your digital signals.
Transmission line effects come into play any time a signal must travel more then about 1/10 of a wavelength. They cause a number of bizarre effects if not carefully managed, like causing open circuits to act like short circuits; short circuits to act like capacitors, etc.
So on the old XT even the fifth harmonic could travel all the way across the board without having to worry about transmission line effects. A 33 MHz board 33 cm across would be almost 1/20 wavelengths across, but if you are careful with your traces and avoid having any (33 MHz) trace being more the 20 cm you can at least get the full third harmonic and probably most of the fifth so you are still OK.
Consider a modern 133 MHz board that is say 25 cm across. The board is now 1/6 of a wavelength across. At this frequency even your primary frequency may be subject to transmission line effects. You now need to carefully control things like impedance, trace length etc to send digital signals around the board. (These effects start to kick in around 1/10 of a wavelength but don't get into full gear until about 1/4 of a wavelength)
Around this frequency you start to run into other problems as well. If two signals travel different distances the difference in travel time can become important. Imagine having two memory chips far enough apart that one receives a second clock pulse half a clock pulse after the first. By the time they send data back to the CPU the second bit of data from the first chip arrives around the same time as the first bit from the second chip. Not a good thing.
This is what is likely to stop parallel busses from operating faster then about 200 MHz and why the really fast memory like RDRAM or SLDRAM use serial busses.
Evil Twin
10-10-2000, 03:04 PM
I think it's the laws of physics that are starting to take their toll on the materials used nowadays. Anyway I've always loved wearing turtleneck sweaters. Why not bottleneck sweaters for computers nerds?
nkeezer
10-10-2000, 04:15 PM
Wwwwhhhhhhhooooooosssssshhhhh! -- that was the sound of 90% of what Moridin wrote flying completely over my head http://www.sharkyforums.com/ubb/smile.gif But at least he summed it up nicely in the last paragraph, so I think I know what he's talking about now. Enlightening http://www.sharkyforums.com/ubb/smile.gif
Anyway, I still stand by the hard drive as the biggest bottleneck. Now maybe all you guys ever do with your computer is play games, in which case I guess your HD doesn't matter. But I don't, and (I know this is a shock) a lot of other people don't either. And I know that if I look down at my computer, there's a lot of times when that hard drive light is on those things are thrashing away: like any time I want to open an application that I haven't used for a while, or need to save a file and the entire directory structure has to get listed again, or I want to go to a website and it's reading and writing back to the cache folder, etc. Like I said before, all that takes milliseconds. Which might not seem like that much, unless you consider that everything else the computer's doing is measured in nanoseconds. So in other words...seems to me that every time your hard drive is being used, your computer is being slowed down.
And just one final point: the biggest bottleneck in computers is not really the hard drive, the chip, memory or bus speeds. It's really the average consumer that buys their box at Circuit City and is happy with it for 5 years. If all computer users were a bunch of tech geeks, things would certainly move a hell of a lot faster http://www.sharkyforums.com/ubb/smile.gif
------------------
www.nweb-design.com (http://www.nweb-design.com) <-- Send me a client, I'll send you a 5% finder's fee
"You have just destroyed one model XQJ-37 nuclear powered pansexual roto-plooker....and you're gonna have to pay for it." -- Frank Zappa
Arcadian
10-10-2000, 05:41 PM
Moridin was simply explaining one of the many reasons why it is so difficult to move up speeds on the motherboard. These days in modern electronics, you can easily crank up the speed of ICs to over 1GHz. In fact, in other industries other than CPUs, it is even common to see ICs on the order of 40-80GHz with need technology like SoI or Silicon Germanium BJT transistors. However, once you leave the safe confines of your die, you begin to feel the wrath of onboard electrical wiring.
For the reason Moridin gave, and others, it is very difficult to get wide pathways going in excess of certain speeds. Harmonic noise is just one of the factors you start to have to deal with. The front side bus on the Intel and AMD micrprocessors that people use today have 64bits used for data, and many other bits used for address and protocol. Getting these to run at the speed of a microprocessor is not possible with todays electronics, which is why CPUs have multipliers.
However, getting information to the processor faster than it's going right now is only part of the problem. My opinion of the system bottleneck is that it is a number of things that cause current systems to not perform as well as they can. Processor to memory bandwidth is one of them, and perhaps an important one, too. However, even if you were to give a processor an unlimited amount of free information (in other words, the processor receives information instantly without any latency), it still wouldn't be able to process everything it receives.
Perhaps this sounds a little obvious, but consider how much of a bottleneck the CPU really is. My feeling is that the x86 architecture is the bottleneck. Because of the antique nature of the architecture, it is very difficult to improve upon. I am really looking forward to Intel's new architecture, IA64, because that will be the first new architecture Intel has used in it's main processor line since the 8086.
In the future, I see faster systems doing the following.
1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes. On this same topic, we'll call it 1b) Paralellize data. In other words, SMP configurations are good for now, but I see symmetric multiprocessing on the core level in the future. Read another post on this forum for more information, but I believe SMT and CMP to be the future. More paralellized code will also be necessary so that each of the processor cores and threads will be given data 'a plenty to sort through.
2) Integrating everything. Like I said, speed is only found on the die, and as soon as you move off of it, you face nature's wrath. So the solution is to integrate CPU, video, memory, I/O, and the kitchen sink all on to the very same die.
3) Serialize I/O. If you have to move off the chip, it isn't going to be using wide parallel interfaces. Rather, everything will move to high speed serial. Serial can move much faster, and can scale well by adding more channels. Since data from different channels don't have to sync up like in parallel, this gives the designer a lot more room to speed things up.
4) Getting rid of rotating media. Nkeezer is mostly correct in sticking with his argument about hard drives. Frankly, a lot of programs out there are hard drive limited, and there may be even more in the future with all the streaming media that is becoming more common. If we want to speed up hard storage, we have to find a way to create cheap media that is not mechanical in nature. Spinning plates will NEVER reach microseconds of access time, let alone nanoseconds. Unfortunately, technology to do this may be a long ways off.
5) Make outside lines optical. So far, we have seen no cable capable of producing more throughput than optical. However, the disadvantage right now is that you can't bend an optical cable because the photons travelling through the cable will leave the cable if the angle is too great (this is a scientific fact... want to mess up data going to a server? bend the optical cable, if there is one). Once we overcome this, which may be very distant from now, we can start wiring our motherboards, and start connecting to online sources through optical cabling, thus enabling very high speed connections.
These 5 things together will move the industry into the next tier of computer performance. Small changes in one area or another aren't likely to affect much. This is my opinion, at least.
Moridin
10-11-2000, 02:02 PM
Originally posted by Arcadian:
1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes. On this same topic, we'll call it 1b) Paralellize data. In other words, SMP configurations are good for now, but I see symmetric multiprocessing on the core level in the future. Read another post on this forum for more information, but I believe SMT and CMP to be the future. More paralellized code will also be necessary so that each of the processor cores and threads will be given data 'a plenty to sort through.
I am not so sure about IA-64 at this point. When I first heard about it I was excited but I am beginning to have a lot of doubts. The premise of IA-64 is to make use of VLIW/Explicit Parallelism to avoid the complexities of out of order execution and hopefully get more parallelism then OOOE can achieve.
OOOE finds parallelism at run time using hardware; VLIW/EPIC finds it at compile time using software. In theory this should make your hardware simple.
In other words it is supposed to be small simple and effective. Somewhere along the way IA-64 became very very complicated. The die size is huge and it is too complicated to run at high speed. This is a very bad way to start for a processor whose primary advantage is supposed to be simplicity. This may change with future versions of IA-64 but while IA-64 was under development a few other things happened.
OOOE was developed and perfected. Improved manufacturing process made more transistors available to chip designers. This meant that deep OOO designs are now much more practical. In fact the OOO components of most processes take up less then 20 % of the die space, so it has become relatively cheep to implement OOOE.
On the other side VLIW/EPIC compilers have not advanced nearly as quickly. Memory latency has increased dramatically. This is worse for VLIW then OOOE because memory latency can't be predicted at compile time. Dependencies generated by memory latency can only be resolved at run time using some form of OOOE.
VLIW/EPIC can look much deeper for parallelism then OOOE provided the compiler is good, but OOOE can react to things like memory latency much better then VLIW/EPIC. Ultimately we will probably see a convergence and most ISA's will be similar
Moridin
10-11-2000, 02:28 PM
Originally posted by Arcadian:
1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes
Intel has done a very good job of eliminating the CISC penalty in x86. It is still there, but is quite small. They still have a lot of problems though. IMHO,I think X86 can be competitive for some time yet if it can make the following changes. (Not necessarily easy ones though)
Give the architecture some more registers. I don't know if they need 32 like most RISC machines but 16 are a minimum. Limited registers means more memory operations and a lot less parallelism.
Get rid of the accumulator configuration in the ALU. X86 ALU can only save the result of a calculation to 1 register (the Accumulator) and on top of that the Accumulator is always one of the original operands, so you always destroy one of your values unless you specifically save it to another register. The process of copying register contents, moving them to put them in the accumulator adds extra, unnecessary instructions to the program.
SSE can do most of this on the FPU side so that is a step forward, and now that we have SSE2 which is fully IEEE compliant and can completely replace X87 (SSE could not) a lot of the concerns on the FPU side have been addressed.
Get triadic instructions for both the ALU and FPU. X86 and SSE2 only support 2 operand instructions. They can do things like a=a+b, a=a+c etc (the result must be put into a). Most other ISA's support 3 operand instructions and can do a=b+c+d, a=a+b+c, c=a+b+d (result can go anywhere)
At some point 64 bit operation will be required (for memory access)
These changes, along with a trace cache to remove the CISC decode penalty would bring X86 a long way towards parity with more modern ISA's
[This message has been edited by Moridin (edited October 11, 2000).]
Arcadian
10-11-2000, 08:43 PM
Originally posted by Moridin:
Give the architecture some more registers. I don't know if they need 32 like most RISC machines but 16 are a minimum. Limited registers means more memory operations and a lot less parallelism.
Get rid of the accumulator configuration in the ALU. X86 ALU can only save the result of a calculation to 1 register (the Accumulator) and on top of that the Accumulator is always one of the original operands, so you always destroy one of your values unless you specifically save it to another register. The process of copying register contents, moving them to put them in the accumulator adds extra, unnecessary instructions to the program.
The Itanium is able to provide these things. It actually has 128 registers, all 64bit. It also has 128 more floating point registers, and 64 predicate registers. The instructions also accomodate you suggestion regarding the ALU. You say in a previous post that you are having some doubts about IA-64. You shouldn't because it is progressing quite well. Some people are displeased that it has taken so long, but there are the 5 9's of reliability it must attain. In other words, 99.999% uptime. Intel can't release it unless it is at least that stable. Plus, the successor to Itanium, the McKinley is supposed to be a much smaller die, and much faster, too. Intel has a lot in store for IA-64, so I would be very surprised if it doesn't do very well.
Humus
10-12-2000, 09:42 AM
Moridin:
You're wrong on some points. The x86 ALU can save to results to all registers for many instruction, not just the accumulator (EAX). You can do stuff like
ADD EBX, ECX
AND EBX, ECX
OR EBX, ECX
SHL EBX, 1
You don't need to use the accumulator for those.
However, many instructions (such as MUL and IMUL (the single operand one)) can only work on the accumulator, but it's mainly the complex instructions (which should be avoided anyway).
And a = b + c + d is a 4 operand instruction. A 3 operand should read a = b + c.
Humus
10-12-2000, 09:48 AM
... I would like to add that I have high hopes for IA-64 too. It really solves most of the problems that the x86 archetecture has. It removes much of the dependency of high speed memory, it removes the need for a branch prediction unit, it can do register indexing meaning you can put small arrays into registers instead of memory (I always wanted to be able to do that), and it's instruction set isn't so damn ugly! http://www.sharkyforums.com/ubb/smile.gif
Moridin
10-12-2000, 12:39 PM
Originally posted by Humus:
Moridin:
You're wrong on some points. The x86 ALU can save to results to all registers for many instruction, not just the accumulator (EAX). You can do stuff like
ADD EBX, ECX
AND EBX, ECX
OR EBX, ECX
SHL EBX, 1
You don't need to use the accumulator for those.
However, many instructions (such as MUL and IMUL (the single operand one)) can only work on the accumulator, but it's mainly the complex instructions (which should be avoided anyway).
Thanks, I am not familiar with the full IA-32 instruction set. Most of my assembly programming was done in 68000, 6800, 8080 and the 8080 compatible Z80. I still think IA-32 is limited in this regard compared to newer ISA's and that this is a bottleneck in the ISA.
I also understand that the LEA instruction can extend some of these capabilities, but I am not entirely sure what it does.
Moridin
10-12-2000, 12:52 PM
Originally posted by Arcadian:
The Itanium is able to provide these things. It actually has 128 registers, all 64bit. It also has 128 more floating point registers, and 64 predicate registers. The instructions also accomodate you suggestion regarding the ALU. You say in a previous post that you are having some doubts about IA-64. You shouldn't because it is progressing quite well. Some people are displeased that it has taken so long, but there are the 5 9's of reliability it must attain. In other words, 99.999% uptime. Intel can't release it unless it is at least that stable. Plus, the successor to Itanium, the McKinley is supposed to be a much smaller die, and much faster, too. Intel has a lot in store for IA-64, so I would be very surprised if it doesn't do very well.
The problem with adding more registers is that adds extra levels of logic to every register access. 16 registers require only 4 levels of logic while 128 require 7. If you look at the pipeline for Itanium register access is spread over 2 pipeline stages. Every other processor does this in a single stage.
IA-64 is an in order device so it needs many more registers for the compiler to use so it can find parallelism. An OOO device does not need nearly as many architectural registers to find parallelism. It does however need a set of rename registers for each pipeline stage prior to the execution stage.
I agree with you though, McKinley looks a lot better then Itanium, I still do not think it will be as fast as EV7 though.
zombor
10-12-2000, 01:44 PM
ok, this just got way out my league http://www.sharkyforums.com/ubb/wink.gif
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
Arcadian
10-12-2000, 02:30 PM
Originally posted by Moridin:
The problem with adding more registers is that adds extra levels of logic to every register access. 16 registers require only 4 levels of logic while 128 require 7. If you look at the pipeline for Itanium register access is spread over 2 pipeline stages. Every other processor does this in a single stage.
I think you are misunderstanding some of the specifications. I don't know much more myself, but what you said probably doesn't have the impact that you think. At least, I haven't heard anyone else complain about this. Can anyone shed some light on this topic?
Originally posted by Moridin:
IA-64 is an in order device so it needs many more registers for the compiler to use so it can find parallelism. An OOO device does not need nearly as many architectural registers to find parallelism. It does however need a set of rename registers for each pipeline stage prior to the execution stage.
For an in order device, Itanium can be much faster than many out of order devices. It is the architecture itself that makes in order operation work. The reason why in order affects performance in x86 is because there are too many dependancies, and even register renaming can't solve them all. In EPIC, these dependancies are resolved at compiler time, so that the CPU will not be stalled as often. In order actually raises the performance of the Itanium.
Originally posted by Moridin:
I agree with you though, McKinley looks a lot better then Itanium, I still do not think it will be as fast as EV7 though.
There is nothing impressive about the EV7. Actually EV7 doesn't change the processor architecture at all. It simply allows new levels of SMP parallelism. It's a different protocol that will only raise performance on a system level, not a processor level. If you are talking about IA-64 vs. an EV7 single processor, Itanium will be much faster, let alone McKinley. Only in very large SMP systems will EV7 be faster. At least this is what I hear.
Moridin
10-12-2000, 03:54 PM
Originally posted by zombor:
ok, this just got way out my league http://www.sharkyforums.com/ubb/wink.gif
Hey, I'm pretty close to that myself. I may be an engineer, but I'm not a chip designer.
chickenboo
10-12-2000, 03:55 PM
Originally posted by nkeezer:
As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives.
I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!! http://www.sharkyforums.com/ubb/smile.gif
Ymaster
10-12-2000, 04:13 PM
Originally posted by chickenboo:
I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!! http://www.sharkyforums.com/ubb/smile.gif
No! Just use a zipdrive...
Moridin
10-12-2000, 04:27 PM
Originally posted by Arcadian:
For an in order device, Itanium can be much faster than many out of order devices. It is the architecture itself that makes in order operation work. The reason why in order affects performance in x86 is because there are too many dependancies, and even register renaming can't solve them all. In EPIC, these dependancies are resolved at compiler time, so that the CPU will not be stalled as often. In order actually raises the performance of the Itanium.
EPIC cannot resolve them all either. Memory dependencies cannot be predicated at compile time and therefore can never be resolved by an in order machine. As the gap between processor speed and memory speed increases so do the effect of memory dependencies. A deep OOO design can deal with most of the instruction dependencies and help with memory dependencies as well.
To expand on this a bit, at compile time you don't know if a given piece of data will be in L1 L2 or main memory. If you try to execute an instruction that operates on that data an in order processor, EPIC included, has no choice but to wait until it loads that data from memory. You can get around this a little by placing the load as soon as possible in the instruction stream and working on the data later, but you still have no idea exactly how long you will have wait before the data is in a register. This is why EPIC needs so many registers.
The situation gets worse when you are accessing a memory location that is decided upon by a calculation or even just the contents of another memory location. (Pointer chasing) In this case there is little EPIC can do but sit and wait for the data to arrive. Even if it already has all the data in its registers needed to execute an instruction it cannot do anything if that instruction is not next in line. A single call to main memory can stop execution for tens or hundreds of cycles.
An OOO processor on the other hand just keeps looking for instructions that can be executed. If the first cannot be it looks at the next, then the next and so on until it finds one that can be executed. If data from an instruction that comes later in an instruction stream arrives before the data for an earlier one the later instruction can be executed if it has no other dependencies.
Originally posted by Arcadian:
There is nothing impressive about the EV7. Actually EV7 doesn't change the processor architecture at all. It simply allows new levels of SMP parallelism. It's a different protocol that will only raise performance on a system level, not a processor level. If you are talking about IA-64 vs. an EV7 single processor, Itanium will be much faster, let alone McKinley. Only in very large SMP systems will EV7 be faster. At least this is what I hear.
Given its massive memory bandwidth and process shrink to .18 the EV7 should double the current EV6 SPEC scores which already lead the industry by a large margin. Intel has already announced that they will not submit SPEC scores for Itanium. The only other company that has done this recently is Sun with its pathetic USII scores. (Sun did post scores for the USIII, but they look fairly average.)
PDR60
10-13-2000, 12:52 AM
This is an interesting question. On one hand we have hard drives that really haven't even pushed the ATA66 standard for sustained output. I really think that HD technology is topping out. It is a mechanical device that in no way will ever approach the speed of en electronic device. You can only spin platters so fast.
Then we have system buss technology. I think Intel, Via and others are at a loss as to find an economical way to increase buss speed and maintain factors like memory bandwidth. Look at all the incarnations of the 8xx chipset intel has tried. Rambus was a flop. Then Via came out with all those K133 series chips now its the 694 series. Both have memory bandwidth problems. Its hard to believe that a chipset as old as the BX is still a viable option. It can still compete with the new chipsets in its overclocked 133 state.
Memory is another bottleneck. Maybe not just in the speeed catagory but in the amount that bloated OS's are demanding. 98 running on anything less then 128meg takes a performance hit. Then theres Win2k! I think that DDR ram may help in the memory front. However you will still need adequate amounts to satisify your OS. I run a dually and am considering going to half a gig for memory.
------------------
BP6 with dual 366@550 and loving it
Phoenix
10-13-2000, 04:24 AM
Originally posted by Arcadian:
I think you are misunderstanding some of the specifications. I don't know much more
myself, but what you said probably doesn't have the impact that you think. At least, I
haven't heard anyone else complain about this. Can anyone shed some light on this topic?
We recently had a small discussion on this subject in one of my lectures, the professor simply said that have done test that show that about 30 registers is about the perfect of registers. 30 was the best amount for non-floating point calculations, as floating point calculations only needed about 15 registers before performance at it's max. Once you get above 40 registers performance started to decrease due to long addresses. The only problem is that our discussion was on MIPS procressors and I'm not sure how it would relate to any x86, or x87 for that matter, processors.
Humus
10-13-2000, 08:49 AM
Originally posted by Moridin:
Thanks, I am not familiar with the full IA-32 instruction set. Most of my assembly programming was done in 68000, 6800, 8080 and the 8080 compatible Z80. I still think IA-32 is limited in this regard compared to newer ISA's and that this is a bottleneck in the ISA.
I also understand that the LEA instruction can extend some of these capabilities, but I am not entirely sure what it does.
Yes, LEA extends the capabilities some. LEA stands for Load Effective Address and is used for calculating pointers, but can be used for other stuff too. You use it like this:
LEA REG1, [REG2 + s * REG3 + immediate]
So, its a 4 operand instruction where the last one has to be a constant and s has to be 1,2,4 or 8.
So if you want to do a = b + c you can write
LEA EAX, [EBX + ECX]
or for a = b + 2 * c you can write
LEA EAX, [EBX + 2 * ECX]
or for a = b + 2 * x + 7 you can do
LEA EAX, [EBX + 2 * ECX + 7]
The lea instruction is rather powerful, but is a little limited too. You cannot do a = b - c, only add. The constant can be negative though.
Rick_James9
10-13-2000, 11:16 AM
Here's some info about AMD's new processor "hammer" and "Itanium" You can read more about it at: http://www.cpureview.com/art_64bit_a.html
"Key Features of AMD's 64 bit x86 architecture
AMD is stressing compatibility; and with a good reason: it will speed adoption of their upcoming 64 bit processors.
Full backward compatibility with exiting 32 bit x86 code base
Same instruction set, registers extended to 64 bits
Full 64 bit flat address model
If you were to ask programmers what the major faults of the existing x86 architecture were, they would give you a short but extremely important list:
Not enough general purpose registers
Stack based floating point processor; with only eight registers
Did I mention not enough registers?
Someone must have been listening. Here are some of the key features of x86-64:
general purpose registers extended to 64 bits
added eight more general purpose 64 bit registers (!!!!!)
added an additional 16 register IEEE standard floating point unit (!!!!!)
They did not stop there. More goodies:
SSE support! with twice the number of registers Intel provides
PC relative addressing for data
same syntax for using 64 bit operations as 16/32 bit operations
prefix byte to allow access to new general purpose registers in 64 bit mode
byte-addressing for the low byte of all 16 general purpose registers
Basically, the changes go a long way to making the instruction set more orthogonal; and the larger register files allow far greater opportunities for code optimization by the compiler writers."
------------------
Tech GOD
[This message has been edited by Rick_James9 (edited October 13, 2000).]
Arcadian
10-13-2000, 12:08 PM
Rick_James9, thanks for replying to this post with such a great topic. I hope other people read far enough into this one to see your link. You might want to repost this as a main topic. But I did want to respond to you post with some insight.
First, the article makes a good point about the compatability aspect of x86-64 architecture. It makes sure to point out that Itanium's 32bit performance is not strong, and even mentions that Intel does not intend for this processor to run in 32bit, and that the capability is only there for compatibility.
I wanted to mention that this should not be taken as a negative to IA-64 architecture. You see, Intel has made sure to research what OEMs are looking for in systems. What they found out after talking to many different companies is that large servers are usually sold as complete packages that are further tested by the companies that sell them. Take Hewlette Packard, for example, who I believe is taking an active part in the Itanium launch.
HP will bundle an Itanium system with the OS and all the software that a customer will want. They will make sure that all the software is 64bit for optimal compatability, and at the same time try to offer the customer as many choices as are available. If there is one application that is 32but that the customer requests, then Itanium can certainly be bundled with it, and not be severely impacted. However, if there are many 32bit applications that the customer desires, HP will still also have Intel's IA-32 processors, which will continue to progress with new technology, and still be compelling choices. However, Intel believes from what they've heard from their customers, that most users of these advanced systems will be able to use complete 64bit solutions. If they weren't sure about this, it is doubtful that Intel would invest so much into Itanium.
As the Itanium platform matures, there will be more software available, which means more choices to the end users, and a larger user base. It's a fairly solid business model that's hard to see from our point of view, Rick_James9.
AMD's business model is different. Even though there will probably be an x86-64 compatible Linux kernel by the time Sledgehammer launches, there will not be too much software at first. Although this will grow with time, AMD knows this, and is prepared to offer Sledgehammer as a strong 32bit processor. As more software is developed, then Sledgehammer will slowly transition to its more powerful 64bit side.
However, this takes time, and Intel will quickly gain market share with their 64bit solution. They've already been nearly guarenteed this through customer feedback and support. AMD will surely transition their processor to the workstation market, where its strong floating point should be very compelling. Butm in terms of the enterprise server, they will have a much tougher time penetrating.
Eventually, AMD wants the Hammer series to be their next desktop chip. With strong 32bit performance, and hopes of more x86-64 software being developed, seeing this processor in the desktop market is inevitable. Intel will be hard pressed to compete with their IA-32 line once 64bit software becomes ubiquitous, but this is very far down the road, and I believe they have solutions already planned.
The article also mentions McKinley, the successor to Itanium. Details about this chip are scarce, but it is said that it will perform in many ways better than Itanium. Before details are released, I think it is presumptuous for the article to already claim Sledgehammer's superiority to McKinley, especially since the two are aimed at different markets.
Itanium is already slated to be available in 8, 16, and 32 procesor systems from NEC and Unisys, and Sledgehammer will not scale nearly as well. AMD has an emerging 2 processor system, which definately breaks the surface, but they have a while to go before they are able to implement the 8-way servers that are on their roadmap, and I have yet to hear of anything greater than that.
I believe there is enough market space for both Intel and AMD to be successful in their own way. I hope they are both successful, because they both offer new and interesting technology that will surely put us into the next generation of computing.
Superwormy
10-13-2000, 01:59 PM
Limiting factor is bus speeds and memory speeds, certaintly not HD speed. Look at the computer when your playing games and such, as long as you have 128 megs of RAM, the little hard drive light rarely comes on. On the other hand, look at the huge FPS difference u get if u go from bus speed of 66 to 133 with the same proccessor.
ziadmo
10-20-2000, 01:15 AM
Well, my opinion is the following: FSB and Hard Drives. I agree that memory is an important peice to speed up the computer's performance and that a lot of RAM is never enough, but as memory can be just adjusted, so we dont need to worry about it . HDs on the other hand, are something that we cannot just ignore it . The HD play many important roles to improve the speed of the boot process , faster loading of programs, and here I am talking about large hard drives with at least 10 000 RPM. The more room u have the more freely u can move on it. If we install a 2 GB HD in a 800 Mhz system, with 128 RAM we will see a huge difference than changing any other peices like memory or video . Also FSB is really slow comparatively to the current speed of processors. Faster FSB means faster connection between the parts and each others, and faster connection btween the RAM and the processor which means a faster performence for overall speed. FSB and HDs are not as easy to replace as the RAM. I also would like to point the importance to cache memory (both L1 and L2) as it also helps loading programs a lot faster. Imagine we have a cache of three time the processor speed ..How fast will the programs load ..For gamers I think that video card and RAM are the major two factors.
frankmccann
10-20-2000, 06:46 AM
The biggest bottleneck is Gates software that uses only 640K upper memory.
Busses used to be a problem and they're getting better.
Perhaps they could change RAM designs so that this "640K" barrier can be jumped with hardware rather than software. The diffrences between hardware and software are blurring more and more.
In five years we'll be laughing about all this.
Originally posted by Arcadian:
I wanted to spawn a discussion on system performance. More specifically, what do you think is the biggest bottleneck in a computer system?
These days, we are starting to see diminishing returns for improvements in various hardware. It has gotten to the point where insignificant improvements are taken to be huge leaps in technology. We used to live in a time where the newest video card gave a 20-40% improvement over the competition, whereas today it is more like 5%. CPUs used to give similar increases, and now they, too, are giving < 5% improvements.
So what is the bottleneck? What is preventing our systems from doubling in performance? Is it system memory? Is it the CPU or the video? Is it AGP? Or front side bus speed? Is it the hard disk or removable media? Or is it something else entirely?
Before I give my opinion, I was just curious on what other people thought. So come on... state your opinions. Keep it technical, if you can, and site examples if possible. Web links are really great, too.
Arcadian
10-20-2000, 01:40 PM
Originally posted by frankmccann:
The biggest bottleneck is Gates software that uses only 640K upper memory.
Busses used to be a problem and they're getting better.
Perhaps they could change RAM designs so that this "640K" barrier can be jumped with hardware rather than software. The diffrences between hardware and software are blurring more and more.
In five years we'll be laughing about all this.
That was a problem of 5 years ago. x86 archiecture in general must be designed so that it is compatible with the 640k memory boundary (which is actually a 1MB boundary in hardware). Current hardware, since the 386 in fact, has allowed a mode called Protected Mode, which allows for large memory areas. Window's did not fully take advantage of this until Windows 95 gave what appeared to programs as a flat memory region. Windows NT enhanses memory mapping even further. Windows is doing a good job, too, because x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance. For now, the OS is bearly a bottleneck as far as memory mapping is concerned.
Humus
10-20-2000, 05:21 PM
Originally posted by Arcadian:
That was a problem of 5 years ago. x86 archiecture in general must be designed so that it is compatible with the 640k memory boundary (which is actually a 1MB boundary in hardware). Current hardware, since the 386 in fact, has allowed a mode called Protected Mode, which allows for large memory areas. Window's did not fully take advantage of this until Windows 95 gave what appeared to programs as a flat memory region. Windows NT enhanses memory mapping even further. Windows is doing a good job, too, because x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance. For now, the OS is bearly a bottleneck as far as memory mapping is concerned.
Would you mind explaining a little better what you mean by "x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance."?
In what way is IA-32 not flat? You address memory with a single 32bit address like MOV EAX, [EBX]. With EBX = 0 in this case you address the first byte of your program and with EBX = 0xFFFFFFFF you address the last byte in your virtual memory.
BTW, OS memory management is a very important factor for system performance. Every time you encounter a "new" or "delete" statement in C++ that's a call the OS. Same for malloc/calloc/realloc/free etc.
Those are not considered to be cheap operations.
[This message has been edited by Humus (edited October 20, 2000).]
TolTas
10-20-2000, 05:22 PM
My vote is for x86 Code.
Unfortunately until we stop running x86 code which has an efficency rating of less than 60% i believe, there is no way to vastly improve computer speed.
------------------
--> £¿Quanza!¿£ <--
[This message has been edited by TolTas (edited October 20, 2000).]
Arcadian
10-20-2000, 07:50 PM
Originally posted by Humus:
Would you mind explaining a little better what you mean by "x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance."?
In what way is IA-32 not flat? You address memory with a single 32bit address like MOV EAX, [EBX]. With EBX = 0 in this case you address the first byte of your program and with EBX = 0xFFFFFFFF you address the last byte in your virtual memory.
BTW, OS memory management is a very important factor for system performance. Every time you encounter a "new" or "delete" statement in C++ that's a call the OS. Same for malloc/calloc/realloc/free etc.
Those are not considered to be cheap operations.
[This message has been edited by Humus (edited October 20, 2000).]
Ever notice how the Pentium III, for example, has a 36bit address range? Yet registers such as EBX are 32bit. In order to access memory outside of that range, you need segment registers, which include CS and DS, for example. Each program running under Windows, for example, is allowed their own protected segment to run under. A flat memory address space, which could only be allowed with 64bit registers, is a preferable situation, and does not require extra instructions to compute effective addresses, for example.
smtkr
10-20-2000, 08:14 PM
I'd say the biggest bottleneck is the bus speed in general. Everything needs to be faster except the processor which would still be nice to see run at unreal frequencies. All periferals internal and external need to interact faster with each other.
------------------
I'm consistantly inconsistant!
BikeDude
10-20-2000, 09:20 PM
Originally posted by Arcadian:
Ever notice how the Pentium III, for example, has a 36bit address range? Yet registers such as EBX are 32bit. In order to access memory outside of that range, you need segment registers, which include CS and DS, for example. Each program running under Windows, for example, is allowed their own protected segment to run under.
I dunno about Win9x, but Windows NT operates with memory pages, where each page can be as small as 4KB or as big as (usually) 2GB. Segments are more or less a thing of the past (and has been - atleast using protected mode on the i386 and up).
--
Rune
Humus
10-20-2000, 10:06 PM
Originally posted by Arcadian:
Ever notice how the Pentium III, for example, has a 36bit address range? Yet registers such as EBX are 32bit. In order to access memory outside of that range, you need segment registers, which include CS and DS, for example. Each program running under Windows, for example, is allowed their own protected segment to run under. A flat memory address space, which could only be allowed with 64bit registers, is a preferable situation, and does not require extra instructions to compute effective addresses, for example.
But then we are talking about extensions, which are rarely used by standard applications. Any application using standard 32bit addresses will have a flat memory model and will not require any kind of extra instructions to compute the effective address.
Arcadian
10-21-2000, 12:33 AM
Originally posted by Humus:
But then we are talking about extensions, which are rarely used by standard applications. Any application using standard 32bit addresses will have a flat memory model and will not require any kind of extra instructions to compute the effective address.
Correct, but I was originally discussing processor architecture performance. In terms of x86, the operating system still has to treat the memory model as segmented, which adds overhead. Windows NT does a very good job of making the x86 memory model look flat, but it still takes processor time to do so. Thanks for making this clearer for everybody else, though, Humus. http://www.sharkyforums.com/ubb/smile.gif
darkamage
10-21-2000, 04:02 AM
Originally posted by Arcadian:
[B]
In the future, I see faster systems doing the following.
1) Moving to IA64, or other non-x86 architecture. This will undoubtably be IA64, but I am opening myself to the possibility that IA64 might fail, and another architecture comes to fill its shoes. On this same topic, we'll call it 1b) Paralellize data. In other words, SMP configurations are good for now, but I see symmetric multiprocessing on the core level in the future. Read another post on this forum for more information, but I believe SMT and CMP to be the future. More paralellized code will also be necessary so that each of the processor cores and threads will be given data 'a plenty to sort through.
2) Integrating everything. Like I said, speed is only found on the die, and as soon as you move off of it, you face nature's wrath. So the solution is to integrate CPU, video, memory, I/O, and the kitchen sink all on to the very same die.
[B]
I remember what my architecture professor asked the class: "how many of you have done multi-processor programming?", 2-3 hands raised. "how many of you actually like multi-processing programming?", none raised. judging by that, he made a prediction that by at least 5-10 more years, SMP is not going happen to the mainstream. He said that ever since the 70s people have been saying that SMP (or RISC for that matter) is the way but still here we are. of course physics will not be defeated and we will get there (unless that chemical/biological/dna based processings really take of). Also a note on compilers: one can argue that compilers should really bridge that gap from hardware architecture and programmers. However keep in minds that architecture and compiler designs are not develop in-sync (and that compiler people are programmers too http://www.sharkyforums.com/ubb/smile.gif). architectures have a very long time span considerations (including backward compatibility and roadmaps) so compilers will have to follow the architecture. Somehow it's hard for the big processor maker to built that real new and nice architecture. Once they've built it, they'd keep it for as long as possible. It would be very nice if the designs are developed together cleanly.
integrated memory and logic is actually pretty hard to manufacture although there are several products already (usually with small amount of memory). memory and logic actually have different manufacturing process. Manufacturers don't even sure which process is better, to integrate logic to memory or memory to logic.
One aspect that can be optimized is the OS. In a way the OS developments have fallen victim to architectural design paradigm much like the compilers (and no, it's not entirely microsoft's fault either http://www.sharkyforums.com/ubb/smile.gif). There are/were many many OS designs (and implementations) that are very efficient in terms of bus or disk or memory utilizations or filesystems for example. As someone already mentioned, OS system calls are very expensive and can be made a lot more efficient. A specialized network hardware company has special NICs & drivers that are very fast and bypass the OS so data can be copied (and used) without a lot of context switching by the OS. My other professor believes that in the not too distant future, in a LAN situation, a server with a lot of RAM actually can serve data (a lot) faster than local drives (he also predicts that ubiquitous wireless networking will happen as soon as we aliminated all those nasty road tunnels).
[This message has been edited by darkamage (edited October 21, 2000).]
[This message has been edited by darkamage (edited October 21, 2000).]
Moridin
10-23-2000, 04:29 PM
Intel's original 8-bit processor the 8080 used a 16-bit register for storing the memory address to be accessed. This gave it access to a maximum of 64 KB of memory. This was similar to most of the processors of that generation, like the Z80, and the 6800. This is a hidden register, not a user register.
When Intel developed the 8088 and 8086 they it as quickly as possible due to threats from the 68000 and Z8000 which were also under development at the time. To maintain backward compatibility and get the processor out fast the memory address register was left at 16 bits. To allow access to more then 64 KB of memory Intel came up with a segmented model that allowed the processor to access 16 of these segments for a total of 1 MB of memory.
This was the environment that DOS, and therefore all DOS programs were written for. DOS assigned the first 10 of these segments for use by programs thus the 640-KB limit in DOS. DOS could also use the other 6 pages for some, but not all, system-related stuff. Later versions of DOS allowed swapping the one of these pages (the 11th???) with another page above 1 MB allowing some programs to use memory above 1 MB if they were written for it.
When Intel introduced the 386 they expanded the memory address register to 32 bits or 4 GB addressable memory. The 4 bits used in the 8088 were still present allowing you to address 16 x 4GB = 64 GB. Of course DOS and all DOS apps still thought that this register was only 16 bits and treated it accordingly, thus the difference between 16 bit "Real" mode and 386 enhanced mode. (Of course there are other differences as well like memory protection)
BikeDude
10-24-2000, 03:14 PM
Originally posted by Arcadian:
In terms of x86, the operating system still has to treat the memory model as segmented, which adds overhead.
The OS only needs to create one huge 4GB page and then this wouldn't be an issue. This was one of the possibilities described in many a book covering the 80386 CPU years ago.
However... If you have an OS that caters to more than one application, and each application fancies a flat address space (and perhaps more memory then what's currently physically available), then I ask you this: Why not introduce the concept of paging?
NT btw was initially developed on workstations running on the MIPS R3000 (or R4000?) CPU. Ironically that's one of the first CPUs MS dropped support for (NT 3.51 still supported it?).
--
Rune
Warpoet
10-25-2000, 08:50 PM
Seems to me that we haven't even begun to exploit the advantages of parallel processing. As for as consumer software goes, I can't think of a single product that attempts to harness more than one processor. Thus, I'd say software is the real bottleneck right now.
------------------
-warpoet
BRB
10-28-2000, 01:06 PM
It's gotta be da bus speed!
I run a BX chipset @ 144MHz while my wife runs her BX @ 100MHz. My PIII/720(667) will walk away from her PIII/800 on any benchmark that we would run(BTW, her PIII/800 is the 256kb ATC variety).
I don't own or use any AMD products but I have read enough about the T-Bird to know that it is a very, very fast processor and it
runs a high FSB.
Secondly, I do strongly believe that all memory is not created equal. I noted a marked improvement in performance and benchmarks, not to mention overclocking capacity, when I replaced my Micron PC133 with Mushkin High-Perf. Rev.2 222.
Bobby
garethr
10-29-2000, 11:34 AM
Bottlenecks depend on the application:
- Windows office applications are usually disk IO-bound, so faster disk subsystems help (and more RAM to avoid paging)
- Browsers are typically network IO-bound, so fatter pipes help
- Transaction-processing systems are typically disk IO-bound, so more spindles help
- 3D Max is typically floating-point compute-bound, so getting an Athlon helps
- Quake III at 1280x1024 High quality is typically bound by the memory bandwidth of the graphics card, so a Geforce DDR works better than an SDR
- The Java-based web application I'm working right now is disk-write bound (because it's logging a stupid amount of debugging info to disk)
- Server-based applications are often limited in the number of simultaneous users they can support by the size of their level two processor cache, so a Xeon or Sun processor helps
Abit400
11-04-2000, 01:20 PM
It is the stupid Microsoft OSs. The computer has so much power but it is all wasted with Microsofts add on crap. For example with the new media player you can't even play an MP3 without the stupid processor time wasting visulation going.
BikeDude
11-05-2000, 05:20 AM
Originally posted by Abit400:
It is the stupid Microsoft OSs. The computer has so much power but it is all wasted with Microsofts add on crap. For example with the new media player you can't even play an MP3 without the stupid processor time wasting visulation going.
Media Player has exactly nothing to do with the OS. Please pick something tangible to ***** about. (If I created my own Linux distro and bundled "BikeDude's Fantastic MP3 Player" with it which sucked CPU like there was no tomorrow, does this make Linux a "stupid OS"?)
--
Rune
colonel
11-07-2000, 03:58 AM
WAIT.....WAIT.....HOLD ON
we're all looking at this the wrong way the biggest bottle neck isn't the hard ware its the software. We're all using a os that was based on a 20 year old coding (dos) un till we go to 32 bit coding our systems will be slowed down.....
but its not just our os its software in general.....It takes 6 months to 2 years to write a game or product in that time we will see massive improvements in speed.....I mean come on some people are still benchmarking quake2 and its already 3 years old which means that by moores law by now everyone should have replaced our computers.....
Software can only go so fast why do you think theres no new video cards coming out for a while there all waiting for dx8....theres no point making a card that will run current products 5% faster......
hard ware can be created and produced much faster than software due to error checking and such and there will always be this problem,when we had the pentium 100's (hey i still use my pentium 75 play civilisation on it http://www.sharkyforums.com/ubb/smile.gif ) we were working with smaller multiples in the numbers we were working with.....now we are using the same speed increases with smaller multiples on the same operating system (win me is really just win 95 which is really just 3.1 with a different gui which is really just a different win 1 and 2 which is really just a dos with a gui period) we are increasing our clock speeds in the same 33mhz jumps (900-933) which isn't as big a jump as 100 to 133......
------------------
'From now on I'm thinking only of me.'
Major Danby replied indugently with a superior smile: 'But, Yossarian, suppose everyone felt that way.'
'Then,' said Yossarian, 'I'd certainly be a damned fool to feel any othe way wouldn't I?'
Arcadian
11-07-2000, 11:57 AM
If that's true, than how come Windows 2000, which is pure 32bit and 0% DOS, doesn't go that much faster than Windows 98 or ME?
I think the software tries to scale with the hardware, and development times aren't likely to get any shorter. There is still a lot that can be improved in the hardware, so I believe that is still the bottleneck in performance.
BikeDude
11-07-2000, 03:33 PM
Originally posted by Arcadian:
If that's true, than how come Windows 2000, which is pure 32bit and 0% DOS, doesn't go that much faster than Windows 98 or ME?
It does.
E.g. everything that deals with graphics is faster. Given half a decent (i.e. nothing from 3DFX these days) device driver, 2000 will sink 98 performance wise...
2000 sports better file systems, better disk I/O, etc...
But, NT has gotten bigger with the years. It has a pretty hefty shell (explorer is far from lightweight) so you need to give it lotsa room (128MB _minimum_).
It's also a lot smoother multitasking wise. (i.e. you can still do stuff while loading a huge app)
What it _won't_ do is count faster or put differently: Anything CPU bound will still be CPU bound (but NT is better at distributing CPU to other tasks that might require it).
Greatly simplified of course, but you get the gist of it.
--
Rune
LordZordec
11-07-2000, 07:21 PM
Well, Ill put my 2 1/2 cents in. What the blonde dude on page one said is right. The hard drive, and really, non-volatile storage media period is the biggest bottleneck on performance.
Think hard drives are slow, look at the CD-ROM. Think thats slow, look at the floppy drive!
Im gonna try this sometime. I have 320MB RAM. Somehow, I would like to set up a 200MB RAM Drive and do a full install of Jedi Knight to it. Right now, even with a 7200 RPM hard drive, every time it goes to load a new sound into memory (like the sound one of those three eyed ReeYees makes when it sees you-"Erderot-ray!"), the frame rate drops to almost nothing. Why? For that split second, its waiting for the hard drive to find and copy the sound to memory. But with a RAM Drive, all this would be instant.
By some chance, does anybody where I can find a utility like that? The ramdrive.sys that comes with windows only supports up to 32MB drives...
Verneir
11-08-2000, 02:42 AM
Dang. You guys work at Intel or what? I've logged many hours reading up on systems, and I still don't know crap about registers and the over head it produces, etc..
Anyway. Judging by constant disagreements from techs of ALL knowledge levels.. I think the simple approach is best: Who cares? Quit trying to fix whats broke, trash it, recreate a system architecture with today's technology, and an OS for it.
Yeah, yeah.. "real world" expenses and ramifications, whatever.. Microcrud is big enough to do whatever they want. Let alone Wintel.. http://www.sharkyforums.com/ubb/wink.gif
BikeDude
11-08-2000, 01:06 PM
Originally posted by LordZordec:
Well, Ill put my 2 1/2 cents in. What the blonde dude on page one said is right. The hard drive, and really, non-volatile storage media period is the biggest bottleneck on performance.
It's not too hard to put together a drive system (think multiple drives striped together - RAID) that would outrun a regular PCI bus (133MB/s).
--
Rune
Arcadian
11-08-2000, 01:23 PM
Originally posted by BikeDude:
It's not too hard to put together a drive system (think multiple drives striped together - RAID) that would outrun a regular PCI bus (133MB/s).
--
Rune
And just how many people do you think would do this in a PC? Say you had a RAID system (RAID5, for example, since RAID0 only works for 2 drives), and it was SCSI160, so you had enough bandwidth. Since drives typically can't sustain transfer rates of more than 20MB/s for any period of time (buffer runs out, etc), you would still need more than 5 drives to stress the bandwidth of the PCI bus. I don't know of any PCs that have more than 5 SCSI160 drives. http://www.sharkyforums.com/ubb/smile.gif
colonel
11-09-2000, 05:13 AM
Originally posted by Arcadian:
If that's true, than how come Windows 2000, which is pure 32bit and 0% DOS, doesn't go that much faster than Windows 98 or ME?
I think the software tries to scale with the hardware, and development times aren't likely to get any shorter. There is still a lot that can be improved in the hardware, so I believe that is still the bottleneck in performance.
because win 2000 isn't designed for running games and such.......no matter how many people use it differently its still a netwroking os designed as a workstation platform to be stable rather than fast and is weoghed down by safety programs
M/soft had originnally planned for win me to be true 32 bit coding but had problems with the plug and play support so they scraped it and just rehashed and re-realesed win 98(win me)
Someone said it was the x86 architecture but the pentium pro which current processors are based was designed to be a little faster in 16bit and a lot faster in 32bit.....but we're still only using a 16/32 hybrid we are yet to realsise the full potential of our processors
------------------
'From now on I'm thinking only of me.'
Major Danby replied indugently with a superior smile: 'But, Yossarian, suppose everyone felt that way.'
'Then,' said Yossarian, 'I'd certainly be a damned fool to feel any othe way wouldn't I?'
zombor
11-09-2000, 01:33 PM
hell, i can tell ya what it is....metal interconnects. optical all the way! http://www.sharkyforums.com/ubb/wink.gif
------------------
Compaq Armada E500
650MHz PIII 128MB RAM
8MB ATI rage mobility
11GB hard drive/DVD
dual booting win98/2k
who said you cant game on a notebook????
BikeDude
11-09-2000, 02:42 PM
Originally posted by Arcadian:
Say you had a RAID system (RAID5, for example, since RAID0 only works for 2 drives), and it was SCSI160, so you had enough bandwidth. Since drives typically can't sustain transfer rates of more than 20MB/s for any period of time (buffer runs out, etc)
First off, is RAID 0 (mirroring) really limited to only two drives? Who says you can't have two drives mirroring the first?
And don't forget, there's RAID 1 and 3 too...
(RAID 1 is IIRC, basically RAID 5 except it doesn't have any parity -- i.e. disk striping without parity in Windows NT lingo)
As for running out of buffers... Well, in theory the drives would get more time filling those buffers (seeing as load is distributed among the drives).
In addition you'd probably want to hook up to a DLT streamer as well. Those suckers want their piece of the bandwidth too.
You'd be lucky if you could service 5 disks and a DLT streamer using the PCI bus. There is a reason why they put 64 bit PCI bus in high-performance servers (and why Adaptec 2940U2W and 29160 supports 64 bit PCI).
(BTW: Sony's DTF-2 tape station claims 24MB/s data transfer rate, and I'm sure you can find plenty of disks that are faster than 20MB/s -- If I read Quantum Atlas 10K III's specs right, it should exceed 50MB/s)
--
Rune
BikeDude
11-09-2000, 02:47 PM
Originally posted by colonel:
because win 2000 isn't designed for running games and such.......no matter how many people use it differently its still a netwroking os designed as a workstation platform to be stable rather than fast and is weoghed down by safety programs
That doesn't explain why Windows 2000 scores better in Quake 3 compared to Win98, when both configurations use a NVidia based adapter...
Given good drivers and enough RAM, 2000 will beat 98.
(ATI, 3dfx and Matrox didn't have good drivers, atleast not a month or so back according to the benchmark I saw -- so using products from those vendors will effectively tie Win2k's hands)
--
Rune
BikeDude
11-09-2000, 03:04 PM
Originally posted by BikeDude:
[B(BTW: Sony's DTF-2 tape station claims 24MB/s data transfer rate, and I'm sure you can find plenty of disks that are faster than 20MB/s -- If I read Quantum Atlas 10K III's specs right, it should exceed 50MB/s)
[/B]
According to http://www.quantum.com/quantum/pc/pr/pr00101601.htm I did read the specs right. 54MB/s sustained.
So, two of those drives and a Sony tape drive, and your PCI bus is fully occupied handling all that SCSI action. Never mind what happens if you pop in one or two 100 mbit/s network cards.
--
Rune
techs
11-09-2000, 07:33 PM
hard drives. when hard drives become hybrids with nonvolatile ram and disk in one we will see real increases. imagine loading your operating system and most common apps into ram, and turning your computer on and "booting" in 5 seconds. office apps opening virtually insantaneously.
Arcadian
11-09-2000, 09:25 PM
Originally posted by BikeDude:
First off, is RAID 0 (mirroring) really limited to only two drives? Who says you can't have two drives mirroring the first?
And don't forget, there's RAID 1 and 3 too...
(RAID 1 is IIRC, basically RAID 5 except it doesn't have any parity -- i.e. disk striping without parity in Windows NT lingo)
As for running out of buffers... Well, in theory the drives would get more time filling those buffers (seeing as load is distributed among the drives).
In addition you'd probably want to hook up to a DLT streamer as well. Those suckers want their piece of the bandwidth too.
You'd be lucky if you could service 5 disks and a DLT streamer using the PCI bus. There is a reason why they put 64 bit PCI bus in high-performance servers (and why Adaptec 2940U2W and 29160 supports 64 bit PCI).
(BTW: Sony's DTF-2 tape station claims 24MB/s data transfer rate, and I'm sure you can find plenty of disks that are faster than 20MB/s -- If I read Quantum Atlas 10K III's specs right, it should exceed 50MB/s)
--
Rune
Well, RAID 0 is striping, and RAID 1 is mirroring, and I believe RAID 5 is striping with parity. But anyways, my point is just what you said. Those that use the equipment you listed are probably trying to put together servers or powerful workstations, NOT PCs. That is why server motherboards come with 64bit PCI, or even better yet, 64bit/66MHz PCI, which has 4 times the bandwidth of standard PCI. I still don't think that PCs are at the point where more I/O bandwidth is necessary. Does anybody here plan to buy Quantum Atlas 10K III drives in their next PC?
Those planning on spending the money on two Quantum Atlas 10K III drives and a Sony tape drive, should spend another couple hundred $$$ and get this board here.
http://developer.intel.com/design/servers/stl2/
colonel
11-10-2000, 01:09 AM
Originally posted by BikeDude:
That doesn't explain why Windows 2000 scores better in Quake 3 compared to Win98, when both configurations use a NVidia based adapter...
Given good drivers and enough RAM, 2000 will beat 98.
(ATI, 3dfx and Matrox didn't have good drivers, atleast not a month or so back according to the benchmark I saw -- so using products from those vendors will effectively tie Win2k's hands)
--
Rune
bike dude i was agreeing with you
------------------
'From now on I'm thinking only of me.'
Major Danby replied indugently with a superior smile: 'But, Yossarian, suppose everyone felt that way.'
'Then,' said Yossarian, 'I'd certainly be a damned fool to feel any othe way wouldn't I?'
Moridin
11-10-2000, 08:19 PM
Originally posted by Arcadian:
Well, RAID 0 is striping, and RAID 1 is mirroring, and I believe RAID 5 is striping with parity. But anyways, my point is just what you said. Those that use the equipment you listed are probably trying to put together servers or powerful workstations, NOT PCs. That is why server motherboards come with 64bit PCI, or even better yet, 64bit/66MHz PCI, which has 4 times the bandwidth of standard PCI. I still don't think that PCs are at the point where more I/O bandwidth is necessary. Does anybody here plan to buy Quantum Atlas 10K III drives in their next PC?
Cheap ID raid controllers supporting RAID 0 and 1 (and 10) are becoming popular for desktop systems. I agree with you though that RAID is really only critical in some server systems. A properly configured desktop system should not need to access the HDD after loading the app.
Just a little information. Oracle does not recommend using RAID other then disk mirroring since a properly tuned Oracle database will get better performance if you do not use RAID.
kinetic
11-11-2000, 12:05 AM
Operating systems and bloatware....that's the biggest bottleneck
--all the garbage that eats memory and clock cycles......what a shame http://www.sharkyforums.com/ubb/redface.gif(
nukefault
11-13-2000, 09:40 PM
Heh, I have to say the biggest slowdown is definitely, totally, Microsoft. Windows is the most bloated app I've ever used, even worse than Word 6.
Once you go to a halfway decent OS like Linux or Be (or Win98/ME Lite, sorta) the biggest block depends on your activity. If you have 256 megs of PC133 CAS2 memory then your limited by HD speed (c'mon, ATA100 is a joke - if you can hold up a 30MB/sec transfer your doing awesome) and the FSB. A Geforce GTS or Radeon graphics card should be plenty fast for any gaming if they'd get some decent drivers on Linux or Be...around 1280 you get limited by fillrate, but I'm perfectly happy running Quake III at 1280x1024 at 50fps http://www.sharkyforums.com/ubb/smile.gif
Anyway I'm gonna go watch Pitch Black now, hope this helped http://www.sharkyforums.com/ubb/smile.gif
------------------
"It is better to keep your mouth closed and look stupid, than to open it and remove all doubt."
~Famous Dead Guy
frankmccann
11-15-2000, 04:16 PM
After doing a little more research, I've come to the conclusion that windows upper memory management program is really one of the biggest bottlenecks there is. Also it is bugged big time.
frank
Originally posted by Arcadian:
I wanted to spawn a discussion on system performance. More specifically, what do you think is the biggest bottleneck in a computer system?
These days, we are starting to see diminishing returns for improvements in various hardware. It has gotten to the point where insignificant improvements are taken to be huge leaps in technology. We used to live in a time where the newest video card gave a 20-40% improvement over the competition, whereas today it is more like 5%. CPUs used to give similar increases, and now they, too, are giving < 5% improvements.
So what is the bottleneck? What is preventing our systems from doubling in performance? Is it system memory? Is it the CPU or the video? Is it AGP? Or front side bus speed? Is it the hard disk or removable media? Or is it something else entirely?
Before I give my opinion, I was just curious on what other people thought. So come on... state your opinions. Keep it technical, if you can, and site examples if possible. Web links are really great, too.
TaxExemPt
11-15-2000, 04:40 PM
Originally posted by Ymaster:
I dont want the see video cards anymore. I want to see GPU sockets! Adding Alphas to are Video GPU's like we do to new cpu's...Gives a whole new light to onboard video?
Now that I think about it..Why not add on-die cash to the Gpu's? Hmmmm
[/B][/QUOTE]
I think the bitboys were going that route with 9mb on memory.
gaffo
11-17-2000, 02:14 AM
Originally posted by Arcadian:
If that's true, than how come Windows 2000, which is pure 32bit and 0% DOS, doesn't go that much faster than Windows 98 or ME?
I think the software tries to scale with the hardware, and development times aren't likely to get any shorter. There is still a lot that can be improved in the hardware, so I believe that is still the bottleneck in performance.
I'm thinking the colonal is right here. I know linux under kde is around 30-percent faster than win-98 on my system. And i've hear wonderous legends of speed about the BeOS and how its at least twice (even four times leaner/faster WRT video/audio) as fast as win-2000/nt. I know when i used OS/2 (yes it was better than win-95 and came out a year earlier) on my AMD 486-100, it was around 30-percent faster than win-95 also - Os/2, linux, Beos are 32-bit. Not 16-bit like DOS, win-95,win-98,win-me.
James
11-18-2000, 01:57 PM
Obviously, as stated several times before, switching from mechanical to electronic access times will increase overall performance. (After all, overall performance increases are what its about.) Also, I wholeheartedly believe in system integration, but not in quite the same way. Multiprocessor designs will always win performance wise. But to truly reap the benefits a new architecture is in order. Use more than one processor, but use them in conjunction for different tasks. Several processors working in sync, with one handling video, one sound, one I/O, etc. By breaking down the workload and designating a processor to each task, performance gains would be phenomenal. By integrating all of these processors into a single package you could still keep in line with the concept of a CPU (central processing unit). In order to keep things going all you would need (as far as processor intercommunication) is a sync signal. Obviously, software coding would have to change in order to utilize the new architecture and that won't happen.
Alas, it should also be obvious that this highly technical type thinking isn't my cup of tea http://www.sharkyforums.com/ubb/smile.gif.
just my 4 cents (inflation's a mofo)
------------------
Think outside the box or be forced to live in one.