Performance Discussion - The Biggest Bottleneck

Printable View

Show 25 post(s) from this thread on one page

10-12-2000, 03:54 PM
Moridin

Quote:

Originally posted by zombor:
ok, this just got way out my league https://www.sharkyforums.com/images/.../2005/06/6.gif

Hey, I'm pretty close to that myself. I may be an engineer, but I'm not a chip designer.
10-12-2000, 03:55 PM
chickenboo

Quote:

Originally posted by nkeezer:
As for what's holding up performance on the most general level, I think the answer to that is easy: hard drives.

I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!! https://www.sharkyforums.com/images/.../2005/06/5.gif
10-12-2000, 04:13 PM
Ymaster

Quote:

Originally posted by chickenboo:
I don't know about you, but my hard drive is blazing fast compared to my floppy drive. Isn't there anyone that can make my floppy drive go faster??!! https://www.sharkyforums.com/images/.../2005/06/5.gif

No! Just use a zipdrive...
10-12-2000, 04:27 PM
Moridin

Quote:

Originally posted by Arcadian:
For an in order device, Itanium can be much faster than many out of order devices. It is the architecture itself that makes in order operation work. The reason why in order affects performance in x86 is because there are too many dependancies, and even register renaming can't solve them all. In EPIC, these dependancies are resolved at compiler time, so that the CPU will not be stalled as often. In order actually raises the performance of the Itanium.

EPIC cannot resolve them all either. Memory dependencies cannot be predicated at compile time and therefore can never be resolved by an in order machine. As the gap between processor speed and memory speed increases so do the effect of memory dependencies. A deep OOO design can deal with most of the instruction dependencies and help with memory dependencies as well.

To expand on this a bit, at compile time you don't know if a given piece of data will be in L1 L2 or main memory. If you try to execute an instruction that operates on that data an in order processor, EPIC included, has no choice but to wait until it loads that data from memory. You can get around this a little by placing the load as soon as possible in the instruction stream and working on the data later, but you still have no idea exactly how long you will have wait before the data is in a register. This is why EPIC needs so many registers.

The situation gets worse when you are accessing a memory location that is decided upon by a calculation or even just the contents of another memory location. (Pointer chasing) In this case there is little EPIC can do but sit and wait for the data to arrive. Even if it already has all the data in its registers needed to execute an instruction it cannot do anything if that instruction is not next in line. A single call to main memory can stop execution for tens or hundreds of cycles.

An OOO processor on the other hand just keeps looking for instructions that can be executed. If the first cannot be it looks at the next, then the next and so on until it finds one that can be executed. If data from an instruction that comes later in an instruction stream arrives before the data for an earlier one the later instruction can be executed if it has no other dependencies.

Quote:

Originally posted by Arcadian:
There is nothing impressive about the EV7. Actually EV7 doesn't change the processor architecture at all. It simply allows new levels of SMP parallelism. It's a different protocol that will only raise performance on a system level, not a processor level. If you are talking about IA-64 vs. an EV7 single processor, Itanium will be much faster, let alone McKinley. Only in very large SMP systems will EV7 be faster. At least this is what I hear.

Given its massive memory bandwidth and process shrink to .18 the EV7 should double the current EV6 SPEC scores which already lead the industry by a large margin. Intel has already announced that they will not submit SPEC scores for Itanium. The only other company that has done this recently is Sun with its pathetic USII scores. (Sun did post scores for the USIII, but they look fairly average.)
10-13-2000, 12:52 AM
PDR60

This is an interesting question. On one hand we have hard drives that really haven't even pushed the ATA66 standard for sustained output. I really think that HD technology is topping out. It is a mechanical device that in no way will ever approach the speed of en electronic device. You can only spin platters so fast.
Then we have system buss technology. I think Intel, Via and others are at a loss as to find an economical way to increase buss speed and maintain factors like memory bandwidth. Look at all the incarnations of the 8xx chipset intel has tried. Rambus was a flop. Then Via came out with all those K133 series chips now its the 694 series. Both have memory bandwidth problems. Its hard to believe that a chipset as old as the BX is still a viable option. It can still compete with the new chipsets in its overclocked 133 state.
Memory is another bottleneck. Maybe not just in the speeed catagory but in the amount that bloated OS's are demanding. 98 running on anything less then 128meg takes a performance hit. Then theres Win2k! I think that DDR ram may help in the memory front. However you will still need adequate amounts to satisify your OS. I run a dually and am considering going to half a gig for memory.

------------------
BP6 with dual 366@550 and loving it
10-13-2000, 04:24 AM
Phoenix

Quote:

Originally posted by Arcadian:
I think you are misunderstanding some of the specifications. I don't know much more
myself, but what you said probably doesn't have the impact that you think. At least, I
haven't heard anyone else complain about this. Can anyone shed some light on this topic?

We recently had a small discussion on this subject in one of my lectures, the professor simply said that have done test that show that about 30 registers is about the perfect of registers. 30 was the best amount for non-floating point calculations, as floating point calculations only needed about 15 registers before performance at it's max. Once you get above 40 registers performance started to decrease due to long addresses. The only problem is that our discussion was on MIPS procressors and I'm not sure how it would relate to any x86, or x87 for that matter, processors.
10-13-2000, 08:49 AM
Humus

Quote:

Originally posted by Moridin:

Thanks, I am not familiar with the full IA-32 instruction set. Most of my assembly programming was done in 68000, 6800, 8080 and the 8080 compatible Z80. I still think IA-32 is limited in this regard compared to newer ISA's and that this is a bottleneck in the ISA.

I also understand that the LEA instruction can extend some of these capabilities, but I am not entirely sure what it does.

Yes, LEA extends the capabilities some. LEA stands for Load Effective Address and is used for calculating pointers, but can be used for other stuff too. You use it like this:
LEA REG1, [REG2 + s * REG3 + immediate]

So, its a 4 operand instruction where the last one has to be a constant and s has to be 1,2,4 or 8.

So if you want to do a = b + c you can write
LEA EAX, [EBX + ECX]
or for a = b + 2 * c you can write
LEA EAX, [EBX + 2 * ECX]
or for a = b + 2 * x + 7 you can do
LEA EAX, [EBX + 2 * ECX + 7]

The lea instruction is rather powerful, but is a little limited too. You cannot do a = b - c, only add. The constant can be negative though.
10-13-2000, 11:16 AM
Rick_James9

Here's some info about AMD's new processor "hammer" and "Itanium" You can read more about it at: http://www.cpureview.com/art_64bit_a.html

"Key Features of AMD's 64 bit x86 architecture
AMD is stressing compatibility; and with a good reason: it will speed adoption of their upcoming 64 bit processors.

Full backward compatibility with exiting 32 bit x86 code base
Same instruction set, registers extended to 64 bits
Full 64 bit flat address model

If you were to ask programmers what the major faults of the existing x86 architecture were, they would give you a short but extremely important list:

Not enough general purpose registers
Stack based floating point processor; with only eight registers
Did I mention not enough registers?
Someone must have been listening. Here are some of the key features of x86-64:

general purpose registers extended to 64 bits
added eight more general purpose 64 bit registers (!!!!!)
added an additional 16 register IEEE standard floating point unit (!!!!!)
They did not stop there. More goodies:

SSE support! with twice the number of registers Intel provides
PC relative addressing for data
same syntax for using 64 bit operations as 16/32 bit operations
prefix byte to allow access to new general purpose registers in 64 bit mode
byte-addressing for the low byte of all 16 general purpose registers
Basically, the changes go a long way to making the instruction set more orthogonal; and the larger register files allow far greater opportunities for code optimization by the compiler writers."

------------------
Tech GOD

[This message has been edited by Rick_James9 (edited October 13, 2000).]
10-13-2000, 12:08 PM
Arcadian

Rick_James9, thanks for replying to this post with such a great topic. I hope other people read far enough into this one to see your link. You might want to repost this as a main topic. But I did want to respond to you post with some insight.

First, the article makes a good point about the compatability aspect of x86-64 architecture. It makes sure to point out that Itanium's 32bit performance is not strong, and even mentions that Intel does not intend for this processor to run in 32bit, and that the capability is only there for compatibility.

I wanted to mention that this should not be taken as a negative to IA-64 architecture. You see, Intel has made sure to research what OEMs are looking for in systems. What they found out after talking to many different companies is that large servers are usually sold as complete packages that are further tested by the companies that sell them. Take Hewlette Packard, for example, who I believe is taking an active part in the Itanium launch.

HP will bundle an Itanium system with the OS and all the software that a customer will want. They will make sure that all the software is 64bit for optimal compatability, and at the same time try to offer the customer as many choices as are available. If there is one application that is 32but that the customer requests, then Itanium can certainly be bundled with it, and not be severely impacted. However, if there are many 32bit applications that the customer desires, HP will still also have Intel's IA-32 processors, which will continue to progress with new technology, and still be compelling choices. However, Intel believes from what they've heard from their customers, that most users of these advanced systems will be able to use complete 64bit solutions. If they weren't sure about this, it is doubtful that Intel would invest so much into Itanium.

As the Itanium platform matures, there will be more software available, which means more choices to the end users, and a larger user base. It's a fairly solid business model that's hard to see from our point of view, Rick_James9.

AMD's business model is different. Even though there will probably be an x86-64 compatible Linux kernel by the time Sledgehammer launches, there will not be too much software at first. Although this will grow with time, AMD knows this, and is prepared to offer Sledgehammer as a strong 32bit processor. As more software is developed, then Sledgehammer will slowly transition to its more powerful 64bit side.

However, this takes time, and Intel will quickly gain market share with their 64bit solution. They've already been nearly guarenteed this through customer feedback and support. AMD will surely transition their processor to the workstation market, where its strong floating point should be very compelling. Butm in terms of the enterprise server, they will have a much tougher time penetrating.

Eventually, AMD wants the Hammer series to be their next desktop chip. With strong 32bit performance, and hopes of more x86-64 software being developed, seeing this processor in the desktop market is inevitable. Intel will be hard pressed to compete with their IA-32 line once 64bit software becomes ubiquitous, but this is very far down the road, and I believe they have solutions already planned.

The article also mentions McKinley, the successor to Itanium. Details about this chip are scarce, but it is said that it will perform in many ways better than Itanium. Before details are released, I think it is presumptuous for the article to already claim Sledgehammer's superiority to McKinley, especially since the two are aimed at different markets.

Itanium is already slated to be available in 8, 16, and 32 procesor systems from NEC and Unisys, and Sledgehammer will not scale nearly as well. AMD has an emerging 2 processor system, which definately breaks the surface, but they have a while to go before they are able to implement the 8-way servers that are on their roadmap, and I have yet to hear of anything greater than that.

I believe there is enough market space for both Intel and AMD to be successful in their own way. I hope they are both successful, because they both offer new and interesting technology that will surely put us into the next generation of computing.
10-13-2000, 01:59 PM
Superwormy

Limiting factor is bus speeds and memory speeds, certaintly not HD speed. Look at the computer when your playing games and such, as long as you have 128 megs of RAM, the little hard drive light rarely comes on. On the other hand, look at the huge FPS difference u get if u go from bus speed of 66 to 133 with the same proccessor.
10-20-2000, 01:15 AM
ziadmo

Well, my opinion is the following: FSB and Hard Drives. I agree that memory is an important peice to speed up the computer's performance and that a lot of RAM is never enough, but as memory can be just adjusted, so we dont need to worry about it . HDs on the other hand, are something that we cannot just ignore it . The HD play many important roles to improve the speed of the boot process , faster loading of programs, and here I am talking about large hard drives with at least 10 000 RPM. The more room u have the more freely u can move on it. If we install a 2 GB HD in a 800 Mhz system, with 128 RAM we will see a huge difference than changing any other peices like memory or video . Also FSB is really slow comparatively to the current speed of processors. Faster FSB means faster connection between the parts and each others, and faster connection btween the RAM and the processor which means a faster performence for overall speed. FSB and HDs are not as easy to replace as the RAM. I also would like to point the importance to cache memory (both L1 and L2) as it also helps loading programs a lot faster. Imagine we have a cache of three time the processor speed ..How fast will the programs load ..For gamers I think that video card and RAM are the major two factors.
10-20-2000, 06:46 AM
frankmccann

The biggest bottleneck is Gates software that uses only 640K upper memory.
Busses used to be a problem and they're getting better.
Perhaps they could change RAM designs so that this "640K" barrier can be jumped with hardware rather than software. The diffrences between hardware and software are blurring more and more.
In five years we'll be laughing about all this.

Quote:

Originally posted by Arcadian:
I wanted to spawn a discussion on system performance. More specifically, what do you think is the biggest bottleneck in a computer system?

These days, we are starting to see diminishing returns for improvements in various hardware. It has gotten to the point where insignificant improvements are taken to be huge leaps in technology. We used to live in a time where the newest video card gave a 20-40% improvement over the competition, whereas today it is more like 5%. CPUs used to give similar increases, and now they, too, are giving < 5% improvements.

So what is the bottleneck? What is preventing our systems from doubling in performance? Is it system memory? Is it the CPU or the video? Is it AGP? Or front side bus speed? Is it the hard disk or removable media? Or is it something else entirely?

Before I give my opinion, I was just curious on what other people thought. So come on... state your opinions. Keep it technical, if you can, and site examples if possible. Web links are really great, too.
10-20-2000, 01:40 PM
Arcadian

Quote:

Originally posted by frankmccann:
The biggest bottleneck is Gates software that uses only 640K upper memory.
Busses used to be a problem and they're getting better.
Perhaps they could change RAM designs so that this "640K" barrier can be jumped with hardware rather than software. The diffrences between hardware and software are blurring more and more.
In five years we'll be laughing about all this.

That was a problem of 5 years ago. x86 archiecture in general must be designed so that it is compatible with the 640k memory boundary (which is actually a 1MB boundary in hardware). Current hardware, since the 386 in fact, has allowed a mode called Protected Mode, which allows for large memory areas. Window's did not fully take advantage of this until Windows 95 gave what appeared to programs as a flat memory region. Windows NT enhanses memory mapping even further. Windows is doing a good job, too, because x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance. For now, the OS is bearly a bottleneck as far as memory mapping is concerned.
10-20-2000, 05:21 PM
Humus

Quote:

Originally posted by Arcadian:
That was a problem of 5 years ago. x86 archiecture in general must be designed so that it is compatible with the 640k memory boundary (which is actually a 1MB boundary in hardware). Current hardware, since the 386 in fact, has allowed a mode called Protected Mode, which allows for large memory areas. Window's did not fully take advantage of this until Windows 95 gave what appeared to programs as a flat memory region. Windows NT enhanses memory mapping even further. Windows is doing a good job, too, because x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance. For now, the OS is bearly a bottleneck as far as memory mapping is concerned.

Would you mind explaining a little better what you mean by "x86 architecture in general is segmented. When IA-64 becomes popular, you will start seeing flat memory ranges, which will give larger performance."?

In what way is IA-32 not flat? You address memory with a single 32bit address like MOV EAX, [EBX]. With EBX = 0 in this case you address the first byte of your program and with EBX = 0xFFFFFFFF you address the last byte in your virtual memory.

BTW, OS memory management is a very important factor for system performance. Every time you encounter a "new" or "delete" statement in C++ that's a call the OS. Same for malloc/calloc/realloc/free etc.
Those are not considered to be cheap operations.

[This message has been edited by Humus (edited October 20, 2000).]
10-20-2000, 05:22 PM
TolTas

My vote is for x86 Code.

Unfortunately until we stop running x86 code which has an efficency rating of less than 60% i believe, there is no way to vastly improve computer speed.
------------------
--> £¿Quanza!¿£ <--

[This message has been edited by TolTas (edited October 20, 2000).]

Show 25 post(s) from this thread on one page