Home

News

Forums

Hardware

CPUs

Motherboards

Video

Guides

CPU Prices

RAM Prices


Sharky Extreme : Forums:


+ Reply to Thread
Results 1 to 9 of 9

Thread: Memory

  1. #1
    Expensive Sushi
    Join Date
    Dec 2000
    Posts
    33

    Post Memory

    Memory, lets talk about it from the new page. My view is initially based on SiSoft Sandra benchmarks.

    1. Tests via ALU show lower results than tests via FPU for all memory types (but for a start I mean only SDRAMs). FPU uses 64-bit data ('double float' type) or 32-bit data ('float' type) while ALU uses 32-bit (long integer), 16-bit (integer) and 8-bit (short integer) data. Memory benchmarks will be the same if test uses only 32-bit binary instructions for ALU part regardless of what it uses for FPU part, or if test uses only 32-bit unary instructions for both ALU and FPU parts, because memory (SDRAM) is 64-bit wide, in the first case an entire block will be equally useful for FPU and ALU (if operands of binary instructions placed continuously), in the second case a block will be "half-useful" again equally. But the test actually doesn't work this way. It utilizes different methods. FPU data is mostly larger and less sensible for erratic accesses. ALU data, especially used by unary operations, requires more linearity but it doesn't get it always.

    Furthermore, SiSoft Sandra uses STREAM test optimized for streams as you guess. Real memory performance is lower, because real programs are much more erratic and they use for many logical evaluations relatively small 16-bit and 8-bit values. Only precise float evaluations are performed with the speed test shows.

    2. DDR-RAM uses same 64-bit data path to move next 64-bits, but only next. From the view of transmission and connection construction it has double frequency, from the view of addressing scheme and performance it has double bit-width. Tests shows that FPU bandwidth is increased by half and ALU bandwidth is increased by quarter. I have rounded numbers to show only tendency. Only binary 64-bit instructions work on full speed of the memory, also particular linearity helps reach 150% of non-DDR speed. In the same time only rare continuous memory addressing increases utility of the "128-bit" memory.

    Real programs benefit from main DDR-RAM a little, and I ensure you programmers try as they can. But probably there is a hidden potential somewhere, I don't know. As I suggested before DDR-RAM is very effective in the real-performance/price term for using by GPU.

    3. PC800 systems have triple memory performance over PC133 systems. Dual RIMM subsystem has 64-bit addressing (including DDR technology) and 400 MHz FSB. Why it can't have 200% bonus? So it has. I will not talk about latency, because there are a lot of enthusiasts. In the light of this I don't want to quarrel with anyone. Otherwise I, as a reasonable debater, need details which Intel didn't print for public reading.

    I think RD-RAM is best choice for using by CPU, and price isn't a matter if you want ultimate performance.


    P.S. Probably my English is getting better, but I still use Word.

    ------------------
    Don't read style, read meaning.
    Don't read style, read meaning.

  2. #2
    Reef Shark Marsolin's Avatar
    Join Date
    Nov 2000
    Location
    Austin, TX, USA
    Posts
    378

    Post

    It will be interesting to see how performance compares between a P4 system using the current 850 RDRAM implementation and a DDR chipset in the future. For the first time we'll be able to see a head-to-head comparison on a processor capable of using the full bandwidth of each.

    My feeling right now is that DDR will be slightly better on low bandwidth programs, but that RDRAM's bandwidth lead will rule the more demanding applications. RDRAM really needs to show strong benefits though if it is going to overcome the current negative public opinion.

    It will also be interesting to see how performance on Quake 3 changes. How many fps come from the processor and system bus, and how many can be attributed to memory. I expect RDRAM to maintain the lead in this program, but by how much is the real question.


  3. #3
    Goldfish
    Join Date
    Oct 2000
    Location
    Ohio, USA!
    Posts
    59

    Post

    So are you ripping on sandra? I don't completely understand your point.
    People are idiots!

  4. #4
    Hammerhead Shark
    Join Date
    Sep 2000
    Location
    Edinburgh,UK
    Posts
    1,188

    Exclamation

    As Sisoftware say: all of these benchmarks are synthetic are may not tally with real world performance.

    ------------------
    Stoo
    Stoo

  5. #5
    Expensive Sushi
    Join Date
    Dec 2000
    Posts
    33

    Post

    Originally posted by smktr:
    ----
    So are you ripping on sandra? I don't completely understand your point.

    Sandra tests memory in the two ways: via ALU and via FPU. Seeing differing results I had supposed causes. And I was interesting what people think (while reading whole article), but I don't mean Sandra's benchmarks are the thread's topic.


    Originally posted by stoo:
    ----
    As Sisoftware say: all of these benchmarks are synthetic are may not tally with real world performance.

    Of course, real world performance is affected by the program type. But the test stresses mostly memory, and it uses same algorithms for all kinds of memory. So we still can see performance difference at least using test's algorithm.

    ------------------
    Don't read style, read meaning.
    Don't read style, read meaning.

  6. #6
    Sphyrna Mokarran awa64's Avatar
    Join Date
    Jan 2001
    Location
    Cyberspace
    Posts
    5,163

    Post

    I know this might not be of as much help as it could be, but I remember a post by Arcadian with some kind of benchmarks, and an Athlon 1.2GHz Processor with 128MB DDR RAM beat a P4 1.5GHz with 128MB RDRAM. Unfortunately, I don't know specific scores, and I don't know which benchmark, but it was within 10-20 points from first place and third place.

    ------------------
    When it comes to weirdy, paradoxy space stuff, been there done that bought the t-shirt.
    My K5

  7. #7
    White Shark Moderator Moridin's Avatar
    Join Date
    Sep 2000
    Posts
    5,351

    Post

    Two things to remember about memory and benchmarks. The first is that processors don’t load individual memory locations from main memory they fill cachelines. When the PIII loads data from main memory it does so in 32 Byte blocks. The Athlon has a 64 Byte cacheline and therefore loads 64 Bytes at a time. The P4 has 128 Byte cacheline but can fill it as two blocks of 64 Bytes. This is true even if you only want to use 4 or 8 bytes of that data.

    The second thing you should remember is that FPU intensive apps tend to have more parallelism/fewer dependencies. This means they tend to be more sensitive to bandwidth and less sensitive to latency then Integer applications. This is not always true, but it does tend to be true.


    ------------------
    Make it idiot proof and someone will make a better idiot.
    "Never attribute to malice that which can be adequately explained by stupidity."

    "Any sufficiently advanced incompetence is indistinguishable from malice."

  8. #8
    White Shark Moderator Moridin's Avatar
    Join Date
    Sep 2000
    Posts
    5,351

    Post

    Originally posted by m538:

    Of course, real world performance is affected by the program type. But the test stresses mostly memory, and it uses same algorithms for all kinds of memory. So we still can see performance difference at least using test's algorithm.


    The way it uses memory may differ from the way real applications do. Any benchmark really only tells you how well a similar application may perform. Even applications that are similar in style like UT and QIII may use different algorithms and have different data access patterns and therefor yield completely different results when used as a benchmark.

    In other words any benchmark only gives you useful information about similar apps and synthetic benchmarks may not be similar to any real application.



    ------------------
    Make it idiot proof and someone will make a better idiot.
    "Never attribute to malice that which can be adequately explained by stupidity."

    "Any sufficiently advanced incompetence is indistinguishable from malice."

  9. #9
    Expensive Sushi
    Join Date
    Dec 2000
    Posts
    33

    Post

    Cache vs. NUMA

    For acknowledgement: while cache duplicates main memory or another cache, NUMA technology will combine memories with different speeds in one array. That's because cache is very little comparing to main memory and main memory of cache's size costs less then NUMA technology integrated into a chipset. But parts that will be combined by NUMA are comparable in the term of size.

    For example main memory may consist of 64Mb 2Gb/s module 128Mb 1Gb/s module and 256Mb 0.5Gb/s module, where 64Mb holds OS and active task, 128Mb holds active data and 256Mb plays role of the swap file. In the case of a video card 32Mb may be 32Gb/s embedded memory, 64Mb connected directly and 128Mb used as aperture.

    I only don't know and can't guess: will NUMA functions be supported via chipset/bios or OS? Opinions?

    ------------------
    Don't read style, read meaning.
    Don't read style, read meaning.

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts