A Performance Comparison of Contemporary DRAM Architectures .


93 views
Uploaded on:
Category: Food / Beverages
Description
About the Authors. Trevor MudgeProfessor of EE and CS at University of MichiganPh.D: University of IllinoisResearching:Comp. Frameworks designParallel ProcessingComp. Helped DesignImpact of Technology on Comp. Construction modeling. About the Authors. Brian DavisProfessor of E
Transcripts
Slide 1

A Performance Comparison of Contemporary DRAM Architectures Vinodh Cuppu, Bruce Jacob University of Maryland Brian Davis, Trevor Mudge University of Michigan

Slide 2

About the Authors Trevor Mudge Professor of EE and CS at University of Michigan Ph.D: University of Illinois Researching: Comp. Frameworks configuration Parallel Processing Comp. Supported Design Impact of Technology on Comp. Design

Slide 3

About the Authors Brian Davis Professor of E & C Engineering at Technical University of Michigan Ph.D: University of Michigan, Nov 2000 M.S. in CE at University of Michigan, Nov 1991 Research: New sorts of Hardware Description Language; particularly to empower more deliberate techniques for outlining capable DRAM designs.

Slide 4

About the Authors Bruce Jacob Professor of E & C Engineering at Institute for Advanced Comp. Learns at University of Michigan Ph.D: University of Michigan, 1997 M.S. in CS & E at University of Michigan, Nov 1995 A.B. in Math, cum laude at Harvard University, 1988 Current Research: Energy use and voltage scaling in inserted frameworks

Slide 5

About the Authors Vinodh Cuppu Digital IC Logic Designer at Xtremespectrum, Inc. M.S. in E & C Engineering at University of Maryland, Aug 2000 B.E. in E & Communication Engineering at Unversity of Madras, India, May 1997 Research: Has distributed some all around respected papers on DRAM and keeps on demonstrating DRAM in various situations, particularly to check whether it could be utilized as a part of inserted applications

Slide 6

Abstract In reaction to the developing hole between processor speed and primary memory get to time, numerous new DRAM models have been made. This paper tests the execution of a delegate set of the structures to perceive how all they react to this pattern. The models tried are: Fast Page Mode Extended Data Out Synchronous Enhanced Synchronous Link Rambus Direct Rambus

Slide 7

Conventional DRAM

Slide 8

Conventional DRAM

Slide 9

Conventional DRAM

Slide 10

Conventional DRAM

Slide 11

Conventional DRAM

Slide 12

Conventional DRAM

Slide 13

Questions 1. What is the impact of upgrades in DRAM innovation on the memory inactivity and transfer speed issues? 2. Where is time spent in the essential memory framework? What is the execution advantage of abusing the page method of contemporary DRAM? 3. What amount of region is there in the address stream that achieves the essential memory framework?

Slide 14

Observations There is a one-time tradeoff between cost, transfer speed and inactivity… various DRAMs on same transport with transport advancements (|request| ~> |transfer|) anything better requires quicker transport and center 2. future transport advances will uncover push get to time as the essential execution bottleneck… broadening transports introduce a clearer perspective of area, so push hits are key transports … can\'t split the inertness of a transport half as wide despite the fact that the best latencies are seen from transports as wide as the L2 store, they aren\'t exactly savvy 4. ...basic word first does not blend well with burst mode burst mode is probably going to convey unneeded information utilizing a beginning square out of address request … the revive system utilized can altogether adjust the normal memory get to time can add hold up cycles to line and segment get to

Slide 15

Architectures: Fast Page Mode Holds push open after first section is sent, in idealistic expectation that the following access will be for an alternate segment in a similar column.

Slide 16

Architectures: Extended Data Out Added information hook holds section information instantly subsequent to detecting. This permits another exchange or a revive to start when the segment get to is done

Slide 17

Architectures: Synchronous DRAM Often has a programmable cushion so it can return information over different cycles per ask for, making information accessible each clock cycle. Transmits on clock cycles, making timing strobes from the memory controller superfluous.

Slide 18

Architectures: Enhanced SDRAM and Synchronous Link DRAM Enhanced SDRAM speedier inside planning SRAM push reserves added to permit EDO-like conduct, specifically the capacity to fulfill demands for the stored push while liberating the bank up to do different things. Synchronous Link DRAM open engineering, provided by IEEE utilizes a packetized split demand/reaction convention most altogether, it can bolster different simultaneous exchanges (on the off chance that they reference one of a kind banks)

Slide 19

Architectures: Rambus DRAM Uses a multiplexed address/information transport, so it limits correspondence to once every 4 cycles. Transmits on both the rising and falling clock edges, achieving a hypothetical most extreme of 600 Megabytes for each second. Because of inner division of banks, up to 4 columns can stay open without a moment\'s delay.

Slide 20

Architectures: Direct Rambus DRAM Faster center and transmission on both clock edges yields a hypothetical greatest data transfer capacity of 1.6 Gigabytes for each second. Partitioned into 16 banks, utilizing 17 half-push cushions shared between each combine, constraining the measure of banks that can procedure exchanges in parallel additionally diminishing the item estimate. Utilizes a 3 vast channel rather than Rambus\' single extensive channel and sends directions more than one byte width and information over the other two. In particular, Direct Rambus does not multiplex its transport and has its interior structures masterminded in such a way, to the point that it can benefit 3 up to exchanges in the meantime.

Slide 21

Methodology: Basis Extensions composed for SimpleScalar, a forceful out-of-request processor test system, with the goal that it would demonstrate the DRAM designs portrayed. A great deal of the memory get to time is covered with guideline execution in SimpleScalar, so two additional reenactments were run; one where transport transmission was prompt and another where memory operation is immediate, and the accompanying formulae connected to the outcomes: Tp = time preparing, Tl = memory inactivity slows down, To = covered mem. get to Tu = executive. time with immediate transfer speed, Tm = add up to mem. get to time Tb = memory transfer speed slows down, T = add up to genuine execution time Tl = Tu – Tp Tb = T – Tu To = Tp – (T-Tm) Now memory get to time can be isolated out into various classes of slows down and the measure of time transmission capacity and inactivity were covered.

Slide 22

Methodology: Simulated engineering Timing data for DRAM parts was found in specialized reports. Ran the mimicked L2 store at velocities of 100ns, 1ns, scaling the CPU speed to match (CPU speed = 10x L2 speed). Reenacted engineering: Processor: eight-route superscalar, out of request Caches: L1: without lockup split (64K/64K), 2-way set associative with 64-byte linesizes L2: bound together 1MB, 4-way set cooperative with a 128-byte linesize and compose back, sans lockup, however just permits one outstanding demand at once This speaks to a typical workstation of the time (1999).

Slide 23

Methodology: Balancing the structures I nterleaving : Since the demand size is 8 times the move measure in the mimicked association picked, DRAM get to is a pipelined operation. Alternate DRAMs would pick up an out of line preferred standpoint over FPM and EDO DRAM since both are not interleaved. The creators displayed interleaved forms that could fill the memory information transport however much as could reasonably be expected independently. These variants are named FPM3 and EDO2. FPM1 is \'cynical\', it shuts the got to push and precharges instantly. FPM2 is \'hopeful\', it holds the got to line open and deferrals precharge. Transport Structure: SLDRAM, RDRAM, and DRDRAM all utilization smaller, higher-speed transports and are recreated on a solitary width transport in serial. This brings about an additional piece of inactivity since the recreated memory controller needs to blend transport parcels into legitimately estimated squares to send over the basic transport utilized for whatever is left of the reenactments, which is more extensive. To improve this, exchange time over the tight channel is taken to be quick.

Slide 24

Preliminary Results: Refresh Handling DRAM invigorate can influence execution significantly All DRAMs however Rambus have 64ms revive time Rambus has a 33ms revive time and can revive inner banks independently instead of a whole framework at any given moment. This is the reason for perception 5. Since the time-mixed plan is so much better, it was utilized for every one of the DRAMs. This puts every one of the designs on an all the more notwithstanding balance.

Slide 25

Results: Total Execution time Interleaved DRAMs improve (FPM3 & EDO2) Pessimistic FPM1 shows improvement over Optimistic FPM2 since revive takes somewhat longer than line get to. Are more current DRAMS experiencing difficulty staying aware of CPU speed? Is memory data transmission truly the greatest benefactor to DRAM stoppage? A considerable measure has been done to expand memory transmission capacity, yet shouldn\'t something be said about idleness?

Slide 26

Results: Performance breakdown FPM is the slowest Interleaving is great, as is negative technique EDO utilizes essentially an indistinguishable innovation from FPM, yet is speedier because of better design SDRAM is quicker still and ESDRAM is far and away superior since it changes timing and adds a SRAM store to enhance simultaneousness SLDRAM and Rambus have higher get to time contrasted with SDRAM and ESDRAM due with transport pressing SLDRAM and RDRAM make twice the same number of information exchanges as DRDRAM, and if "… they had been composed… to put them on a notwithstanding balance with DRDRAM… their latencies would be 20 to 30% lower." "The parallel-channel comes about exhibit the disappointment of a 100MHz 128-piece transport to stay aware of today\'s speediest parts."

Slide 27

Results: Parallel channel DRAM and transfer speed The parallel transport structures (SLDRAM, RDRAM and DRDRAM) have a significantly bigger extent of their get to time tied up in Bus Transmission Time. Accelerating the transport would make these run speedier, and has been done fourfold since this present paper\'s chance. What is the impact however? With Bus Transmission Time diminished, inactivity turns into the biggest proport

Recommended
View more...