Although very early DRAMs included a buffer allowing multiple column accesses to a single row, without requiring a new row access, they used an asynchronous interface, which meant that every column access and transfer involved overhead to synchronize with the controller. In the mid-1990s, designers added a clock sig-nal to the DRAM interface so that the repeated transfers would not bear that over-head, thereby creating synchronous DRAM (SDRAM). In addition to reducing overhead, SDRAMs allowed the addition of a burst transfer mode where multiple transfers can occur without specifying a new column address. Typically, eight or more 16-bit transfers can occur without sending any new addresses by placing the DRAM in burst mode. The inclusion of such burst mode transfers has meant that there is a significant gap between the bandwidth for a stream of random accesses versus access to a block of data.
To overcome the problem of getting more bandwidth from the memory as DRAM density increased, DRAMS were made wider. Initially, they offered a four-bit transfer mode; in 2017, DDR2, DDR3, and DDR DRAMS had up to 4, 8, or 16 bit buses.
In the early 2000s, a further innovation was introduced: double data rate (DDR), which allows a DRAM to transfer data both on the rising and the falling edge of the memory clock, thereby doubling the peak data rate.
Finally, SDRAMs introduced banks to help with power management, improve access time, and allow interleaved and overlapped accesses to different banks.
2.2 Memory Technology and Optimizations ■ 87
Access to different banks can be overlapped with each other, and each bank has its own row buffer. Creating multiple banks inside a DRAM effectively adds another segment to the address, which now consists of bank number, row address, and col-umn address. When an address is sent that designates a new bank, that bank must be opened, incurring an additional delay. The management of banks and row buffers is completely handled by modern memory control interfaces, so that when a subsequent access specifies the same row for an open bank, the access can happen quickly, sending only the column address.
To initiate a new access, the DRAM controller sends a bank and row number (called Activate in SDRAMs and formerly called RAS—row select). That com-mand opens the row and reads the entire row into a buffer. A column address can then be sent, and the SDRAM can transfer one or more data items, depending on whether it is a single item request or a burst request. Before accessing a new row, the bank must be precharged. If the row is in the same bank, then the pre-charge delay is seen; however, if the row is in another bank, closing the row and precharging can overlap with accessing the new row. In synchronous DRAMs, each of these command cycles requires an integral number of clock cycles.
From 1980 to 1995, DRAMs scaled with Moore’s Law, doubling capacity every 18 months (or a factor of 4 in 3 years). From the mid-1990s to 2010, capacity increased more slowly with roughly 26 months between a doubling. From 2010 to 2016, capacity only doubled!Figure 2.4shows the capacity and access time for various generations of DDR SDRAMs. From DDR1 to DDR3, access times improved by a factor of about 3, or about 7% per year. DDR4 improves power and bandwidth over DDR3, but has similar access latency.
AsFigure 2.4shows, DDR is a sequence of standards. DDR2 lowers power from DDR1 by dropping the voltage from 2.5 to 1.8 V and offers higher clock rates: 266, 333, and 400 MHz. DDR3 drops voltage to 1.5 V and has a maximum clock speed of 800 MHz. (As we discuss in the next section, GDDR5 is a graphics
Best case access time (no precharge) Precharge needed Production year Chip size DRAM type RAS time (ns) CAS time (ns) Total (ns) Total (ns)
2000 256M bit DDR1 21 21 42 63
2002 512M bit DDR1 15 15 30 45
2004 1G bit DDR2 15 15 30 45
2006 2G bit DDR2 10 10 20 30
2010 4G bit DDR3 13 13 26 39
2016 8G bit DDR4 13 13 26 39
Figure 2.4 Capacity and access times for DDR SDRAMs by year of production. Access time is for a random memory word and assumes a new row must be opened. If the row is in a different bank, we assume the bank is precharged;
if the row is not open, then a precharge is required, and the access time is longer. As the number of banks has increased, the ability to hide the precharge time has also increased. DDR4 SDRAMs were initially expected in 2014, but did not begin production until early 2016.
88 ■ Chapter Two Memory Hierarchy Design
RAM and is based on DDR3 DRAMs.) DDR4, which shipped in volume in early 2016, but was expected in 2014, drops the voltage to 1–1.2 V and has a maximum expected clock rate of 1600 MHz. DDR5 is unlikely to reach production quantities until 2020 or later.
With the introduction of DDR, memory designers increasing focused on band-width, because improvements in access time were difficult. Wider DRAMs, burst transfers, and double data rate all contributed to rapid increases in memory band-width. DRAMs are commonly sold on small boards called dual inline memory modules (DIMMs) that contain 4–16 DRAM chips and that are normally organized to be 8 bytes wide (+ ECC) for desktop and server systems. When DDR SDRAMs are packaged as DIMMs, they are confusingly labeled by the peak DIMM band-width. Therefore the DIMM name PC3200 comes from 200 MHz $2$8 bytes, or 3200 MiB/s; it is populated with DDR SDRAM chips. Sustaining the confusion, the chips themselves are labeled with the number of bits per second rather than their clock rate, so a 200 MHz DDR chip is called a DDR400.Figure 2.5shows the relationships’ I/O clock rate, transfers per second per chip, chip bandwidth, chip name, DIMM bandwidth, and DIMM name.
Reducing Power Consumption in SDRAMs
Power consumption in dynamic memory chips consists of both dynamic power used in a read or write and static or standby power; both depend on the operating voltage. In the most advanced DDR4 SDRAMs, the operating voltage has dropped to 1.2 V, significantly reducing power versus DDR2 and DDR3 SDRAMs. The addition of banks also reduced power because only the row in a single bank is read.
Standard I/O clock rate M transfers/s DRAM name MiB/s/DIMM DIMM name
DDR1 133 266 DDR266 2128 PC2100
DDR1 150 300 DDR300 2400 PC2400
DDR1 200 400 DDR400 3200 PC3200
DDR2 266 533 DDR2-533 4264 PC4300
DDR2 333 667 DDR2-667 5336 PC5300
DDR2 400 800 DDR2-800 6400 PC6400
DDR3 533 1066 DDR3-1066 8528 PC8500
DDR3 666 1333 DDR3-1333 10,664 PC10700
DDR3 800 1600 DDR3-1600 12,800 PC12800
DDR4 1333 2666 DDR4-2666 21,300 PC21300
Figure 2.5 Clock rates, bandwidth, and names of DDR DRAMS and DIMMs in 2016. Note the numerical relationship between the columns. The third column is twice the second, and the fourth uses the number from the third column in the name of the DRAM chip. The fifth column is eight times the third column, and a rounded version of this number is used in the name of the DIMM. DDR4 saw significant first use in 2016.
2.2 Memory Technology and Optimizations ■ 89
In addition to these changes, all recent SDRAMs support a power-down mode, which is entered by telling the DRAM to ignore the clock. Power-down mode dis-ables the SDRAM, except for internal automatic refresh (without which entering power-down mode for longer than the refresh time will cause the contents of mem-ory to be lost).Figure 2.6shows the power consumption for three situations in a 2 GB DDR3 SDRAM. The exact delay required to return from low power mode depends on the SDRAM, but a typical delay is 200 SDRAM clock cycles.
Graphics Data RAMs
GDRAMs or GSDRAMs (Graphics or Graphics Synchronous DRAMs) are a spe-cial class of DRAMs based on SDRAM designs but tailored for handling the higher bandwidth demands of graphics processing units. GDDR5 is based on DDR3 with earlier GDDRs based on DDR2. Because graphics processor units (GPUs; see Chapter 4) require more bandwidth per DRAM chip than CPUs, GDDRs have several important differences:
1. GDDRs have wider interfaces: 32-bits versus 4, 8, or 16 in current designs.
2. GDDRs have a higher maximum clock rate on the data pins. To allow a higher transfer rate without incurring signaling problems, GDRAMS normally connect directly to the GPU and are attached by soldering them to the board, unlike DRAMs, which are normally arranged in an expandable array of DIMMs.
Altogether, these characteristics let GDDRs run at two to five times the bandwidth per DRAM versus DDR3 DRAMs.
0 100 200 300 400 500 600
Low power mode
Typical usage
Fully active
Power in mW Background power
Activate power Read, write, terminate power
Figure 2.6 Power consumption for a DDR3 SDRAM operating under three condi-tions: low-power (shutdown) mode, typical system mode (DRAM is active 30% of the time for reads and 15% for writes), and fully active mode, where the DRAM is continuously reading or writing. Reads and writes assume bursts of eight transfers.
These data are based on a Micron 1.5V 2GB DDR3-1066, although similar savings occur in DDR4 SDRAMs.
90 ■ Chapter Two Memory Hierarchy Design