DRAM
DRAM Circuits
Organization
Interfaces
Shih-Lien Lu
IEEE Microarchitecture Conference 2016
Acknowledgement: Dr. Shigeki Tomishima
© 2015 TSMC, Ltd
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
2
Cost of Memory with Time
1.E+09
Flip-Flops
1.E+08
Core
1.E+07
Memory Price ($/MB)
ICs on boards
1.E+06
SIMMs
1.E+05
DIMMs
1.E+04
1.E+03
1.E+02
1.E+01
1.E+00
1.E-01
1.E-02
http://www.jcmit.com/memoryprice.htm
1.E-03
Year
© 2015 TSMC, Ltd
1955
1960
1965
1970
1975
1980
1985
1990
1995
2000
2005
2010
2015
2020
3
DRAM Scaling - Capacity
http://www.memcon.com/pdfs/proceedings2014/MOB102.pdf
© 2015 TSMC, Ltd
4
Introduction
 Memory is a critical component
 For more than 4 decades DRAM is the key
technology to implement main memory

Many amazing innovations lead to cost per bit reduction
 143M

folds in 41 years (1/2 cost every 18 months in avaerage)
Standardization of interface
 Recent trends


Market segmentation
Technology challenges
 Scaling
slowing down?
 Power?



© 2015 TSMC, Ltd
DRAM internal modification?
Interface diversification + 3D
Reliability enhancement
5
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
6
DRAM – Starting Out
 An important invention

1968 patent by R. H. Dennard
 1T1C

cell
1970 W. Regitz ISSCC paper
 3T
cell
 Intel 1103 (3T DRAM)
Introduced 1970
 PMOS

based
1st commercial 1Kb DRAM
Widely used by HP 9800 and PDP-11
Data In

Read WL
Data Out

Write WL
Intel Basic RAM Dynamic Cell
K. Itoh, IEEE SSCS News, Winter 2008
G. Hendrie, “Oral History of Joel Karp,” Computer History Museum 2003
W. Regitz, J. Karp, “A three transistor-cell, 1024-bit, 500 NS MOS RAM,” IEEE JSSC 1970
The Intel Memory Design Handbook, August 1973
© 2015 TSMC, Ltd
7
DRAM Development Innovations
Memory Cell
Stacked Cap. + Planner Tr.
Planner Cap.
+
Planner Tr.
Stacked Cap.
+
Recessed Tr.
(RCAT-SRACT-URACT)
Trench Cap. + Planner Tr.
Cell Area
22F2
16F2
4F2
8F2
6F2
(Elpida)
4F2
Sub-Array Architecture Technology
Open BL
Folded BL
Open BL
Synchronous
DDRx, LPDDRx,
GDDRx, HBM
WIO, WIO2
(LRDIMM, FBDIMM, RDIMM)
Interface Technology
Asynchronous
Many Innovations + Much Hard Work
© 2015 TSMC, Ltd
8
Cross Section View - Cartoon
 Two areas


Memory array area
Standard logic area
M3
Array
Peripheral
V2
M2
V1
M1
Cell Plate
CH (W)
Cell Capacitors
M0 (W)
BL
Poly Gate
WL
STI
Si Sub
© 2015 TSMC, Ltd
M0 CH
Si Surface
(Diffusion)
9
DRAM Cross Section (1)
http://www.maltiel-consulting.com/Hynix-DRAM-31Vs44nm-layout.html
© 2015 TSMC, Ltd
10
DRAM Cross Section (2)
http://www.eetimes.com/document.asp?doc_id=1281315
© 2015 TSMC, Ltd
11
DRAM Cross Section (3)
http://www.ma-tek.com/industry_detail.php?cpath=23
© 2015 TSMC, Ltd
12
Cell Operation
TSMC Property
 Destructive read



 Parameters
Precharge bitline (BL)
Fire wordline (WL)
Develop a V diff

 Vcca
= 1.2V
 Half Vcc = 0.6V
 Vccp = 3.2V
 Vbb = -0.6V
 Vnwl = -0.3V
 Vb=(CbVp+CsVs)/(Cs+Cb)
 DVb=Vb-Vp
is large
enough for sensing

Write back

 Direct write

BL
© 2014 TSMC, Ltd
(1/2 Vcca ~ 0V or Vcca)
13
BL driven to high or
low then forces in cell
WL
CP (1/2Vcca)
P-Well
(Vbb)
(Vnwl ~ Vccp)
Typical voltages
Capacitor
 Ta2O5
 25-30fF

Access Transistor
Characteristic
 Vt
= ~0.9V
 Id = ~10uA@cell
 Ioff = ~10fA@cell
Cell Operation
TSMC Property
 Read

 Write
Cross-couple latch SA
 Timed


enabling

equalization

Detect ~100mV diff
Vth imbalance sensitive


…
CL_S
© 2014 TSMC, Ltd
Vcc/2
EQ
14
EQ
ISO
SA
A simplified ckt for folded bitline architecture
LDQ#
LDQ
SAP
Cells
Data Write Sequence
1. BL (H)
2. WL (H) – On
3. WL (L) – Off
4. BL (M)
ISO
BL#
Takes time to restore
Without full restoration cell
charge deteriorates
Data Read Sequence
1. BL (M)
2. WL (H) – On
3. BL (M+a) ; a=DVBL
4. WL (L) – Off
SAN
BL
Many rows each AR
Row address refresh
 Read
 Write back

Isolated by CSL
Cannot write all cells
 Refresh
Half-Vcc precharge
 With


DRAM Internal Timing
(Micron TN-40-03: DDR4 Networking Design Guide)
(WL likely under driven)
SA firing
Isolate Equalize
Vcc/2
LDQ#
LDQ
SAP
© 2015 TSMC, Ltd
SA
EQ
CL_S
EQ
ISO
BL#
…
ISO
SAN
BL
Cells
15
Open Bitline Circuit and Timing
Sense Amp (S. A.)
Main Amplifier
SA firing
Column select
M. Inoue el. al. “A 16-Mbit DRAM with a Relaxed Sense- Amplifier- Pitch Open-Bit-Line Architecture,” JSSC 1988
© 2015 TSMC, Ltd
16
Physical Layout
WL0
WL2
WL4
WL1
WL3
WL5
WL0
WL2
WL4
WL1
WL3
WL5
BL0
BL0
2
BL1
BL2
2
BL1
BL2
BL3
BL4
BL3
BL4
BL5
BL5
4
3
6F2 Cell (Open BL Architecture)
8F2 Cell (Folded BL Architecture)
3F
2F
T. Takahashi et. al. “A Multigigabit DRAM Technology With 6F2 Open-Bitline Cell, Distributed Overdriven Sensing, and Stacked-Flash Fuse,” JSSC 2001
S. Lu et. al. “Improving DRAM Latency with Dynamic Asymmetric Subarray,” IEEE Symp. On Microarchitecture, 2015
© 2015 TSMC, Ltd
17
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
18
DRAM Device Block Diagram
Data IO
Core Array
CLK/CMD Inputs
Address Inputs
Y-Control (Column)
2 Meg x 4 Memory Array with SDR and DDR Interface
SRC: Micron TN4605.pdf
© 2015 TSMC, Ltd
19
Core Array Example
1Gb DDR3 Internal Organization
256 SA
256 BLs
512 WLs
X
512 BLs
IO/Pads Area
512 of bit/bit#
512 of bit/bit#
bank
4Mb subarray w/16 tiles
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15
4Mb
512b x 16 = 8Kb page
512 x 8k = 4M
© 2015 TSMC, Ltd
4Mb
Col dec
64Mb
512 WLs
X
512 BLs
Row Clock/Addr Spine
256Kb
64Mb
256Kb
256 LW
256 SA
256 LW
256 LW
256 SA
Each bank
128Mb
(8x128Mb=1Gb)
bank
256 LW
256Kb
256 LW
512 WLs
SA Cont
Global RD
256 SA
256 BLs
256 LW
Global RD
SA Cont
64Mb
SA
BL
64Mb
256 LW
256 LW
MWL
64Mb half bank
16+1 4Mb subarrays.
4Mb
4Mb
Col dec
20
Subarrays – Half Bank with Redundancy
Physically
516 WLs
Physically 8704 (8192+512)
2 dummy WLs
Subarray 0
2 dummy WLs
2 dummy WLs
516 WLs
Subarray 1
2 dummy WLs
2 dummy WLs
516 WLs
Subarray 15
2 dummy WLs
8256 WLs (Physically)-8192 WLs (Logically) = +64 (Redundancy)
© 2015 TSMC, Ltd
21
Subarrays/Tiles
Edge Subarrays
Normal Subarrays
Subarray 0
BL
BL
Subarray i
SA
Half row buffer
SA
SA
SA
SA
SA
SA
SA
Subarray 1
BL#
BL#
BL#
Subarray i+1
SA
Half row buffer
SA
SA
SA
SA
SA
SA
Subarray 3
Subarray 4
Subarray 5
Subarray 6
Subarray 7
Subarray 8
Subarray 9
Subarray 0’
BL
Subarray 2
Subarray 1
SA
SA
Subarray 0’
Subarray 0
SA
T
B
T
B
T
B
T
B
T
SA
SA
SA
SA
T
© 2015 TSMC, Ltd
B
22
Cells to Chip
RED : True Path
BLUE : Complement Path
OPEN BL Architecture
WL0
BL
Subarray 0
Local Amp
SA
LA
Global Amp
GA
WL524
Subarray 1
BL#
Memory Array
DQ
Tx/Rx
DRAM Chip
DQ
Even data is “1” at DQ,
Physical charge in the memory cell depends on which subarray
© 2015 TSMC, Ltd
23
Detailed Signal Path
Half Normal
Half Dummy
WLb
WLa
BL#
SA
GIO
GIO#
BL
RED : True
BLUE : Bar
CS0
LIO
LIO#
BL /BL
LIO /LIO
GIO /GIO
Rd
1st
Amp
DQ
WLa
H
H
L
H
L
H
L
H
H
WLb
L
H
L
H
L
H
L
H
H
Data Status at DQ Pins is not always equal to Data Status at Cells.
© 2015 TSMC, Ltd
HVcc
SA
SA
Half Normal
Half Dummy
Cell
BL
SA
BL#
SA
SA
SA
Subarray0
BL#
BL
SA
SA
SA
SA
BL
SA
SA
SA
SA
Subarray15
Subarray14
SA
BL#
SA
B#L
SA
BL
HVcc
Subarray2
SA
Subarray1
Subarray0
2nd
Amp
Rd
(True)
Write
Driver
Wd
(True)
Tx
DQ
Rx
24
CSL (Column Select Line) Architecture
CSL is common to all subarrays
4Mb
64Mb
IO/Pads Area
Row Control/Add Spine
64Mb
64Mb
4Mb
4Mb
EX: M3 (AL) : ~2700um
Ctotal = ~0.6pF + 10% (Cg)
Rtotal =~700 Ohm
© 2015 TSMC, Ltd
1K Col dec.
In the case of x8 DDR3
1:128 decode
64bits come out@ each 4Mb w/8 burst
25
IO Line Architecture
SA Cont
512 WLs
X
512 BLs
2576 LWD
256Kb
256 LWD
256 LWD
Global RD
LIO Pair 4b
256 SA
256 SA
SA Cont
SA
BL
256 SA
256 SA
LIO Pair 4b
32 SA Bands and 32 SA Bands and
33 4Mb sub-arrays.
33 4Mb sub-arrays.
GA
GIO Pair 4b
GIO Pair 4b
GA
GA
Col Dec.
© 2015 TSMC, Ltd
LA : Local Amplifier
- Many circuit variaty
- Voltage Sensing
- Write Path has MUX gate
GIO : Global IO Pair
- Vcc Precharge & Pull-down
Coldec.
Coldec.
GIO Pair 4b
LIO : Local IO Pair
- Half-Vcc Precharge at Std-by
- After Subarray is selected,
Vcc Precharge & Pull-down
GA : Global Amplifier
- Many variations
- Vcc Precharge & Pull-down
- Voltage/Current Sensing
- Write Path has Write Driver
26
Circuit Elements and Operation





© 2015 TSMC, Ltd
Decoder
Wordline driver
Column muxing
Local to global bitline
Where does it make sense to add logic?
27
DRAM Timing
tAC
Addr decode
Signal devel & sensing
Data out
tRC = tRAS + tRP
Addr decode
Signal devel & sensing & recovery
Precharge
© 2015 TSMC, Ltd
WRITE
BL Equalize
BL “H” to ~98%
CSL Fire
T1
T2
WL
Close
SA Flip By New Write
tRCD
RAED
BL Pair Split to certain %
ACT
SA Fire
CMD/ADD
Decoding
Redundancy
Activate WL
tRAS > tRCD + T1 + tWR – T2 – T3
T3
tWR
PRE
28
DRAM Interface Timing Parameters (1)
 tCL (or tCAS - CAS Latency)


tCCD (Column Address to Column Address Delay)


Once we send the memory controller a row address (through RAS), we'll have to wait this many cycles before accessing
one of the row's columns. So, if a row hasn't been selected, this means we'll have to wait tRCD + tCL cycles to get our
result from the RAM.
tRP (Row Precharge Time)


This is the minimum number of cycles between consecutive column accesses (CAS) to the same row.
tRCD (Row Address (RAS) to Column Address (CAS) Delay)


This is the most important memory timing. CAS stands for Column Address Strobe. If a row has already been selected, it
tells us how many clock cycles we'll have to wait for a result (after sending a column address to the RAM controller).
If we already have a row selected, we'll have to wait this number of cycles before selecting a different row. This means it
will take at least tRP + tRCD + tCL (tRC) cycles to access the data in a different row.
tRAS (Row Active Time)

This is the minimum number of cycles that a row has to be active for to ensure we'll have enough time to access the
information that's in it. This usually needs to be greater than or equal to the sum of two previous defined latencies (tRAS
>= tCL + tRCD)
 tRC (Random Cycle Time or Row Cycle Time)


tRRD (Row Active to Row Active Delay)

© 2015 TSMC, Ltd
This is the time in cycles between two accesses from different rows in the same bank. In other word the time between
two successive ACTIVE commands to the same bank. (tRC=tRAS+tRP=tCL+tRCD+tRP)
The minimum time interval in cycles between two successive ACTIVE commands to the different banks is defined by
tRRD.
29
DRAM Interface Timing Parameters (2)

tWR (Write Recovery Time)


tRD (Read Delay)


This is the number of cycles needed to perform a refresh. As soon as the tRFC time elapses, the memory controller can
issue four consecutive Activate commands to different banks in the rank.
tREFI (Refresh Interval Time)

© 2015 TSMC, Ltd
This specifies the time window in which four activates are allowed the same rank.
tRFC (Refresh Cycle Time)


Number of clocks inserted between a read command to a row pre-charge command to the same rank.
tFAW (Four Activate Window Time)


This is the number of cycles needed to be inserted between read command and a subsequent write command on
different rank for data turn-around.
tRTP (Read to Precharge Delay)


This specifies the number of clock between the last valid write operation and the next read command to the same
internal bank.
tRTW (Read to Write delay)


This is the number of memory clocks from DRAM Chip Select# assert to data ready.
tWTR (Write to Read command Delay / Write to Read Delay)


This is the number of clock cycles taken between writing data and issuing the pre-charge command. tWR is necessary to
guarantee that all data in the write buffer can be safely written to the memory core.
It is the window of time for each refresh command so DRAM cell does not lose its charge and corrupt. Its value depends
on capacity and number of rows in a bank and is measured in micro-seconds (µsec)
30
Example of Internal Change: Asymmetry
 Mixed cell design (technology)


Differentiation of reads and writes
Static vs dynamic
 Hybrid array (circuit/design)

© 2015 TSMC, Ltd
“Improving DRAM latency with dynamic asymmetric
subarray,” S. L. Lu et. al. MICRO 2015: 255-266
31
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
32
DRAM Interface & Bank vs. Rank
Multiple DRAM devices in parallel for a given rank
Number of devices depends on capacity and width
4 Banks
chip
One row spans multiple DRAM chips
Std Non-ECC
DIMM width
is 64b
SRC: David Wang UMD Thesis
© 2015 TSMC, Ltd
33
DRAM Std Bandwidth Trends
Parameters
DDR
Data Rate
(GT/s)
0.4
Bandwidth
(GB/s)
0.8
Latency (ns)
DDR2
DDR3
Prefetch
0.8
(PF)
PF
2X
2X
1.6
DDR4
Bank
3.2
Group
1.6
3.2
6.4
>30
30
27.5
Random
Access (ns)
13.5
6.75
Active
Energy(pJ/b)
150
4
Bank No.
512
IO
0.2
WIO2
PF4X
0.8+
HBM
1024*
IO
1.0
12.8
51.2
27.5
>30
>30
30
6.25
5.75
4
NA
4
50
48
34
34
~10
~5
4
8
8
4
8
8/ch
BW growth of SDR, DDR, DDR2,DDR3 comes
from prefetch factor and IO frequency, row
address access remain relatively the same
© 2015 TSMC, Ltd
WIO
128*
BW increases from DDRx to WIO to
HBM due to wider DQs and higher
frequency
SRC: Joe Ting, Piecemakers
34
Simplified DRAM Device State Diagram
Idle Mode
Primary Command
(Row Operation)
ACT : Row Activation
PRE : Row Precharge
Row Mode
Column Mode
Secondary Command
(Col. Operation)
READ : Data Read
WRITE : Data Write
Row Mode
Micron DDR3 Datasheetarge
© 2015 TSMC, Ltd
35
DDR2/3 Command Truth Table
Row
Col WR
Command and Address pins are DDR
© 2015 TSMC, Ltd
36
?n-Prefetch of DDRx
DDR
DDR
DDR2 is 4n-Prefetch
DDR3/4 is 8n-Prefetch
LPDDR4 is 16n-Prefetch
DDR2 has a prefetch buffer of depth 4
DDR3/4 has a prefetch buffer of depth 8
LPDDR4 has a prefetch buffer of depth 16
SRC: Micron Technical Note TN-46-05
© 2015 TSMC, Ltd
37
DDR READ Example
SRC: Micron Technical Note TN-46-05
© 2015 TSMC, Ltd
38
LPDDR2/3 Command Encoding
© 2015 TSMC, Ltd
39
LPDDR4 Command Table
© 2015 TSMC, Ltd
40
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (20 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (10 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
41
LPDDRx/WIOx/GDDRx
 DDRx chips put in DIMMs usually
 LPDDRx direct bond

Wider IO
 WIOx


Multiple channels
For stacking with APU/CPU directly
 GDDRx


© 2015 TSMC, Ltd
Higher frequency
Direct bond
42
High Bandwidth Memory
 Stacked DRAM for graphics and HPC
 Spec overview




2 channels per die (ec channel is similar to std DDR)
128b data IOs (DDR) per channel
500Mhz – 1GHz clock translate to 16–32 GB/s BW per ch
Up to 8 dice stack with each die 8Gb (2nd Gen)
 New features




Per-bank refresh
Temperature compensated self-refresh
DBI
ECC support (optional)
 JEDEC Std
 Vs. HMC (Micron version of 3D stacked DRAM)
© 2015 TSMC, Ltd
43
HBM with 4 DRAM dice and 1 Logic Die
© 2015 TSMC, Ltd
SRC: D.U Lee, JSSCC Jan. 2015
44
Specialty DRAM - RLDRAM (vs. LRDIMM)
 RLDRAM (Reduced Latancy DRAM)



Low tRC at the cost of density
SRAM-like interface (not address multiplexing)
RLDRAM-II
 576Mb
(x9, x18, x36)
 400-533
MHz clk
 BL=2
 tRC
= 15ns (6 or 8 cycles)
 4GB/s
BW max
 LRDIMM (Load-Reduce DIMM)



©
Larger capacity
Multiple
Buffered cmd/addr and data
SRC: Micron RLDRAM datasheet and Inphi whitepaper
(https://www.inphi.com/products/whitepapers/Inphi_LRDIMM_whitepaper_Final.pdf)
2015 TSMC, Ltd
45
DRAM Cell Retention
BL
 Leakage paths of a DRAM cell
1)
Sub-threshold leakage
 Process
 WL
2)
plate(1/2Vcc)
(Vneg)
3)
dependent
1)
(Vneg) vs. Vth setting vs. GIDL
2)
P-Well

BL in precharged state “Hi” is better

BL swinging then “Hi” and “Lo” equally probably Other factors affecting cell
Drain leakage
 Junction
 Vneg

3)
WL
profile/voltage/GIDL
(Vbb)
retention time:
1) Process defect
2) Data restoration
3) Sense amp offset
makes GIDL worst leads to “Hi” is worse
“Hi” is worse
Cell capacitor wall leakage
 Metal-Insulator-Metal

(MIM) defect
“Hi” and “Lo” same
1. M. A. Pawlak et. al., “Enabling 3X nm DRAM: Record low leakage 0.4 nm EOT MIM capacitors with novel stack engineering,” IEDM 2010
2. K Kim "A New Investigation of Data Retention Time in Truly Nanoscaled DRAMs" IEEE T EDL 2009
3. S. Jin et. al. “Prediction of Data Retention Time Distribution of DRAM by Physics-Based Statistical Simulation" IEEE T EDL 2005
© 2015 TSMC, Ltd
46
2Gb Mobile LPDDR SDR (src : Micron)
Self Refresh Power = Idd2 (DC including Leak) + Refresh Power
1650uA
Idd2 = 504uA
Ref. Power = 1146uA
Idd2 = 272uA
Ref. Power = 458uA 730uA
Idd2 = 186uA
Ref. Power = 224uA
410uA
575uA
200uA
300uA
LPDDR has TCSR
Full Array SR Power = Idd2 (DC + leak) + Full Array Refresh
1/16 SR Power = Idd2 (DC + Leak) + 1/16 Array Refresh
© 2015 TSMC, Ltd
47
DRAM Refresh
 Needs to issue refresh periodically (tREFI) and each
time refresh takes tRFC

tRFC/tREFI unavailable (ex. 350/7800=4.5%)
 AR vs. SR (SDRAM)

AR (auto-refresh) : issue AR command (RAS/CAS/CS all
asserted) no address needed (internal counter)
 Opened
rows are precharged before AR issued
 Per-bank

(LPDDRx) vs. all-bank
SR (self-refresh) : DRAM enters/exits SR mode
 All
banks pre-charged before entering
 CKE
low and RAS/CAS/CS low and WE high
tRFC (refresh cycle time – depends on DRAM chip density ~350ns for 8Gb)
tREFI (refersh interval – retention_time/refreshes)
(e.g. 64ms/8192 = 7.8ms)
© 2015 TSMC, Ltd
Red/write access ~50ns
48
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
49
DRAM Scaling
 Challenges


Patterning
 Utilized
multiple patterning before logic
 Manual
crafted design
Capacitor
 Honeycomb

structure (Samsung)
Transistor - DRAM has two parts
 Peripheral
circuits – logic
 Array

RCAT (recessed channel array transistor)

Saddle-Fin and buried word-line

Vertical gate?
» Floating body effect (GIDL)
» Retention degradation (off-leakage)
SRC: Sungjoo Hong, IEDM 2010
© 2015 TSMC, Ltd
50
Capacitor Scaling

To plate
 Very

leakage
Capacitance
 C=ece0A/t
 Cell capacitor leakage
thickness
thin hi-k dielectric
To neighboring cells
 6F2
cell -> 2F tight pitch
 Example

of 20nm tech.
Pitch is 40nm
» SN contact to SN contact

Diffusion separation
<2nm
» by deep trench - 20nm

Contact margin to SN diffusion
» ~10nm
20nm
40nm
W.Muller et. al., “Challenges for the DRAM Cell Scaling to 40nm,”
© 2015 TSMC, Ltd
40nm
51
Scaling and Trend
 Challenges

Interconnect
 Capacitance

Low-k spacer materials or air-gap?
 Resistivity

of bitline – Cb/Cs ratio
of bitline
Power/energy (active and stdby)
 Density
leads to over activation
 Reliability



Variable retention
Disturb
Delay/timing
 Security


© 2015 TSMC, Ltd
Flipping bits
Data reminisce
52
Redundancy (1)
 Essential for yield improvement

Both row and column redundancy were employed early
Individual subarray
replacement
conventional
Flexible intra-subarray
replacement redundancy
Simultaneous
replacement
© 2015 TSMC, Ltd
Masashi Horiguchi, “Redundancy Techniques for High-Density DRAMs,” IEEE Int.
Conf. Innovative Systems Silicon, 1997
53
Redundancy (2) - Row
S. Takase and N. Kushiyama, ”A 1.6-GByte/s DRAM with Flexible Mapping Redundancy
Technique and Additional Refresh Scheme,” JSSC 1999
© 2015 TSMC, Ltd
54
On-Die ECC
 First published paper


“A 50-11s 16-Mb DRAM with a 10-ns Data Rate and On-Chip
ECC” by Howard Kalter et. al. from IBM (JSSC 1990 )
Synergistic fault tolerant approach with row/column
redundancy
 LPDDR4 (1st commodity DRAM) adopted on-die ECC


“A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With
Integrated ECC Engine for Sub-1 V DRAM Core Operation”
by Tae-Young Oh et. al. from Samsung (JSSC 2015)
SEC (not SECDED) (136, 128) code
 Overhead
 With
© 2015 TSMC, Ltd
is 6.25% in core array
array efficiency ~50% the overhead is ~3%
 Encoding
overhead is 3ns (tWR from 15 to 18ns) vs LPDDR3
 Decoding
overhead is 2.5ns (RL from 15 to 17.5ns) vs LPDDR3
55
LPDDR3 vs LPDDR4
“A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation” by Tae-Young Oh et. al.
from Samsung (JSSC 2015)
© 2015 TSMC, Ltd
56
Bank Organization with Integrated ECC
“A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation” by Tae-Young Oh et. al.
from Samsung (JSSC 2015)
© 2015 TSMC, Ltd
57
Allows 4X Retention Time – Low Power
“A 3.2 Gbps/pin 8 Gbit 1.0 V LPDDR4 SDRAM With Integrated ECC Engine for Sub-1 V DRAM Core Operation” by Tae-Young Oh et. al.
from Samsung (JSSC 2015)
© 2015 TSMC, Ltd
58
Agenda
 Introduction (5 minutes)
 DRAM basic and principle (10 minutes)

Cell + layout + technology
 Array structure (10 minutes)


Bitline + subarrays + banks
Circuit elements + timing
 DRAM interface (25 minutes)




(LP/G) DDRx
WIO + HBM
Specialty DRAM
Refresh
 Scaling and trend (5 minutes)
 Summary and research direction + Q&A (5 minutes)
© 2015 TSMC, Ltd
59
Summary + Direction
 DRAM has been an amazing memory technology



Many innovations at several fronts
Cost reduction per bit
Scaling to 1- nm
 Standardization is diversifying
 Many requirements



BW and latency
Cost and capacity
Power
 Making memory H. E. A. R.




© 2015 TSMC, Ltd
Hierarchical
Efficient
Asymmetric
Resilient
60
Q&A
© 2015 TSMC, Ltd
61
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising