# 65-nm Semi-Custom Sub-Threshold Memories

Oskar Andersson, Babak Mohammadi, Yasser Sherazi, and Joachim Rodrigues

Department of Electrical and Information Technology, Lund University, Sweden



### 65-nm Semi-Custom Sub-Threshold Memories

- This work done in cooperation with EPFL, Switzerland:
  - Pascal Meinerzhagen, Andreas Burg







### **Motivation**

ULV/ULP biomedical implants and wireless sensor nodes

Very stringent power budget, but only low speed requirements

Aggressive  $V_{DD}$  scaling leads to **subthreshold** (sub- $V_T$ ) operation



Cardiac pacemaker J. Rodrigues, Keynote, PATMOS'11

Memories consume dominant area & power share [ITRS'11]

Leakage-power during standby may dominate overall power budget

Typical memory requirements:
▶ Robust sub-V<sub>T</sub> operation
▶ Ultra-low leakage
▶ Speed is secondary concern





### Motivation

- Commercial SRAM memory compilers use 6T bitcell
- > Not reliable in sub- $V_{T}$  domain w/o level-shifters
- Fullcustom sub-V<sub>T</sub> SRAM designs with 8T, 10T, ... 14T bitcells and R/W assist techniques
- > High design effort, no design automation, high leakage currents

# Our solution:

- Fully automated standard-cell based memory (SCM) compilation flow
- Single custom-designed standard cell to minimize all major leakage contributors in SCM array <u>and</u> peripherals
- > Fill the gap of missing/bad sub- $V_{T}$  memory compilers
- Ensure high robustness
- Reduce area cost for storage capacities smaller than a few kt

### Outline

- 1. Architectural choices for sub- $V_{T}$  operation
- 2. Custom-designed low-leakage 3-state-enabled latch
- 3. 4kb low-leakage SCM test chip
- 4. Silicon measurement results
- 5. Comparison with prior-art sub- $V_{T}$  memories
- 6. Conclusions



### **Best Architectural Choices for Above- and Sub-VT**

- Write Logic
- Clock-gates (b): smaller and less power than FF (a)

#### Read Logic

- Above-VT
  - ✓ Muxes (c): faster, power efficient
- ➢ Sub-VT
  - ✓ 3-state (d): less leakage
- Array of Storage Cells
- Latch arrays smaller than FF arrays, but longer write-address setup time



#### Valid for different technology nodes

Meinerzhagen *et al.*, MWSCAS'10; Meinerzhagen *et al.*, **JETCAS'11** 

Lund University / SOS Workshop / Oskar Andersson / 2012-10-03

6

### Low-Leakage Latch with Tri-State Output

Static latch: transmission-gate, 3-state buffer, or multiplexer

### **Best practice for low leakage**

- 1. Lowest number of  $V_{DD}$ -GND paths
- 2. Highest resistance
- 3-state
- Stacking & stretching
- 3-state output

Stacking factor: max 2

**Channel length stretching:** 1.5L<sub>min</sub>

All dominant leakage contributors are minimized by designing only 1 custom standard-cell



### Architecture of low-leakage 4kb SCM

- Write logic uses clock-gates
- > 3-state buffers used for read operation are integrated in low-leakage latch



### 4kb SCM Test Chip in LP-HVT 65nm CMOS

Chip microphotograph and zoomed-in layout picture Area cost of 12.7  $\mu$ m<sup>2</sup> per bit (including peripherals)

Scan-chain test interface

Functionality verification: W/R random and checker-board patterns

Oven to control temperature: 27 or 37° C





### **Static Noise Margin**







Static noise margin (SNM) of latch in non-transparent phase

**1k-point Monte Carlo simulations** 

Minimum data-retention voltage is 210mV

# **MEASUREMENTS**





## Silicon Measurements: V<sub>DDmin</sub> for data retention is 220mV

### Silicon Measurements: V<sub>DDmin</sub> for W/R is 420mV

Low-leakage 3-state read logic limits  $V_{\text{DDmin}}$  for R/W

Read 420mV

Write 300mV

**Measured** minimum  $V_{DD}$ 

Below 420mV, read-failures appear column-wise



### Silicon Measurements: Energy is 14 fJ/bit-access

Measured energy per bit-access performed at maximum speed

Measured energy minimum is 14fJ/bit at 500mV, 110kHz





### **Outlook: segmented RBLs for Faster Read**

- Limit number of bitcells on each RBL segment
- Implement backend of read mux with static CMOS muxes



16

Lund University / SOS Workshop / Oskar Andersson / 2012-10-03

### Silicon Measurements: Leakage Power is 500fW/bit

At  $V_{\text{DDhold}}$ =220mV, data is correctly held with a **leakage power of 425-500fW per bit** (best and worst out of 4 measured dies)

At 37°C (biomedical implants)

Higher leakage current thus higher operational frequency



17

### **Comparison with Prior-Art Sub-V<sub>T</sub> Memories**

Benefits of designing 1 custom standard cell

Leakage power reduced by 50% (at reduced area) w.r.t. commercial standard cell latch [Meinerzhagen et al., JETCAS'11]

#### Considered work: Full macro, measured, 65nm node

|                                | [1]        | [2]       | [3]        | [4]                  | This work |
|--------------------------------|------------|-----------|------------|----------------------|-----------|
| V <sub>DDmin</sub> [mV]        | 350        | 250       | 380        | 700                  | 420       |
| V <sub>DDhold</sub> [mV]       | 250        | 250       | 230        | 500                  | 220       |
| E <sub>tot/bit</sub> [fJ/bit]  | 55 (0.35V) | 86 (0.4V) | 54 (0.4V)  | -                    | 14 (0.5V) |
| P <sub>leak/bit</sub> [pW/bit] | -          | 6.1       | 7.6 (0.3V) | 6.0,1.0 <sup>a</sup> | 0.5       |
| Area [bits]                    | 32 kb      | 64 kb     | 256 kb     | 1 Mb                 | 4 kb      |

<sup>a</sup> Leakage-power of bitcell only.

[1] **STM**: Clerc et al., ESSCIRC 2012, [2] **MIT**: Sinangil, Verma, and Chandrakasan, [3] **MIT**: Calhoun and Chandrakasan, JSSC 2007; JSSC 2009; [4] **Intel CRL**: Wang et al., JSSC 2008;

- Lowest leakage-power/bit ever reported in 65nm CMOS
- Lowest active energy/bit-access ever reported in 65nm CMOS

[D. Sylvester, ISCAS'11] has lower leakage power in 180nm CMOS



### Conclusions

Need for robust sub- $V_{T}$  memories in ULP/ULV systems

Ultra-low leakage, relaxed speed requirement

Fully automated standard-cell based memory compilation flow

- > Fill gap of missing/bad sub- $V_{T}$  memory compilers
- Robust
- Area-efficient for small storage capacities of several kb

Adding 1 custom-designed standard-cell to commercial library

> Attacks all major leakage contributors of SCMs, including peripherals

3-state read logic limits W/R  $V_{\rm DDmin}$  and read-access time, but satisfies ambition of ultra-low leakage power and access energy

Segmented RBLs improve read speed by >10X at low area and leakage overhead

Among all silicon-verified macro memories in 65nm CMOS

Lowest leakge-power/bit, and lowest energy/bit-access

### Where is the border between SRAM and SCM?



# Thank you for your attention!

# Q & A

Acknowledgements:

- ST Microelectronics for manufacturing
- Swedish VINNOVA Industrial Excellence Centre
- Swedish Vetenskapsradet
- Swiss National Science Foundation

