

#### ADVANCED MEMORY SOLUTIONS FOR EMERGING CIRCUITS AND SYSTEMS

L. Ciampolini<sup>1</sup>, B. Giraud<sup>1</sup>, M. Kooli<sup>1</sup>, A. Makosiej<sup>1</sup>, R. Boumchedda<sup>1,2</sup>, and J.-P. Noel<sup>1</sup>

<sup>1</sup>Univ. Grenoble Alpes, CEA, LETI, MINATEC Campus, Grenoble, France, <sup>2</sup>STMicroelectronics, 850 rue Jean Monnet, 38920 Crolles, France

Acknowledgements to: A. Levisse<sup>3</sup>, N. Gupta<sup>4</sup>, E. Vianello<sup>1</sup>, F. Andrieu<sup>1</sup>, K. C. Akyel<sup>5</sup>, M. Brocard<sup>6</sup> <sup>3</sup>now at EPFL<sup>4</sup>now at Minima Processor <sup>5</sup>now at NanoXplore <sup>6</sup>now at Dolphin





- Emerging Non-Volatile Memory Landscape
- In-Memory Computing Promises
- 4T SRAM in CoolCube
- E : IMC<sup>3</sup>
- An SRAM-to-CAM Transformer
- Modeling 1 ppm Yield (and \$) Losses in SRAM





EMERGING NON-VOLATILE MEMORY LANDSCAPE



#### **EMERGING NVM LANDSCAPE**

• Which technology to replace Flash memories?

| Г               |       |                                                            |                                                 |                                |  |  |
|-----------------|-------|------------------------------------------------------------|-------------------------------------------------|--------------------------------|--|--|
|                 |       | Magneto-resistive                                          | Phase-Change                                    |                                |  |  |
|                 |       | RAIVI                                                      | RAIVI                                           | Resistive RAIVI                |  |  |
|                 | Plus  | Endurance                                                  | Maturity                                        | Density,<br>CMOS compatibility |  |  |
|                 | Minus | Costly technology,<br>Density, Read,<br>CMOS compatibility | Consumption<br>Thermal stability                | Maturity,<br>Forming           |  |  |
| Crossbar memory | eleme | nt                                                         | Top Electrode<br>Crystalline GST<br>Active Area | Silver                         |  |  |

Memory function by resistance switching Programmed with <u>Current</u>, Voltage and Time BEOL process → no FEOL masks → Low-cost solution



### **NVM INTERCONNECT VOLTAGE DROP**

- Unavoidable Scaling Effects:
  - Shrink of conductive section, therefore increase of metal line resistance
  - Increase of the metal resistivity (right, data from ITRS)
- Large-bank effect:
  - Series connection of multiple resistances



Metal resistivity becomes critical

[A. Levisse, LASCAS 2017]



#### **NVM VOLTAGE DROP COMPENSATION**

• Standard approach by trial-and-error





#### **NVM VOLTAGE DROP COMPENSATION**



Efficient Calibration, Compatible with other compensation techniques

## Promises! Promises!



## IN-MEMORY COMPUTING PROMISES



#### **IN-MEMORY COMPUTING**

 The largest part of power consumption of logic and arithmetic operations in some kinds of ICs is due to the memory access





#### **IN-MEMORY COMPUTING**

 In-Memory Computing (IMC) consists in performing computation tasks where the data is stored, *i.e.* in memories, to counter the heavy data traffic between CPU and cache





...tomorrow with IMC!



• This solution makes sense when processing data in the CPU becomes very heavy or inadequate (*data-centric apps, Al...*)



#### **IMC COMPUTATIONAL MEMORIES**

 A Computational SRAM (C-SRAM) executes *in-situ* micro-instructions





#### **IMC : HOW DOES IT WORK?**

Conventional 2R operations of a 10T, three-port (1RW2R) SRAM





#### **MULTI-ROW SELECTION**

• Multi-row selection yields a Boolean function of data





#### **EXPECTED GAIN: EVALUATION MODEL**

 Boolean functions of data are the bricks to obtain additions, subtractions, multiplications





#### **EXPECTED GAIN: EVALUATION MODEL**

 The current emulation platform (IMPACT) allows to roughly estimate the benefits of using IMC operations instead of standard ALU operations



# **EXPECTED GAIN: PRELIMINARY RESULTS**





4T SRAM IN MONOLITHIC 3D COOLCUBE TECHNOLOGY



#### COOLCUBE<sup>™</sup>: LETI'S MONOLITHIC 3D

 Monolithic 3D consists in manufacturing a second layer (Top tier) of active MOSFETS over a first layer (Bottom tier) where active MOSFETS already exist







#### 4T SRAM IN 3D COOLCUBE<sup>™</sup> TECHNOLOGY

• 4T Driver-Less SRAM bitcell:





#### 4T SRAM IN 3D COOLCUBE<sup>™</sup> **TECHNOLOGY**

**Design split across tiers:** 



Bitcell area: 0.054µm<sup>2</sup> (-30% versus SPHD) Manufactured on both tiers in former STM 14nm FD-SOI

[*M. Brocard et al.,* S3S, 2016] | 20



#### 4T SRAM IN 3D COOLCUBE™ TECHNOLOGY

 Device threshold voltage gap ∆V<sub>th</sub> = (V<sub>th PMOS</sub> - V<sub>th NMOS</sub>) distribution from 507 pairs available in a wafer manufactured at 2014 (nonmature process)





#### 4T SRAM IN 3D COOLCUBE™ TECHNOLOGY

 The Functional/ Non Functional regions have been found through Spice MC simulations around a variable, mean device threshold voltage gap < \Delta V<sub>th</sub> >



[B. Giraud et al., IEDM 2017] | 22



#### 4T SRAM IN 3D COOLCUBE™ TECHNOLOGY

- The requirements over the mean device threshold voltage gap  $<\Delta V_{th}>$  can be relaxed by using Data Dependent Back-Biasing :
  - PMOS V<sub>th</sub> is increased statically
  - Top-Tier NMOS  $V_{\text{th}}$  is modified dynamically and dependent on the stored value



![](_page_22_Figure_6.jpeg)

[R. Boumchedda et al., TVLSI, 2017] | 23

![](_page_23_Picture_0.jpeg)

#### 4T SRAM IN 3D COOLCUBE<sup>™</sup> TECHNOLOGY

- Spice MC analysis of DBB effect on single-cut functionality:
  - 32 bitcell / column
  - ΔV<sub>th</sub> =180 mV @ T = -10°C
  - Process = TT

![](_page_23_Figure_6.jpeg)

![](_page_23_Figure_7.jpeg)

![](_page_23_Figure_8.jpeg)

![](_page_23_Figure_9.jpeg)

PASS ( 0 fail over 10<sup>6</sup>)

F FAIL

[R. Boumchedda et al., TVLSI, 2017] | 24

#### With DDBB

![](_page_24_Picture_0.jpeg)

## E[XPECTING] : IMC<sup>3</sup>

![](_page_25_Picture_0.jpeg)

#### LOOKING TO THE FUTURE: IMC<sup>3</sup>?

• Memory Computing ultimate unification...

![](_page_25_Figure_3.jpeg)

# AN SRAM-TO-CAM TRANSFORMER

![](_page_26_Picture_1.jpeg)

![](_page_27_Picture_0.jpeg)

#### **CAM-SENSIBLE MARKETS**

- CAMs are Content-Addressable Memory that are able to search quickly for a particular stored key
- Typical application is searching internet addresses in huge tables in router devices

![](_page_27_Picture_4.jpeg)

 If by any chance you know about possible other companies that could be interested about searching data quickly...

![](_page_27_Picture_6.jpeg)

![](_page_28_Picture_0.jpeg)

#### **A SEARCH VIEW OF SRAM**

- Standard memory (SRAM/DRAM) contents are indexed by a numerical key: the memory address (~ row number)
- A memory readout provides the word stored at the correspondent location (one-word hit of the address search)

|          |   | row 1    | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |  |
|----------|---|----------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|--|
|          |   | row 2    | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |  |
| <u> </u> |   | row 3    | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |  |
| ess      | 6 | row 4    | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |  |
| Addr     |   | row 5    | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |  |
| -        |   | row 6    | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |  |
|          |   | row 7    | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 |  |
|          |   | Data Out | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |  |

![](_page_29_Picture_0.jpeg)

#### **CONTENT-ADDRESSABLE MEMORY**

| 30

- A CAM allows a one-cycle search of a given search key amongst all stored words
- A memory readout provides the address where the word is stored (one-word hit of the content search)
- If multiple hits, one might be chosen

![](_page_29_Figure_5.jpeg)

![](_page_30_Picture_0.jpeg)

#### **CONTENT-ADDRESSABLE MEMORIES**

- Some key parts can be masked (e.g. search for "Albert \*instein")
- We say that the correspondents key bits are "h" meaning that they are in always-hit

![](_page_30_Figure_4.jpeg)

![](_page_31_Picture_0.jpeg)

#### **CONTENT-ADDRESSABLE MEMORIES**

 CAM designs might dissipate enormous amounts of power due to large capacity, hierarchical architecture, high-perf requirements and parallel operations on all rows

![](_page_31_Figure_3.jpeg)

![](_page_32_Picture_0.jpeg)

#### RECONFIGURABLE SRAM/CAM ARCHITECTURE

![](_page_32_Figure_2.jpeg)

- Key features:
  - Use only SRAM cells (large density)
  - Allow R/W SRAM operations on horizontal words
  - Allow CAM operations on vertical words
  - This is obtained by routing 2 WLs per

row, and with additional digital/Row Dec cirtcuitry

![](_page_33_Picture_0.jpeg)

#### RECONFIGURABLE SRAM/CAM ARCHITECTURE

![](_page_33_Figure_2.jpeg)

- Key features:
  - Read on ground-line for both CAM and SRAM
  - SA on Vss is single-ended, imbalanced

- CAM/SRAM read : single WL
- CAM Write : single-ended SRAM write
- SRAM Write : standard with differential BL's
  - [N. Gupta et al., ESSCIRC, 2017]

![](_page_34_Picture_0.jpeg)

#### **COMPARISON WITH PREVIOUS WORKS**

|                            | This work                                             | [1]                      | [2]                     | [3]         | [4]          |
|----------------------------|-------------------------------------------------------|--------------------------|-------------------------|-------------|--------------|
| Technology                 | 28nm FDSOI                                            | 28nm FDSOI               | 32nm                    | 65nm        | 0.13µm       |
| Transistors/cell           | 6T                                                    | 6T                       | 11T                     | 10T         | 9T+Read      |
| Area/cell [µm2]            | 0.197μm <sup>2</sup> α                                | 0.152µm <sup>2</sup>     | -                       | 3.3         | 20           |
| Array Size                 | 128x64                                                | 64x64                    | (64x64) *4              | 128x128     | 128x32       |
| Frequency<br>(VDD)         | 1.56 GHz@0V9 <sup>β</sup><br>8.9MHz@0V38 <sup>γ</sup> | 370 MHz (1V)             |                         | 500MHz (1V) | 250MHz (1V)  |
| Energy/Search/<br>bit [fJ] | 0.13 (0.9V)                                           | 0.6 (1V)<br>0.41 (0.75V) | 1.07 (1V)<br>0.3 (0.5V) | 0.77 (1.2V) | 1.87 (1V)    |
| Match-line<br>Technique    | 1-Single-ended<br>imbalanced SA                       | 2-Single<br>Ended SA     | Wide AND                | NOR         | Differential |
| Memory Modes               | BCAM/SRAM/<br>Pseudo-TCAM                             | BCAM/<br>TCAM/SRAM       | BCAM                    | BCAM        | BCAM         |

[1] Jeloka, S. et al. VLSI-C 2015, [2] Agarwal, A. et al. ESSCIRC 2011, [3] Do, A. T. et al. ESSCIRC 2013 [4], Wang, C.C., et al. TCAS-II 2010

 $\alpha$  Area with compact-design rules (with waiver on metal routing)  $\beta$  meas. WLMIN +300ps periphery delay (estimated)  $\gamma$  Assuming cycle time is 120% of meas. WLMIN [N. Gupta et al., ESSCIRC, 2017]

# MODELING 1 PPM YIELD (AND \$) LOSSES IN SRAM

![](_page_35_Picture_1.jpeg)

![](_page_35_Figure_2.jpeg)

![](_page_36_Picture_0.jpeg)

#### A GLANCE TO AUTOMOTIVE MARKET

• World Motor Vehicle Production per year / by country

![](_page_36_Figure_3.jpeg)

<sup>[</sup>Image from Wikipedia]

![](_page_37_Picture_0.jpeg)

#### **HOW DEFECT TRACKING LEADS TO \$**

- Impressive how Japan took over the car market in the 80s...
  - Might be related to the **Toyota** Quality Management
- "Six Sigma (6σ) is a set of techniques and tools for process improvement. It was introduced by engineer Bill Smith while working at Motorola in 1986. Jack Welch made it central to his business strategy at General Electric in 1995."
- "A six-sigma process is one in which 99.99966% of all opportunities to produce some feature of a part are statistically expected to be free of defects (3.4 defective features per million opportunities)"
- "... Johnson and Johnson, with \$600 million of reported savings, Texas Instruments, which saved over \$500 million as well as Telefónica de Espana, which reported \$30 million euros of revenue in the first 10 months."
- In circuits, SRAM is one of the highest Yield Detractors. With Emerging NVMs, other kinds of memory will assume this role.

![](_page_38_Picture_0.jpeg)

• Four Classic Yield-Vs-Vdd curves

![](_page_38_Figure_3.jpeg)

- Yield losses in SRAM are due to two sides of the same phenomenon:
- Either content is lost during read (cell can be written easily): Read-Limited
- Or new content cannot be written: Write-Limited
- Cell *limitation* changes with temperature
- Yield is monitored during technology development on test vehicles of various capacity (e.g. 6) over the temperature range ~24statistics for both fresh and aged silicon

![](_page_39_Picture_0.jpeg)

- In FD28SOI, Body Bias can be an effective performance booster for digital circuits. SRAM can be excluded from such boosting at the cost of:
  - Increased area due to block isolation
  - No improvements in memory operations when digital is accelerated
- Adding BB: (6 capacities) x (4 temperatures) x (6 BB voltages) = 144 Classic Yield Vs Vdd curves !!!!

![](_page_39_Figure_6.jpeg)

![](_page_40_Picture_0.jpeg)

• Yield Vs Vdd curves are temperature-dependent, capacity dependent, Body Bias-dependent (here only T shown)

![](_page_40_Figure_3.jpeg)

![](_page_41_Picture_0.jpeg)

- Yieldograms represent yield levels (YL = Yield Loss) for any cut size and show how the bitcell performs at various voltages
- They allow to understand how far are we from yield losses

![](_page_41_Figure_4.jpeg)

![](_page_42_Picture_0.jpeg)

• Body-Bias effects on yield in FD28SOI SRAM

![](_page_42_Figure_3.jpeg)

### CONCLUSIONS

![](_page_43_Picture_1.jpeg)

![](_page_43_Figure_2.jpeg)

![](_page_44_Picture_0.jpeg)

#### CONCLUSIONS

- Voltage drop calibration techniques open up options for designing compact and reliable high-density crossbar memories
- Computational memory opens the way to energy-efficient datacentric applications
- High-density 4T SRAM bitcell in 3D CoolCube technology demonstrated on silicon with 30% area gain
- Reconfigurable SRAM/CAM offers high performance in both operation modes with very low CAM search energy/bit
- Effective Yield modeling through yieldograms allows to monitor complex runtime use of SRAMs with dynamic Body-Bias