

# Cellular Electronics – Baseband Processing

Liang Liu

Dept. of Electrical and Information Technology Lund University, Sweden

Liang.liu@eit.lth.se

#### Outline

- The team
- The researches
  - DFE: filtering for CA/sign-bit processing
  - Learning the channel: channel estimation for LTE
  - The matrix: matrix decomposition/inversion
  - Recovering the signal: multi-mode MIMO detection
  - Multi-task platform: reconfigurable cell array
  - Going faster than Nyqvist: chip measurement
- Conclusion



#### **The Team - Digital ASIC**



Prof. Peter Nilsson



Ph.D. Stud. Oskar Andersson

Ph.D. Stud.

Reza

Meraji





Ph.D. Stud. Anil Dey

Ph.D. Stud.

Mohammadi

Babak



Erik Larsson



Ph.D. Stud. Isael Diaz



Ph.D. Stud. Hemanth Prabhu



Assist. Prof. Joachim Rodrigues



Ph.D. Stud. Rakesh Gangarajaiah



Ph.D. Stud. Yasser Sherazi



Post doc Liang Liu



Lic. Erik Hertz

Ph.D. Stud.



Ph.D. Stud. Johan Löfgren



Ph.D. Stud. Chenxin Zhang



#### **Research Motivations & Objective**



Our Focus: integrate the demands in an efficient hardware







### **DFE: Digital Front-End**





Selective-channelization for LTE-A carrier aggregation (together with IMEC)

NAM-CALLAND

### **Channelization for LTE-A Carrier Aggregation**

- LTE-A Carrier Aggregation
  - CA scenarios: intra-band continuous, intra/inter-band non-continuous
  - CC bandwidth: 1.4MHz (6RB)~20MHz (100RB)



Software Defined Radio as Potential Solution (ADRES)







#### **Filtering Methods**

- Candidate filtering method: Long FIR Filter/CIC+FIR Filter/FFT
- Power analysis (on ADRES)
  - Clock cycles as metric
  - FFT is the best due to architectural optimization







### **Performance Analysis**

- Channelization schemes
  - *Partial filtering*: only the user assigned bandwidth is extracted
  - Full filtering: the entire transmission bandwidth is extracted



#### Performance analysis

- EPA (2km/h), EVA (30km/h), ETU(130km/h)
- Marginal performance loss due to CE error
- Complexity saving is up to 70%



#### **Continue DFE in DARE**





Michal Stala Isael Diaz

- Imperfections with carrier aggregation
- Scalable DFE for both high-end LTE devices and low-end M2M devices
- Together with Ericsson



#### **Channel Estimation**





Improved matching pursuit for LTE channel estimation (results update from LCDWS 2011)

#### Johan Löfgren





### **Improved Matching-Pursuit for LTE CE**

- MP with three modifications:
  - L<sup>1</sup>-norm energy calculation, SNR-depended stopping scheme, and smartly increased system resolution
- Better performance than frequency-domain MMSE and original MP







### **Hardware Implementation Results**

- ST 65nm CMOS including FFT/IFFT & core estimator
- Better accuracy with compatible hardware & higher speed

|                               | This work                  | [9]  | [22] |  |
|-------------------------------|----------------------------|------|------|--|
| Technology [nm]               | 65                         | 180  | 65   |  |
| Area [mm <sup>2</sup> ]       | 0.13/0.29 <sup>(1)</sup> 1 |      | 0.1  |  |
| Norm. Area [mm <sup>2</sup> ] | 0.13/0.29(1)               | 0.13 | 0.1  |  |
| Frequency [MHz]               | 125                        | 154  | 200  |  |
| Init. [us]                    | 7.62                       | 336  | N/A  |  |
| Update [us]                   | 0.62                       | 3.62 | N/A  |  |
| Estimates in 1ms              | 10.3                       | 1.6  | 8    |  |



ARO

(1) Without/With FFT/IFFT

[9] P. Maechler, et.al., "Matching pursuit: evaluation and implementation for LTE channel estimation," *IEEE ISCAS, May 2010.* 

[22] M. Simko, et.al., "Implementation aspects of channel estimation for 3gpp LTE terminals," *European Wireless Conference, April 2011* 

### **Energy-Efficient Channel Pre-Processing**





Link-adaptive QR-decomposition using Householder transformations

#### Rakesh Gangarajaiah





Energy efficient channel preprocessor using partial update scheme

0

Z

#### Chenxin Zhang Hemant Prabhu

#### **Link-Adaptive QR-Decomposition**

- Basic idea: dynamically adjust parameters energy-efficient mode according to H and modulation scheme, with constraint: BER requirement is satisfied
- Parameters:
  - Newton-Rhapson *iteration number*
  - Word-length of the processor



#### **Power Reduction Using Partial Channel Update**

- Full channel update (complete QRD)
  - Needed to track channel change
  - Expensive in terms of power

|                   | QRD-1 (TCAS1-11) | MIMOD-1 (ISSCC-09) |
|-------------------|------------------|--------------------|
| Gate Count        | 111K             | 114K               |
| Throughput (SC/s) | 12.5M            | 28.125M            |
| Energy (nJ/SC/s)  | 12.76            | 5.37               |

- Partial channel update (approximated QRD)
  - Only upper triangular **R** is updated as:

$$oldsymbol{\hat{R}'_i} = oldsymbol{Q}_{i-t}^H oldsymbol{H}_i$$

 Dynamically switch between full and partial update according to timecorrelation; full update in low-correlated channel



#### **Performance Evaluation**

- LTE downlink with 4×4 64-QAM MIMO under EVA-70 channel
- Performance-power tradeoff by adjusting patial update ratio
- Saving 60% power with 1dB performance loss





#### **MIMO Detection**





Multi-mode MIMO signal detection with soft-output

Liang Liu

#### **Multi-Mode Soft-Output MIMO Detection**

Soft-output MIMO detection



Multi-mode MIMO detector



NZ

### **MIMO Techniques – Unified Algorithm**

- Algorithms share most of the operations
  - SM: FSD tree-search with bit-flipping
  - **SDMA**: FSD tree-search with detection reordering
  - **SD**: Real-valued MAP decoder using bit-flipping

| Operations       |                    | MIMO Techniques |      |    |  |
|------------------|--------------------|-----------------|------|----|--|
|                  |                    | SM              | SDMA | SD |  |
| Pre<br>Proc.     | H decomp.          | V               | V    |    |  |
| re<br>oc.        | H permut.          | V               | V    |    |  |
| S                | Node selection     | V               |      |    |  |
| tree<br>search   | Interf. cancel     | V               | V    |    |  |
| э<br>С           | Euclidean distance | V               | V    |    |  |
| LLR<br>calc      | Sorter             | V               |      |    |  |
| ר <u>.</u><br>ר: | List LLR calc.     |                 |      |    |  |
|                  | Bit-flipping       | V               |      |    |  |

N

#### **Antenna Configurations – Scalable Architecture**

- Example: SM detector
  - TSB: Activate different stages according to antenna configuration
  - LLCB/BFB: Close half of the LLR/BFB calculation units



#### **Results**

- Post-layout results with ST 65nm CMOS technology
- Supports the most MIMO modes
- Consumes the least hardware and energy

|                                  | TVLSI' 07            | TVLSI' 11         | JSSC' 12                                 | ISCAS' 10            | This Work            |
|----------------------------------|----------------------|-------------------|------------------------------------------|----------------------|----------------------|
| MIMO Modes                       | SM                   | SM                | SM                                       | SM                   | SM/SD/SDMA           |
| Antenna Size                     | 4×4                  | 4×4               | 4×4                                      | 4×4                  | 4×4                  |
| Modulation                       | 64-QAM               | 64-QAM            | 64-QAM                                   | 64-QAM               | 64-QAM               |
| Algorithm                        | Soft-output          | Soft-output       | SISO                                     | Soft-output          | Early-pruned FSD     |
| Algorium                         | K-best               | best-first        | MMSE-PIC                                 | K-best               | with bit-flip        |
| Process Technology               | 0.13 μm              | 65 nm             | 90 nm                                    | 65 nm                | 65 nm                |
| Max. Clock Rate                  | 270 MHz              | 333 MHz           | 568 MHz                                  | 833 MHz              | 167 MHz              |
| Throughput                       | 8.57 Mb/s            | 83.3 Mb/s         | 757 Mb/s                                 | 2 Gb/s               | 1 Gb/s               |
| Core Area                        | 2.38 mm <sup>2</sup> | N/A <sup>2</sup>  | $1.5 \text{ mm}^2$                       | 0.57 mm <sup>2</sup> | 0.25 mm <sup>2</sup> |
| Gate Count                       | $280 \text{ kG}^a$   | $64 \text{ kG}^a$ | 410 kG <sup>b</sup> /160 kG <sup>a</sup> | 298 kG $^a$          | 83.7 kG <sup>a</sup> |
| Hardware Efficiency<br>kG/(Mb/s) | 32.67 <sup>a</sup>   | 0.77 <sup>a</sup> | 0.21 <sup>a</sup>                        | 0.15 <sup>a</sup>    | 0.084 <sup>a</sup>   |
| Power Consumption                | 94 mW                | 11.5 mW           | 189.1 mW                                 | 280 mW               | 59.3 mW @ 1.2 V      |
|                                  | @ 1.2 V              | @ 1.0 V           | @ 1.2 V                                  | @ 1.3 V              | (SM mode)            |
| Normalized<br>Power Consumption  | N/A                  | 16.6 mW           | 136.6 mW                                 | 238.6 mW             | 59.3 mW              |
| Normalized<br>Energy Consumption | N/A                  | 199.2 pJ/bit      | 180.4 pJ/bit                             | 119.3 pJ/bit         | 59.3 pJ/bit          |
|                                  |                      |                   |                                          |                      |                      |

## **Reconfigurable Cell Array (RCA)**





Mapping channel estimation, QRD, and MIMO detection in LTE-A on a reconfigurable platform

**Chenxin Zhang** 

### **Algorithms**

- Operations push to vector-level
  - Improve data parallelism and instruction parallelism
  - Easily mapped to vector processor with high hardware utilization
- Algorithms
  - Channel estimation: Robust MMSE with sliding window
  - QRD: Sorted-QRD using modified Gram–Schmidt processing
  - MIMO detection: MMSE with node perturbation

| Mathematical operations |                    | Ch.<br>Est. | Ch. Pre-<br>proc. | Signal<br>Det. |
|-------------------------|--------------------|-------------|-------------------|----------------|
| Vector Opt.             | Vector-vector      |             |                   |                |
|                         | Scalar-vector      |             |                   |                |
|                         | Matrix-vector      |             |                   | $\checkmark$   |
|                         | Vector permutation |             | $\checkmark$      |                |
| Scalar Opt.             | SQRT/DIV           |             |                   |                |
|                         | Node selection     |             |                   |                |
|                         |                    |             |                   |                |

#### Platform

- Heterogeneous cell array with vector operation
  - RISC elements (PE0, PE1): task scheduling, cell configuration, and conditional & scalar operations.
  - *Multiple memory banks*: to improve bandwidth and access flexibility
  - Dataflow processor PE2 (DPE): 2D FUs for vector-based operations.





# **Faster Than Nyqvist Signaling**





Iterative decoder for multi-carrier faster than Nyqvist system (measurement result update from LCDWS2011)

Z

#### Deepak Dasalukunte

### **Chip Measurement Results**

| Tech.            | ST 65nm CMOS        |  |
|------------------|---------------------|--|
| Die Area         | 0.8 mm <sup>2</sup> |  |
| Gate Count       | 250k                |  |
| Total memory     | 14.68kB             |  |
| IO & core supply | 1.8v & 1.2v         |  |
| Throughput       | 1Mbps@8 iter        |  |
| Power            | 9.6mW               |  |
| Energy           | 6nJ/sym/iter        |  |



#### **FTN is a Practical Technique**

- Lack of existing hardware implementations for FTN decoder
- To see how FTN decoder fits into exsting systems by referring a reconfigurable FFT and a Turbo decoder in 65nm CMOS

| Functionality   | FTN iterative decoder        | 128-2048 point<br>FFT         | 3GPP LTE<br>Turbo Decoder    |
|-----------------|------------------------------|-------------------------------|------------------------------|
|                 | ESSCIRC 2012                 | JSSC 2012                     | DATE 2010                    |
| Technology      | 65nm                         | 65nm                          | 65nm                         |
| Core Area       | 0.567 mm <sup>2</sup>        | 1.375 mm <sup>2</sup>         | 2.1 mm <sup>2</sup>          |
| Gate count      | 250k                         | 1100k                         | -                            |
| Total<br>Memory | 14.68kB                      | 6.14kB                        | 54% of area                  |
| Power           | 9.6mW<br>(@ 1.2V,<br>100MHz) | 4.05mW<br>(@ 0.45V,<br>20MHz) | 300mW<br>(@ 1.1V,<br>300MHz) |

M N N

### **Publications (2011-Present)**

#### Journal

- [1] 'Hardware architecture of IOTA pulse shaping filters for multicarrier systems', IEEE TCAS-I
- [2] 'Area-efficient configurable high-throughput signal detector supporting multiple MIMO modes', IEEE TCAS-I.
- [3] 'Low complexity likelihood information generation for spatial-multiplexing MIMO signal detection', *IEEE TVT*
- [4] 'Multicarrier faster-than-Nyquist transceivers: hardware architecture and performance analysis', IEEE TCAS-I

#### Conference

- [5] 'Mapping channel estimation and mimo detection in Ite-advanced on a reconfigurable cell array', IEEE ISCAS
- [6] 'A unified multi-mode MIMO detector with soft-output', IEEE ISCAS
- [7] 'A 0.8 mm2 9.6 mW implementation of a multicarrier faster-than-nyquist signaling iterative decoder in 65nm CMOS', *IEEE ESSCIRC*
- [8] 'Reconfigurable cell array for concurrent support of multiple radio standards by flexible mapping', IEEE ISCAS
- [9] 'Detecting multi-mode MIMO signals: algorithm and architecture design', IEEE ISCAS
- [10] 'Improved matching pursuit algorithm and architecture for LTE channel estimation', IEEE ISCAS
- [11] 'Analysis of a novel low complex SNR estimation technique for OFDM systems', IEEE WCNC
- [12] 'Highly scalable implementation of a robust MMSE channel estimator for OFDM multi-standard environment', *IEEE WSPS*
- [13] 'Low complexity soft-output signal detector for spatial-multiplexing MIMO system', IEEE PIMRC
- [14] 'Unified multi-mode signal detector for LTE-A downlink MIMO system', APSIPA-ASC
- [15] 'Design and implementation of iterative decoder for faster-than-Nyquist signaling multicarrier systems', IEEE ISVLSI
- [16] 'Improved memory architecture for multicarrier faster-than-Nyquist iterative decoder', IEEE ISVLSI
- [17] 'Complexity analysis of IOTA filter architectures in Faster-than-Nyquist multicarrier systems', IEEE NORCHIP

Z

[18] 'On hardware implementation of radix 3 and radix 5 FFT kernels for LTE systems', IEEE NORCHIP

#### Conclusions

- Support multi-standard, multi-mode, and multi-task
- High-speed, good performance with energy & area-efficiency
- Co-optimize system schedule, algorithm, and hardware
- Link-adaptive signal processing
- Scalable ASICs & reconfigurable cell array
- LTE/LTE-A as driving applications
- Post-layout simulation & chip measurement





