# Neuromorphic Computing with Resistive Synaptic Arrays: Devices, Circuits and Systems

Yu (Kevin ) Cao, Shimeng Yu, Jae-sun Seo

School of ECEE, Arizona State University

### Outline

- Learning On-a-chip:
  Synaptic Devices and the Crosspoint Array
- Non-ideal Device Effects on Learning Accuracy
- Peripheral Circuits and Parallel Operation
- A System-level Benchmark Simulator
- Summary and Discussion

#### **From Data to Information**



# **Learning On-a-chip**

 Deep learning in the cloud: expensive computation, huge training data, low energy efficiency, high precision





- Edge computing needs novel hardware / algorithms
  - Local to the sensor, real-time, reliable, low-power
  - On-line, personalized learning with continuous data



#### **Acceleration Need**

 10<sup>3</sup> – 10<sup>5</sup> speedup required to achieve real-time training of HD images at 30 frames/second

| GPU       | FPGA      | 1024x256<br>Synapses<br>256 Neurons | Beyond<br>CMOS     |
|-----------|-----------|-------------------------------------|--------------------|
| 10 – 30 X | 10 – 50 X | 10 <sup>2</sup> – 10 <sup>3</sup> X | >10 <sup>3</sup> X |

Device beyond CMOS: RRAM to emulate the synapse



### **Resistive Crosspoint Array**

A biomimetic solution: RRAM for synapse, crosspoint for dense interconnection; not necessarily spiking neurons



# **Sparse Coding**



#### Emergence of simple-cell receptive field properties by learning a sparse code for natural images

Bruno A. Olshausen\* & David J. Field

Department of Psychology, Uris Hall, Cornell University, Ithaca, New York 14853, USA 111日本が必要が

LETTERS TO NATURE

$$\min_{D,Z} \frac{1}{n} \sum_{i=1}^{n} \left( \frac{1}{2} \parallel D \cdot Z_i - x_i \parallel^2 + \lambda |Z_i|_1 \right)$$

Reconstruction Sparseness Error

- X: input vector
- Z: feature vector (output)
- **D**: dictionary (weight matrix)

[D. H. O'Connor et al., Neuron 2010; B. A. Olshausen, D. J. Field, Nature 1996]

- High power efficiency
- No backward propagation
- Scalable to multi-layers

### **Analog Memory and Computing**

- All cells are DC connected, no sneak path for read
- The value of Z, X (or r) represented by the number of voltage pulses; D by the RRAM conductance



Input Neuron (X or r)

| Task               | Operations                                 |
|--------------------|--------------------------------------------|
| D·Z                | $I_{r,i} = \sum_{i} G_{ij} \cdot V_{Z,j}$  |
| $D^T \cdot r$      | $I_{Z,j} = \sum_{j} G_{ij} \cdot V_{r,ij}$ |
| <i>D</i><br>update | $\Delta G_{ij} = \eta \cdot r \cdot Z$     |

# **Realistic Device Properties**



- Non-zero off-state conductance; limited levels / precision
- Device variations; nonlinearity in weight update
- Experiment with unsupervised sparse coding + MNIST to study their impact on learning accuracy

#### **Non-zero Off-state and Precision**

- Solution: spatial redundancy to solve non-zero off-state
- Fixed-point computing

  - On/off ratio needs to be > 25



Ζ

INPUT

#### **Device Variations**

- Weight update variation: device-to-device and cycle-to-cycle
  - Device nonlinearity has moderate impact on the accuracy
- Weight read noise



### **Impact on the Accuracy**



- Impact of weight update variation: moderate
- Impact of weight read noise: significant
- Solution: multiple cells to minimize the variation

#### **Interconnect Resistance**

- Wire resistance is in series with RRAM resistance
  - RC delay is not an issue
- Solution: scaling up the wire



#### **Neuron Circuits: Parallel Read**

A current-to-digital converter, operating as the Integrate-and-Fire neuron model



#### **Neuron Circuits: Parallel Write**

Write RRAM through the spiking rate between input (X or r) and output (Z) neurons

 $\Delta G_{ij} \propto pulse width = Write Time \cdot Firing Rate = \eta \cdot Z \cdot r$ 

- Z value for the time window to write
- r value for the pulse number (firing rate)



# Parallel Operation: O(1)



- 16 -

# **Array Size**

- Peripheral circuits consume significant area
- Solution: scaling up the array size; non-CMOS neurons



130nm 1T1R array



# **System Simulator for Benchmark**



- Built on the template of CACTI and NVSim
- Metrics include area, latency, leakage power, dynamic power, etc. for a given array size, device type and node

### Example: A 256 x 256 Array

| Architecture<br>(array size=256 <sup>2</sup> ) | Area                        | Read<br>Latency | Read<br>Energy | Write<br>Latency | Write<br>Energy | Leakage       |
|------------------------------------------------|-----------------------------|-----------------|----------------|------------------|-----------------|---------------|
| SRAM Array<br>(row-by-row)                     | 39638.07<br>µm <sup>2</sup> | 393.38<br>ns    | 15.14 nJ       | 114.55 ns        | 1.9 nJ          | 3247.93<br>μW |
| 1T1R Array<br>(row-by-row)                     | 5601.04<br>μm²              | 75.51 ns        | 1.84 nJ        | 10311.42<br>ns   | 15.22 nJ        | 11.17 μW      |
| Cross-point Array<br>(fully parallel)          | 6551.49<br>μm²              | 70.63 ns        | 1.68 nJ        | 160 ns           | 10.62 nJ        | 2.07 µW       |

| Sparse Coding        | SRAM    | 1T1R    | Cross-point | Improvement |
|----------------------|---------|---------|-------------|-------------|
| Update Z (200 Read)  | 78.7 µs | 15.1 µs | 14.1 µs     |             |
| Update D (1 Write)   | 115 ns  | 10.3 µs | 160 ns      |             |
| Time for 1 Iteration | 78.8 µs | 25.4 µs | 14.2 µs     | 5.5×        |

# **Technology Scaling**



- Large array does not scale well due to wire width relaxation
- <u>Solution</u>: partition of large array into multiple small arrays with technology scaling

#### **Future Needs**

- <u>Synaptic Device</u>: variation control, read noise reduction, better endurance (habituation), more levels (>4-bit)
- <u>Circuits and Architecture</u>: larger array, peripheral device/circuits, physical design, multi-array architecture
- <u>Neuromorphic Algorithm</u>: brain-inspired algorithm for low precision, compact network, and high energy efficiency



