## Power Comparison of 2D, 3D and 2.5D Interconnect Solutions and Power Optimization of Interposer Interconnects

M Ataul Karim<sup>1</sup>, Paul D. Franzon<sup>2</sup>, Anil Kumar<sup>3</sup> <sup>1,2</sup>North Carolina State University, <sup>3</sup>SEMATECH Inc. <sup>1</sup>makarim@ncsu.edu, <sup>2</sup>paulf@ncsu.edu, <sup>3</sup>anil.kumar@sematech.org

#### Abstract

This paper compares the power efficiency of multiple 2D, 2.5D and 3D interconnect scenarios, specifically DDR3 with PCB, DDR3 with interposers, LPDDR2(3) with POP, wide I/Os with through-silicon vias (TSVs) and interposers and 32 nm technology CMOS drivers with TSVs and on-chip wires. It was found that DDR3 with PCB is the lowest power efficiency (10.9 mW/Gbps) and custom designed CMOS drivers optimized for the 2.5D and 3D give the highest power efficiency (0.23mW/Gbps). Optimization of a Back End of the Line (BEOL) 65 nm interposer interface is also presented for Wide IO interface to find maximize power efficiency. Power efficiency for different interposer trace lengths (5mm-40mm) and pitches (4.6µm-11.05µm) was analyzed. It was found that power efficiency degrades linearly (mW/Gbps increases) with the increase of pitch and length of the interposer traces both in one stack and 4 stack die of Wide IO.

#### Introduction

Power efficiency has become an important issue as it limits the performance scaling of processors. 2.5D and 3D packaging models has the advantage of higher band width and lower power consumption. However, published data on the specifics are scarce. Because any estimate of interconnect power savings is only approximate, obtaining more accurate comparisons of power efficiency is useful.

In this study, power consumption for read and write operation for different conventional and 3D enabled interconnect scenarios were investigated through detailed modeling and simulation. The scenarios compared are Double Data Rate type three Synchronous Dynamic Random Access Memory (DDR3 SDRAM) with Printed Circuit Board (PCB) and one Dual in-line memory module (DIMM) connection or Interposer, Low Power DDR2/3 (LPDDR) with Package on package (POP), Wide IO with one Through Silicon Via (TSV) stack, Wide IO with 4 TSV stack and on chip wire and simple CMOS driver with TSV and on chip wire. In all the standard based scenarios IBIS models were used for drivers and receivers for the Spice simulations that created the power calculations. Since the IBIS models for the drivers do not include the pre-driver, the driver power was multiplied by a factor of 1/3 to account for this. The power calculated here includes read and write operations. No clock alignment and clock and data recovery was considered in the calculation. Electro static discharge (ESD) capacitors were also included. For PCB and Package on Package scenarios 500 fF were added to both the drivers and receivers to account for ESD protection. For the 3D cases a 50 fF

capacitor was added for each driver and receiver. It was assumed that only secondary ESD protection was needed for the 3D cases.

For the PCB, POP and 3D chip stack interconnect scenarios, the interconnect trace parameters were essentially fixed. For the PCB and POP scenarios, they must be standard compliant transmission lines. For the TSV cases, a specific TSV case was modeled. However, for the interposer scenarios, the interconnect structures are not as fixed. They tend to achieve low values for characteristic impedance, and also tend to be very lossy. Thus lines optimized for specific impedance might not be the most power efficient. The potential tradeoffs in determining the most efficient interconnect scenario for an interposer based on a 65 nm Back End of the Line (BEOL) is established.

Section 2 presents the details of the different 2D, 2.5D and 3D scenarios and their power efficiencies as obtained through simulation. Section 3 presents the details of 65nm BEOL interposer optimization using Wide IO memory interface to maximize power efficiency.

#### Power Calculation for different scenarios

#### DDR3 on PCB with one DIMM

The first scenario investigated is a DDR-3 standard based interface conventionally packaged on a PCB with a Land Grid Array (LGA) CPU package and a single DIMM memory package.



**Figure 1.** (a) CPU and 1 DIMM having 4 DDR3 (b) Schematic diagram.

A 91mm transmission line [2] went from the CPU on a PCB board to the end of the bus as shown in Fig.1. Then another 30mm transmission line went from the bus to the on die termination (ODT) [2]. The DIMM socket had four 2GB DDR3-1066 in it. In this analysis a 1.6 Gbps/channel data rate was chosen with 1.5 volt supply and 500 fF ESD capacitor at both driver and receiver ends. Package resistance of 0.119 $\Omega$ , inductance of 1.181 nH and capacitance of 0.41 pF [1] were also present there. 120 $\Omega$  ODT resistor [2] was also added at the receiver side (Fig. 1(b)).

The simulation results with and without ESD is shown in table 1. The pre driver power is assumed to be one third of driver power. It was found that this is the most power hungry scenario (10.96 mW/Gbps). Later we will see that DDR3 with interposer traces improves the power efficiency.

Table 1. Power efficiency for DDR3 with PCB.

|             | Tx<br>power<br>(mW) | Termination<br>Resistor power<br>(mW) | Rx<br>Power<br>(mW) | Total<br>Power<br>(mW) | mW<br>/Gbps |
|-------------|---------------------|---------------------------------------|---------------------|------------------------|-------------|
| With ESD    | 13.33               | 0.52                                  | 3.7                 | 17.55                  | 10.96       |
| Without ESD | 13                  | 0.42                                  | 3.7                 | 17.12                  | 10.7        |

#### LPDDR 2 and LPDDR 3 with Package on Package

This scenario represents typical mobile memory packaging approaches. Low Power DDR is connected in Package on Package (POP) structure. A 2 die stacked 2GB LPDDR2-800 or 16 GB LPDDR3-1600 was used here. The supply voltage was 1.2 V, ESD capacitor was 0.5 pF and data rate was 800 Mbps/channel for LPDDR2 and 1600 Mbps/channel for LPDDR3.





**Figure 2.** (a) Two die stacked LPDDR [3] (b) LPDDR2/3 with POP schematic diagram (c) RLC model of the POP package.

The RLC model for POP was derived from [4] and shown in Fig. 2(c). The power numbers for LPDDR2/3 for with and without ESD is shown in table 2. It was found that LPDDR is 2-3 times more power efficient than DDR3 with PCB. It consumes moderate amount of power compared to other scenarios. And it was noticed that LPDDR3 was more power efficient and had higher throughput than LPDDR2.

Table 2. Power efficiency for LPDDR with POP structure.

|                       | Tx power<br>(mW) | Rx power<br>(mW) | Total power<br>(mW) | mW/Gbps |
|-----------------------|------------------|------------------|---------------------|---------|
| LPDDR2<br>With ESD    | 4.01             | 0.99             | 5.09                | 6.4     |
| LPDDR2<br>Without ESD | 3.6              | 0.99             | 4.59                | 5.73    |
| LPDDR3<br>with ESD    | 5.8              | 1.05             | 6.85                | 4.28    |
| LPDDR3<br>without ESD | 6.37             | 0.69             | 6.83                | 4.27    |

Wide IO with TSV (Four die stack without interposer trace)

This is a perfect 3D scenario. In this non-interposer scenario three wide IO dies were vertically stacked on a CPU. The data rate was 400 Mbps/channel and supply voltage was 1.2 V. Each layer had a Receiver and an ESD protection capacitor of 50 fF. This is a face down design where each layer was connected through a microbump and a TSV without interposer traces as shown in Fig. 3(a). The interconnect models are shown in Fig. 3(b) - 3(d). Each TSV was assumed to be 50µm long and has 4.6µ average diameter. It shows very good power efficiency compared to previous scenarios because of less Wide IO internal driver circuit power.





**Figure 3.** (a) CPU and three Wide IO memories packaging structure (b) Electrical model of microbump [5]. (c) Electrical model of TSV (d) the schematic diagram of this scenario. Schematic diagram of wide IO with 4 dies stacked.

The simulation results for this scenario are shown in table 3. It was found that Wide IO was almost 24 times more power efficient than DDR 3 with PCB.

**Table 3.** Power efficiency for Wide IO with TSV in 4 die stack.

|             | Tx power<br>(mW) | Rx power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|-------------|------------------|------------------|------------------------|---------|
| With ESD    | 0.187            | 0.073            | 0.259                  | 0.65    |
| Without ESD | 0.16             | 0.073            | 0.233                  | 0.58    |

## Wide IO with TSV (Two die stack without interposer trace)

Here only one Wide IO memory was placed on top of a CPU and connected by TSV and microbump. It is also a face

down example with every characteristic same as the previous one except the number of die stacked as shown in Fig. 4.



Figure 4. CPU and one Wide IO stack.

The simulation results are shown in table 4. The power savings over the previous scenario were modest, indicating that the power is not dominated by the interconnect parasitics but the internal circuit powers.

**Table 4.** Power efficiency for Wide IO with TSV in 2 diesstack.

|                | Tx power<br>(mW) | Rx power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|----------------|------------------|------------------|------------------------|---------|
| With ESD       | 0.15             | 0.073            | 0.22                   | 0.55    |
| Without<br>ESD | 0.12             | 0.073            | 0.194                  | 0.49    |

### Wide IO with TSV and interposer (3 die stack)

In this scenario three wide IO memory dies were stacked vertically and then placed next to a CPU and connected using 20mm long 65nm BEOL silicon interposer trace as shown in Fig. 5(a). This is meant to represent a commonly assumed "2.5D" scenario. It approximately model a situation where the memories are connected using a combination of on-chip wiring. The TSV and microbump electrical models were same as Fig. 3 (b) & (c). The Q3D EM field solver was used to find the RLGC values, crosstalk and characteristic impedance. Benzocyclobutene with  $\varepsilon$ =2.6 was used here as the polymer [6]. In the 65nm process there are 8 metal layers. Top four Metal layers were used as the interposer traces. The RLGC value found for the dimension of Fig. 6. (a) is shown in table 5.







**Figure 6.** Interposer structure, (a) Dimensions of interposer traces, (b) capacitances of the metal layers.

**Table 5.** RLC value of interposer trace of dimension used in Fig 6(a).

| Circuit element          | RLC/mm        |
|--------------------------|---------------|
| R                        | 48.77 Ω/mm    |
| L                        | 0.625 nH/mm   |
| $C_{ground}, C_{couple}$ | 58.3 fF, 5 fF |
| C_total of a trace       | 68.3 fF       |



**Figure 7.** Equivalent schematic of 3 wide IO stack next to a CPU on an interposer.

The simulation results using 20mm long and  $11.05\mu$ m pitch on-chip trace is shown in table 6. The 20mm long interposer wiring case leads to significantly more power than the pure TSV scenarios, as that wiring adds significant interconnect capacitance.

**Table 6.** Power efficiency for 3 Wide IO with TSV and 20mm long, 11.05µm pitch on-chip traces.

|          | Tx power<br>(mW) | Rx power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|----------|------------------|------------------|------------------------|---------|
| With ESD | 1.07             | .07              | 1.14                   | 2.84    |

| Without ESD | 1.03 | 0.07 | 1.09 | 2.74 |
|-------------|------|------|------|------|
|-------------|------|------|------|------|

## One Wide IO with Interposer traces

In this scenario one wide IO memory was placed next to a CPU and connected through microbump and then 20mm length of interposer trace as shown in Fig. 8. ESD capacitor of 50 fF, supply voltage of 1.2V and data rate of 400 Mbps was used. The schematic diagram is shown in Fig. 9. Table 7 shows the simulation results for  $11.05\mu$ m pitch and 20 mm long interposer trace. Again the interposer dominates the power consumption.



Figure 8. One Wide IO with interposer traces.



**Figure 9.** Equivalent schematic of one wide IO stack next to a CPU on an interposer.

**Table 7.** Power efficiency for Wide IO with 20mm long, 11.05µm pitch interposer traces.

|             | Tx power<br>(mW) | Rx power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|-------------|------------------|------------------|------------------------|---------|
| With ESD    | 0.8              | .07              | 0.87                   | 2.17    |
| Without ESD | 0.78             | 0.07             | 0.86                   | 2.15    |

# CMOS Driver with TSV (Four die stack without interposer trace)

This scenario is almost same as Fig. 3(a) except the new CMOS driver and receiver instead of Wide IO. A 32 nm predictive Spice model [7] was used for the CMOS driver and

receivers. The simulation result is shown in the table 8. It was found that the custom designed driver (Wn=2 $\mu$ m, Wp=6 $\mu$ m) consumes least amount of power among the all scenarios. The internal receivers have same sizes as the driver and the pre driver size is five times smaller than the driver sizes.

**Table 8.** Power efficiency for CMOS Driver with 4 diesstacked.

|             | Tx power<br>(mW) | Rx<br>power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|-------------|------------------|---------------------|------------------------|---------|
| With ESD    | 0.363            | 0.04                | 0.367                  | 0.23    |
| Without ESD | 0.223            | 0.004               | 0.227                  | 014     |

DDR3 with 3 dies stack and 20 mm Interposer trace

This scenario is same as Fig. 5 except the Wide IO memory interfaces were replaced by DDR3 memory interfaces. The data rate was 1.6Gbps and the supply was 1.5V. The ESD capacitance was 0.5 pF in each layer. Table 9 shows that it consumes less power than PCB but more than Wide IO counterpart. The power numbers are shown in table 9. No R\_ODT (On Die Termination) resistor was used neither here nor any other 3D scenario. This illustrates the internal power overhead of the DDR standard over wide IO one.

**Table 9.** Power efficiency for DDR3 with 3 dies stack and5mm Interposer trace.

|             | Tx power<br>(mW) | Rx power<br>(mW) | Total<br>power<br>(mW) | mW/Gbps |
|-------------|------------------|------------------|------------------------|---------|
| With ESD    | 13.24            | 2.25             | 15.49                  | 9.68    |
| Without ESD | 12.88            | 2.28             | 15.17                  | 9.48    |

## Power efficiency (mW/Gbps) comparison of all the scenarios

The previous simulation results, and some slight variants not discussed in detail are summarized in the following bar chart (Fig. 10) to show the comparison of their power efficiency. It was noticed that DDR3 with PCB consumes the highest power, LPDDR is moderate power hungry and Wide IO and custom designed CMOS driver and receiver consumes least amount of power. The pure 3D cases consume a lot less power than the cases with horizontal interconnect wires, include the interposer scenarios. While not standard compliant, the custom CMOS driver case achieves the lowest power, since it can be optimized to this one scenario.

### **Optimization of Back End of the Line (BEOL) interposer for Wide IO memory to maximize power efficiency**

This section addresses the question as to what interposer cross-section will lead to the best power efficiency. For the purposes of this Wide IO was used with a 65 nm BEOL interposer. The top 4 metal layer of 65 nm process was used as being representative of the interposer traces. Different pitch and length of the interposer trace were analyzed to find the maximum power efficient dimension of interposer.



Figure 10. Power efficiency comparison for all the scenarios.

Unsurprisingly, it was found that as the length got smaller the power efficiency was improved. However, it was at first surprising that the tighter the pitch, the lower the power, despite the high interconnect losses. The reason is that interconnect capacitance dominates while RC delay has a low impact at the data rate required for wide IO (400 MT/s). With the increase of interposer trace length ground capacitance increases as shown in table 10. Fig. 11(a) shows that the Power efficiency decreases with the total capacitances.

**Table 10.** Capacitance and Power efficiency for different length of interposer with pitch  $6.4\mu m$ .

| Trace<br>length<br>(Width=<br>0.4μm<br>Spacing<br>=6μm) | Interconnect<br>Capacitance=<br>Cground +<br>2*C_<br>coupling (fF) | Drive<br>r &<br>recei<br>ver<br>cap<br>(pF) | TSV<br>cap<br>(fF) | Micro<br>bump<br>cap<br>(# of<br>microbu<br>mp<br>=4) (fF) | Total<br>ESD<br>Cap<br>(fF) | Total<br>wire<br>Cap<br>(pF) |
|---------------------------------------------------------|--------------------------------------------------------------------|---------------------------------------------|--------------------|------------------------------------------------------------|-----------------------------|------------------------------|
| 20mm                                                    | 1366                                                               | 0.4                                         | 91                 | 21.6                                                       | 100                         | 2.376                        |
| 15mm                                                    | 1024.5                                                             | 0.4                                         | 91                 | 21.6                                                       | 100                         | 2.037                        |
| 10mm                                                    | 683                                                                | 0.4                                         | 91                 | 21.6                                                       | 100                         | 1.693                        |
| 5 mm                                                    | 341.5                                                              | 0.4                                         | 91                 | 21.6                                                       | 100                         | 1.355                        |

To find maximum power efficiency for Wide IO with 3 die stack and interposer traces (same as Fig. 5), different length and pitch were used and their power efficiencies are listed in table 11.

| <b>Table 11.</b> Fowel efficiency for 5 the stack while it | Table 11. | Power | efficiency | for 3 | die | stack | Wide I | С |
|------------------------------------------------------------|-----------|-------|------------|-------|-----|-------|--------|---|
|------------------------------------------------------------|-----------|-------|------------|-------|-----|-------|--------|---|

| Widt<br>h | Spac<br>ing | R<br>Q/m | L<br>(pH/ | C<br>fF/m | Zo<br>O | Length | mW/Gbps     |             |
|-----------|-------------|----------|-----------|-----------|---------|--------|-------------|-------------|
| (μm)      | (μm)        | m        | mm)       | m         |         |        | With<br>out | With<br>ESD |
| 6.75      | 4.3         | 3.8      | 275       | 174       | 60      | 20     | 2.74        | 2.84        |
|           |             |          |           |           |         | 15     | 2.41        | 2.54        |
|           |             |          |           |           |         | 10     | 2.0         | 2.11        |
|           |             |          |           |           |         | 5      | 1.51        | 1.61        |
| 4.93      | 4.75        | 4.7      | 329       | 151       | 75      | 20     | 2.57        | 2.67        |
|           |             |          |           |           |         | 15     | 2.27        | 2.37        |
|           |             |          |           |           |         | 10     | 1.87        | 2.04        |
|           |             |          |           |           |         | 5      | 1.44        | 1.54        |
| 0.4       | 6           | 48.8     | 625       | 68.3      | 297     | 20     | 1.71        | 1.84        |
|           |             |          |           |           |         | 15     | 1.57        | 1.67        |
|           |             |          |           |           |         | 10     | 1.37        | 1.51        |
|           |             |          |           |           |         | 5      | 1.2         | 1.31        |

For one Wide IO stacked on a CPU scenario, the analysis result is listed in table 12. It was found that if the pitch and length was increased, the Power efficiency got worse.

Table 12. Power efficiency for 2 die stack Wide IO

| Widt | Spac | R          | L   | С    | Zo  | Leng | mW/Gbps |      |
|------|------|------------|-----|------|-----|------|---------|------|
| h    | ing  | $\Omega/m$ | pH/ | fF/m | Ω   | th   | With    | With |
| (um) | (µm) | m          | mm  | m    |     | mm   | out     | ESD  |
| (µ)  |      |            |     |      |     |      | ESD     |      |
| 6.75 | 4.3  | 3.8        | 275 | 174  | 60  | 20   | 2.15    | 2.17 |
|      |      |            |     |      |     | 15   | 1.85    | 1.87 |
|      |      |            |     |      |     | 10   | 1.41    | 1.47 |
|      |      |            |     |      |     | 5    | 0.91    | 0.97 |
| 4.93 | 4.75 | 4.7        | 329 | 151  | 75  | 20   | 1.85    | 1.97 |
|      |      |            |     |      |     | 15   | 1.64    | 1.68 |
|      |      |            |     |      |     | 10   | 1.28    | 1.32 |
|      |      |            |     |      |     | 5    | 0.85    | 0.9  |
| 0.4  | 6    | 48.8       | 625 | 68.3 | 297 | 20   | 1.11    | 1.16 |
|      |      |            |     |      |     | 15   | 1.0     | 1.04 |
|      |      |            |     |      |     | 10   | 0.81    | 0.86 |
|      |      |            |     |      |     | 5    | 0.6     | 0.66 |

With the increase of width or length, ground capacitance increases and so the power consumption also increases. It was found that maximum power was consumed due to dynamic power component  $(CV^2f)$  and which changes linearly with the capacitance. So, Power efficiency decreases linearly with pitch and length.



**Figure 11**. Power efficiency variation with (a) capacitance, (b) pitch and (c) Length. variation of Interposer traces

## Conclusion

From this study we now have a clear view of Power efficiency for different 2D, 3D and 2.5D interconnect solutions. It was found that DDR-3 with PCB consumed a lot of power which could be reduced by using DDR3 and Interposer and further reduction could be done by using Wide

IO with interposer. Significant reduction of power consumption is obtained in pure 3D scenarios compared to 2.5D interposer scenario. Custom designed CMOS driver would result in least amount of power consumption.

For Wide IO with 65nm BEOL interposer traces with smaller pitch have higher Power efficiency due to smaller ground capacitance for both single stack and four stack die.

## Acknowledgments

This work was supported by Sematech and managed by SRC.

## References

1. IBIS model, 2GB DDR3-1066;

http://www.elpida.com/en/products/ddr3.html

- D. B. Lin, M. P. Houng and W. S. Liu, "Enhancement of Signal Integrity for Multi-Module Memory Bus by Particle Swarm Optimization," in Proc. *IEEE 11th annual*, Wireless and Microwave Technology Conference (WAMICON) 2010, pp. 1-5.
- J. Sjoberg, S. Alam, D. A. Geiger and D. Shangguan, "Process Development and Reliability Evaluation for Inline Package-on-Package (PoP) Assembly," in Proc. *Electronic Components and Technology Conference*, 2008, pp. 2005-2010.
- W. Yuan, C. K. Wang, Z. Boyu, N. Suthiwongsunthorn, S. Chungpaiboonpatana, "Electrical Performance Evaluation & Comparison of High-Speed Multiple-Chip 3D Packages" in Proc. *12th Electronics Packaging Technology Conference*, 2010, pp. 114-119.
- S. R. Vempati, N. Su, C. H. Khong, Y. Y. Lim, K. Vaidyanathan, J. H. Lau, B. P. Liew, K. Y. Au, S. Tanary, A. Fenner, R. Erich, J. Milla, "Development of 3-D Silicon Die Stacked Package Using Flip Chip Technology with Microbump Interconnects," in Proc. *IEEE Electronic Components and Technol. Conf.* May 26-29, 2009, pp. 980-987.
- Q. Cui, X. Sun, Y. Zhu, S. Ma, J. Chen, M. Miao, Y. Jin, "Design and optimixation of Redistribution Layer (RDL) on TSV interposer for high frequency application," in Proc. *IEEE International conference on Electrical Packaging Technilogy & High Density Packaging*, August 8-11, 2011, pp. 1-5.
- 7. http://ptm.asu.edu/modelcard/LP/32nm\_LP.pm