# Advanced RF/Baseband Interconnects for Future ULSI Communications

### **Mau-Chung Frank Chang**

High Speed Electronics Laboratory, Department of Electrical Engineering University of California, Los Angeles, CA 90095-1594

### Abstract

Future inter- and intra-ULSI interconnect systems demand extremely high data rates as well as bi-directional multi-I/O concurrent service, re-configurable computing/processing architecture, and total compatibility with mainstream silicon SOC (system-on-chip) and SIP (system-in-package) technologies. In this paper, we review recent advances in CDMA and FDMA interconnect schemes that promise to meet all of the above system requirements. The physical transmission line is no longer limited to a direct-coupled metal wire. Rather, it can be accomplished via either wired or wireless mediums through capacitor couplers that reduce the baseband noise and DC power consumption. These new advances in interconnect schemes would fundamentally alter the paradigm of ULSI data communications and enable the design of next generation computing/processing systems.

#### Introduction

The system performance of modern ULSI is being limited by its interconnect bandwidth in both on-chip and chip-to-chip communications [1-3]. The rapid evolution of ULSI has demanded that the interconnect system be fast and at the same time be flexible, reliable and low cost. Ideally, future interconnect systems must encompass the following important features:

- Ultra-high data rates (e.g. >100Gbps/pin or 20Tbps aggregate [2], defined as the total sum of data rate for each pin on a chip or within a system of chips)
- Concurrent multi-I/O's service for simultaneous and bidirectional communications on a shared transmission medium
- Realtime re-configurability in connectivity and bandwidth for optimized channel efficiency and fault-tolerance

Additionally, the fabrication of future interconnect systems must be compatible with the mainstream SOC and SIP technologies for low cost system production.

Traditional inter-chip and intra-chip communications are based solely on TDMA (time division multiple access). In a TDMA-interconnect (*TDMA-I*) system, each I/O pair communicates over a shared transmission medium by transmitting only during its scheduled time slot in which no other I/O pair may transmit. In essence, time is being divided or allocated to each individual I/O pair so that a given transmission medium may be effectively shared. Furthermore, advanced *TDMA-I* (bus and links) in recent years has exploited multi-level signaling and dispersive signal equalization techniques to achieve multi-Gbps throughput [4-6]. Nevertheless, this type of system is limited to fixed and non-reconfigurable architecture that has high data transmission latency and cannot support bidirectional and simultaneous transmission of multiple I/Os on the same physical channel.

### **Advanced Interconnect Schemes**

To overcome the limitations of traditional TDMA-I, a number of new interconnect schemes have been investigated recently to greatly increase the aggregate data rate and concurrency as well as to reduce latency and power consumption [7-13]. These new schemes permit the use of a combination of other multiple access techniques, namely code division multiple access (CDMA) and frequency division multiple access (FDMA). In CDMA interconnect (CDMA-I), each I/O pair is assigned one or more pseudo-noise (PN) codes with near-ideal correlation property so that any other I/O pair assigned with different PN codes will contribute no interference when they are transmitting concurrently onto the same medium. In contrast, FDMA-I allows sharing of a transmission medium by assigning I/O's to different frequency channels. I/O's assigned with different frequency channels may communicate concurrently with virtually no interference, provided that undesired frequency channels for a given I/O are filtered out properly. FDMA-I and CDMA-I may be combined into a multicarrier CDMA-I, whereby concurrent I/O transmissions are accomplished by properly assigning codes and frequencies to each I/O pair.



Fig. 1 Comparison of Interconnect Schemes

Figure 1 compares new schemes based on *CDMA-I* and *FDMA-I* against the traditional *TDMA-I* in terms of two critical features for interconnects used in future ULSI systems: namely, aggregate data rate and re-configurability. Interconnect systems based on the *multi-carrier CDMA-I* achieve the highest aggregate data rate and re-configurability. The high aggregate data rate is a result of the increased bandwidth made available through the use of more than one frequency channel. The high re-configurability arises from the increased number of combined code and frequency channels (or  $N_c$  and  $N_f$ , which denote

respectively the number of code channels and the number of frequency channels) now available to the system to dynamically assign based on the specific operation requirement. Note that both *FDMA-I* and *CDMA-I* have similar degrees of reconfigurability but *FDMA-I* has higher aggregate data rates than *CDMA-I* due to the additional bandwidth made available through multiple frequency channels. In theory, *TDMA-I* has the same degree of re-configurability as either *FDMA-I* or *CDMA-I* in that a *TDMA-I* scheme can re-configure based on rescheduling and reassigning time slots for a given set of I/O's depending on the operation needs. In principle, the number of time slot  $N_t$  can be made equal to  $N_c$  or  $N_f$ . However, even

though an increase in  $N_t$  results in proportional reduction in the average data rate per I/O pin, the burst data rate for each I/O pin still remains the same as the aggregate data rate. The high I/O bust data rate makes implementation difficult due to the need for high-order modulation and time-domain equalizers at high operating speed. For *CDMA-I* and *FDMA-I*, on the other

hand, the burst data rate per I/O is inversely proportional to  $N_c$ 

and  $N_f$ , which simplifies the signal processing required for

each I/O. In particular, for *CDMA-I*, Rake receiver architecture may be employed to compensate for time dispersion in the transmission media but it requires much less complexity than a time-domain equalizer [14]. To put this into the perspective of practical implementation, wired *TDMA-I* has limited reconfigurability due to difficulty in increasing  $N_t$  without excessive complexity and power dissipation in the transceiver system design.

The re-configurability of wired *TDMA-I* may be improved with the single-carrier RF-interconnect scheme (or *SCRFI*) shown in Fig 1. The *SCRFI* uses only one frequency channel and achieves similar throughput as a wired *TDMA-I* system. However, it is able to achieve higher re-configurability than *TDMA-I*, since the transmission medium is no longer limited to fixed wiring but rather may be wirelessly broadcasted through coupling capacitors to communicate with different receivers. Such a scheme not only simplifies the fabrication process by eliminating the vertical metal studs needed in future 3D IC but also reduces the noise and DC power consumption. Transmission through capacitor-coupling can also be applied to *FDMA-I*, *CDMA-I*, and *Multi-Carrier CDMA-I* (or *MCCDMA-I*) as well.

#### Conclusion

In this paper, we will review recent progress in each of the new interconnect schemes described in Fig. 1 and discuss their applications in future ULSI interconnect implementations and architecture designs of next generation computer/processor systems. We will first describe a wired *CDMA-I* that achieves channel re-configurability while providing simultaneous multi-I/O's services. We will then discuss the wired *FDMA-I* that is able to achieve multi-band (or multi-mode) channel communications. Subsequently, *SC-RFI* specifically designed for 3D IC will be presented. Finally, wireless inter-chip communication based on *MCCDMA-I* will be discussed along with other system applications such as CDMA dynamic random access memory (*CDMA-DRAM*) and re-configurable interconnect for next generation systems (*RINGS*).

### References

- K. C. Saraswat and F. Mohammadi, "Effect of interconnection scaling on time delay of VLSI circuits," *IEEE Trans. on Electron Devices*, vol. ED-29, pp. 645-650, 1982.
- [2] J. D. Meindl, "Integration limits on 21<sup>st</sup> Century Gigascale Integration", IEEE Interconnect Technology Conference - Short Course, San Francisco, CA, May 31, 1998.
- [3] M. T. Bohr and Y.A. El-Mansy, "Technology for advanced high performance microprocessors," *IEEE Trans. on Electron Devices*, vol. ED-45, pp. 620-625, 1998.
- [4] J. L. Zerbe, C.W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W.F. Stonecypher, A. Ho, T.P. Thrush, R.T. Kollipara, M.A. Horowitz, K.S. Donnelly, "Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell", IEEE Journal of Solid-State Circuits, Volume: 38, pp.2121 – 2130, Dec. 2003.
- [5] K. Farzan, D.A. Johns, "A CMOS 10-gb/s power-efficient 4-PAM transmitter", IEEE Journal of Solid-State Circuits, Volume: 39, pp.529 – 532, March 2004.
  [6] Farjad-Rad, R.; Yang, C.-K.K.; Horowitz, M.A.; "A 0.3-µm CMOS 8-Gb/s 4-
- [6] Farjad-Rad, R.; Yang, C.-K.K.; Horowitz, M.A.; "A 0.3-µm CMOS 8-Gb/s 4-PAM serial link transceiver", IEEE Journal of Solid-State Circuits, Volume: 35, pp.757 – 764, May 2000.
- [7] M. F. Chang, V.P. Roychowdhury, et al., "RF/Wireless Interconnect for Interand Intra-Chip Communications", Proceedings of the IEEE, Vol. 89, No. 4, April 2001.
- [8] M. F. Chang, Hyunchol Shin and Liyang Zhang, "RF-interconnect for future inter- and intra-ULSI communications", IEEE International Electron Devices Meeting (IEDM), Technical Digest, 2001 pp. 23.4.1 -23.4.4, Dec. 2001.
- [9] Hyunchol Shin, Zhiwei Xu and M.F. Chang, "RF-interconnect for multi-Gb/s digital interface based on 10 GHz RF-modulation in 0.18μm CMOS" Microwave Symposium Digest, 2002 IEEE MTT-S International, vol.1, pp.477-480, 2002
- [10] Zhiwei Xu, Hyunchol Shin, Jongsun Kim, M.F. Chang and Charles Chien, "Giga bit/s CDMA-Interconnect Transceiver Chip-Set with Multi-level Signal Data Recovery for Re-Configurable VLSI System", IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp. 322-323, San Francisco USA, Feb. 2003
- [11] Jongsun Kim, Zhiwei Xu and M. F. Chang, "A 2-Gbps/pin Source Synchronous CDMA Bus Interface with Simultaneous Multi-Chip Access and Reconfigurable I/O Capability", Proceedings of the IEEE Custom Integrated Circuits Conference (CICC), pp. 317-320, Sep., 2003, USA.
- [12] Jongsun Kim, Zhiwei Xu and M. F. Chang, "Reconfigurable Memory Bus Systems using Multi-Gbps/pin CDMA I/O Transceivers" Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Vol.2, pp. II-33-36, May, 2003.
- [13] Qun Gu, Z. Xu, J. Kim, J. Ko, M. F. Chang, "Three-Dimensional Circuit Integration Based on Self- Synchronized RF-Interconnect using Capacitive Coupling", 2004 Symposium on VLSI Technology and Circuits (VLSI), Hawaiian, pp. 96-97, USA, June 2004.

## Advanced RF/Baseband Interconnects for Inter- and Intra-ULSI Communications

M. Frank Chang High-Speed Electronics Lab., Electrical Engineering Department University of California, Los Angeles

mfchang@ee.ucla.edu

High Speed Electronics Laboratory

UCLA

### **Future Interconnect System Requirements**

- Ultra-high data rates (>100Gbps/pin or 20Tbps aggregate, defined as total sum of data rate for each pin or within a system of chips) and low latency
- Concurrent multi-I/O's service for simultaneous/bidirectional communications on a shared transmission medium
- Realtime reconfigurability in connectivity and bandwidth for optimized channel efficiency and faulttolerance

High Speed Electronics Lab

 Compatible with mainstream SOC and SIP technologies for low cost system insertion

UCLA

### **Advanced Interconnect Schemes**

- CDMA-I
  - Code divided channels (N<sub>c</sub>) based on direct sequence spread spectrum (DSSS)
  - Flexible in reconfiguration by software code assignment
- FDMA-I
  - Frequency divided channels (N<sub>f</sub>)
  - Data rate for individual channel proportional to its bandwidth allocation
- Combined FDMA/CDMA-I (Multi-Carrier CDMA-I)
  - FDMA/CDMA combined multiple access
     Reduced burst rate per channel leads to low transceiver complexity and low
    power consumption
  - Synchronous access can be realized by using orthogonal codes to eliminate the cross channel interference

UCLA

High Speed Electronics Laboratory

# Issues of Traditional TDMA Interconnect (*TDMA-I*)

High Speed Electronics Laboratory

Outline

Future Interconnect System Requirements

□ Issues of Traditional TDMA-Interconnect

New Interconnects System Applications

Advanced Interconnects

- CDMA-Interconnect - FDMA-Interconnect

- FDMA/CDMA-Interconnect

Comparison of Interconnects

Summary

UCLA

UCLA

- Space/Time multiplexing only, does not offer concurrent and bi-directional communications for distributed multi-I/Os
- High I/O burst data rate (equals aggregate data rate) demands high-order modulation and time-domain equalization
- Limited reconfigurability because rescheduling or reassigning N<sub>t</sub> (number of time slots) inevitably increases the complexity and power consumption of transceiver system
- · Difficult to re-allocate bandwidth to each I/O pair in realtime
- · Switched by hardware, cannot take full advantage of packet switching
- · Poor security, source-destination pairs physically identifiable on wafer
- Large number of I/Os, inefficient use of wafer real estate due to bonding, soldering and packaging constraints

High Speed Electronics Lat

- · Inadequate line shielding, vulnerable to noise and interference
- · Low dynamic testability for "known-good-die"

CDMA-Interconnect (CDMA-I)

UCLA

High Speed Electronics Laborator









## SS CDMA-I Bus Transceiver and Test module



50 ohm Terminated Transmission line

UCLA

□ The new chip is being fabricated in SAMSUNG 0.10-um **DRAM Process** and packaged in a WBGA

Test PCB module with four (2X2) chips T onition pins





### **Comparison of DRAM Bus Interface Standard**

|                | # of Parallel<br>high-speed<br>Data<br>Channels | # of Parallel<br>High-speed<br>Address<br>Channels | # of Parallel<br>High-speed<br>Clock<br>Channels | Total<br># of High-<br>speed<br>Data & Add.<br>Channels | Min. # of<br>Shielding<br>Channels<br>for<br>Data & Add. | Data Rate<br>(Mbps) | Total<br>Data<br>Bandwidth |
|----------------|-------------------------------------------------|----------------------------------------------------|--------------------------------------------------|---------------------------------------------------------|----------------------------------------------------------|---------------------|----------------------------|
| SDRAM (PC-133) | <u>64</u>                                       | 12                                                 | 1                                                | <u>76</u>                                               | -                                                        | 133                 | <u>1.1 Gbyte/s</u>         |
| Rambus DRAM    | <u>16</u>                                       | 8                                                  | 2<br>(differential)                              | 24                                                      | 24                                                       | 400 * 2             | <u>1.6 Gbyte/s</u>         |
| DDR(SSTL-2)    | <u>64</u>                                       | 12                                                 | 8<br>(data strobe)                               | <u>76</u>                                               | -                                                        | 133 * 2             | 2.1 Gbyte/s                |
| CDMA DRAM      | 4                                               | 2                                                  | 2<br>(differential)                              | <u>6</u>                                                | <u>6</u>                                                 | 800 * 4             | <u>1.6 Gbyte/s</u>         |

RDRAM uses 2 differential (CTM/CTMB.CFM/CFMB) clock lines for source synchronous clocking DDR uses 8 data strobe (DQS) lines for source synchronous clocking CDMA DRAM can use 2 differential clock or CDR for source synchronous clocking Both SDRAM and DDR DRAM has some additional command lines which are not listed above

UCLA

High Speed Electronics Lab

# **New Interconnect System Applications (II)**

High Speed Electronics Laboratory

### **Vertical Interconnect in 3D Integration**

High Speed Electronics Laboratory



UCLA

Courtesy: Prof. Jason Woo, UCLA

Issues:

**Traditional 3D integration** requires complex etching/alignment /metallization for vias and studs

### Solution:

Use capacitor-coupled **RF-Interconnect to avoid** complex processing





## **Output/Input Waveforms**





## **Output Signal Eye Diagram**



## **SS RFI Performance**

| PRBS Input Signal    | 3Gbps                 |  |  |
|----------------------|-----------------------|--|--|
| Carrier              | 10GHz                 |  |  |
| Jitter               | 1.28ps rms            |  |  |
| BER                  | 1.2x10 <sup>-10</sup> |  |  |
| Coupling Capacitance | 60fF                  |  |  |
| Active Chip Area     | 0.02mm <sup>2</sup>   |  |  |
| Power Consumption    | 4mW from 1.8V supply  |  |  |

### **UCLA**

High Speed Electronics Labora



## Summary

- Traditional *TDMA-I* does not meet future SOC/SIP interconnect needs in dynamic channel/bandwidth re-allocation, concurrent simultaneous multi-I/O service and bi-directional communications on shared transmission medium
- New Interconnects based on additional CDMA and FDMA algorithms (*CDMA-I, FDMA-I and Multi-Carrier CDMA-I*) can satisfy the above needs and would alter the paradigm of ULSI data communications and enable reconfigurable architecture designs for future computing/processing systems



High Speed Electronics Laboratory