An Analysis of MOS Current Mode Logic for Low Power and High Performance Digital Logic by Jason Musicer Research Project Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination: Committee: Professor Jan Rabaey Research Advisor (Date) ******* Professor Robert Broderson Second Reader (Date) Abstract In this work, MOS Current Mode Logic (MCML) is analyzed for application to low power, mixed signal environments. A small MCML cell library is developed and optimized for several different performance requirements. The cells are then applied to the generation of ripple adders and piplelined CORDIC structures and compared with equivalent CMOS circuits. MCML CORDICs are designed which can operate from 125MHz to 310MHz with power consumption varying between 4.3mW and 18.6mW. These power results are up to 1.5 times less than CMOS CORDICs with equivalent propagation delays. Design was done in a 0.25µm standard CMOS process from ST Microelectronics. Acknowledgment Over the year and a half that it has taken to complete this project, many people have touched my life and helped to contribute to this research. Some of them have given me academic advice, some have given me new ideas to try, and others have just been there for moral support. While it is not possible to thank everyone who has contributed to my two great years at UC Berkeley, I will attempt to point out a few people who have especially gone out of their way. First, I would like to thank my advisor, Jan Rabaey. Without his guidance and advice, this work would never have even begun, nevermind end. His original ideas were the seeds that grew to become this thesis and his ideas and conclusions have shaped it and focused it along the way. Jan's leadership at the Berkeley Wireless Research Center along with Bob Brodersen have created an environment of support that has made research easier and much more fun. I cannot thank him enough for all of the support. The two biggest contributors to this project have been Antonio Lei and Brian Etscheid. In the last several months, Antonio has been a tremendous help and is responsible for all of the layout for the circuits designed. I can't thank him enough for all of his unselfish and dedicated contributions. Brian was my partner at the beginning of this project and has slaved away way too many nights at the BWRC running simulations and contributing ideas. Thank you both. The researchers at the BWRC have also been a tremendous help throughout this project. Paul Husted, Rhett Davis, Johan Vanderhaegen, David Sobel, and many others have been great sounding boards for ideas and have saved me incredible amounts of time with their vast knowledge. Thank you to everyone at the Center. Most importantly, a warm thank you to my friends and family whose support and guidance have allowed me to keep my sanity in the world of graduate school. Table of Contentsain ........................................................................................................................................................... 17 3.3.2 Current Matching Ratio (CMR) ............................................................................................................... 18 3.3.3 Voltage Swing Ratio (VSR)....................................................................................................................... 18 3.3.4 Signal Slope Ratio (SSR) .......................................................................................................................... 19 3.3.5 RFN and RFP Voltage Limits .................................................................................................................. 20 3.3.6 Area ........................................................................................................................................................... 21 3.3.7 Delay, Power, Power-Delay, Energy-Delay.............................................................................................. 21 3.3.8 Power Supply Switching Noise ................................................................................................................. 21 3.4 DESIGN PARAMETERS .................................................................................................................................. 21 3.4.1 VDD........................................................................................................................................................... 22 3.4.2 Voltage Swing (∆V)................................................................................................................................... 24 3.4.3 Current (I) ................................................................................................................................................. 25 3.4.4 Differential Pair Transistor Sizes (WA, LA, WB, LB, WC, LC).............................................................. 26 3.4.5 PMOS Load Transistor Sizes (WRFP, LRFP)......................................................................................... 27 3.4.6 NMOS Current Source Transistor Sizes (WRFN, LRFN) ...................................................................... 28 3.5 MCML GATE OPTIMIZATION PROCEDURE .................................................................................................. 29 3.6 MCML GATE OPTIMIZATION RESULTS ....................................................................................................... 31 CHAPTER 4 MCML GATE LAYOUT.................................................................................................................. 38 4.1 LOCAL VS. GLOBAL EFFECTS ............................................................................................................................ 38 4.2 LAYOUT TOPOLOGY .......................................................................................................................................... 39 4.3 TRANSISTOR MATCHING.................................................................................................................................... 40 4.4 LAYOUT RESULTS.............................................................................................................................................. 42 4.4.1 Parallel vs. Anti-Parallel MCML Layout ................................................................................................. 42 4.4.2 MCML and CMOS Layout vs. Schematicoltage Swing, ∆V .................................................................................................................................... 49 5.2.2 Supply Voltage, VDD ................................................................................................................................ 51 5.3 VOLTAGE SWING CONTROL CIRCUITRY ............................................................................................................ 51 5.4 CURRENT CONTROL CIRCUITRY ........................................................................................................................ 56 5.5 SUPPORT FOR CURRENT VARIABILITY ............................................................................................................... 58 5.6 GATE DRIVE STRENGTH SCALING ..................................................................................................................... 61 5.7 CONVERSION CIRCUITRY ................................................................................................................................... 63 CHAPTER 6 SYSTEM DESIGN EXAMPLE : RIPPLE ADDERS .................................................................... 65 6.1 MCML FULL ADDER DESIGN ............................................................................................................................ 65 6.2 BASIC RIPPLE ADDER DESIGN ........................................................................................................................... 67 6.3 MODIFIED MCML RIPPLE ADDERS WITH CURRENT RATIO ADJUSTMENT ........................................................ 70 CHAPTER 7 SYSTEM DESIGN EXAMPLE : CORDIC .................................................................................... 75 7.1 CORDIC ALGORITHM ....................................................................................................................................... 75 7.2 CORDIC ARCHITECTURE ................................................................................................................................. 76 7.3 CIRCUIT OPTIMIZATION ..................................................................................................................................... 77 7.4 RESULTS ............................................................................................................................................................ 78 CHAPTER 8 CONCLUSIONS................................................................................................................................ 82 8.1 SUMMARY ......................................................................................................................................................... 82 8.2 FUTURE WORK .................................................................................................................................................. 82 APPENDIX A DERIVATION OF IDEAL MCML GATE PERFORMANCE .................................................. 84 A.1 GOAL ................................................................................................................................................................ 84 A.2 MCML GATE WITH IDEAL LOAD ..................................................................................................................... 84 A.3 MCML GATE WITH NON-IDEAL LOAD ............................................................................................................. 86 REFERENCES ......................................................................................................................................................... 92 Chapter 1 Introduction 1.1 Motivation The recent advances in VLSI technology have allowed rapid growth in the area of portable electronic devices. Laptop computers, cellular phones, and personal desktop assistants have all become commonplace items in people's lives. One of the primary consumer complaints of these devices is the short battery life and/or the extra weight of the batteries due to the high power consumption of the circuitry. As CMOS process technology scales and demand for more processing power increases, it can be shown that the power consumption of future IC's will increase over time if significant architectural changes are not made [1]. It is therefore critical in future circuits that power be minimized beyond the traditional constraints of packaging cost and heat dissipation. As device density increases, it is also extremely desirable to integrate analog and digital circuitry onto the same die for many DSP and communications systems. High levels of integration will be required in order to reduce total system area and drive down production costs. This integration has been delayed due primarily to the difficulty in designed high precision analog circuitry in the presence of extremely hostile digital switching noise. These difficulties will also increase as process technology scales due to fundamental challenges in high precision analog design at low supply voltages in digital CMOS technology. Either significant advances in analog design techniques will be required or digital designers will be forced to adapt their design style or process technology. 1 A digital circuit style that seems to be promising in both reducing power consumption and providing an analog friendly environment is MOS Current Mode Logic (MCML). While bipolar CML, a derivative of emitter coupled logic (ECL), has been used for years in high performance applications, it has become less desirable over time due to its high static power consumption and reliance on bipolar processing. In [2], MCML was analyzed and a 64-bit adaptively pipelined adder was developed and simulated. It was demonstrated in that paper that MCML could dissipate less power than equivalent CMOS circuitry as well as adjust for clock skew and environmental or process variations. In this project, a much broader analysis of MCML is presented with some theoretical development and application to other circuit blocks. Near-minimum sized transistors are used in this project instead of the significantly larger devices in [2] and power consumption is measured for a wide variety of circuit blocks, performance levels, and design techniques. It will be shown that area efficient MCML can actually consume significantly less power than equivalent CMOS circuitry while maintaining many of the other benefits of traditional CML such as reduction in dI/dt effects, common mode noise immunity, and process and voltage variation immunity. The most important goal of this project is to evaluate the appropriate domains of performance and power requirements in which MCML presents benefits over current logic styles. 1.2 Thesis Organization This thesis will be organized as follows: Chapter 2 will present the basic principles and guidelines for design with MCML logic. Basic gates will be described and a simulation framework for evaluating gate level performance will be discussed. Chapter 3 discusses the design methodology and optimization process for MCML gates. Different trends in transistor 2 sizing, supply voltage, current levels, and voltage swings will be discussed and analyzed in detail. Chapter 4 discusses the issues and presents results for the layout of MCML gates and compares to equivalent CMOS gate layouts. Chapter 5 discusses many of the system level design issues in MCML such as control circuitry, current variability, and conversion circuitry. Chapter 6 applies the results of chapters 2-5 to the design of ripple adders and demonstrates the effects of several system level design decisions. Chapter 7 presents the CORDIC algorithm and describes the circuit implementation of a fully pipelined CORDIC. An equivalent CMOS CORDIC is also designed and analyzed to give a fair basis for comparison. Chapter 8 is the conclusion and gives some overall analysis of the feasibility of MCML use and its potential benefits. 3 Chapter 2 MCML Gate Design Basics 2.1 Ideal Gate - Operation and Theory In order to understand the issues in designing with real MCML gates, it is beneficial to first derive some of the properties and equations of a general, ideal MCML gate. This ideal gate is presented in figure 2.1a below and consists of three main parts: the pull up resistors, the pull down network switch, and the current source. R R Out Out In0 In0 In1 In1 Pull Down Network InN InN Inputs I Figure 2.1a : Basic MCML Gate The inputs to the pull down network (PDN) are fully differential. In other words, the true and complement off all logical inputs must be presented to the gate. The PDN can implement any logic function but must have a definite value for all possible input combinations. In general, the design of the MCML pull down network is similar to other differential logic styles such as differential cascode voltage switch logic (DCVSL) or differential split-level logic (DSL) [3]. 4 Unlike DCVSL or DSL, the pull down network in MCML circuits is regulated by a constant current source. The pull down network steers the current I to one of the pull up resistors based upon the logic function being implemented. The resistor connected to the current source through the PDN will have current I and a voltage drop equal to ∆V = I × R . The other resistor will not have any current flowing through it and its output node will be pulled up to VDD in the DC state. If we look at the differential output voltage, the total voltage swing is set exclusively by the amount of current (I) and the value of the pull up resistance (R). This voltage swing is generally much smaller than VDD, of the order of a few hundred millivolts. With this simple model in mind, we can derive some basic transient properties for a circuit composed of MCML gates. For a more detailed analysis and proofs of the following equations, please see Appendix A. For simplicity, let's assume that our circuit is a linear chain of N identical gates, all with identical load capacitance C on each output node. The total propagation delay of the chain of gates will be proportional to: DMCML = NRC = N × C × ∆V I The power consumption of a digital gate is typically broken down into its static and dynamic components. In the case of MCML, it can be proven (see App. A) that the sum of the static and dynamic components is a constant to first order. With this assumption, we can write expressions for power, power-delay, and energy-delay [2]: PMCML = N × I × V dd NC∆V PD MCML = NIV dd × = N 2 × C × ∆V × V dd I 3 2 2 NC∆V N × C × V dd × ∆V 2 ED MCML = N C∆VV dd × = I I 5 For comparison, the delay, power, power-delay, and energy-delay for static CMOS logic are well known and approximated by [4]: D CMOS = PCMOS N × C × V dd k α × (V dd − V t ) 2 1 = N × C × V dd2 × D CMOS PD CMOS = N × C × V dd2 ED CMOS V dd2 C2 = N ×2× × k (V dd − V t )α 2 where k and α are process and transistor size dependent parameters. Note that the above equations assume that the CMOS circuitry is being clocked at a frequency equal to the inverse of the propagation delay. One interesting property to note is that MCML circuits do not have a theoretical minimum to the energy-delay product whereas the CMOS circuits do [2]. A designer can arbitrarily reduce the ED product by increasing the current for a given C, VDD, and voltage swing. In reality, this is not possible for very large currents because the robustness of the circuitry will deteriorate if no other changes are made. Possibly the most important conclusion from the above equations comes from the effect of logic depth, N. The performance of MCML gates in relation to CMOS decreases linearly with N. This is due to the fact that MCML consumes static power, even when not switching. It is very important therefore in MCML circuits to maintain a shallow logic depth. In slowly clocked circuits, CMOS will not consume as much power as MCML, but in circuits with high performance requirements, MCML can have significantly better power-delay or energy-delay. 6 Much more analysis will be given later in this report as to the actual crossover points between MCML and CMOS performance. Another interesting property is that the energy-delay is proportional to the square of the voltage swing for MCML. This fact encourages the use very low swing circuits. Once again, the limiting factor is the robustness of the circuitry. For mixed signal environments, the constant current supplied by VDD is extremely desirable. The dI/dt effects are negligible in comparison to CMOS circuits and the current variation is theoretically 0. There will be some current change during switching due to non-idealities, but the change is less than 5% in circuits simulated. The circuits are also significantly more robust against power supply noise due to their inherent common mode rejection. 2.2 MCML Inverter and Control Circuitry Now that we have seen and analyzed an ideal MCML gate, let's begin to deal with the nonidealities of CMOS processing. The first real circuit to analyze is the MCML Inverter/Buffer shown in figure 2.2a. Since MCML is a differential logic style, the buffer and inverter function are identical topologically and only require switching of the output or input sense. The pull down network switch is implemented with a standard nmos differential pair controlled by the single input. The current source is an nmos device with a fixed gate voltage (RFN) working in the saturation region. The load resistors are pmos devices with fixed gate voltages (RFP) and are designed to be operated in the linear region in order to model resistors. 7 RFP Out (Out) Out (Out) In In RFN Figure 2.2a : MCML Inverter/Buffer The goal of the nmos differential pair is to switch the current provided by the current source from one side to the other. Ideally, all current will only travel down one path and the "off" path will have zero current flowing through it under DC conditions. In reality, some current will always flow in the "off" path and cause a reduction in the true signal voltage from VDD. The quality of current switching increases with larger input voltage difference (Vid) or larger W/L of the PDN transistors and decreases as larger currents are used. The current source for MCML circuits is implemented with a single nmos device. While several different architectures are known for current sources [6] (e.g. cascoding), a single device implementation was decided upon for area efficiency. It is important to maintain a relatively small transistor so that total cell size is not dominated by the current source. It is also desirable to use a non-minimum length device for this current source in order to achieve higher output impedance and better current matching across gates. More detail will be given in Chapter 5 as to how to set the RFN voltage, but a simple way is to use a current mirror [6]. The load resistances are implemented with single pmos devices. It is desirable to make these devices as close to minimum size as possible, unlike standard analog circuits. Increasing the width of these devices will decrease the linearity and also increase the capacitance. The RFP 8 voltage is controlled by a simple feedback circuit shown in figure 2.2b known as the Variable Swing Controller (VSC) similar to that used in [2]. VDD Vlow VDD Vlow + VDD Vlow Vlow RFP - Inputs RFN Figure 2.2b : Variable Swing Controller (VSC) The VSC adjusts the gate voltage (RFP) of the pmos loads so that the equivalent DC resistance is equal to the desired voltage swing divided by the current. The inputs to the VSC are the RFN voltage (i.e. current level) and the low output voltage, Vlow = VDD - ∆V. The VSC then generates the RFP voltage by using a model of the gate to be controlled. More will be said about the issues of the VSC design in Chapter 5. 2.3 Other MCML Gate Topologies Now that we have a basic understanding of the MCML inverter and ideal VSC, we can begin to construct a small library of gates. The goal was not to build a complete standard cell library, but rather to develop a small collection of typical gates and functions. The issues of parameter optimization will be discussed in depth in the next chapter while here we present a general framework for implementing logic functions in MCML. 9 All MCML gates have one current source device and two load devices. Different logic functions are implemented with different pull down networks. The pull down networks are identical to those used in ECL logic and are composed of sets of differential pairs. The implementation of a logic function can be determined immediately from a creation of a Binary Decision Diagram (BDD). BDD's are used extensively in the area of logic synthesis and CAD to visualize boolean optimizations and can also be used in determining MCML gate structure. A general analysis of the formation and optimization of BDD's is beyond the scope of this report, but please refer to [5] for more information. Instead, we will look at a single example. Let's try to implement the following function in MCML: F = ABC + B'D + ACD + A'BC' We can begin to factor this expression until we have a completely specified and fully factored equation: F = A[BC+B'D+CD] + A'[B'D+BC'] F = A[B(C+CD) + B'(D+CD)] + A'[B(C') + B'(D)] F = A{B(C) + B'(D)] + A[B(C') + B'(D)] The BDD for this expression is shown in figure 2.3a and the implementation of the pull down network is shown in figure 2.3b. B 1 F A 1 0 B 1 CD C 1 1 D 0 0 C 1 1 0 CD D B B B D A 0 1 DC 0 B C F 0 A 0 Figure 2.3b : MCML Pull Down Network for F Figure 2.3a : Binary Decision Diagram for F 10 Since it is desirable to reduce the logic depth of the nmos pull down circuitry to preserve both DC and transient properties, only functions of three levels or less are considered. While general BDD algorithms can achieve optimized trees for any logic function, many of the well known functions can be easily created by hand. Several of these functions are shown in figure 2.3c with their corresponding current sources and pull up devices included. RFP RFP RFP OUT RFP RFP OUT Out Out Out Out C C C C B D1 B A A D1 D0 S D0 OUT D B B B D B A A S C RFP OUT CLK CLK RFN RFN RFN RFN XOR3 AND/NAND/OR/NOR D Latch 2:1 MUX Figure 2.3c : MCML Gate Examples One interesting property to note is the relative homogeneity of gate topologies. If we look at the leftmost gate in figure 2.3c, we can see that the AND, OR, NAND, and NOR functions all have the exact same topology and therefore the same sizing, delay, power, etc. The only difference in implementing these functions is the ordering of the inputs and outputs. This uniformity leads to more predictability in the timing and area of cells and reduces the need for boolean manipulation in order to transform into inverting logic. The basic storage element used for MCML is the D-Latch shown above. The latch has a simple cross-coupled structure and can be used to form a D Flip Flop with a master-slave approach [3]. The XOR and MUX gates shown above have a fairly compact structure compared to equivalent static CMOS implementations and are expected to perform well. The design of adder cells will be addressed in detail later in chapter 6. 11 2.4 CMOS Gate Design In order to give a fair comparison of MCML gates to standard implementations, a set of static CMOS gates was also created. The CMOS versions of each block were optimized for low power. Traditional sizing rules were used in which the pmos devices were made twice as wide as the nmos devices and all series transistors were made wider to achieve the same first order delays. While logic styles such as dynamic logic or pass-transistor logic were generally not used, several of the blocks such as XOR, MUX, and adders were designed with transmission gates for better performance and power efficiency. Since this project does not deal with effects of long interconnect, small load capacitances were used and hence all gates were minimum sized for low power [4]. Three dynamic latches and three flip-flops were analyzed for CMOS sequential circuits: C2MOS, TSPC, and Doubled C2MOS. A more detailed description of these latches can be found in [3]. All of these dynamic blocks were compared to static CMOS implementations and were found to be significantly better in power and delay. The C2MOS D-FF's were used in the CMOS implementation of the CORDIC to be discussed in chapter 7. 12 Chapter 3 MCML Gate Optimization 3.1 Optimization Goals and Challenges The focus of this chapter will be to describe and analyze the performance of MCML gates as a function of numerous design parameters. The goal of this analysis is to allow a designer to quickly optimize transistor level gate designs while exploring different system choices. This optimization is necessary with MCML designs because many of the system level decisions will not exhibit their true performance tradeoffs unless the proper transistor level adjustments are made. A future extension of this work would be to develop an automated design flow for generating optimized MCML gates. An equivalent optimization problem exists for static CMOS design but is much better understood. The only two real parameters which effect gate performance in CMOS are VDD and transistor sizes. These two parameters are typically chosen independently and general guidelines for transistor sizing are well known. In contrast, MCML optimization has many more degrees of freedom in the parameter selection. Furthermore, the parameters tend to be tightly coupled and do not allow independent selection. The following sections will be an attempt to limit the freedom in parameter selection by showing the trends in performance as different design decisions are made. The approach taken in this chapter is to take an individual gate and to optimize it under ideal system conditions. The effects of actual system level decisions will be shown chapter 5. For example, all of the analysis in this chapter assumes the use of a VSC matched perfectly to the 13 current gate although multiple of unmatched VSC's are commonly used in practice. There will also be complete freedom in the selection of system level parameters such as voltage swings or VDD on a per gate basis. These parameters are typically fixed across different gates in the same system but are allowed to vary between gates for this analysis. The reason for this simplistic analysis is to explore an upper bound on the achievable performance of MCML gates as a baseline to judge non-idealities. 3.2 Simulation Methodology Before beginning the parameter optimization, it is first necessary that a simulation methodology be fixed in order to evaluate performance. The goal of this methodology is to fairly produce standard performance metrics such as delay and power as a function of transistor sizing, voltage swing, VDD, current, etc. while introducing as few simulation artifacts as possible. Most parameter simulations were done at the transistor schematic level although some verification of layout effects was later performed. Designs were entered into Cadence using the ST Microelectronics 0.25um design library. Schematics were then netlisted and all simulations were performed in HSPICE. The first modification to standard simulation techniques was to buffer all inputs with inverters. Since many of the measurements being made are sensitive to input signal slopes, inverters were used to more closely model actual waveforms present on a chip. Between two and four inverters were used for buffering and were sized to produce about the same signal slope as the gate under test's output slope. The inverters were connected to different power supplies in order to not affect any power measurements. 14 The next modification made was to model realistic output loading conditions. For individual gate simulations in which we had no knowledge of surrounding circuitry, fanouts of 4 and 3 identical gates were used for CMOS and MCML blocks respectively. More fanout was used for CMOS to try to simulate the reduction in logic depth due to MCML's complementary nature [3]. The amount of capacitance added per fanout was equal to the measured input capacitance of the gate under test plus a fixed amount of wiring capacitance varying between 1fF and 10fF. Actual loading capacitance numbers were measured and used in situations where the circuit topology was known beforehand. The next simulation decision made was the method for comparing power dissipation of the two logic styles. It is well known that the power consumption of static CMOS is proportional to the clock frequency [4]. A better metric for evaluating the efficiency of a CMOS gate is to measure its energy per switching event or its power-delay product. This metric is independent of clock frequency and is a more fundamental measure of the gate. Unfortunately, not all switching events dissipate the same energy. The metric chosen for our purposes was the average energy per switch over all possible input switching combinations, including no switching. The probabilities of each input switching were taken to be 50% and the energy dissipation was measured for all switching combinations. Note that this average energy metric measurement requires the generation of 22N different input switching combinations, where N is the number of inputs. This is only feasible for very small N and this was done for 3 input gates or less. For circuits with more than 3 inputs, random waveforms were generated and applied to measure power dissipation. In contrast, MCML dissipates a nearly constant amount of power, independent of the clock frequency or input switching activity. In order to compare the two logic styles, we divided the 15 average energy per switch metric by the propagation delay for CMOS and found the total average power. The final methodology decision was in the realm of flip flop and latch evaluation. The propagation delay of a flip flop is taken as the sum of the setup time (tsetup) and the clock to output (clk2q) delays. It is well known that these delays are actually dependent upon each other. The technique employed in this project was to sweep the setup time given to the flip flop and to report the propagation delay as the minimum of the sum. 3.3 Constraints and Performance Criteria Now that the simulation environment is defined, we need to establish some metrics of performance as a basis for our optimizations. These metrics can be broken into two categories: hard constraints and optimization goals. Hard constraints place a limit on some performance metric which must not be violated. Optimization goals do not have any fixed requirements but should be minimized or maximized whenever possible. A summary of these metrics is shown in figure 3.3a: Hard Constraints Optimization Goals Gain : Av Area Current matching ratio (CMR) : Iact /Iref Power Power-Delay Voltage swing ratio (VSR): ∆Vout /∆Vin Signal slope ratio (SSR) Energy-Delay RFN and RFP voltage limits Power supply switching noise Figure 3.3a : Optimization Metrics The following sections describe the motivation behind the above optimization criteria. 16 3.3.1 Gain In standard CMOS circuits, one of the main qualities of robustness to noise is the mid-swing DC voltage gain [3]. Digital logic can only function correctly if there exists a point in the DC transfer curve where the gain is larger than 1. There are two primary reasons for the gain to be made larger than the absolute minimum: regeneration and bi-stability. Regeneration is the ability for a gate to produce an output voltage closer to the ideal voltage level than its input voltage. Bistability is a requirement in latches and flip-flops and assures that there are only two stable logic states in the system. Both of these metrics are helped by large DC voltage gains. Standard CMOS circuits naturally have large mid-swing voltage gains. In simple circuits simulated, gains of greater than 60 can be achieved with no additional effort. In constrast, MCML circuits do not naturally have high gains. Large gains can be achieved but at a tremendous cost in area and performance. Therefore, it is critical to design at or near the minimum requirements for voltage gain. Furthermore, MCML circuits do not suffer from the same noise constraints as CMOS circuits. Most of the noise which adversely affects CMOS circuits becomes common mode noise for MCML and is rejected by the differential logic. MCML circuits also generate significantly less switching noise than CMOS circuits and the environment will therefore be more conducive to low gain operation. The lower limit on voltage gain for this project was set at 1.4 for nominal conditions. The requirement is really that the gain be greater than 1 for all process, voltage, and matching conditions, but it was felt that a 40% margin would be sufficient for these variations. Later simulations verified that this margin was sufficient under typical variations and mismatch. 17 3.3.2 Current Matching Ratio (CMR) This constraint is referring to the amount of current flowing through the actual current source in comparison to the reference current source. This ratio is illustrated in figure 3.3b: Iref RFN Iact Figure 3.3b : Current Matching Ratio = Iact/Iref The parameter which is set by the designer is the reference current, Iref, but the acutal current flowing through the test gate is Iact. In order to achieve predictability in design, we would like the actual current to be close to our reference current. We allow the actual current to vary by 10% from the reference. The main parameters which affect this ratio are the output impedance of the current source and the supply voltage. 3.3.3 Voltage Swing Ratio (VSR) The ideal MCML gate contains a perfect current switch where all of the current flows down one side or the other. In reality, some finite amount of the current flows in the "off" path and the full current does not flow in the "on" path. The result of this non-ideality is a reduction in the output voltage swing. This problem is exacerbated by the fact that the quality of current switching is directly proportional to amount of input swing applied. It is theoretically possible to create a chain of gates which have a continuously degrading voltage swing. This does not occur in reality because of the heterogeneity of gates used. Some gates will reduce signal swing while 18 others will tend to regenerate swing. The mixture of gates tends to ensure preserved voltage levels but it is still desirable to place an upper bound on the amount of signal degradation of a single gate. We set this limit by constraining that the output voltage swing must be at least 98% of the input voltage applied. 3.3.4 Signal Slope Ratio (SSR) The output transient response of an MCML gate can be viewed as the sum of two events: the pull-up of one side of the gate and the pull-down of the other side. The sum of these two events creates a differential voltage swing which is viewed as the total signal. In an ideal MCML gate, each side's response is a first order system dominated by the same RC time constant (see Appendix A). The sum of these responses is a completely linear transition. In reality, many nonidealities exist in MCML gates. One of the most significant nonidealities is the nonlinearity of the pmos load resistances. The modified transient response is analyzed in Appendix A. The result of that analysis is that a direct tradeoff exists between a gate's propagation delay (tp) and its 10%-90% rise/fall time (trf). It is possible to make a circuit with a very fast pull-up and a very slow pull-down response. The overall response of this circuit will look like figure 3.3c below. Since the speed of the next gate will depend not only on the propagation delay but also on the output waveform shape of the previous gate, some control must be used to ensure reasonable rise/fall times. The metric used in this project is to take the ratio of the 10%-90% rise/fall time and the propagation delay: SSR = trf / tp. This metric is kept as low as possible but constrained to an absolute limit of 5. 19 V Input Waveform ∆V time tp Output Waveform Pull-down and Pull-up active Pull-up completes Pull-down only 80% ∆V time trf Figure 3.3c : Typical MCML Transient Response 3.3.5 RFN and RFP Voltage Limits The final hard constraint is the voltage limits on the control signals, RFN and RFP. For simulation and optimization purposes, we tend to use artificially ideal control circuits to generate these two voltages. It is therefore important to monitor the ideal control circuits and to make sure that they are producing voltages which could also be produced by real circuitry. The RFN signal sets the gate voltage on the nmos current source. It therefore needs to be kept a few hundred millivolts from both VDD and from ground if set by current mirrors. The RFP voltage is used to set the pmos load gate voltages and is allowed to be below ground (to be discussed in chapter 5). The constraint on RFP is that the total Vsg of the pmos devices must be kept below the process limit of 2.5V. 20 3.3.6 Area Since the goal of this optimization exercise is to be able to quickly evaluate transistor level tradeoffs, it would be nice to have an accurate estimation of cell area. Unfortunately, the layout of MCML cells is somewhat irregular and is difficult to predict. The approach taken in this project was to constrain the transistor widths and lengths at the schematic level to a maximum and then try to reduce the sizes whenever possible. This approach leads to a library of "minimum sized" cells. When larger driving strengths are required, these conditions are relaxed and all transistors are scaled in proportion to the needed drive strength. While this approach does not take into account routing area, it is the best guess that can be made without doing layout at each step. The sizes of the nmos current sources are constrained to W < 2.0u, L < 0.5u, the nmos pull down transistors to W < 1.5u, L = 0.25u, and the pmos loads to W < 1.5u, L < 1.5u. 3.3.7 Delay, Power, Power-Delay, Energy-Delay These metrics are the common interpretations and are used throughout this report to evaluate performance or power efficiency. 3.3.8 Power Supply Switching Noise This metric is used to evaluate the ability of the MCML circuits to coexist with sensitive analog circuitry. The metric used is the percentage change of the supply current from its DC average. 3.4 Design Parameters We can view our test environment described above as a computation engine which takes the user defined input parameters and the gate topology to be tested and produces a number of 21 performance metrics. The next step in the optimization procedure is to define and classify all of these input parameters to be optimized. The input parameters are listed in figure 3.4a: Parameter Name VDD ∆V I WA, LA WB, LB WC, LC WRFP, LRFP WRFN, LRFN LOADCAP Description Supply Voltage Input and Output Voltage Swing Current desired in current source Width and Length of first level pull down network nmos devices Width and Length of second level pull down network nmos devices (if needed) Width and Length of third level pull down network nmos devices (if needed) Width and Length of pmos load devices Width and Length of nmos current source The output loading capacitance Figure 3.4a : Input Parameter Description Please note that the WB, LB, WC, and LC parameters are only needed for two or three level gates. Also note that the total load capacitance (LOADCAP) is the sum of the input gate capacitance of the gate under test and some fixed interconnect capacitance. The reason for this type of loading is to prevent the optimization from unfairly using large devices and creating a large loading on the previous gate. The following sections are an attempt to show some of the effects on the performance criteria by varying each of the design parameters. As stated earlier, these effects are not independent of each other. This analysis is presented in order to give an intuitive feel for the design optimizations to be illustrated in section 3.5. 3.4.1 VDD The only true upper bound on the supply voltage is due to the process limits (2.5V) but it is typically desirable to lower VDD in order to reduce the power consumption. The power consumption of the circuitry is linearly proportional to the supply voltage and it should therefore be reduced as much as possible. 22 The main lower limit on the power supply voltage comes from the nmos current source. Reducing VDD too far hurts the output impedance of the current source and eventually pushes it out of the saturation region. One effect of this degradation is that the current matching ratio (CMR) begins to decrease and the current in the gate is reduced. Another effect is that the midband voltage gain (Av) is reduced. To illustrate the effects of supply voltage selection it is necessary to fix all of the other parameters. Figure 3.4b shows the effects of supply voltage selection on Av, CMR, current source output impedance (Ro), and power for an MCML inverter using 10uA of current, 400mV swing, 10fF of load capacitance (LOADCAP) and having the following transistor sizes : WRFN = 2.0um, LRFN = 0.5um, WRFP = 0.5u, LRFP = 0.6u, WA = 1.0u, LA = 0.25u. Gain vs. VDD 2.5 2 1 1.5 0.9 1 Volta ge Gain (Av ) 0.5 0.5 7 10 CMR vs. VDD 1.1 0.8 1 1.5 VDD (Volts) 2 0.7 Matchin g Ratio (CMR ) Current 0.5 1 1.5 VDD (Volts) 2.5 -5 Current Source Ro vs. VDD 3 2 2.5 2 2.5 Power vs. VDD x 10 2.5 6 10 2 1.5 5 10 Ro (Ohms ) 1 Power (W ) 0.5 4 10 0.5 1 1.5 VDD (Volts) 2 0 0.5 2.5 1 1.5 VDD (Volts) Figure 3.4b : Effects of VDD on MCML Inverter performance I = 10uA, ∆V = 400mV, WA = 1.0um, LRFP = 0.6um, LOADCAP = 10fF 23 It is evident from the above figure that Gain, CMR, and Ro all have definite rolloff points. It is therefore desirable to operate at VDD near this point which will vary for different current levels and gate topologies. 3.4.2 Voltage Swing (∆V) As seen in chapter 2, it is extremely desirable to reduce the voltage swing as much as possible in order to reduce the propagation delay of MCML. The lower limit on the voltage swing is determined by the gain and current switching requirements. As the voltage swing is reduced, the mid-transition output voltages become closer to VDD and reduce the output impedance of the pmos loads. The quality of the current switching will also be reduced and the voltage swing ratio (VSR) will suffer. These degradations of the gain and VSR can be fixed by adjusting other parameters such as transistor sizes or VDD. The lower bound on the swing must also take into account possible circuit mismatch effects. In general, the smallest amount of voltage swing used in this project is 200 mV although lower swings could be used with extremely careful layout and noise management. The upper bound on the voltage swing comes from the nonlinearity of the pmos loads and the effects on the signal slope ratio (SSR). As the voltage swing is increased, the pmos device on the side which is being pulled down is required to move closer to its Vdsat voltage. This leads to eventual entering of the saturation region and extreme nonlinearity. This can be adjusted by increasing the length of the pmos device but increases the propagation delay. Another upper bound on voltage swing comes from the nmos current source. If the voltage swing is too large, the pull down side will approach ground and force the current source out of saturation. The tradeoff among linearity, gain, and speed can be seen in figure 3.4c: 24 Voltage Gain (Av) 3 Propagation Delay (tp) 160 140 2.5 120 2 100 Gain1.5 80 Dela y (ps ) 1 0.5 60 0 0.5 1 Voltage Swing (V) 40 1.5 Voltage Swing Ratio (VSR) 1.05 0 1.5 Signal Swing Ratio (SSR) 20 1 0.5 1 Voltage Swing (V) 15 0.95 10 0.9 SSR 5 = trf / t p VSR0.85 = Vout/Vin 0.8 0 0.5 1 Voltage Swing (V) 0 1.5 0 0.5 1 Voltage Swing (V) 1.5 Figure 3.4c : Effects of Voltage Swing on MCML Inverter I = 10uA, WA = 0.5uA, VDD = 2V, LRFP = 0.9um, LOADCAP = 10fF 3.4.3 Current (I) This is the most general of the design parameters and is varied over a wide range in this analysis. The majority of the next section will be dedicated to showing the trends of MCML gates at different current levels. The lower bound on the current comes from severe signal SSR and VSR effects. The upper bound is set by the maximum transistor sizes allowed for a "minimum" sized cell. The circuits are tested in this project in the range of 0.5uA to 100uA (for near minimum sized transistors). 25 3.4.4 Differential Pair Transistor Sizes (WA, LA, WB, LB, WC, LC) The sizing of transistor lengths and widths is the design parameter with the greatest degree of freedom and effects almost all performance criteria. In order to limit the scope of the discussion, we will first make a few assumptions. First, while each transistor width and length could be independently varied, we assume that all differential pair transistors are matched to be the same size. The second assumption is that the length of all transistors in the pull down network will be kept to the minimum (0.25u) since there is almost no benefit from increasing the length. In general, increasing the width of the differential pair transistors will increase the voltage gain but it will also increase the input and output capacitance. This leads to a direct tradeoff between performance (both delay and area) and robustness (voltage gain). It is desirable to use the smallest transistors possible in order to achieve enough voltage gain. The relationship between voltage gain and performance is illustrated in figure 3.4d for an MCML inverter. Note that in this simulation, the loading capacitance of the gate has a fanout of 3 where each fanout has a 1fF fixed interconnect capacitance plus the input capacitance of the gate under test. 2 Voltage Gain (Av) 220 1.9 Propagation Delay (tp) 210 1.8 200 1.7 190 1.6 180 Gain 1.5 Dela y (ps ) 170 1.4 160 1.3 1.2 0.5 1 Transistor Width (um) 150 0.5 1.5 1 Transistor Width (um) Figure 3.4d : Effects of Diff. Pair Transistor Widths (WA) on MCML Inverter I = 10uA, ∆V = 250mV, VDD = 2V, LRFP = 0.6um, LOADCAP = 3*(1fF + Input Cap) 26 1.5 It is evident from figure 3.4d that gain must be kept at a minimum in order to preserve performance. By changing the transistor widths from 0.5um to 1.5um, the input capacitance increases dramatically (~3x) and the performance of the gate driving this gate decreases from 155ps to 215ps. In multi-level gates, the number of differential pairs to be sized increases. The definition of voltage gain also changes in multi-level gates and we define the overall voltage gain as the gain from the worst case input combination. It is very difficult to come up with general design rules for multi-level gates due to the fact that the effects on gain are extremely inter-dependent among levels. In general, optimized gates tend to have widths increasing slightly from bottom to top. 3.4.5 PMOS Load Transistor Sizes (WRFP, LRFP) Optimizing the size of the pmos load transistors is one of the most difficult and nonlinear tasks in creating a good MCML gate. Besides the obvious area tradeoffs, the main performance criteria affected by the sizing of these transistors are the voltage gain, signal swing ratio, RFP control voltage limit, the propagation delay, and control voltage mismatch. The voltage gain is increased by increasing the length of the pmos devices (LRFP). This effect is especially strong when increasing from minimum length and therefore, non-minimum length transistors are used if possible for these devices. Non-minimum length devices also help by reducing the effects of transistor mismatch both between load devices and from the gate to the control circuitry. The ratio of the width and length (W/L) of the devices also affects several criteria. Increasing the W/L, either by increasing W or decreasing L, decreases the effective resistance of the load devices and therefore improves the propagation delay. If the width is increased, the 27 capacitance is also increased at the output node and the propagation delay may in fact stay the same. The actual effect will be heavily dependent on the amount of load capacitance. Increasing the W/L of the devices also reduces the Vdsat voltage and increases the nonlinearity of the resistance (see Appendix A). In order to preserve a reasonable signal swing ratio (SSR), it is usually necessary to use small W/L's (<1) and to accept the loss in propagation delay associated with this decision. Finally, the choice of the width and length is bounded by the minimum RFP voltage. The W/L must be kept large enough so that enough DC resistance can be achieved for a given voltage swing and current without generating Vsg voltages greater than 2.5V. All of these different design constraints lead to a very complex optimization problem. The optimization methodology will be discussed further in the next section (3.5) but in general, WRFP is kept to be minimum (0.5u) and LRFP varies from minimum (0.25u) for high currents to around 1.5u for very small currents. Some of the effects are illustrated in figure 3.4e below. 3.4.6 NMOS Current Source Transistor Sizes (WRFN, LRFN) The principal tradeoff in the selection of the current source transistor sizes is between area and robustness. It is desirable to use non-minimum length devices for the current sources both to increase the output impedance and to decrease the mismatch effects. It is also desirable to have a large W/L to decrease the Vdsat voltage and allow for further reduction in VDD. The limit on increasing both the length and width is that the area of the gate begins to grow dramatically. The current source devices used in this project were set to a limit of WRFN = 2.0u and LRFN = 0.5u in order to have reasonable area. 28 Voltage Gain (Av) Pmos load Gate-Source Voltage (Vgs) 2 3 2.5 1.8 2 1.6 1.5 V gs (V ) Gain 1.4 1.2 1 0 0.5 1 Pmos load length, LRFP (um) 0.5 1.5 0 Propagation Delay (tp) 7 100 6 80 5 Dela y (ps ) 60 0 1.5 Signal Slope Ratio (SSR) 120 40 0.5 1 Pmos load length, LRFP (um) SSR4= trf / t p 0.5 1 Pmos load length, LRFP (um) 3 1.5 0 0.5 1 Pmos load length, LRFP (um) 1.5 Figure 3.4e : Effects of PMOS load transistor length (LRFP) on MCML Inverter I = 10uA, WA = 0.5uA, VDD = 2V, ∆V = 400mV, LOADCAP = 10fF, WRFP = 0.5um 3.5 MCML Gate Optimization Procedure Now that we have explored the desired performance goals and the input parameter effects on these goals, we can try to formalize a design methodology. As mentioned earlier, the goal of this methodology is to be able to quickly optimize at the transistor level when considering system level design choices. While a general random optimization procedure (exhaustive search, simulated annealing, etc.) could be applied to this problem, we instead take the approach of trying to limit the optimization space as much as possible and then using some human intuition. The first step in the methodology is to define the limits placed on certain input parameters. These limits were discussed in the previous section but are summarized in figure 3.5a: 29 Parameter Name Limit VDD VDD < 2.5V ∆V 200mV < ∆V < 2.4V I 0.5uA < I < 100uA LA, LB, LC LA, LB, LC = 0.25um WA, WB, WC 0.5um < WA, WB, WC < 1.5um WRFP WRFP = 0.5um LRFP 0.25um < LRFP < 1.5um WRFN WRFN = 2.0um LRFN LRFN = 0.5um Figure 3.5a : Input Parameter Optimization Limits The first step in the optimization process is to initialize the VDD = 2.5V. The outer sweep variable for this optimization is the current level, I. For a number of discrete values of I ranging from 0.5uA to 100uA, we are trying to find the VDD, ∆V, WA, WB, WC, and LRFP which adheres to all of the hard design constraints and produces the smallest energy delay product. The next loop variable in the optimization is the voltage swing (∆V). For each I, we choose and fix a voltage swing (∆V). The third loop variable used is the pmos load length (LRFP). Finally, within each iteration of this loop, we find the best WA, WB, and WC so that the hard constraints explained in section 3.3 are met. If it is possible to meet these hard constraints with this selection of I, ∆V, and LRFP, we then lower VDD until the hard constraints are no longer met. Finally, we simulate the gate with all of these design parameters fixed and record the value of energy delay product. Once we sweep through all of the possible ∆V's and LRFP's for a given current, we find the set of parameters which gives the overall minimum for energy-delay. We then move on to another current level and repeat the process. This optimization procedure is illustrated in figure 3.5b: 30 for 0.5uA < I < 100uA { for 200mV < ∆V < 2.4V { Initialize VDD = 2.5V for 0.25um < LRFP < 2.5um { Find smallest WA, WB, WC which satisfy: 0.5um < WA, WB, WC < 1.5um Gain > 1.45, Current Matching Ratio > 90%, Voltage Swing Ratio > 98%, Signal Slope Ratio < 5, |VDD - RFP| voltage < 2.2V If above is possible, find smallest VDD which satisfies: Gain > 1.4, Current Matching Ratio > 90%, Voltage Swing Ratio > 98%, Signal Slope Ratio < 5 If above is possible, store parameters and ED product. } Find LRFP which gives minimum ED for given I, ∆V } Find ∆V which gives minimum ED for given I Store values of I, ∆V, VDD, WA, WB, WC, LRFP } Figure 3.5b : MCML Gate Optimization Procedure While in general the above optimization algorithm can be up to O(n6), it is usually possible to prune the design space dramatically when using human intuition. Optimization does become exponentially more difficult as the number of levels in the gate increase as well as there being an increase in simulation time. With some practice and a properly setup simulation environment, a gate can be optimized in less than ten or fifteen minutes for a wide variety of current levels. 3.6 MCML Gate Optimization Results In order to see the effects of the system level design decisions to be demonstrated chapter 5, it is useful to have a set of idealized data points as an upper bound on performance. This section 31 will show the theoretical performance limits of individual MCML gates and compare them with equivalent CMOS gates. As discussed in sections 2.1 and 3.2, the power-delay and energy-delay metrics for MCML gates are directly proportional to the logic depth of the circuitry being used. It is therefore extremely unfair to compare CMOS and MCML gates at their absolute maximum clock frequency (1/tp). Instead, we assume an optimistic yet achievable logic depth of 4 for all gates in this section. The actual performance of the MCML gates can be scaled accordingly from that depth in order to see the actual performance under real circuit conditions. The final disclaimer before displaying the results is that this optimization does not take any layout effects into account. The load capacitance value uses an estimate of 1fF of wiring capacitance per fanout and a fanout factor of 3 (4 for CMOS) identical gates. The effects of actual circuit layout will be discussed in the next chapter. The first MCML gate optimized through the above procedure was a simple inverter/buffer. The result parameters are given in figure 3.6a for several different current levels, each optimized for energy-delay product. The plots of delay, energy, and energy-delay vs. current are shown in figure 3.6b. Note that each value of current is fully optimized in all of the other parameters. The effects of fixing parameters across different current levels will be explored in chapter 5. I (uA) 0.5 1 3 6 10 30 60 100 ∆V (mV) 200 200 200 250 300 400 500 600 tp (ps) VDD (V) (W/L)A (W/L)RFP (W/L)RFN 0.80 .5/.25 .5/1.5 2.0/.5 2134 0.75 .5/.25 .5/1.0 2.0/.5 1075 0.80 .5/.25 .5/.6 2.0/.5 391 0.85 .5/.25 .5/.6 2.0/.5 247 0.90 .5/.25 .5/.6 2.0/.5 182 1.10 .85/.25 .5/.25 2.0/.5 98 1.25 1.0/.25 .6/.25 2.0/.5 68 1.35 1.25/.25 .8/.25 2.0/.4 57 Figure 3.6a : MCML Inverter Optimization Results 32 ED (pJ*ps) 7.22 3.40 1.43 1.21 1.15 1.21 1.30 1.57 Energy for logic depth = 4 Propagation Delay (tp) 1200 30 1000 25 800 20 Energy-Delay for logic depth = 4 3.5 3 2.5 600 15 Dela y (ps ) 400 2 Ener gy (fJ ) 10 Ener gy-Dela y (pJ* ps ) 200 0 0 10 1.5 5 1 10 Current, I (uA) 2 10 0 0 10 1 10 Current, I (uA) 2 10 1 0 10 1 10 Current, I (uA) Figure 3.6b : MCML Inverter Performance for Logic Depth = 4 There are several interesting things to note about the optimization results for the MCML inverter. The first is the deviation from the statement made in chapter 2.1 that there is no theoretical minimum to the energy delay product. As we can see from the figure on the right, a minimum does exist around I = 10uA. The reason that energy-delay cannot be decreased further is that the voltage gain begins to degrade for larger currents. In order to maintain the minimum gain metric, it is necessary to either increase ∆V, WA, or LRFP. We cannot actually increase LRFP because the signal slope ratio (SSR) begins to dominate and we must instead increase either WA or ∆V, both of which negatively impact the performance. Therefore, at high current levels, we reach a limit to the speed improvements but not to the energy increase and therefore, the energydelay goes up. On the low current end, the main factors hurting performance are the limits on minimum swing and minimum transistor widths. While high current levels require higher voltage swings in order to reach the minimum gain metric, low current levels do not. With low current levels, the gain metric is met easily and the higher than required swing merely hurts the performance. It 33 2 10 would also be desirable to decrease the width of the transistors in order to reduce capacitance but the process minimum widths are already achieved. With these results, we can generate some comparisons with CMOS inverters. The CMOS inverter simulated has nmos width = 0.5um and pmos width = 1.0um with a fanout of 4 identical inverters (plus interconnect capacitance). In order to compare the energy efficiency of MCML and CMOS gates, we plot the energy delay (ED) product versus the delay of the gate. In MCML gates, we vary the delay by changing the current level. In CMOS gates, we vary the delay by changing VDD. Once again, we assume a logic depth of 4 in order to allow a more fair comparison. The results are shown in figure 3.6c: Propagation Delay for CMOS Inverter (tp) 800 Energy-Delay vs. Delay for MCML and CMOS Inverters 2.5 700 CMOS MCML 600 2 500 400 Dela300 y (ps ) 1.5 Ener gy-dela y (pJ* ps ) 200 100 0 0.5 1 1.5 VDD (V) 2 1 2.5 0 200 400 600 Delay (ps) 800 1000 Figure 3.6c : MCML and CMOS Inverter Comparison We can see from these graphs a few more interesting trends. For small delays (i.e. high performance), the MCML inverter performs up to two times better than the CMOS inverter and can achieve smaller energy-delay products than even possible with CMOS circuitry. For large delays (i.e. low performance), the benefits gained by reducing VDD in CMOS circuits far 34 outweigh possible savings in MCML current reduction and CMOS is more energy efficient. This graph shows a crossover point at around 300ps which corresponds to around 5uA of current in MCML or 1.1V in CMOS. If performance requirements are greater than 300ps, then MCML is more energy efficient. It is also interesting to note that the overall performance limit is greater for MCML than for CMOS. An MCML inverter with 100uA of current achieves a propagation delay of only 57ps while a CMOS inverter running at 2.5V has a tp of 100ps. While the energy efficiency of MCML gates degrades significantly at these high performance levels, it is possible to achieve extremely high performance designs, about twice as fast as CMOS. The above graphs depend on several very important assumptions. First of all, the logic depth is assumed to be equal to 4 in the above analysis. If the logic depth is greater than 4, the MCML curve will become less energy efficient in comparison to CMOS. Also, we allow arbitrary selection of the design parameters for MCML gates. Many of these parameters will be constrained at the system level and may not be optimized. We performed a similar analysis on a wide variety of gates: NAND2, NOR2, XOR2, MUX21, NAND3, NOR3, XOR3, MUX41, Full Adder, D Flip-Flop. All of the gates show the same general trends as the inverter comparison and perform better in comparison for high performance, low depth circuits. We will not give the sizing results for all of the gates but it is illustrative to show the results for one of the larger gates, XOR3. These results for a limited set of current points are shown in figure 3.6d: I (uA) ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP (W/L)RFN tp (ps) 1 10 100 250 500 800 0.75 1.00 1.70 .5/.25 .5/.25 1.40/.25 .5/.25 .5/.25 1.45/.25 .5/.25 .5/.25 1.50/.25 .5/1.0 .5/.6 .6/.25 2.0/.5 2.0/.5 2.0/.4 2414 532 201 Figure 3.6d : MCML XOR3 Optimization Results 35 ED (pJ*ps) 17.0 10.5 25.6 As stated earlier, the general trends tend to agree with the inverter optimization. In general, larger gates require slightly larger signal swings, larger transistor sizes, and larger VDD in order to achieve the minimum robustness constraints. All of these factors hurt the performance of the gate and we can see that the XOR3 has about ten times the energy-delay as the inverter across all current points. We can also construct the energy-delay comparison curves against a transmission gate based CMOS XOR3 from the ST Microelectronics standard cell library (figure 3.6e). Energy-Delay for MCML XOR3 - logic depth = 4 Energy-Delay vs. Delay for MCML and CMOS XOR3 28 70 MCML XOR3 CMOS XOR3 26 60 24 22 50 20 40 18 16 Ener gy-Dela y (pJ* ps ) 30 Ener gy-Dela y (pJ* ps ) 14 20 12 10 0 10 1 10 Current, I (uA) 10 2 10 0 500 1000 1500 2000 Delay (ps) 2500 3000 Figure 3.6e: MCML XOR3 Comparisons The shape of both curves is relatively the same as in the inverter case, but the CMOS XOR3 is significantly worse in energy-delay than the MCML XOR3 at all performance levels. Whereas the inverter comparison gave a maximum benefit of MCML of around 2 times, the XOR3 can perform up to 6 times better. The reason for this improvement over the inverter comparison is that the CMOS XOR3 becomes much more complex than a simple inverter while the MCML XOR3 has a very similar structure. This comparison is extremely encouraging for 36 arithmetic circuits since the XOR3 function is used for sum generation in full adders. We can also see similar trends in other complex gates. In general, MCML performs better in comparison to CMOS for circuits composed of more complex gates. The above sections have demonstrated an intuition and methodology for optimizing MCML gates at the transistor level. We see from the results that MCML is most energy efficient in the range around I = 10uA. We also see that in comparison with CMOS logic, MCML is more energy efficient at higher performance levels. These results are still somewhat idealistic and rely on some impossible system level design choices. Chapter 5 will look at these system level tradeoffs and analyze their degradation of the ideal results presented. Before we look at the system level issues though, we will spend a chapter to analyze the tradeoffs and requirements for MCML gate layout. 37 Chapter 4 MCML Gate Layout 4.1 Local vs. Global Effects Now that we have devised a procedure for determining MCML gate parameters, it is important to see the effects of circuit layout on the optimization results. All layout was done in Cadence with the ST-Microelectronics, 0.25um, 6-layer process. While it is not possible to generate the layout for each data point optimized at the schematic level, we would like to compare enough gates to draw some general conclusions about performance degradation. We would also like to analyze the difference in performance degradation between MCML gates and their CMOS counterparts. The first distinction to be made in this analysis is between local and global layout effects. Local layout effects in this context are taken to be the difference in performance of a single gate with ideal surroundings when compared to its schematic level description. Examples of local effects are intra-gate device mismatch, capacitance balancing, and layout topology. In general, local wires are short and we can therefore ignore resistive effects. In contrast, global layout issues are defined as the performance degradation due to the connection of multiple gates in a larger block of logic. Examples of global layout effects are clock and control signal routing, wire resistance and buffering, load balancing, power distribution, and device matching to VSC's. This chapter deals exclusively with local layout effects and leaves the discussion of global effects until later in the report. 38 4.2 Layout Topology The second main topic of layout methodology concerns the general cell design framework. There are two primary ways of doing layout for cell based design: standard cell format and datapath format [3]. In standard cell designs, all cells typically have the same height but vary in width based upon the complexity of the gate. The inputs and outputs are left floating in a standard cell so that routing tools can send the signals in any direction. In datapath designs, the width is usually fixed for all cells in order to ensure pitch matching but the height can vary. The inputs and outputs are aligned to opposite sides of the cell and brought to the edges to enable tiling. Both of these approaches are shown in figure 4.1a: Standard Cell Approach Power GND Same height RFN Vary in width I/O Clock Control VDD I/O randomly scattered RFP Datapath Approach Power Same width Vary in height I/O regularly placed I/O Control Figure 4.1a : Alternative layout structures Both of these structures were used for MCML cells depending on the application. For this chapter, the standard cell methodology is the structure of choice because we are discussing the 39 layout of individual gates. In chapter 7, the datapath approach is used to implement the CORDIC algorithm. 4.3 Transistor Matching One of the key differences between standard CMOS layout and MCML layout is the issue of transistor matching. It is known in CMOS processing that many parameters of the transistor can vary both within a chip and between chips. These differences between transistors can degrade performance of differential circuits. There are three main places where transistor mismatch can adversely affect MCML gate performance: input differential pair mismatch, pmos load mismatch, and VSC-gate mismatch. The first two forms of mismatch, input diff. pair and pmos load, are very similar in their scope and effects. It can be shown [6] that these mismatch effects can be modeled by an input offset voltage of the differential pair. For example, the combination of the threshold voltage, gate oxide thickness, and transistor W/L mismatch may create a 50mV offset voltage for the input of a single gate. Therefore, if 200mV is applied to the input, the actual effective input voltage will be either 150mV or 250mV depending on the polarity. If 150mV is not enough for the gate to operate properly, then the mismatch has effectively destroyed the circuit. The third kind of mismatch, between the VSC and the gate, is slightly different in its effects. If the VSC does not properly model the gate being controlled, the RFP voltage will not set the load resistance correctly. In this case, the gate may not produce the correct output voltage swing and later gates will have degraded performance. Analog designers are constantly faced with the matching problem. Many techniques are known for increasing matching in differential pairs. The most general of these techniques 40 involve aligning both transistors in the same way (same current flow direction) and fingering devices. Unfortunately, these techniques have tremendous consequences in digital logic design. The primary effects of using fingering and alignment are that the area of the cells grows significantly. A secondary effect is that the performance (speed and/or power) will degrade. The area penalty is typically not a problem in analog circuits while matching is a severe problem. In digital logic, mismatch degrades noise margins but usually does not prevent operation while area and speed are extremely important. Therefore, using standard matching techniques may not be the appropriate choice for MCML. Another difficulty in using matching techniques for MCML is that process data on transistor mismatch is hard to obtain. The information provided for analog designers is typically for large, non-minimum length devices while MCML uses small, minimum length devices. It is nearly impossible to calculate the actual expected mismatch of the MCML circuits without better process information. With all of this information in mind, we would like to examine the tradeoffs in both area and performance when using and not using matching techniques. We define two types of MCML layouts: parallel and anti-parallel. These topologies are shown in figure 4.3a below. Antiparallel designs are the worst possible case for matching but are the most area efficient and have lowest capacitance. Parallel layout is better for matching but has area and performance penalties. The results of the simulations for both types of gates are shown in the next section. It is left as a topic for future work to perform a more detailed analysis as to the effects of mismatch and how it should affect the gate optimization procedure. 41 D1 D2 G1 Anti-Parallel G1 G2 G2 D1 S12 D2 S12 Parallel D1 D2 G1 G2 S12 Figure 4.3a : Parallel vs. Anti-Parallel Layout of Differential Pairs 4.4 Layout Results 4.4.1 Parallel vs. Anti-Parallel MCML Layout The first comparison to be made is the difference in area and performance of the two styles of MCML layout for basic cells. The two gates examined were the inverter and XOR3 whose parameters were derived in chapter 3. Both gates were layed out for three different current levels in both the parallel and anti-parallel styles. The layout for the I = 10uA versions are shown in figures 4.4a and 4.4b. If we compare the implementations of the parallel and anti-parallel MCML blocks, we observe that the parallel implementations are larger, as expected. Figure 4.4c gives a summary of the key simulated results. 42 Anti-Parallel Parallel Figure 4.4a : MCML Inverter/Buffer Layouts Parallel Anti-Parallel Figure 4.4b : MCML XOR3 Layouts 43 Gate I (uA) INV INV INV XOR3 XOR3 XOR3 1 10 100 1 10 100 Anti-Parallel Area (um2) 35.9 29.0 25.4 89.9 90.2 113.5 Parallel Area (um2) 43.4 36.2 37.4 115.0 115.0 139.8 Anti-Parallel Delay (ps) 1202 198 62.5 3013 662 225 Parallel Delay (ps) 1155 198 62.4 3442 753 252 Anti-Parallel ED (pJ*ps) 4.24 1.36 1.90 26.4 16.2 32.0 Parallel ED (pJ*ps) 3.92 1.35 1.90 34.5 21.0 40.4 Figure 4.4c : Anti-Parallel vs. Parallel MCML Layout Area and Performance We can see that the total area difference varies from about 20% to 50% depending on the current point but the parallel implementations are all significantly larger. From a performance perspective, the parallel implementation performs just as well or even slightly better than the anti-parallel version for the inverter but is significantly worse (12%-14% in delay) for the larger gate, XOR3. The decision on whether to use the parallel or anti-parallel cell layouts is left as an open question depending on the application and performance requirements. The above experiment indicates the approximate area and delay penalties but does not give a quantitative analysis of the matching benefits of the parallel layout. For the remainder of this chapter, we assume the use of the anti-parallel cell versions for comparison with schematics and with CMOS gates. 4.4.2 MCML and CMOS Layout vs. Schematic Now that we have seen the tradeoffs associated with MCML gate layout topologies, we would like to evaluate the performance degradation between simulations of schematics to simulations of layout. It is known that the layout simulations will produce slower gates but the important measurement is to determine the relative slowdown in comparison with the slowdown of CMOS circuits. With this knowledge, we can more accurately predict the benefits of MCML over CMOS when using schematic level simulation numbers. 44 For this analysis, we use only the anti-parallel implementations of the MCML gates. The results are shown in figures 4.4d and 4.4e: Gate INV INV INV XOR3 XOR3 XOR3 I (uA) 1 10 100 1 10 100 Area (um2) 35.9 29.0 25.4 89.9 90.2 113.5 Schematic Delay (ps) 1074 182 60.1 2402 532 200 Layout Delay (ps) 1202 198 62.5 3013 662 225 % change 11.2 10.9 4.0 25.4 24.4 12.5 Schematic ED (pJ*ps) 3.40 1.14 1.76 16.9 10.5 25.4 Layout ED (pJ*ps) 4.24 1.36 1.90 26.4 16.2 32.0 % change 24.7 19.3 8.0 56.2 54.3 26.0 Figure 4.4d: MCML Schematic vs. Layout Performance Gate INV INV INV INV XOR3 XOR3 XOR3 XOR3 VDD (V) 1.0 1.5 2.0 2.5 1.0 1.5 2.0 2.5 Area (um2) 9.6 9.6 9.6 9.6 117 117 117 117 Schematic Delay (ps) 364 161 112 92 1994 826 569 454 Layout Delay (ps) 385 169 118 97 2382 974 647 529 % change 5.8 5.0 5.4 5.4 19.4 17.9 13.7 16.5 Schematic ED (pJ*ps) 1.32 1.34 1.68 2.25 36.9 36.8 49.7 66.9 Layout ED (pJ*ps) 1.46 1.46 1.86 2.45 53.5 52.0 66.6 92.5 % change 10.6 9.0 10.7 8.9 45.0 41.3 38.2 38.3 Figure 4.4e CMOS Schematic vs. Layout Performance Several important trends can be observed from the above data. The first column to examine is the area numbers for the MCML and CMOS blocks. For different performance levels in MCML, the transistor sizes are changed in order to preserve robustness. This leads to the difference in areas for the different current levels. The CMOS gates operate with the same transistor sizes as the supply voltage is scaled. As you can see, the CMOS inverter is much smaller than the MCML inverter at any performance level. Since MCML is differential, the only inverters required will be used for buffering and therefore this large area difference is not very important. The more important area comparison is for the more complex block, the XOR3, where the MCML gate is smaller for all current levels. This fact shows that even a fully differential implementation can be more area efficient than CMOS for certain logic blocks. It is important to note that these two blocks are 45 probably extremes as far as the area comparison and that most gates will probably fall somewhere in between with CMOS being slightly smaller than MCML. The next important comparison to make is the variation in delay and energy-delay between the schematic and layout simulations. In both MCML and CMOS, the variation is larger for the more complex logic than for the simple inverter. We expect that the inverter is too small of a block to be useful for this comparison and we will only consider the XOR3. For the MCML blocks, we notice that the variation in much less at the high current level. This is due to the fact that the transistors are much larger for this point and therefore the change due to wiring capacitance is not as large of a factor in the overall capacitance. The total variation between layout and schematic is greater for MCML at the low current points and less at the high current points. Since the variations in either case are not much different than the CMOS variations, we can safely use the analytical techniques of chapter 3 at the schematic level in order to compare CMOS and MCML blocks. This verifies that the results of chapter 3 are not skewed in relation to actual layout simulations. In this chapter, we have looked at the tradeoffs in MCML cell layout techniques and have seen that MCML cells can have similar area and schematic-layout matching as their CMOS counterparts. We now have enough confidence in our optimization procedure of chapter 3 in order to begin a discussion of system level design tradeoffs in MCML. 46 Chapter 5 MCML System Level Design 5.1 MCML System Overview Chapters 3 and 4 dealt with the design of the individual gates to be used in an MCML logic block. This chapter describes the peripheral circuitry necessary to correctly operate the logic and analyzes some of the tradeoffs in the system level optimizations which can be made. The basic MCML system is shown in figure 5.1a. We can divide the MCML system into five key subsystems: the core, the current control block, the swing control block, the clock distribution network, and the conversion circuitry. Data Inputs Current Control CMOS-CML RFN1 RFNN Logic Swing Desired Swing Data Conversion MCML RFP1 Control RFPN and Core Registers CLK CMOS-CML Clock tree CML-CMOS Data Conversion Data Outputs Figure 5.1a : General MCML System Topology The MCML Core is composed of both combinational and sequential logic. It can be a pipelined datapath, a finite state machine, or any other digital logic operation required. As seen 47 in chapter 2, the best applications for MCML logic are those which have high performance and shallow logic depth. Therefore, MCML cores typically are deeply pipelined and have high clock frequencies. The swing control circuitry controls the load resistances of the MCML gates. The primary pieces of this block are the Variable Swing Controllers (VSC's) described in chapter 2. There are many tradeoffs in the design of the VSC and some of those issues will be explored in this chapter. The responsibility of the current control circuitry is to monitor the current levels used in the individual gates. There may be a single or multiple current levels and these levels may be adaptively controlled in order to fine tune performance. Two models of current control circuitry will be presented later in this chapter. Finally, the clock circuitry and conversion circuits are crucial in order to interface with a CMOS dominated world. The clock circuitry is composed of multi-level buffers and requires the ability to use multiple drive strength gates. Before we begin examining the peripheral circuitry in detail, we must take one more look at the optimization procedure for individual gates proposed in chapter 3. In order to build a realistic MCML core, it is necessary to relax some optimization parameters in order to have gates which can operate together. This generalization of the optimization procedure is the topic of the next section. 5.2 MCML Gate Parameter Generalization In chapter 3, an optimization procedure was developed which found the best parameter selection for each individual gate for a variety of current levels. The VDD, ∆V, and all transistor 48 sizes could be chosen completely independent of the selection for any other gate. Unfortunately, some of these parameters need to be shared across different gates if we want to build complex logic. The amount of generalization required will be determined by the exact logic to be implemented, but this section describes the sensitivity of the gate optimizations when certain parameter constraints are enforced. 5.2.1 Voltage Swing, ∆V The first obvious generalization to be made is that each gate should have the same voltage swing as every other gate for a given operating point. The easiest and most constraining way to generalize the voltage swing is to make sure that each gate has the same voltage swing for every possible current level. This leads to a family of gates that will work together at any current level. We will see in chapter 6 that it is sometimes desirable to have different gates running with the same voltage swing but with different current levels. We ignore this case for now and assume that all gates in the core will operate at the same current level. As seen in chapter 3, the optimal point for voltage swing is different for different size gates. The optimal swing tends to be larger for larger gates (i.e. XOR3) and smaller for smaller gates (i.e. INV). In order for all of the gates to work together, we must either use an optimal swing for one of the gates and adjust all of the other gates to this swing or use a non-optimal swing for all of the gates. Once again, we explore the different voltage swings while re-optimizing all of the other parameters. As our primary example, we will look at the Inverter, XOR2, and XOR3 blocks for a current level of I=10uA. The fully optimized statistics created by the optimization procedure in chapter 3 are shown in figure 5.2a: 49 Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP (W/L)RFN tp (ps) INV XOR2 XOR3 300 400 500 0.90 0.95 1.00 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .5/.6 .5/.6 .5/.6 2.0/.5 2.0/.5 2.0/.5 182 317 532 ED (pJ*ps) 1.15 3.58 10.5 Figure 5.2a : Fully Optimized MCML Gates for I = 10uA We now rerun the optimization procedure from chapter 3 except that we fix the voltage swing to a certain level for all of the blocks. The results are shown in figures 5.2b-d: Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP tp (ps) INV XOR2 XOR3 300 300 300 0.90 0.95 1.10 .5/.25 .6/.25 1.1/.25 .9/.25 1.15/.25 1.2/.25 .5/.6 .5/.6 .5/.6 182 325 584 ED (pJ*ps) 1.15 3.85 14.6 ED % from min 0 7.5 39.0 Figure 5.2b : Optimized MCML Gates for I = 10uA, ∆V = 300mV Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP tp (ps) INV XOR2 XOR3 400 400 400 0.90 0.95 1.10 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .75/.25 .5/.4 .5/.6 .5/.6 202 317 510 ED (pJ*ps) 1.41 3.58 11.1 ED % from min 22.6 0 5.7 Figure 5.2c : Optimized MCML Gates for I = 10uA, ∆V = 400mV Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP tp (ps) INV XOR2 XOR3 500 500 500 0.90 0.95 1.00 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .5/.6 .5/.6 .5/.6 241 357 532 ED (pJ*ps) 2.00 4.51 10.5 ED % from min 74 26.0 0 Figure 5.2d : Optimized MCML Gates for I = 10uA, ∆V = 500mV The rightmost columns of figures 5.2b-d show the percent deviation from optimal when the three types of gates are used at voltage swings other than their preferred settings. We can see that both large voltage swings for inverters and small voltage swings for XOR3 give large deviations from optimal. Also note that the area of gates using smaller than optimal swings tends to increase. The correct choice for the overall voltage swing depends heavily on the types of circuits being used. If the majority of the circuitry is composed of two-level gates (including flip-flops), then 400mV should be used. On the other hand, if the circuit uses a majority of 3-level gates (including adders), then the optimization of these gates may outweigh the degradation of the 50 other types of gates. In general, there are not many single level gates due to the complementary logic style but there may be a large load capacitance which needs buffering and hence a smaller voltage swing. The same tradeoff will exist for all current levels although for small currents, all of the desired voltage swings tend to converge to the minimum allowable swing (200mV) and the choice becomes trivial. 5.2.2 Supply Voltage, VDD The next generalization to be made is to ensure that all gates operate at the same supply voltage for a given current level. Once again, we will leave the generalization of operating at multiple current levels until later. This procedure is much simpler than the voltage swing optimization due to the fact that a large barrier to performance exists if we try to lower VDD from its optimal. Therefore, it makes sense to just use the highest VDD required by any of the gates used. In other words, use the VDD specified by the optimal point of the 3-level gate (i.e. XOR3) because these are always the highest for a given current level. This increases the power consumption of the 1 and 2-level gates but is necessary unless multiple supply voltages are used. 5.3 Voltage Swing Control Circuitry The primary responsibility of the swing control circuitry is to properly adjust the load resistance of the pmos devices by setting the RFP voltage. The main circuit block is the Variable Swing Controller (VSC) shown in chapter 2 and repeated in figure 5.3a. The VSC models a particular gate and uses an operational amplifier and negative feedback loop to set the RFP voltage. This section explores the issues of VSC design and usage. 51 VDD Vlow VDD Vlow + VDD Vlow Vlow RFP - Inputs RFN Figure 5.3a : Variable Swing Controller (VSC) The first issue to be discussed is the tradeoff between using a single or multiple VSC's. The optimizations in chapter 3 and earlier in this chapter assume the use of a VSC which is perfectly matched to the gate under test. It is possible to share VSC's across multiple types of gates but with some loss of swing precision. The amount of tolerance to this loss is dependent on the operating point and circuit block under consideration. If small voltage swings are being used, it may be necessary to use multiple VSC's in order to have finer control over the small swing. The first requirement of using a single VSC for multiple types of gates is that the pmos load transistors must be the same size for all of the gates. This can be accomplished by the same generalization procedure as in the previous section and will result in some loss of performance. Then we must choose which gate to use as the model for VSC. This analysis is very similar to the analysis done in section 5.2.1 where we could use the optimal swing conditions for any of the three gate types. We use the data points from section 5.2.1 with a swing of 400mV and VDD = 1.1V for our example. The (W/L)RFP is chosen to be .5u/.6u for this experiment. Note also that RFP voltages are allowed to be negative as long as the gate-source voltage of the pmos devices is less than 2.5V. The key results are shown in figures 5.3b-e: 52 Gate INV XOR2 XOR3 RFP Voltage (V) -.349 -.339 -.317 Actual Swing (mV) 401 399 392 Deviation in swing (%) 0.25 -0.25 -2.0 Midswing Gain 1.80 1.54 1.41 tp (ps) 205 308 510 ED (pJ*ps) 1.81 4.06 11.1 ED % from min 57.4 13.4 5.7 Figure 5.3b : MCML Gates with 3 separate VSC's (ideal) VDD = 1.1V, ∆V = 400mV, (W/L)RFP = .5u/.6u Gate INV XOR2 XOR3 RFP Voltage (V) -.349 .349 .349 Actual Swing (mV) 401 388 360 Deviation in swing (%) 0.25 -3.0 -10.0 Midswing Gain 1.80 1.51 1.33 tp (ps) 205 302 480 ED (pJ*ps) 1.81 3.90 9.84 ED % from min 57.4 8.9 -6.3 Figure 5.3c : MCML Gates using Inverter based VSC Gate INV XOR2 XOR3 RFP Voltage (V) -.339 -.339 -.339 Actual Swing (mV) 412 399 370 Deviation in swing (%) 3.0 -0.25 -7.5 Midswing Gain 1.83 1.54 1.36 tp (ps) 209 308 491 ED (pJ*ps) 1.87 4.06 10.3 ED % from min 62.6 13.4 -1.9 Figure 5.3d : MCML Gates using XOR2 based VSC Gate INV XOR2 XOR3 RFP Voltage (V) -.317 -.317 -.317 Actual Swing (mV) 440 424 392 Deviation in swing (%) 10.0 6.0 -2.0 Midswing Gain 1.90 1.60 1.41 tp (ps) 217 321 510 ED (pJ*ps) 2.02 4.41 11.1 ED % from min 75.7 23.2 5.7 Figure 5.3e : MCML Gates using XOR3 based VSC The above tables give us several interesting results. First, we can see that using a smaller VSC than ideal leads to a reduced voltage swing but an increase in performance (tp, ED). These results come at a cost - the reduction of the robustness (Gain, Noise Margins). We would like to limit the possible operating conditions to those that provide at least the desired swing. If we were to allow reduced swing for the larger gates, we would have to re-simulate all of the gates at this reduced swing level for a worst case analysis. Therefore, if we wish to use a single VSC, we will use the largest one possible and allow for larger voltage swings from the smaller gates. This will reduce our performance (up to about 20%) but will not decrease the robustness of the circuitry. 53 The total effect of the generalizations made can thus be seen in figure 5.3e. Compared with using different VDD's, different voltage swings, and different (W/L)RFP's, and multiple VSC's, we see that the Inverter loses about 76%, the XOR2 loses 23% and the XOR3 loses 6% of their ideal energy-delay values. There are also other reasons why it is desirable to use multiple VSC's; these reasons will be explored in chapter 6. Now that we have examined the tradeoff between using a single VSC and using multiple VSC's, we would like to explore the issues of designing the VSC's themselves. The main circuit block design is identical to the design of a standard MCML gate. The only important issue in designing the gate part of the VSC is to make sure that the alignment of the transistors is the same as the gates to which the RFP voltage is being broadcast. We would like the matching between the VSC and the logic gates to be as close as possible in order to achieve predictable swings. The main design challenge in the VSC is the operational amplifier. While the topic of opamp design is beyond the scope of this report, it is well covered in [6]. Instead, we will specify the constraints imposed on the op-amp by the MCML VSC design. The gain of the op-amp determines (partially) the accuracy of the voltage swing. If we assume a worst case output voltage difference, VDD - RFP = 2.5V, and we allow the low voltage point to vary by 10mV, then the gain of the op-amp must be approximately equal to 250. Lower gain op-amps can be used but the actual circuit voltage swing will vary by larger amounts. The frequency response requirements of the op-amp are fairly lax. Since the RFP signal is really a DC value, the bandwidth of the op-amp can be quite low. The only real requirement is that the op-amp and VSC feedback loop is stable. Since the RFP node typically has a large capacitance, this requirement is easily achieved. The actual capacitance on the RFP node varies 54 based upon the number and type of gates being driven by a single VSC. There may also be a frequency requirement set by the stabilization time during power-on. One major difficulty worth discussion was encountered in the design of the VSCs. If the VDD of the MCML circuits is lowered below process maximum (to save power) and the pmos loads are kept to be minimum width, then the RFP required by the feedback loop can go below 0V. The Vgs of the pmos transistors is still within the 2.5 Volts required by the process but the output of the op-amp must now generate a negative voltage. There were several possible solutions to this problem. The first solution is to increase the width of the pmos load devices and therefore require less Vgs voltage for RFP. There are two major problems with this approach. First, the larger width increases the capacitance at the output node and will slow the circuit down. Even more importantly, the increased width will decrease the Vdsat of the load device and increase the nonlinearity of the resistance. We would like the load devices to operate in the linear mode but low Vdsat values cause many non-idealities and full voltage swing times can increase by an order of magnitude. This difficulty led us to try other solutions to the negative RFP problem. The second option for generating the negative gate voltage would be to operate the op-amp between Vdd and Vss where Vss is lower than ground. For example, if the logic Vdd was 1.0V, the op-amp would be run between 1.0V and -1.5V. It would then be possible for the op-amp to generate negative voltages with no difficulty. The problem with this approach is that the bulk of the nmos devices must all be tied together because the process being used is a single n-well process. Either all of the nmos bulks are tied to 0 or they are tied to Vss. In the first case, the op-amp nmos devices will have positively biased junction diodes and will not work properly. In the second case, all of the logic transistors will 55 have larger threshold voltages and performance will suffer. Neither of these options is particularly desirable, so other solutions were examined. The next solution would be to use a negative charge pump to offset the op-amp voltage to be below ground. This also poses several analog design challenges and it was determined that there was not enough time to complete this solution within the scope of this project. The actual solution used was to assume that the process was actually a dual well process (not that uncommon with next generation processes) and to use an ideal op-amp for simulation purposes. While not a currently implementable solution, this is nevertheless the most realistic solution for this project. 5.4 Current Control Circuitry Now that we have discussed the design of the circuitry to control the voltage swing for variable current settings, let's discuss the current control circuitry. The goal of the current control circuitry is to set the RFN voltage to a desired level. As with the RFP voltage, there can be a single RFN voltage or multiple RFN voltages although using a single RFN voltage generator is typically adequate. The RFN signal determines the amount of current flowing in the current source and therefore determines the speed and power of the circuit. The simplest way to set this reference voltage is to use a current mirror [6]. Alternatively, an adaptive pipelining system can be used [2], shown in figure 5.4a: 56 Datapath CMOS-CML Converter CML-CMOS Converter CMOS-CML Converter CML-CMOS Converter Logic and Registers Inputs Outputs CMOS-CML Converter CML-CMOS Converter CMOS-CML Converter CML-CMOS Converter RFP2 VSC2 Critical Path Model RFP1 VSC1 Vlow DLL RFP3 VSC3 Phase Control Detector Clock CMOS-CML Converter Buffer RFN Figure 5.4a : Adaptively pipelined MCML system The basic principle behind an adaptive pipeline is to use a Delay Locked Loop (DLL) to measure the delay through a model of the critical path of the circuit. If the critical path delay is greater than the required clock period, then the DLL increases the RFN voltage and thereby increases the current, speed and power of the circuit. If the delay is less than the required clock period, then RFN is decreased and less current is used. Single or multiple VDC's can be used to maintain a fixed voltage swing as the current varies. Multiple DLL's could also be used if there are requirements for multiple RFN voltages to be generated. The goal of the adaptive pipelining is to make the circuit timing insensitive to process, temperature, and voltage variations. For example, if a chip comes back from fabrication and happens to be near the slow process corner, the adaptively pipelined circuit will meet the same timing requirements as the chip near the fast process corner. The difference between the chips will be in the power consumption and not the timing. 57 In a standard CMOS design methodology, designers must always design for the worst case. This leads to using VDDs which are higher than required for the nominal case and therefore increases power consumption for all designs. With adaptive pipelining, designers can design for the nominal case for delay and instead, the power will vary. If multiple chips are used on a board, the average power consumption of all the chips should approach the nominal value. This technique can also improve the yield of circuits and allow for late changes in system clock frequency. It was shown in [2] that clock skew of 20% could be easily compensated for by using a slightly altered version of adaptive pipelining. While it was not possible to design the complete adaptive pipeline circuit within the scope of this project, it is easy to make estimates of the effects of using such circuitry. In CMOS, the clock frequency of a circuit will be reported as the worst case process condition while in MCML, the frequency will be the nominal value. Please refer to [2] for more details about the individual circuits used in the adaptive pipeline. 5.5 Support for Current Variability One of the key aspects to the adaptive pipelining system described above is that the logic gates and flip-flops are able to operate at different current levels in order to adjust the performance. Earlier in this chapter, we generalized the optimization procedure so that an MCML system could use multiple gate types with a single VDD, ∆V, and a single VSC. Now we would like to add the option of using multiple current levels while still maintaining acceptable robustness. The first, and simplest model for current variability, is to support two modes: on and off. The gates can be completely optimized for a pre-determined current level and are shut off by 58 either turning off the reference current source in the current mirror design or by turning off the clock with the adaptive pipeline. Both of these schemes reduce the gate currents to zero and therefore remove all power consumption. This mode is an ideal "sleep" mode and power consumption should be negligible. Two main problems exist with the on/off technique. The first problem is that unlike static CMOS circuits, all of the MCML flip-flops and latches lose their storage when turned off. This may be acceptable if the circuit block being deactivated is a datapath element and can be flushed upon startup. An alternative scheme for sensitive data would be to use two separate RFN control circuits: one for logic and one for flip-flops. With this scheme, the logic could be turned off while the flip-flops retain their value (and burn power). A third choice would be to use a hybrid CMOS / MCML system where static CMOS storage is used during sleep mode to store the data and is reloaded back into the MCML flip flops upon start up. The second problem with the on/off technique is the ramp-up time of the system. While the actual time is highly dependent on the op-amp specifications and the capacitance of the RFN and RFP node, the time could approach thousands of clock cycles. Once this time is known, it is relatively easy to account for and the system could only be turned off during long stall periods. Once the on/off mechanism is determined, we can imagine several other possible levels of current variability. A possible system could need support for a high performance mode and a low performance mode in which two discrete current levels need to be supported. A system may require that a continuous range of current levels be supported (i.e. adaptive pipelining). The most complex system would require support for multiple, continuous current ranges. In all of these systems, the optimizations performed in chapter 3 must be relaxed so that each gate is fully operational at each possible current level. As an example, we will use the case 59 where the circuits are optimized for use at I = 10uA but we would now like to support a range around that current. We will look at 2 different ranges and the requirements on the gate optimizations in order to support them: 9uA-11uA and 5uA-20uA. For both cases, we will use the assumption that I=10uA is the most common case and optimize for it as much as possible. We reoptimize the Inverter, XOR2, and XOR3 gates using the same robustness constraints as in chapter 3. For this analysis, we allow the use of 3 separate VSC's and therefore allow different (W/L)RFP's for the different gates. The results are shown in figures 5.5a-d: Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP INV XOR2 XOR3 400 400 400 1.10 1.10 1.10 .5/.25 .5/.25 .5/.25 .5/.25 .5/.25 .75/.25 .5/.4 .5/.6 .5/.6 tp (ps) 195 308 510 ED (pJ*ps) 1.64 4.06 11.1 ED % from fixed I 0 0 0 Figure 5.5a : Optimized MCML Gates for I = 10uA (fixed) Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP INV XOR2 XOR3 400 400 400 1.20 1.20 1.20 .5/.25 .5/.25 .55/.25 .5/.25 .60/.25 .75/.25 .5/.4 .5/.6 .5/.6 tp (ps) 193 304 512 ED (pJ*ps) 1.77 4.36 12.3 ED % from fixed I 7.9 7.4 10.8 Figure 5.5b : Optimized MCML Gates for I = 10uA (9uA-11uA supported) Gate ∆V (mV) VDD (V) (W/L)A (W/L)B (W/L)C (W/L)RFP INV XOR2 XOR3 400 400 400 1.25 1.25 1.25 .5/.25 .7/.25 .85/.25 .8/.25 1.0/.25 1.25/.25 .5/.6 .5/.6 .5/.6 tp (ps) 203 354 666 ED (pJ*ps) 2.03 6.18 21.8 ED % from fixed I 23.8 52.2 96.4 Figure 5.5c : Optimized MCML Gates for I = 10uA (5uA-20uA supported) We can draw a few conclusions from the above tables. First, the penalty for supporting small current variations (~10%) is relatively small. The main penalty in this case is due to the higher VDD needed to allow the higher current point to operate properly. The penalty for the larger current variation (50%) is much larger. Not only must the VDD be increased in order to support the high current end, but the transistor sizes must be increased in order to increase the gain. The 60 96.4% penalty in energy-delay is probably not acceptable for the XOR3 gate which supports currents from 5uA to 20uA. Not only does the performance suffer from using larger current ranges, but there is a definite maximum variation around 75%. This maximum is set by the conflicting requirements of the high current and low current points on the pmos load transistor sizes. The high current point requires smaller length devices in order to generate an RFP voltage within the limits. The low current point requires a large length in order to control the Vdsat of the device and therefore the Signal Swing Ratio (SSR). The overall conclusion of this experiment is that while small current variation is relatively harmless, the model described above of having multiple current levels with the same hardware is not feasible with this technology. Adaptive pipelining is expected to produce small current variations (<20%) and is therefore an acceptable technique. The best design technique with MCML is to determine the required performance level and to optimize the gates around that current. If other performance levels are required, either design alternative MCML blocks with different voltage swings and transistor sizes or use a different technology. 5.6 Gate Drive Strength Scaling The previous sections and chapters have dealt with the optimization necessary to design a set of "minimum" sized MCML gates. If the fanout of the gate is large however, it is desirable to have gates with larger drive strengths. This is especially crucial in clock circuitry where large capacitave loads need to be driven. The general approach in CMOS circuits is to multiply all transistors by a fixed factor in order to increase the drive strength. This same approach is applied 61 to MCML circuits but careful analysis must be done in order to assure that the correct signal swing and robustness properties are maintained. In order to double the drive strength of an MCML gate, we double all of the transistor widths. If we are using a current mirror for the RFN current control, we keep the mirror device the same size while we double the size of the RFN device. This will generate twice the current of the reference source in the MCML gate. Since all devices' (W/L)'s are doubled and the current is doubled, all of the Vdsats and voltages should remain constant. We simulated the results of this scaling operation on the Inverter for several different power strengths. The base sizes and parameters are those used in figure 5.5b above. The results are shown in figure 5.6a: Scaling Factor, N 1 2 4 16 Gate Current, I (uA) 9.853 18.94 37.16 146.6 Deviation in I from ideal (%) -1.5 -5.3 -7.1 -8.4 Actual ∆V (mV) 401 345 317 298 Change in ∆V (%) -14.0 -20.9 -25.7 Gain 1.62 1.51 1.44 1.40 tp with scaled CL (ps) 193 191 187 184 Figure 5.6a : MCML Inverter Scaling, all device sizes scaled by N We can see several interesting things from this experiment. First, the current scales fairly well for even large drive strength devices (<9% deviation). Unfortunately, the characteristics associated with the pmos load devices do not scale as well. Both the gain and voltage swing metrics are degraded substantially by scaling all of the transistor sizes equally. Fortunately, we can easily adjust for this by fine tuning the pmos load device sizes for each drive strength. We can also adjust for the current variation by fine tuning the RFN device widths. The result from this fine tuning is shown in figure 5.6b: Scaling Factor, N 1 2 4 16 RFP Scaling Factor 1 1.9 3.9 15.2 RFN Scaling Factor 1 2.075 4.25 17.2 Gate Current, I (uA) 9.853 19.6 39.4 157.4 Actual ∆V (mV) 401 448 388 399 Change in ∆V (%) 11.7 -3.2 -0.5 Gain 1.62 1.68 1.59 1.62 Figure 5.6b : MCML Inverter Scaling, pmos devices fine tuned 62 tp with scaled CL (ps) 193 210 198 202 We can see that by making these simple adjustments, the current and voltage swing track appropriately. This same analysis can be done with any gate type and fine tuning the RFN and RFP transistors will lead to a wide variety of accurate drive strength devices. 5.7 Conversion Circuitry The final system level issue to be discussed is the conversion of MCML logic to CMOS logic and vice versa. This ability is necessary for systems which use an MCML core surrounded by standard CMOS logic. The conversion from CMOS to MCML is a trivial operation. Since all MCML gate can operate correctly with larger differential inputs than required, a simple CMOS inverter and MCML inverter can be used to generate the proper MCML signal. The conversion from MCML to CMOS is slightly more complicated but can be performed by using a differential to single ended amplifier [6] and a CMOS inverter. The amplifier only requires enough gain to amplify the voltage swing beyond the switching threshold of the inverter. Even with a minimum input voltage swing of 200mV, the amplifier only needs a gain of 10 for all possible operation points. Since the output load is also small, the speed of the amplifier can be good for even small source currents. The two conversion circuits are shown in figure 5.7a. 63 RFP CMOS - CML CMOS In CMOS Inverter MCML Out + - RFN CMOS Out MCML In + - CML - CMOS RFN Figure 5.7a : Conversion Circuits 64 Chapter 6 System Design Example : Ripple Adders 6.1 MCML Full Adder design In order to see the overall effects of the optimizations and guidelines of the previous chapters, we would like to design some more complex blocks of logic in MCML. The first such example will be a case study of ripple adder design. Please refer to [3] for a more detailed analysis of ripple adder architecture. The building block of the ripple adder is the full adder block. The logic equations for a full adder are well known to be an XOR3 for the sum and a 3 input majority vote for the carry [3]. In nearly all CMOS ripple adders, an optimization is made which pre-computes the propagate and generate signals in order to speed up the carry path [3]. There is a similar possibility for MCML adders and 2 implementations can be imagined. The two possibilities are shown in Figs. 6.1a and 6.1b below. The CMOS adder used for comparison in this section uses transmission gates and is generally minimum sized and optimized for low power. While the second MCML full adder will have a lower carry delay, it is not clear whether this improvement will compensate for the additional current required due to there now being 4 gates instead of 2. In fact, for small adders (< 16 bits), it is more efficient to use the first full adder implementation and this is the circuit used for the CORDIC. Both of these architectures will be used and analyzed in the next sections. 65 RFP RFP RFP RFP Cout SUM SUM Cout A A A B B A A B Cin Cin Cin A A A B Cin Cin Cin A B B RFN RFN Sum Circuit (XOR3) Carry Circuit (AB+BC+AC) Figure 6.1a : MCML Full Adder #1 RFP RFP G RFP RFP P P G B B B B B A A A A RFN RFN Propagate Circuit (P=A B) Generate Circuit (G=AB) RFP RFP Cout Cout Cin RFP RFP Sum Sum Cin Cin Cin Cin P P G P P G RFN RFN Sum Circuit (P Cin) Carry Circuit (G + PCin) Figure 6.1b : MCML Full Adder #2 66 6.2 Basic Ripple Adder Design Now that we have defined the two alternative adder structures in MCML, we would like to compare the two architectures against each other and against the equivalent CMOS adder. For this experiment, we designed several different size ripple adders ranging from 4 bits to 32 bits and measured the delay and power consumption for both the MCML and CMOS implementations. The same experiment could be done at any performance level but we limited the scope to a single current point near the optimal for MCML at I = 10uA for all of the "gates" inside of the full adders. We use the same generalization techniques mentioned in chapter 5 to fix the voltage swing and VDD across gates and we allow for the use of two VSC's for better swing control. The first comparison to be made is the relative performance of the two different full adder architectures for varying ripple adder bit length. The key graphs of delay, power-delay, and energy-delay vs. bit length are shown in figure 6.2a. In chapter 2, it was shown that the delay of a chain of MCML gates is proportional to the number of gates, N. It was also shown that the power-delay is proportional to N2 and the energy-delay is proportional to N3. In order to demonstrate these properties, the quantities in figure 6.2b are normalized by these respective bit length factors. 67 Delay vs. Number of Bits Power-Delay vs. Number of Bits 7 6 Full Adder #1 Full Adder #2 6 Full Adder #1 Full Adder #2 5 5 Energy-Delay vs. Number of Bits 35 Full Adder #1 Full Adder #2 30 25 4 4 20 3 3 t p (ns ) 15 2 Power-Dela y (pJ ) 2 1 1 0 10 Ener gy-Dela y (pJ*ns ) 0 10 20 30 0 5 0 10 20 N 30 0 0 10 N 20 30 N Figure 6.2a : MCML Ripple Adder Performance Normalized Delay 400 Normalized Power-Delay 10 Full Adder #1 Full Adder #2 Full Adder #1 Full Adder #2 350 Normalized Energy-Delay 2 Full Adder #1 Full Adder #2 8 300 250 1.5 6 200 1 4 150 Normalized t p (ps ) 100 Normalized P-D (f J ) 2 0.5 Normalized E-D (pJ* ps ) 50 0 0 10 20 30 0 0 10 20 N 30 0 0 10 N 20 30 N Figure 6.2b : MCML Ripple Adder Performance (Normalized) We can see several interesting trends from the above graphs. First of all, the proportionality to logic depth shown in Chapter 2 seems accurate, especially for larger N. The reason for the larger delay at low N is that the slower generate and sum functions become a more important factor in proportion to the faster carry circuitry. The determination of the better full adder topology depends on the desired goal. In general, the second full adder is faster but consumes about twice the power of the first full adder. Therefore, the power-delay is lower for the first full adder for all N. The energy-delay product however is smaller for the first adder for small N and larger for large N. If a large ripple adder is 68 required and speed is the primary objective, then the second adder topology should be used. If the power efficiency is more important than absolute performance, the first adder topology is better. The other difference between the adders is that the first adder is significantly more area efficient than the second one. Since high performance designs tend to use non-ripple architectures (bypass, lookahead, etc.), the first topology is more desirable in almost all cases. For the remainder of this report, only the first full adder topology will be analyzed. Now that the two alternative MCML full adders have been analyzed and compared, we would like to compare the performance of the MCML adders to traditional CMOS ripple adders. The VDD for the CMOS adders was set at 1.9V to most closely model the performance of the MCML adders with I=10uA. The power measurements for the CMOS adders are found using randomly generated input vectors while the delay is worst case. The comparison results are shown in figure 6.2c: Delay Power-Delay 1 7 10 MCML Adder #1 CMOS Adder 6 MCML Adder #1 CMOS Adder MCML Adder #1 CMOS Adder 1 5 10 0 4 10 3 ) t p (ns 0 10 Power-Dela y (pJ ) 2 Ener gy-Dela y (pJ*ns ) -1 1 0 Energy-Delay 2 10 10 0 10 20 N 30 -1 10 0 10 20 30 0 10 N 20 30 N Figure 6.2c : Comparison of MCML and CMOS Ripple Adders The above results demonstrate the effects of logic depth on MCML performance. For small N, the MCML Adder is superior in both power-delay and energy-delay while it is inferior for larger N. For N = 4, the MCML adder has 35% of the power-delay and 48% of the energy-delay 69 of the CMOS adder. For N = 32, it has 225% of both power-delay and energy-delay as the CMOS adder. This result verifies the conclusion from chapter 2 that MCML is most applicable to circuits with shallow logic depth. 6.3 Modified MCML Ripple Adders with Current Ratio Adjustment The previous section validated the theoretical results of chapter 2 that MCML circuits with moderate to high currents and shallow logic depth are more energy-efficient that equivalent CMOS circuits. We now would like to add one more design optimization to the MCML ripple adders which increase energy efficiency. The optimization to be discussed in this section is that of Current Ratio Adjustment (CRA) and it will first be generalized to non-ripple adder architectures. The basic principle behind CRA is to optimize energy efficiency by reducing the performance (and therefore power) in non-critical paths of logic circuits. This is accomplished in MCML circuits by using different amounts of current in the gates of different logical paths. As long as the critical path timing is maintained, the extra power reduction will not affect the system performance. In a system with perfect CRA, all logic paths will have exactly the same delay and overall power will be utilized in the most efficient manner possible. An analogy can be made and applied to static CMOS circuitry. The equivalent operation to CRA in CMOS would be to use a different supply voltage, VDD, for gates in different paths. This is extremely difficult to do in practice due to the difficulty in maintaining multiple supply networks. In MCML, the use of CRA requires minimal change of the core circuitry and only requires the use of multiple VSC's - a practice already supported by previous analysis. In circuits 70 which require tight swing control and therefore already require multiple VSC's, the addition of CRA is completely free of cost. The general idea of CRA is presented in figure 6.3a below. tp = T Pwr = P Input Output tp = T Pwr = P tp = T Pwr = P tp = T Pwr = P tp = T Pwr = P tp = T Pwr = P tp = T Pwr = P No CRA - Total Delay = 3T, Total Power = 7P tp = 3T Pwr = P/3 Input tp = 3T/2 Pwr = 2P/3 tp = T Pwr = P Output tp = 3T/2 Pwr = 2P/3 tp = T Pwr = P tp = T Pwr = P tp = T Pwr = P With CRA - Total Delay = 3T, Total Power = 5.67P Figure 6.3a : Example of Current Ratio Adjustment The application of CRA to an MCML ripple adder is a fairly straightforward procedure. Let Ic be the current in the carry gate and Is be the current in the sum gate. Also let N be the number of bits of the adder. If we assume a linear relationship between the current and delay, we can easily write that: Tp tot = tp abtoc + ( N − 2) × tp ctoc + tpctos Tp tot = k k1 k + ( N − 2) × 2 + 3 Ic Ic Is where tpabtoc is the input to carry delay, tpctoc is the carry to carry delay, and tpctos is the carry to sum delay for a full adder. For simplicity, assume k1=k2=k3=k. When no CRA optimization is done, then Ic = Is = I and, 71 Tptot = k ×N I (No CRA) Now, let I c = x × I and I s = y × I . Then we can write that, Tp tot = k ( N − 1) 1 × + (With CRA) I x y We can also write the expressions for power, power-delay, and energy-delay: P = N × I × ( x + y )× VDD ( N − 1) 1 PD = k × N × VDD × ( x + y )× + y x ( N − 1) 1 k2 ED = × N × VDD × ( x + y )× + I y x 2 If we set the delay with CRA equal to the delay without CRA, we can solve for x and y which gives the minimum energy-delay. Figure 6.3b gives the results of this first order optimization: N xmin ymin xmin/ymin ED with CRA / ED without CRA 4 1.18 .686 1.72 0.933 8 1.21 .452 2.68 0.830 16 1.18 .304 3.88 0.742 32 1.14 .208 5.48 0.674 Figure 6.3b : Theoretical Results of Current Ratio Adjustment for optimal Energy-Delay We can see from the above table that the potential savings in energy-delay increase from 7% for N=4 to 33% for N=32. The most important number from the above analysis is the ratio, xmin/ymin. This ratio is the amount of current scaling which should be done between the carry and sum circuits for optimal energy-delay product. For example, for a 32 bit adder, the current in the carry path should be 5.48 times larger than the current in the sum path for optimal energy-delay. In general, this ratio is approximately proportional to N . We can also examine the sensitivity of the optimization to this ratio by plotting the energydelay as a function of xmin/ymin in figure 6.3c. From this plot, we can see that the sensitivity 72 around the optimal point is not very great. For N = 32, the deviation from optimal is less than 10% for a ratio range of 2.6-12.8 (optimal = 5.48) and less than 5% for a range of 3.3-10.2. Therefore, the actual ratio used only needs to be somewhat close to the optimal but it is not necessary to be extremely accurate. Normalized Energy-Delay vs. Current Ratio 1.3 1.2 N=4 1.1 N=8 1 0.9 N = 16 0.8 Normalized Ener gy Dela y 0.7 N = 32 0.6 1 2 3 4 5 6 7 Current Ratio = xmin/ymin 8 9 10 Figure 6.3c : Normalized Energy-delay vs. Current Ratio The last step in the CRA optimization analysis is to compare the theoretical results with actual simulations. The ripple adders from section 6.2 were redesigned using the x and y ratios from figure 6.2b. The gates were reoptimized in order to achieve the necessary constraints from chapter 3. The results from this experiment are presented in figure 6.4d: 73 Delay 8 Power-Delay 1 10 Basic MCML MCML w CRA CMOS 7 6 Energy-Delay 2 10 Basic MCML MCML w CRA CMOS Theoretical CRA Basic MCML MCML w CRA CMOS Theoretical CRA 1 10 5 0 10 4 0 t p (ns 3 ) 10 Power-Dela y (pJ ) 2 10 1 0 Ener gy-Dela y (pJ*ns ) -1 0 10 20 Number of Bits, N 30 -1 10 0 10 20 Number of Bits, N 30 0 10 20 Number of Bits, N 30 Figure 6.3d : Current Ratio Adjustment Results While the actual implementations of CRA are better in power efficiency than non-CRA circuits, they are less efficient than the theoretical CRA results. The primary reason for this variation is that the performance of MCML circuits does not vary linearly with current for large current deviations. When re-optimization is required, the smaller current gates have their performance reduced in a greater than linear amount and hence, the theoretical results are not achieved. Even with this difficulty, Current Ratio Adjustment is a powerful technique that can significantly improve energy efficiency in MCML circuits. This chapter has compared CMOS and MCML ripple adder performance. As predicted in chapter 2, the additional logic depth of large ripple structures makes MCML an unattractive choice. On the other hand, MCML adders perform significantly better for smaller numbers of bits. The addition of current rationing in the adders improves the adder performance of MCML even more. While only applied to ripple adders in this chapter, CRA can be applied to any general logic circuit and can increase energy efficiency dramatically. 74 Chapter 7 System Design Example : CORDIC 7.1 CORDIC Algorithm In order to test many of the optimizations and analysis developed earlier, we felt it was necessary to design a complex block of logic using MCML. The target block of logic chosen was a pipelined CORDIC. The basic CORDIC algorithm is used for computing angles of vectors and for rotating vectors [7]. The vectoring algorithm iteratively computes the angle of a vector using the following equations: X i = X i −1 + σ i −1 × 2 − i × Yi −1 Yi = Yi −1 − σ i −1 × 2 − i × X i −1 σ i = sign(Yi ) ∈ (1,−1) After N iterations, where N is the bit width of X and Y, the vector (X0,Y0) will be rotated onto the X-axis and the angle of the vector will be equal to: π N −1 θ = × ∑ σ i 2 −(i +1) 2 i =0 The algorithm can also be used to rotate a vector by a fixed angle. This is accomplished with the same equations as above but the sign bits are not calculated at each stage. Instead, the sign bits are stored and used from a previous angle calculation. In hardware, this added functionality translates into a small amount of control circuitry. One algorithmic difficulty in the vector rotation scheme is that while the vector is rotated by the proper angle, the magnitude of the vector is not preserved. 75 In order to preserve the magnitude after rotation, several magnitude scaling iterations are added to the algorithm in between rotations [7]. These scaling factors are broken down into multiplication by quantities of the form (1 - 2-x) which are implemented by a shift and subtract in hardware. 7.2 CORDIC Architecture In our implementation of the CORDIC, we decided to pipeline every iteration, both rotating and scaling. For 8 bits of precision, there were a total of 14 pipeline stages, 8 for rotation and 6 for scaling. While this pipelining will introduce significant latency, the targeted applications are throughput dominated and should be able to tolerate the extra latency. It is also important to remember that an unpipelined CORDIC will have large logic depth and would not be a good candidate for MCML logic. Another important feature to note is that additional bits are required in order to maintain precision during rotation. Three extra bits are used to reduce truncation error and a single bit is used to prevent overflow before scaling. The total bit width of the stages is 12 bits for an 8 bit input and output. A basic rotation stage of the CORDIC is shown in Fig. 7.2a. Note that the outputs from the XNOR and XOR blocks are actually shifted right in relation to the other inputs to the adder. The amount of shifting depends on the rotation number but is hardwired and requires no extra logic. The critical path begins with the signi-1 bit, goes through a buffer, an xnor, a 12 bit adder, a 2 input mux, and to the signi output. Scaling iterations are entirely subtractions and have a shorter critical path than all rotation iterations. 76 Modei-1 Signi-1 MUX Modei Signi Xi-1 Register Yi-1 Register XNOR XOR + + Xi Register Yi Register Figure 7.2a : CORDIC Pipeline Stage 7.3 Circuit Optimization The gates in each bit pipeline stage are implemented using the Current Ratio Adjustment (CRA) optimization discussed in the previous section. While a fully detailed optimization was not performed, approximations to optimize energy-efficiency were made and implemented. All of the gate currents were fixed at some multiple of a base current, I. There are 4 different VSC's used: one for the carry circuits, one for the sum circuits, one for all two level gates (FFs, XNOR, XOR, MUX), and one for the buffers. The carry circuits in the ripple adders were most common in the critical path and were therefore given 2I current. The XNOR, XOR, MUX, and flip flops all use the same VSC's and are given I current. The sum circuits of the adder have very little contribution to the critical path and are therefore given 0.5I current. Three different values of I were chosen in order to achieve high, medium, and low performance levels: 10uA, 5uA, and 2.5uA. The circuits were completely optimized for these current points (including CRA scaling) using the procedure developed in chapter 3 and modified in chapter 5. For the high performance CORDIC, more current is used throughout to reduce 77 delay. As a result of the increased current, slightly larger transistors, higher VDD, and higher voltage swing are required to maintain the desired DC properties. In the low performance mode, small currents are used and it can therefore utilize reduced voltage swing and VDD. The CMOS equivalent design is not optimized for different performance levels but it was rather simulated at 4 different values of VDD. While the design of a full adaptive pipelining system was beyond the scope of this project, we can estimate the effects of its use. The non-adaptively pipelined results use the worst case clock frequency and the nominal power consumption. The worst case clock frequency assumes worst case process corner and +/-10% variation in VDD for CMOS. Since the MCML circuits consume a constant amount of current, the variation on VDD will be much smaller and is assumed to be negligible but the worst case process corner is still used. The adaptively pipelined results use the nominal clock frequency and power consumption. All clock frequencies used have a 10% margin over the total critical path delay. 7.4 Results Now that the algorithm, architecture, and circuits of the CORDIC system are defined, we will look at the simulation results. In order to verify the effects of wiring capacitance and other layout effects, the critical path pipeline stage was layed out for both the CMOS and one of the MCML CORDIC data points. We simulated the critical path delay for both the schematic and extracted versions from this single pipeline stage and extrapolate some of the results to the entire CORDIC. All of the estimated values are denoted with an asterisk but are believed to be very close to actual values. A subject for future work would be to complete the layout of the entire 78 CORDIC block for both the CMOS and 3 different MCML implementations. The simulated and predicted results are summarized in Figs. 7.3a-c: Nominal VDD (V) 2.5 2.0 Schematic W.C. Clock Freq. (MHz) 235 180 Extracted W.C. Clock Freq. (MHz) 140 105 Schematic Power (mW) 21.3 9.80 Extracted Power (mW) 21.5* 9.94* Schematic ED (pJ*ns) 311 247 Extracted ED (pJ*ns) 904 724 Figure 7.3a : CMOS CORDIC Results 1.5 110 65 3.16 3.33* 212 647 1.0 35 20 0.42 0.45* 248 787 Performance Level High Med. Low VDD (V) 1.1 1.05 1.0 Voltage Swing (V) 0.4 0.35 0.3 Schematic W.C. Clock Freq. (MHz) 280 195 120 Extracted W.C. Clock Freq. (MHz) 95 60 35 # # Schematic Power (mW) 18.6 9.00 4.33# # # Extracted Power (mW) 18.6 9.00 4.33# Schematic ED (pJ*ns) 192 195 242 Extracted ED (pJ*ns) 1682 1908 2492 Figure 7.3b : MCML CORDIC Results - No Adaptive Pipelining Performance Level High Med. VDD (V) 1.1 1.05 Voltage Swing (V) 0.4 0.35 Schematic Nom. Clock Freq. (MHz) 310 205 Extracted Nom. Clock Freq. (MHz) 105 65 Schematic Power (mW) 18.6# 9.00# Extracted Power (mW) 18.6# 9.00# Schematic ED (pJ*ns) 155 172 Extracted ED (pJ*ns) 1379 1699 Figure 7.3c : MCML CORDIC Results - With Adaptive Pipelining Low 1.0 0.3 125 40 4.33# 4.33# 221 2269 * - Result is extrapolated to whole design from single pipeline stage # - Result does not include power due to peripheral control circuitry (VSC's, DLL's, etc.) There are several important things to notice about the final data. The first thing to notice is that for the transistor level schematics, the MCML design has superior energy-delay properties over almost the entire range of performance levels. The MCML design can operate at a faster frequency than even possible than with CMOS at 2.5V and can do so with lower power 79 consumption. The addition of the adaptive pipelining allows operation at nominal instead of worst case clock frequency and increases the benefit of MCML over CMOS. Unfortunately, the delay scaling is not equivalent for the layout of the CMOS and MCML designs. For the CMOS designs, the extracted layout which includes wire capacitances is about 1.7 times slower than the transistor only schematic. This scaling factor increases to between 2.9 and 3.2 times for the MCML design. The effect of these two different scaling factors is that the MCML layout is no longer as efficient in energy-delay as the CMOS design for any design point. While the actual factors causing this scaling difference are unknown, we speculate that the primary reason is that the MCML design is much more susceptible to Miller effect, crosscoupling capacitance between parallel wires. In fact, all of the signals in the MCML design are complementary and are run more or less in parallel so that this additional capacitance may be quite large. A topic of future work would be to re-analyze the layout strategies employed and to attempt to decrease this cross-capacitance by increasing wire spacing or using different metal layers or routing topology. It is my opinion that this scaling factor could be reduce significantly with more attention given to routing. Besides from delay and power, we would also like to consider the area and current switching properties of the two logic styles. Since only a single pipeline stage was layed out for both styles of the CORDIC, we cannot compare total areas but we can compare the area of these stages. We can also look at the simulated results of the power supply current and report the change due to switching activity. Since the areas are all very similar between design points and the supply switching properties are more important for the high performance designs, we only report those results here. These results are summarized in figure 7.3d: 80 MCML CMOS Area (um ) 11,200 9,000 Nominal Supply Current (mA) 1.65 0 Maximum Supply Current (mA) 1.69 22 Minimum Supply Current (mA) 1.61 -4 Figure 7.3d : Area and Supply Current for CORDIC Pipeline Stage 2 As you can see, the supply current variation of the MCML design is superior than the CMOS design. For mixed signal communications systems, this property is extremely important to allow high precision analog circuit operation. The area of the CORDIC pipeline stage is about 25% bigger for the MCML design and may or may not be an acceptable price to pay for low noise computation. This chapter has applied the analysis and design techniques of the previous chapters to the implementation of the CORDIC algorithm. We have seen that even with the use of adaptive pipelining and current ratio adjustment (CRA), the extracted MCML design is less power efficient than the equivalent CMOS design. The primary reasons for this result are that the layout of the MCML CORDIC was not done with careful consideration of cross-coupling capacitance and that the logic depth of the circuitry is larger than ideal. In other functions with shallower logic depth and more efficient routing, the MCML designs may in fact be more power efficient than the equivalent CMOS function. 81 Chapter 8 Conclusions 8.1 Summary This report has been an in depth analysis of the benefits and pitfalls of using MOS Current Mode Logic. We have analyzed the transistor level behavior of MCML circuits and compared their properties to those of standard CMOS circuits. In chapter 3, a design algorithm was proposed for MCML gates and many of the design constraints were analyzed and explained. The algorithm was used to design larger pieces of logic and several global design optimizations such as adaptive pipelining and current ratio adjustment were proposed. We have seen that under ideal circuit conditions, MCML can be much more energy efficient than equivalent CMOS circuits and also present significantly less noise on the supply network. 8.2 Future Work While this report attempted to analyze many different facets of MCML operation, there were several areas which still require more research. The first goal of any future work would be to automate the design process for MCML gates using the algorithm developed in chapter 3. This automation would greatly reduce the time of implementing larger logic functions. Along the same lines, use of MCML gates should be incorporated into a standard digital logic design flow including synthesis, placement, and routing. The main challenges in this arena would be the 82 characterization of the MCML gates for the synthesis tool and the implementation of differential routing. The second main area of future work would be in the implementation of the control logic presented in chapter 5. While many of the effects of adaptive pipelining are estimated in this report, it would be nice to actually implement the adaptive pipeline circuitry and measure its area, power consumption, and tracking properties. Also included in this future work would be the design of the VSCs and closer measurements of opamp requirements. A third area for future research would be a much more detailed analysis of the effects of noise on circuit performance and design. The MCML gate design procedure should be modified to incorporate maximum noise restrictions and more modeling of the noise sources and immunity in MCML circuits is necessary for truly robust design. Finally, the design of different large circuit blocks would give a better picture as to the true effects of global routing and system issues. Further work is needed to optimize the differential routing problem noticed in the CORDIC design. Other issues such as clock distribution and CMOS interfacing need to also be addressed. 83 Appendix A Derivation of Ideal MCML Gate Performance A.1 Goal The basic goal of the following analysis is to understand the effects of transistor linearity on MCML gate performance and power. The key results will be that a completely ideal MCML gate has constant power consumption during switching and that non-ideal transistor behavior is responsible for finite dI/dt effects. We begin by an analysis of a completely ideal MCML gate. A.2 MCML Gate with Ideal Load The first step in the analysis is to derive the transient properties of a MCML circuit with ideal current source, ideal switch, and ideal load. This device is shown in figure A.2a: R R VR VL Out Out t=0 C C I Figure A.2a : Basic MCML Gate 84 We assume in this analysis that the output loads are symmetrical and the pull down network is an ideal switch which moves from left to right at time t = 0. Now we model the left and right sides as separate first order RC networks: Right Side Left Side R R VDD + IL - VR VL + C - VDD IR C I The DC conditions at t = 0- and t = ∞ are as follows: DC Conditions VL(0-) = VDD - IR VL(∞) = VDD VR(0-) = VDD VR(∞) = VDD - IR With these models, we can easily solve the first order differential equations and achieve the following transient responses: VL(t) = VDD - IRe-t/RC VR(t) = VDD - IR(1 - e-t/RC) Vdiff(t) = VR(t) - VL(t) = IR(2e-t/RC - 1) IL(t) = (VDD - VL(t))/R = Ie-t/RC IR(t) = (VDD - VR(t))/R = I(1 - e-t/RC) IVDD = IL(t) + IR(t) = I 85 There are two important things to notice from the above equations. First, the differential output voltage is a simple first order RC response. We can solve this response at the 10%, 50%, and 90% points to predict propagation delay and rise and fall times. Doing this gives us a value for the Signal Slope Ratio, SSR, (see chapter 3) that we can use to compare against. The SSR value from a first order RC system works out to equal 3.17. The second important result of the above transient equations is that the current from the voltage supply, VDD, is a constant. Since the current is not a function of the switching activity, switching speed, capacitance, resistance, etc., the amount of power consumption is always the same and hence the sum of the dynamic and static power must be constant. The results of this fact is that dI/dt = 0 and no switching noise exists on the power lines. A.3 MCML Gate with Non-Ideal Load Now we would like to perform the same analysis as above except that we allow load resistances to be non-ideal in order to more closely model the performance of a pmos transistor. Using a level one model of the transistor, we will assume that the drain current has two modes, linear and saturation, with a definite transition point at Vdsat. We use a voltage controlled current source instead of a simple load resistance as shown in figure A.3a. We model the pmos device with an I-V characteristic as shown in figure A.3b. Now we would like to analyze the transient behavior of this gate while still assuming a perfect switch and ideal current sources, capacitors, etc. We begin by looking at the DC properties of the load resistances. When we define a voltage swing, ∆V, and a current level, I, for our MCML gates, we are effectively setting the DC resistance of the load device. This DC resistance is set by adjusting the Vgs of the devices with the VSC. 86 VR VL Out Isd IR = f{VR} IL = f{V L} Linear Region Saturation Region Out Isat t=0 C Isd = I C sat + k 2(V sd - V dsat) Isd = k 1V sd I Vdsat Figure A.3a : Basic MCML Gate with non-ideal loads Vsd Figure A.3b : PMOS load level 1 model At time t = 0-, the switch is assumed to be on the left side. The current flowing in the left load device must be equal to the current in the current source, I, in order for equilibrium to exist. We also know that the Vsd of the pmos device is equal to ∆V. Therefore, VL = VDD - ∆V and IL = I. The right side has no current flowing and therefore, VR = VDD and IR = 0. At t = ∞, the opposite is true. If ∆V < Vdsat, then the transistor is in the linear region for the entire range of switching voltages. This case is identical to having a resistor R1 = 1/k1 and the analysis is the same as in section A.2. We are only interested in the case where ∆V > Vdsat and the device travels through both regions of operation during switching. In this case, the left side will begin in the saturation region until VL = VDD - Vdsat and then continue in the linear region until VL = VDD. The right side begins in the linear region until VR is pulled down to VDD - Vdsat and then it enters saturation until it reaches its final value of VDD - ∆V. Note that the slopes of the I-V curve in the two regions can also be expressed by the resistances, R1 and R2 which are the inverses of k1 and k2 respectively. 87 We can now define four separate phases to analyze and solve the transient response. The following solutions come directly from solving the first order differential equations associated with each region of operation: Phase #1: Left side, VL starts at VDD - ∆V and ends at VDD - Vdsat at time = t1, load in sat region VL (t ) = VDD t − R2 C , − ∆V + IR2 1 − e t < t1 IR2 t1 = R2 C × ln Vdsat − ∆V + IR2 Phase #2: Left side, VL starts at VDD - Vdsat and approaches VDD, load in linear region VL (t ) = VDD − Vdsat × e (t −t1 ) − R1C , t > t1 Phase #3: Right side, VR starts at VDD and ends at VDD - Vdsat at time = t2, load in linear region VR (t ) = VDD t − − IR1 1 − e R1C , t < t2 IR1 t 2 = R1C × ln IR1 − Vdsat Phase #4: Right side, VR starts at VDD - Vdsat and approaches VDD - ∆V, load in sat region VR (t ) = VDD − ∆V + (∆V − Vdsat )× e (t − t 2 ) − R2C , t > t2 We now make one assumptions and calculate the differential transient response. The necessary assumption is that t2 > t1. This will be true in most cases except when Vdsat is very 88 (< .5 * ∆V). With this assumption, the equation for the differential output voltage can be small written: t t − − R1C R2 C Vdiff (t ) = ∆V − IR1 1 − e , − IR2 1 − e Vdiff (t ) = Vdsat × e t −t − 1 R1C t < t1,t2 t − RC − IR1 1 − e 1 , Vdiff (t ) = − ∆V + Vdsat × e t −t − 1 R1C t1 < t < t2 + (∆V − Vdsat ) × e t −t 2 − R2 C , t > t1,t2 In order to see the effects of nonlinearity on the output waveform shape, we can graph the response for a few different values of Vdsat and Isat. Figure A.3c has I = 10uA, ∆V = 0.4V, and C = 10fF: Differential Transient Response of Non-linear load MCML circuit 0.4 0.3 0.2 0.1 0 Vdiff (V ) -0.1 Vdsat=.35 Isat=9u -0.2 -0.3 -0.4 Vdsat=.30 Isat=9u Vdsat=.30 Isat=9.9u Vdsat=.25 Isat=9u Vdsat=.35 Isat=9.9u Perfect Linear 0 500 1000 1500 2000 2500 Time (ps) 3000 3500 4000 4500 5000 Figure A.3c : Transient Response of Non-linear load MCML circuits The important things to note from the above graph are that as either the Vdsat decreases and the Isat increases, the nonlinearity of the curve become more pronounced. For the case when Vdsat 89 = 0.30V and Isat = 9.9uA, the output is at around -.325V at 5ns after the switching event. This slowdown in the voltage fall is due directly to the fact that the right side voltage must continue through a very strong saturation region. We can also use the previous data points to illustrate the change in the Signal Slope Ratio (SSR) for the different levels of nonlinearity. Figure A.3d examines the change in propagation delay vs. fall time for this level one model. We see that the SSR is a strong function of the I-V curve shape of the pmos load devices. While the actual implications of using large SSR circuits depends on the actual function being implemented, the propagation delay of circuits being driven by a gate with slow fall time will almost definitely decrease. Vdsat (V) Isat (uA) tp (ps) t10% (ps) t90% (ps) SSR Linear Linear 277 42 922 3.18 .25 9 265 42 1608 5.91 .30 9 268 42 1162 4.18 .35 9 276 42 948 3.28 .30 9.9 263 42 3623 13.62 .35 9.9 271 42 1102 3.91 Figure A.3d : Signal Slope Ratio (SSR) Effects of non-linear pmos loads We see from the above table that while the propagation delay stays relatively fixed across the different shapes of the I-V curve, the fall time and hence the SSR can increase dramatically. This behavior is undesirable for circuit reliability and should be avoided by increasing the Vdsat voltage above voltage swing, ∆V, by adjusting the pmos device sizes. The last piece of analysis to perform is to determine the effects of non-linear load devices on the current behavior of the power supply. We can easily compute the current equations by taking the derivative of the voltage responses in the four different phases and summing the results. Once again, we achieve different current responses for the three time regions: 90 −t −t I VDD = I + I × e R2C − I × e R1C , I VDD = I + I sat × e I VDD = I + I sat × e − (t −t1 ) R1C − (t −t1 ) R1C − I ×e t < t1,t2 −t R1C , − ( I − I sat ) × e t1 < t < t2 − (t − t 2 ) R2C , t > t1,t2 With these equations, we can plot the supply current for a variety of I-V curve characteristics. The result is shown in figure A.3e: Supply Switching Current for MCML Circuits with Non-linear loads 13.5 13 Vdsat = .25 Isat = 9u 12.5 Vdsat = .30 Isat = 9.9u 12 Vdsat = .30 Isat = 9u 11.5 Vdsat = .35 Isat = 9.9u Vdsat = .35 Isat = 9u 11 VDD10.5 Current (uA ) 10 9.5 9 -500 0 500 1000 1500 2000 2500 3000 3500 Time (ps) Figure A.3e : Supply Switching Current for MCML Circuits with Non-linear Loads We see that as the nonlinearity increases, the variation of the ideal constant current draw also increases. Therefore, in order to maintain low power supply switching noise, it is crucial that the load devices be operated entirely in regions below the Vdsat voltage. 91 4000 References [1] B. Davari, R. H. Dennard, G. G. Shahidi, "CMOS Scaling for High Performance and Low Power - The Next Ten Years," Proceedings of the IEEE, Vol 83, No. 4, April 1995, p595606. [2] M. Mizuno, M. Yamashina, K. Furuta, H. Igura, H. Abiko, K. Okabe, A. Ono, H. Yamada, “A GHz MOS, Adaptive Pipeline Technique Using MOS Current-Mode Logic,” IEEE Journal of Solid-State Circuits, Vol 31, No. 6, June 1996, p784-791. [3] Jan Rabaey, “Digital Integrated Circuits: A Design Perspective,” Prentice Hall, 1996. [4] Anantha P. Chandrakasan and Robert W. Broderson, “Minimizing Power Consumption in Digital CMOS Circuits,” Proceedings of the IEEE, Vol 83, No. 4, April 1995, p498-523. [5] Giovanni De Micheli, "Synthesis and Optimization of Digital Circuits," McGraw Hill, 1994. [6] Paul R. Gray and Robert G. Meyer, "Analysis and Design of Analog Integrated Circuits," John Wiley and Sons, Inc., 1993. [7] J.E. Volder, "The CORDIC trigonometric computing technique," IRE Trans. Electron. Comput., vol. EC-8, p. 330-334, Sept. 1959 [8] J. S. Walther, "A unified algorithm for elementary functions," in Proc. AFIPS Spring Joint Comput. Conf., 1971, p. 379-385. [9] Dake Liu and Christer Svennson, “Trading Speed for Low Power by Choice of Supply and Threshold Voltages,” IEEE Journal of Solid-State Circuits, Vol 28, No. 1, January 1993, p10-17. 92

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement