An Analysis of MOS Current Mode Logic for Low Power and High Performance Digital Logic

An Analysis of MOS Current Mode Logic for Low Power and High Performance Digital Logic
An Analysis of MOS Current Mode Logic for Low Power and High
Performance Digital Logic
by Jason Musicer
Research Project
Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley,
in partial satisfaction of the requirements for the degree of Master of Science, Plan II.
Approval for the Report and Comprehensive Examination:
Committee:
Professor Jan Rabaey
Research Advisor
(Date)
*******
Professor Robert Broderson
Second Reader
(Date)
Abstract
In this work, MOS Current Mode Logic (MCML) is analyzed for application to low power,
mixed signal environments. A small MCML cell library is developed and optimized for several
different performance requirements. The cells are then applied to the generation of ripple adders
and piplelined CORDIC structures and compared with equivalent CMOS circuits.
MCML
CORDICs are designed which can operate from 125MHz to 310MHz with power consumption
varying between 4.3mW and 18.6mW. These power results are up to 1.5 times less than CMOS
CORDICs with equivalent propagation delays. Design was done in a 0.25µm standard CMOS
process from ST Microelectronics.
Acknowledgment
Over the year and a half that it has taken to complete this project, many people have touched
my life and helped to contribute to this research. Some of them have given me academic advice,
some have given me new ideas to try, and others have just been there for moral support. While it
is not possible to thank everyone who has contributed to my two great years at UC Berkeley, I
will attempt to point out a few people who have especially gone out of their way.
First, I would like to thank my advisor, Jan Rabaey. Without his guidance and advice, this
work would never have even begun, nevermind end. His original ideas were the seeds that grew
to become this thesis and his ideas and conclusions have shaped it and focused it along the way.
Jan's leadership at the Berkeley Wireless Research Center along with Bob Brodersen have
created an environment of support that has made research easier and much more fun. I cannot
thank him enough for all of the support.
The two biggest contributors to this project have been Antonio Lei and Brian Etscheid. In
the last several months, Antonio has been a tremendous help and is responsible for all of the
layout for the circuits designed. I can't thank him enough for all of his unselfish and dedicated
contributions. Brian was my partner at the beginning of this project and has slaved away way
too many nights at the BWRC running simulations and contributing ideas. Thank you both.
The researchers at the BWRC have also been a tremendous help throughout this project.
Paul Husted, Rhett Davis, Johan Vanderhaegen, David Sobel, and many others have been great
sounding boards for ideas and have saved me incredible amounts of time with their vast
knowledge. Thank you to everyone at the Center.
Most importantly, a warm thank you to my friends and family whose support and guidance
have allowed me to keep my sanity in the world of graduate school.
Table of Contents
CHAPTER 1 INTRODUCTION .............................................................................................................................. 1
1.1
1.2
MOTIVATION ................................................................................................................................................. 1
THESIS ORGANIZATION ................................................................................................................................. 2
CHAPTER 2 MCML GATE DESIGN BASICS ..................................................................................................... 4
2.1
2.2
2.3
2.4
IDEAL GATE - OPERATION AND THEORY ....................................................................................................... 4
MCML INVERTER AND CONTROL CIRCUITRY ............................................................................................... 7
OTHER MCML GATE TOPOLOGIES ............................................................................................................... 9
CMOS GATE DESIGN .................................................................................................................................. 12
CHAPTER 3 MCML GATE OPTIMIZATION................................................................................................... 13
3.1
OPTIMIZATION GOALS AND CHALLENGES ................................................................................................... 13
3.2
SIMULATION METHODOLOGY ...................................................................................................................... 14
3.3
CONSTRAINTS AND PERFORMANCE CRITERIA ............................................................................................. 16
3.3.1 Gain ........................................................................................................................................................... 17
3.3.2 Current Matching Ratio (CMR) ............................................................................................................... 18
3.3.3 Voltage Swing Ratio (VSR)....................................................................................................................... 18
3.3.4 Signal Slope Ratio (SSR) .......................................................................................................................... 19
3.3.5 RFN and RFP Voltage Limits .................................................................................................................. 20
3.3.6 Area ........................................................................................................................................................... 21
3.3.7 Delay, Power, Power-Delay, Energy-Delay.............................................................................................. 21
3.3.8 Power Supply Switching Noise ................................................................................................................. 21
3.4
DESIGN PARAMETERS .................................................................................................................................. 21
3.4.1 VDD........................................................................................................................................................... 22
3.4.2 Voltage Swing (∆V)................................................................................................................................... 24
3.4.3 Current (I) ................................................................................................................................................. 25
3.4.4 Differential Pair Transistor Sizes (WA, LA, WB, LB, WC, LC).............................................................. 26
3.4.5 PMOS Load Transistor Sizes (WRFP, LRFP)......................................................................................... 27
3.4.6 NMOS Current Source Transistor Sizes (WRFN, LRFN) ...................................................................... 28
3.5
MCML GATE OPTIMIZATION PROCEDURE .................................................................................................. 29
3.6
MCML GATE OPTIMIZATION RESULTS ....................................................................................................... 31
CHAPTER 4 MCML GATE LAYOUT.................................................................................................................. 38
4.1 LOCAL VS. GLOBAL EFFECTS ............................................................................................................................ 38
4.2 LAYOUT TOPOLOGY .......................................................................................................................................... 39
4.3 TRANSISTOR MATCHING.................................................................................................................................... 40
4.4 LAYOUT RESULTS.............................................................................................................................................. 42
4.4.1 Parallel vs. Anti-Parallel MCML Layout ................................................................................................. 42
4.4.2 MCML and CMOS Layout vs. Schematic ................................................................................................ 44
CHAPTER 5 MCML SYSTEM LEVEL DESIGN................................................................................................ 47
5.1 MCML SYSTEM OVERVIEW .............................................................................................................................. 47
5.2 MCML GATE PARAMETER GENERALIZATION................................................................................................... 48
5.2.1 Voltage Swing, ∆V .................................................................................................................................... 49
5.2.2 Supply Voltage, VDD ................................................................................................................................ 51
5.3 VOLTAGE SWING CONTROL CIRCUITRY ............................................................................................................ 51
5.4 CURRENT CONTROL CIRCUITRY ........................................................................................................................ 56
5.5 SUPPORT FOR CURRENT VARIABILITY ............................................................................................................... 58
5.6 GATE DRIVE STRENGTH SCALING ..................................................................................................................... 61
5.7 CONVERSION CIRCUITRY ................................................................................................................................... 63
CHAPTER 6 SYSTEM DESIGN EXAMPLE : RIPPLE ADDERS .................................................................... 65
6.1 MCML FULL ADDER DESIGN ............................................................................................................................ 65
6.2 BASIC RIPPLE ADDER DESIGN ........................................................................................................................... 67
6.3 MODIFIED MCML RIPPLE ADDERS WITH CURRENT RATIO ADJUSTMENT ........................................................ 70
CHAPTER 7 SYSTEM DESIGN EXAMPLE : CORDIC .................................................................................... 75
7.1 CORDIC ALGORITHM ....................................................................................................................................... 75
7.2 CORDIC ARCHITECTURE ................................................................................................................................. 76
7.3 CIRCUIT OPTIMIZATION ..................................................................................................................................... 77
7.4 RESULTS ............................................................................................................................................................ 78
CHAPTER 8 CONCLUSIONS................................................................................................................................ 82
8.1 SUMMARY ......................................................................................................................................................... 82
8.2 FUTURE WORK .................................................................................................................................................. 82
APPENDIX A DERIVATION OF IDEAL MCML GATE PERFORMANCE .................................................. 84
A.1 GOAL ................................................................................................................................................................ 84
A.2 MCML GATE WITH IDEAL LOAD ..................................................................................................................... 84
A.3 MCML GATE WITH NON-IDEAL LOAD ............................................................................................................. 86
REFERENCES ......................................................................................................................................................... 92
Chapter 1
Introduction
1.1 Motivation
The recent advances in VLSI technology have allowed rapid growth in the area of portable
electronic devices. Laptop computers, cellular phones, and personal desktop assistants have all
become commonplace items in people's lives. One of the primary consumer complaints of these
devices is the short battery life and/or the extra weight of the batteries due to the high power
consumption of the circuitry.
As CMOS process technology scales and demand for more
processing power increases, it can be shown that the power consumption of future IC's will
increase over time if significant architectural changes are not made [1]. It is therefore critical in
future circuits that power be minimized beyond the traditional constraints of packaging cost and
heat dissipation.
As device density increases, it is also extremely desirable to integrate analog and digital
circuitry onto the same die for many DSP and communications systems.
High levels of
integration will be required in order to reduce total system area and drive down production costs.
This integration has been delayed due primarily to the difficulty in designed high precision
analog circuitry in the presence of extremely hostile digital switching noise. These difficulties
will also increase as process technology scales due to fundamental challenges in high precision
analog design at low supply voltages in digital CMOS technology. Either significant advances in
analog design techniques will be required or digital designers will be forced to adapt their design
style or process technology.
1
A digital circuit style that seems to be promising in both reducing power consumption and
providing an analog friendly environment is MOS Current Mode Logic (MCML). While bipolar
CML, a derivative of emitter coupled logic (ECL), has been used for years in high performance
applications, it has become less desirable over time due to its high static power consumption and
reliance on bipolar processing. In [2], MCML was analyzed and a 64-bit adaptively pipelined
adder was developed and simulated. It was demonstrated in that paper that MCML could
dissipate less power than equivalent CMOS circuitry as well as adjust for clock skew and
environmental or process variations.
In this project, a much broader analysis of MCML is presented with some theoretical
development and application to other circuit blocks. Near-minimum sized transistors are used in
this project instead of the significantly larger devices in [2] and power consumption is measured
for a wide variety of circuit blocks, performance levels, and design techniques. It will be shown
that area efficient MCML can actually consume significantly less power than equivalent CMOS
circuitry while maintaining many of the other benefits of traditional CML such as reduction in
dI/dt effects, common mode noise immunity, and process and voltage variation immunity. The
most important goal of this project is to evaluate the appropriate domains of performance and
power requirements in which MCML presents benefits over current logic styles.
1.2 Thesis Organization
This thesis will be organized as follows: Chapter 2 will present the basic principles and
guidelines for design with MCML logic.
Basic gates will be described and a simulation
framework for evaluating gate level performance will be discussed. Chapter 3 discusses the
design methodology and optimization process for MCML gates. Different trends in transistor
2
sizing, supply voltage, current levels, and voltage swings will be discussed and analyzed in
detail. Chapter 4 discusses the issues and presents results for the layout of MCML gates and
compares to equivalent CMOS gate layouts. Chapter 5 discusses many of the system level
design issues in MCML such as control circuitry, current variability, and conversion circuitry.
Chapter 6 applies the results of chapters 2-5 to the design of ripple adders and demonstrates the
effects of several system level design decisions. Chapter 7 presents the CORDIC algorithm and
describes the circuit implementation of a fully pipelined CORDIC. An equivalent CMOS
CORDIC is also designed and analyzed to give a fair basis for comparison. Chapter 8 is the
conclusion and gives some overall analysis of the feasibility of MCML use and its potential
benefits.
3
Chapter 2
MCML Gate Design Basics
2.1 Ideal Gate - Operation and Theory
In order to understand the issues in designing with real MCML gates, it is beneficial to first
derive some of the properties and equations of a general, ideal MCML gate. This ideal gate is
presented in figure 2.1a below and consists of three main parts: the pull up resistors, the pull
down network switch, and the current source.
R
R
Out
Out
In0
In0
In1
In1
Pull
Down
Network
InN
InN
Inputs
I
Figure 2.1a : Basic MCML Gate
The inputs to the pull down network (PDN) are fully differential. In other words, the true
and complement off all logical inputs must be presented to the gate. The PDN can implement
any logic function but must have a definite value for all possible input combinations. In general,
the design of the MCML pull down network is similar to other differential logic styles such as
differential cascode voltage switch logic (DCVSL) or differential split-level logic (DSL) [3].
4
Unlike DCVSL or DSL, the pull down network in MCML circuits is regulated by a constant
current source. The pull down network steers the current I to one of the pull up resistors based
upon the logic function being implemented. The resistor connected to the current source through
the PDN will have current I and a voltage drop equal to ∆V = I × R . The other resistor will not
have any current flowing through it and its output node will be pulled up to VDD in the DC state.
If we look at the differential output voltage, the total voltage swing is set exclusively by the
amount of current (I) and the value of the pull up resistance (R). This voltage swing is generally
much smaller than VDD, of the order of a few hundred millivolts.
With this simple model in mind, we can derive some basic transient properties for a circuit
composed of MCML gates. For a more detailed analysis and proofs of the following equations,
please see Appendix A. For simplicity, let's assume that our circuit is a linear chain of N
identical gates, all with identical load capacitance C on each output node. The total propagation
delay of the chain of gates will be proportional to:
DMCML = NRC =
N × C × ∆V
I
The power consumption of a digital gate is typically broken down into its static and dynamic
components. In the case of MCML, it can be proven (see App. A) that the sum of the static and
dynamic components is a constant to first order. With this assumption, we can write expressions
for power, power-delay, and energy-delay [2]:
PMCML = N × I × V dd
NC∆V
PD MCML = NIV dd ×
= N 2 × C × ∆V × V dd
I
3
2
2
NC∆V N × C × V dd × ∆V
2
ED MCML = N C∆VV dd ×
=
I
I
5
For comparison, the delay, power, power-delay, and energy-delay for static CMOS logic are
well known and approximated by [4]:
D CMOS =
PCMOS
N × C × V dd
k
α
× (V dd − V t )
2
1
= N × C × V dd2 ×
D CMOS
PD CMOS = N × C × V dd2
ED CMOS
V dd2
C2
= N ×2×
×
k
(V dd − V t )α
2
where k and α are process and transistor size dependent parameters. Note that the above
equations assume that the CMOS circuitry is being clocked at a frequency equal to the inverse of
the propagation delay.
One interesting property to note is that MCML circuits do not have a theoretical minimum to
the energy-delay product whereas the CMOS circuits do [2]. A designer can arbitrarily reduce
the ED product by increasing the current for a given C, VDD, and voltage swing. In reality, this
is not possible for very large currents because the robustness of the circuitry will deteriorate if no
other changes are made.
Possibly the most important conclusion from the above equations comes from the effect of
logic depth, N. The performance of MCML gates in relation to CMOS decreases linearly with
N. This is due to the fact that MCML consumes static power, even when not switching. It is
very important therefore in MCML circuits to maintain a shallow logic depth. In slowly clocked
circuits, CMOS will not consume as much power as MCML, but in circuits with high
performance requirements, MCML can have significantly better power-delay or energy-delay.
6
Much more analysis will be given later in this report as to the actual crossover points between
MCML and CMOS performance.
Another interesting property is that the energy-delay is proportional to the square of the
voltage swing for MCML. This fact encourages the use very low swing circuits. Once again, the
limiting factor is the robustness of the circuitry.
For mixed signal environments, the constant current supplied by VDD is extremely desirable.
The dI/dt effects are negligible in comparison to CMOS circuits and the current variation is
theoretically 0. There will be some current change during switching due to non-idealities, but
the change is less than 5% in circuits simulated. The circuits are also significantly more robust
against power supply noise due to their inherent common mode rejection.
2.2 MCML Inverter and Control Circuitry
Now that we have seen and analyzed an ideal MCML gate, let's begin to deal with the nonidealities of CMOS processing. The first real circuit to analyze is the MCML Inverter/Buffer
shown in figure 2.2a. Since MCML is a differential logic style, the buffer and inverter function
are identical topologically and only require switching of the output or input sense.
The pull down network switch is implemented with a standard nmos differential pair
controlled by the single input. The current source is an nmos device with a fixed gate voltage
(RFN) working in the saturation region. The load resistors are pmos devices with fixed gate
voltages (RFP) and are designed to be operated in the linear region in order to model resistors.
7
RFP
Out (Out)
Out (Out)
In
In
RFN
Figure 2.2a : MCML Inverter/Buffer
The goal of the nmos differential pair is to switch the current provided by the current source
from one side to the other. Ideally, all current will only travel down one path and the "off" path
will have zero current flowing through it under DC conditions. In reality, some current will
always flow in the "off" path and cause a reduction in the true signal voltage from VDD. The
quality of current switching increases with larger input voltage difference (Vid) or larger W/L of
the PDN transistors and decreases as larger currents are used.
The current source for MCML circuits is implemented with a single nmos device. While
several different architectures are known for current sources [6] (e.g. cascoding), a single device
implementation was decided upon for area efficiency. It is important to maintain a relatively
small transistor so that total cell size is not dominated by the current source. It is also desirable
to use a non-minimum length device for this current source in order to achieve higher output
impedance and better current matching across gates. More detail will be given in Chapter 5 as to
how to set the RFN voltage, but a simple way is to use a current mirror [6].
The load resistances are implemented with single pmos devices. It is desirable to make these
devices as close to minimum size as possible, unlike standard analog circuits. Increasing the
width of these devices will decrease the linearity and also increase the capacitance. The RFP
8
voltage is controlled by a simple feedback circuit shown in figure 2.2b known as the Variable
Swing Controller (VSC) similar to that used in [2].
VDD
Vlow
VDD
Vlow
+
VDD
Vlow
Vlow
RFP
-
Inputs
RFN
Figure 2.2b : Variable Swing Controller (VSC)
The VSC adjusts the gate voltage (RFP) of the pmos loads so that the equivalent DC
resistance is equal to the desired voltage swing divided by the current. The inputs to the VSC are
the RFN voltage (i.e. current level) and the low output voltage, Vlow = VDD - ∆V. The VSC
then generates the RFP voltage by using a model of the gate to be controlled. More will be said
about the issues of the VSC design in Chapter 5.
2.3 Other MCML Gate Topologies
Now that we have a basic understanding of the MCML inverter and ideal VSC, we can begin
to construct a small library of gates. The goal was not to build a complete standard cell library,
but rather to develop a small collection of typical gates and functions. The issues of parameter
optimization will be discussed in depth in the next chapter while here we present a general
framework for implementing logic functions in MCML.
9
All MCML gates have one current source device and two load devices. Different logic
functions are implemented with different pull down networks. The pull down networks are
identical to those used in ECL logic and are composed of sets of differential pairs.
The implementation of a logic function can be determined immediately from a creation of a
Binary Decision Diagram (BDD). BDD's are used extensively in the area of logic synthesis and
CAD to visualize boolean optimizations and can also be used in determining MCML gate
structure. A general analysis of the formation and optimization of BDD's is beyond the scope of
this report, but please refer to [5] for more information. Instead, we will look at a single example.
Let's try to implement the following function in MCML:
F = ABC + B'D + ACD + A'BC'
We can begin to factor this expression until we have a completely specified and fully
factored equation:
F = A[BC+B'D+CD] + A'[B'D+BC']
F = A[B(C+CD) + B'(D+CD)] + A'[B(C') + B'(D)]
F = A{B(C) + B'(D)] + A[B(C') + B'(D)]
The BDD for this expression is shown in figure 2.3a and the implementation of the pull down
network is shown in figure 2.3b.
B
1
F
A
1
0
B
1
CD
C
1
1
D
0
0
C
1
1
0
CD
D
B
B
B
D
A
0
1
DC
0
B
C
F
0
A
0
Figure 2.3b : MCML Pull Down Network for F
Figure 2.3a : Binary Decision Diagram for F
10
Since it is desirable to reduce the logic depth of the nmos pull down circuitry to preserve
both DC and transient properties, only functions of three levels or less are considered. While
general BDD algorithms can achieve optimized trees for any logic function, many of the well
known functions can be easily created by hand. Several of these functions are shown in figure
2.3c with their corresponding current sources and pull up devices included.
RFP
RFP
RFP
OUT
RFP
RFP
OUT
Out
Out
Out
Out
C
C
C
C
B
D1
B
A
A
D1 D0
S
D0
OUT
D
B B
B
D
B
A
A
S
C
RFP
OUT
CLK
CLK
RFN
RFN
RFN
RFN
XOR3
AND/NAND/OR/NOR
D Latch
2:1 MUX
Figure 2.3c : MCML Gate Examples
One interesting property to note is the relative homogeneity of gate topologies. If we look at
the leftmost gate in figure 2.3c, we can see that the AND, OR, NAND, and NOR functions all
have the exact same topology and therefore the same sizing, delay, power, etc. The only
difference in implementing these functions is the ordering of the inputs and outputs. This
uniformity leads to more predictability in the timing and area of cells and reduces the need for
boolean manipulation in order to transform into inverting logic.
The basic storage element used for MCML is the D-Latch shown above. The latch has a
simple cross-coupled structure and can be used to form a D Flip Flop with a master-slave
approach [3]. The XOR and MUX gates shown above have a fairly compact structure compared
to equivalent static CMOS implementations and are expected to perform well. The design of
adder cells will be addressed in detail later in chapter 6.
11
2.4 CMOS Gate Design
In order to give a fair comparison of MCML gates to standard implementations, a set of static
CMOS gates was also created. The CMOS versions of each block were optimized for low power.
Traditional sizing rules were used in which the pmos devices were made twice as wide as the
nmos devices and all series transistors were made wider to achieve the same first order delays.
While logic styles such as dynamic logic or pass-transistor logic were generally not used, several
of the blocks such as XOR, MUX, and adders were designed with transmission gates for better
performance and power efficiency.
Since this project does not deal with effects of long
interconnect, small load capacitances were used and hence all gates were minimum sized for low
power [4].
Three dynamic latches and three flip-flops were analyzed for CMOS sequential circuits:
C2MOS, TSPC, and Doubled C2MOS. A more detailed description of these latches can be found
in [3]. All of these dynamic blocks were compared to static CMOS implementations and were
found to be significantly better in power and delay. The C2MOS D-FF's were used in the CMOS
implementation of the CORDIC to be discussed in chapter 7.
12
Chapter 3
MCML Gate Optimization
3.1 Optimization Goals and Challenges
The focus of this chapter will be to describe and analyze the performance of MCML gates as
a function of numerous design parameters. The goal of this analysis is to allow a designer to
quickly optimize transistor level gate designs while exploring different system choices. This
optimization is necessary with MCML designs because many of the system level decisions will
not exhibit their true performance tradeoffs unless the proper transistor level adjustments are
made. A future extension of this work would be to develop an automated design flow for
generating optimized MCML gates.
An equivalent optimization problem exists for static CMOS design but is much better
understood. The only two real parameters which effect gate performance in CMOS are VDD
and transistor sizes. These two parameters are typically chosen independently and general
guidelines for transistor sizing are well known. In contrast, MCML optimization has many more
degrees of freedom in the parameter selection. Furthermore, the parameters tend to be tightly
coupled and do not allow independent selection. The following sections will be an attempt to
limit the freedom in parameter selection by showing the trends in performance as different
design decisions are made.
The approach taken in this chapter is to take an individual gate and to optimize it under ideal
system conditions. The effects of actual system level decisions will be shown chapter 5. For
example, all of the analysis in this chapter assumes the use of a VSC matched perfectly to the
13
current gate although multiple of unmatched VSC's are commonly used in practice. There will
also be complete freedom in the selection of system level parameters such as voltage swings or
VDD on a per gate basis. These parameters are typically fixed across different gates in the same
system but are allowed to vary between gates for this analysis. The reason for this simplistic
analysis is to explore an upper bound on the achievable performance of MCML gates as a
baseline to judge non-idealities.
3.2 Simulation Methodology
Before beginning the parameter optimization, it is first necessary that a simulation
methodology be fixed in order to evaluate performance. The goal of this methodology is to
fairly produce standard performance metrics such as delay and power as a function of transistor
sizing, voltage swing, VDD, current, etc. while introducing as few simulation artifacts as
possible.
Most parameter simulations were done at the transistor schematic level although some
verification of layout effects was later performed. Designs were entered into Cadence using the
ST Microelectronics 0.25um design library. Schematics were then netlisted and all simulations
were performed in HSPICE.
The first modification to standard simulation techniques was to buffer all inputs with
inverters. Since many of the measurements being made are sensitive to input signal slopes,
inverters were used to more closely model actual waveforms present on a chip. Between two
and four inverters were used for buffering and were sized to produce about the same signal slope
as the gate under test's output slope. The inverters were connected to different power supplies in
order to not affect any power measurements.
14
The next modification made was to model realistic output loading conditions. For individual
gate simulations in which we had no knowledge of surrounding circuitry, fanouts of 4 and 3
identical gates were used for CMOS and MCML blocks respectively. More fanout was used for
CMOS to try to simulate the reduction in logic depth due to MCML's complementary nature [3].
The amount of capacitance added per fanout was equal to the measured input capacitance of the
gate under test plus a fixed amount of wiring capacitance varying between 1fF and 10fF. Actual
loading capacitance numbers were measured and used in situations where the circuit topology
was known beforehand.
The next simulation decision made was the method for comparing power dissipation of the
two logic styles. It is well known that the power consumption of static CMOS is proportional to
the clock frequency [4]. A better metric for evaluating the efficiency of a CMOS gate is to
measure its energy per switching event or its power-delay product. This metric is independent of
clock frequency and is a more fundamental measure of the gate. Unfortunately, not all switching
events dissipate the same energy. The metric chosen for our purposes was the average energy
per switch over all possible input switching combinations, including no switching.
The
probabilities of each input switching were taken to be 50% and the energy dissipation was
measured for all switching combinations. Note that this average energy metric measurement
requires the generation of 22N different input switching combinations, where N is the number of
inputs. This is only feasible for very small N and this was done for 3 input gates or less. For
circuits with more than 3 inputs, random waveforms were generated and applied to measure
power dissipation.
In contrast, MCML dissipates a nearly constant amount of power, independent of the clock
frequency or input switching activity. In order to compare the two logic styles, we divided the
15
average energy per switch metric by the propagation delay for CMOS and found the total
average power.
The final methodology decision was in the realm of flip flop and latch evaluation. The
propagation delay of a flip flop is taken as the sum of the setup time (tsetup) and the clock to
output (clk2q) delays. It is well known that these delays are actually dependent upon each other.
The technique employed in this project was to sweep the setup time given to the flip flop and to
report the propagation delay as the minimum of the sum.
3.3 Constraints and Performance Criteria
Now that the simulation environment is defined, we need to establish some metrics of
performance as a basis for our optimizations. These metrics can be broken into two categories:
hard constraints and optimization goals. Hard constraints place a limit on some performance
metric which must not be violated. Optimization goals do not have any fixed requirements but
should be minimized or maximized whenever possible. A summary of these metrics is shown in
figure 3.3a:
Hard Constraints
Optimization Goals
Gain : Av
Area
Current matching ratio (CMR) : Iact /Iref
Power
Power-Delay
Voltage swing ratio (VSR): ∆Vout /∆Vin
Signal slope ratio (SSR)
Energy-Delay
RFN and RFP voltage limits
Power supply switching noise
Figure 3.3a : Optimization Metrics
The following sections describe the motivation behind the above optimization criteria.
16
3.3.1 Gain
In standard CMOS circuits, one of the main qualities of robustness to noise is the mid-swing
DC voltage gain [3]. Digital logic can only function correctly if there exists a point in the DC
transfer curve where the gain is larger than 1. There are two primary reasons for the gain to be
made larger than the absolute minimum: regeneration and bi-stability. Regeneration is the ability
for a gate to produce an output voltage closer to the ideal voltage level than its input voltage. Bistability is a requirement in latches and flip-flops and assures that there are only two stable logic
states in the system. Both of these metrics are helped by large DC voltage gains.
Standard CMOS circuits naturally have large mid-swing voltage gains. In simple circuits
simulated, gains of greater than 60 can be achieved with no additional effort. In constrast,
MCML circuits do not naturally have high gains.
Large gains can be achieved but at a
tremendous cost in area and performance. Therefore, it is critical to design at or near the
minimum requirements for voltage gain.
Furthermore, MCML circuits do not suffer from the same noise constraints as CMOS
circuits. Most of the noise which adversely affects CMOS circuits becomes common mode noise
for MCML and is rejected by the differential logic. MCML circuits also generate significantly
less switching noise than CMOS circuits and the environment will therefore be more conducive
to low gain operation.
The lower limit on voltage gain for this project was set at 1.4 for nominal conditions. The
requirement is really that the gain be greater than 1 for all process, voltage, and matching
conditions, but it was felt that a 40% margin would be sufficient for these variations. Later
simulations verified that this margin was sufficient under typical variations and mismatch.
17
3.3.2 Current Matching Ratio (CMR)
This constraint is referring to the amount of current flowing through the actual current source
in comparison to the reference current source. This ratio is illustrated in figure 3.3b:
Iref
RFN
Iact
Figure 3.3b : Current Matching Ratio = Iact/Iref
The parameter which is set by the designer is the reference current, Iref, but the acutal current
flowing through the test gate is Iact. In order to achieve predictability in design, we would like
the actual current to be close to our reference current. We allow the actual current to vary by
10% from the reference. The main parameters which affect this ratio are the output impedance
of the current source and the supply voltage.
3.3.3 Voltage Swing Ratio (VSR)
The ideal MCML gate contains a perfect current switch where all of the current flows down
one side or the other. In reality, some finite amount of the current flows in the "off" path and the
full current does not flow in the "on" path. The result of this non-ideality is a reduction in the
output voltage swing.
This problem is exacerbated by the fact that the quality of current
switching is directly proportional to amount of input swing applied. It is theoretically possible to
create a chain of gates which have a continuously degrading voltage swing. This does not occur
in reality because of the heterogeneity of gates used. Some gates will reduce signal swing while
18
others will tend to regenerate swing. The mixture of gates tends to ensure preserved voltage
levels but it is still desirable to place an upper bound on the amount of signal degradation of a
single gate. We set this limit by constraining that the output voltage swing must be at least 98%
of the input voltage applied.
3.3.4 Signal Slope Ratio (SSR)
The output transient response of an MCML gate can be viewed as the sum of two events: the
pull-up of one side of the gate and the pull-down of the other side. The sum of these two events
creates a differential voltage swing which is viewed as the total signal. In an ideal MCML gate,
each side's response is a first order system dominated by the same RC time constant (see
Appendix A). The sum of these responses is a completely linear transition.
In reality, many nonidealities exist in MCML gates. One of the most significant nonidealities
is the nonlinearity of the pmos load resistances. The modified transient response is analyzed in
Appendix A.
The result of that analysis is that a direct tradeoff exists between a gate's
propagation delay (tp) and its 10%-90% rise/fall time (trf). It is possible to make a circuit with a
very fast pull-up and a very slow pull-down response. The overall response of this circuit will
look like figure 3.3c below.
Since the speed of the next gate will depend not only on the propagation delay but also on the
output waveform shape of the previous gate, some control must be used to ensure reasonable
rise/fall times. The metric used in this project is to take the ratio of the 10%-90% rise/fall time
and the propagation delay: SSR = trf / tp. This metric is kept as low as possible but constrained
to an absolute limit of 5.
19
V
Input Waveform
∆V
time
tp
Output Waveform
Pull-down and Pull-up active
Pull-up completes
Pull-down only
80%
∆V
time
trf
Figure 3.3c : Typical MCML Transient Response
3.3.5 RFN and RFP Voltage Limits
The final hard constraint is the voltage limits on the control signals, RFN and RFP. For
simulation and optimization purposes, we tend to use artificially ideal control circuits to generate
these two voltages. It is therefore important to monitor the ideal control circuits and to make
sure that they are producing voltages which could also be produced by real circuitry.
The RFN signal sets the gate voltage on the nmos current source. It therefore needs to be
kept a few hundred millivolts from both VDD and from ground if set by current mirrors. The
RFP voltage is used to set the pmos load gate voltages and is allowed to be below ground (to be
discussed in chapter 5). The constraint on RFP is that the total Vsg of the pmos devices must be
kept below the process limit of 2.5V.
20
3.3.6 Area
Since the goal of this optimization exercise is to be able to quickly evaluate transistor level
tradeoffs, it would be nice to have an accurate estimation of cell area. Unfortunately, the layout
of MCML cells is somewhat irregular and is difficult to predict. The approach taken in this
project was to constrain the transistor widths and lengths at the schematic level to a maximum
and then try to reduce the sizes whenever possible.
This approach leads to a library of
"minimum sized" cells. When larger driving strengths are required, these conditions are relaxed
and all transistors are scaled in proportion to the needed drive strength. While this approach
does not take into account routing area, it is the best guess that can be made without doing layout
at each step. The sizes of the nmos current sources are constrained to W < 2.0u, L < 0.5u, the
nmos pull down transistors to W < 1.5u, L = 0.25u, and the pmos loads to W < 1.5u, L < 1.5u.
3.3.7 Delay, Power, Power-Delay, Energy-Delay
These metrics are the common interpretations and are used throughout this report to evaluate
performance or power efficiency.
3.3.8 Power Supply Switching Noise
This metric is used to evaluate the ability of the MCML circuits to coexist with sensitive
analog circuitry. The metric used is the percentage change of the supply current from its DC
average.
3.4 Design Parameters
We can view our test environment described above as a computation engine which takes the
user defined input parameters and the gate topology to be tested and produces a number of
21
performance metrics. The next step in the optimization procedure is to define and classify all of
these input parameters to be optimized. The input parameters are listed in figure 3.4a:
Parameter Name
VDD
∆V
I
WA, LA
WB, LB
WC, LC
WRFP, LRFP
WRFN, LRFN
LOADCAP
Description
Supply Voltage
Input and Output Voltage Swing
Current desired in current source
Width and Length of first level pull down network nmos devices
Width and Length of second level pull down network nmos devices (if
needed)
Width and Length of third level pull down network nmos devices (if
needed)
Width and Length of pmos load devices
Width and Length of nmos current source
The output loading capacitance
Figure 3.4a : Input Parameter Description
Please note that the WB, LB, WC, and LC parameters are only needed for two or three level
gates. Also note that the total load capacitance (LOADCAP) is the sum of the input gate
capacitance of the gate under test and some fixed interconnect capacitance. The reason for this
type of loading is to prevent the optimization from unfairly using large devices and creating a
large loading on the previous gate.
The following sections are an attempt to show some of the effects on the performance criteria
by varying each of the design parameters. As stated earlier, these effects are not independent of
each other.
This analysis is presented in order to give an intuitive feel for the design
optimizations to be illustrated in section 3.5.
3.4.1 VDD
The only true upper bound on the supply voltage is due to the process limits (2.5V) but it is
typically desirable to lower VDD in order to reduce the power consumption.
The power
consumption of the circuitry is linearly proportional to the supply voltage and it should therefore
be reduced as much as possible.
22
The main lower limit on the power supply voltage comes from the nmos current source.
Reducing VDD too far hurts the output impedance of the current source and eventually pushes it
out of the saturation region. One effect of this degradation is that the current matching ratio
(CMR) begins to decrease and the current in the gate is reduced. Another effect is that the midband voltage gain (Av) is reduced.
To illustrate the effects of supply voltage selection it is necessary to fix all of the other
parameters. Figure 3.4b shows the effects of supply voltage selection on Av, CMR, current
source output impedance (Ro), and power for an MCML inverter using 10uA of current, 400mV
swing, 10fF of load capacitance (LOADCAP) and having the following transistor sizes : WRFN
= 2.0um, LRFN = 0.5um, WRFP = 0.5u, LRFP = 0.6u, WA = 1.0u, LA = 0.25u.
Gain vs. VDD
2.5
2
1
1.5
0.9
1
Volta ge Gain (Av )
0.5
0.5
7
10
CMR vs. VDD
1.1
0.8
1
1.5
VDD (Volts)
2
0.7 Matchin g Ratio (CMR )
Current
0.5
1
1.5
VDD (Volts)
2.5
-5
Current Source Ro vs. VDD
3
2
2.5
2
2.5
Power vs. VDD
x 10
2.5
6
10
2
1.5
5
10
Ro (Ohms )
1
Power (W )
0.5
4
10
0.5
1
1.5
VDD (Volts)
2
0
0.5
2.5
1
1.5
VDD (Volts)
Figure 3.4b : Effects of VDD on MCML Inverter performance
I = 10uA, ∆V = 400mV, WA = 1.0um, LRFP = 0.6um, LOADCAP = 10fF
23
It is evident from the above figure that Gain, CMR, and Ro all have definite rolloff points. It
is therefore desirable to operate at VDD near this point which will vary for different current
levels and gate topologies.
3.4.2 Voltage Swing (∆V)
As seen in chapter 2, it is extremely desirable to reduce the voltage swing as much as
possible in order to reduce the propagation delay of MCML. The lower limit on the voltage
swing is determined by the gain and current switching requirements. As the voltage swing is
reduced, the mid-transition output voltages become closer to VDD and reduce the output
impedance of the pmos loads. The quality of the current switching will also be reduced and the
voltage swing ratio (VSR) will suffer. These degradations of the gain and VSR can be fixed by
adjusting other parameters such as transistor sizes or VDD.
The lower bound on the swing must also take into account possible circuit mismatch effects.
In general, the smallest amount of voltage swing used in this project is 200 mV although lower
swings could be used with extremely careful layout and noise management.
The upper bound on the voltage swing comes from the nonlinearity of the pmos loads and the
effects on the signal slope ratio (SSR). As the voltage swing is increased, the pmos device on the
side which is being pulled down is required to move closer to its Vdsat voltage. This leads to
eventual entering of the saturation region and extreme nonlinearity. This can be adjusted by
increasing the length of the pmos device but increases the propagation delay.
Another upper bound on voltage swing comes from the nmos current source. If the voltage
swing is too large, the pull down side will approach ground and force the current source out of
saturation. The tradeoff among linearity, gain, and speed can be seen in figure 3.4c:
24
Voltage Gain (Av)
3
Propagation Delay (tp)
160
140
2.5
120
2
100
Gain1.5
80
Dela y (ps )
1
0.5
60
0
0.5
1
Voltage Swing (V)
40
1.5
Voltage Swing Ratio (VSR)
1.05
0
1.5
Signal Swing Ratio (SSR)
20
1
0.5
1
Voltage Swing (V)
15
0.95
10
0.9
SSR 5
= trf / t p
VSR0.85
= Vout/Vin
0.8
0
0.5
1
Voltage Swing (V)
0
1.5
0
0.5
1
Voltage Swing (V)
1.5
Figure 3.4c : Effects of Voltage Swing on MCML Inverter
I = 10uA, WA = 0.5uA, VDD = 2V, LRFP = 0.9um, LOADCAP = 10fF
3.4.3 Current (I)
This is the most general of the design parameters and is varied over a wide range in this
analysis. The majority of the next section will be dedicated to showing the trends of MCML
gates at different current levels. The lower bound on the current comes from severe signal SSR
and VSR effects. The upper bound is set by the maximum transistor sizes allowed for a
"minimum" sized cell. The circuits are tested in this project in the range of 0.5uA to 100uA (for
near minimum sized transistors).
25
3.4.4 Differential Pair Transistor Sizes (WA, LA, WB, LB, WC, LC)
The sizing of transistor lengths and widths is the design parameter with the greatest degree of
freedom and effects almost all performance criteria. In order to limit the scope of the discussion,
we will first make a few assumptions. First, while each transistor width and length could be
independently varied, we assume that all differential pair transistors are matched to be the same
size. The second assumption is that the length of all transistors in the pull down network will be
kept to the minimum (0.25u) since there is almost no benefit from increasing the length.
In general, increasing the width of the differential pair transistors will increase the voltage
gain but it will also increase the input and output capacitance. This leads to a direct tradeoff
between performance (both delay and area) and robustness (voltage gain). It is desirable to use
the smallest transistors possible in order to achieve enough voltage gain. The relationship
between voltage gain and performance is illustrated in figure 3.4d for an MCML inverter. Note
that in this simulation, the loading capacitance of the gate has a fanout of 3 where each fanout
has a 1fF fixed interconnect capacitance plus the input capacitance of the gate under test.
2
Voltage Gain (Av)
220
1.9
Propagation Delay (tp)
210
1.8
200
1.7
190
1.6
180
Gain
1.5
Dela y (ps )
170
1.4
160
1.3
1.2
0.5
1
Transistor Width (um)
150
0.5
1.5
1
Transistor Width (um)
Figure 3.4d : Effects of Diff. Pair Transistor Widths (WA) on MCML Inverter
I = 10uA, ∆V = 250mV, VDD = 2V, LRFP = 0.6um, LOADCAP = 3*(1fF + Input Cap)
26
1.5
It is evident from figure 3.4d that gain must be kept at a minimum in order to preserve
performance. By changing the transistor widths from 0.5um to 1.5um, the input capacitance
increases dramatically (~3x) and the performance of the gate driving this gate decreases from
155ps to 215ps.
In multi-level gates, the number of differential pairs to be sized increases. The definition of
voltage gain also changes in multi-level gates and we define the overall voltage gain as the gain
from the worst case input combination. It is very difficult to come up with general design rules
for multi-level gates due to the fact that the effects on gain are extremely inter-dependent among
levels. In general, optimized gates tend to have widths increasing slightly from bottom to top.
3.4.5 PMOS Load Transistor Sizes (WRFP, LRFP)
Optimizing the size of the pmos load transistors is one of the most difficult and nonlinear
tasks in creating a good MCML gate. Besides the obvious area tradeoffs, the main performance
criteria affected by the sizing of these transistors are the voltage gain, signal swing ratio, RFP
control voltage limit, the propagation delay, and control voltage mismatch.
The voltage gain is increased by increasing the length of the pmos devices (LRFP). This
effect is especially strong when increasing from minimum length and therefore, non-minimum
length transistors are used if possible for these devices. Non-minimum length devices also help
by reducing the effects of transistor mismatch both between load devices and from the gate to the
control circuitry.
The ratio of the width and length (W/L) of the devices also affects several criteria.
Increasing the W/L, either by increasing W or decreasing L, decreases the effective resistance of
the load devices and therefore improves the propagation delay. If the width is increased, the
27
capacitance is also increased at the output node and the propagation delay may in fact stay the
same. The actual effect will be heavily dependent on the amount of load capacitance.
Increasing the W/L of the devices also reduces the Vdsat voltage and increases the nonlinearity of the resistance (see Appendix A). In order to preserve a reasonable signal swing ratio
(SSR), it is usually necessary to use small W/L's (<1) and to accept the loss in propagation delay
associated with this decision. Finally, the choice of the width and length is bounded by the
minimum RFP voltage. The W/L must be kept large enough so that enough DC resistance can
be achieved for a given voltage swing and current without generating Vsg voltages greater than
2.5V.
All of these different design constraints lead to a very complex optimization problem. The
optimization methodology will be discussed further in the next section (3.5) but in general,
WRFP is kept to be minimum (0.5u) and LRFP varies from minimum (0.25u) for high currents
to around 1.5u for very small currents. Some of the effects are illustrated in figure 3.4e below.
3.4.6 NMOS Current Source Transistor Sizes (WRFN, LRFN)
The principal tradeoff in the selection of the current source transistor sizes is between area
and robustness. It is desirable to use non-minimum length devices for the current sources both to
increase the output impedance and to decrease the mismatch effects. It is also desirable to have a
large W/L to decrease the Vdsat voltage and allow for further reduction in VDD. The limit on
increasing both the length and width is that the area of the gate begins to grow dramatically. The
current source devices used in this project were set to a limit of WRFN = 2.0u and LRFN = 0.5u
in order to have reasonable area.
28
Voltage Gain (Av)
Pmos load Gate-Source Voltage (Vgs)
2
3
2.5
1.8
2
1.6
1.5
V gs (V )
Gain
1.4
1.2
1
0
0.5
1
Pmos load length, LRFP (um)
0.5
1.5
0
Propagation Delay (tp)
7
100
6
80
5
Dela y (ps )
60
0
1.5
Signal Slope Ratio (SSR)
120
40
0.5
1
Pmos load length, LRFP (um)
SSR4= trf / t p
0.5
1
Pmos load length, LRFP (um)
3
1.5
0
0.5
1
Pmos load length, LRFP (um)
1.5
Figure 3.4e : Effects of PMOS load transistor length (LRFP) on MCML Inverter
I = 10uA, WA = 0.5uA, VDD = 2V, ∆V = 400mV, LOADCAP = 10fF, WRFP = 0.5um
3.5 MCML Gate Optimization Procedure
Now that we have explored the desired performance goals and the input parameter effects on
these goals, we can try to formalize a design methodology. As mentioned earlier, the goal of this
methodology is to be able to quickly optimize at the transistor level when considering system
level design choices.
While a general random optimization procedure (exhaustive search,
simulated annealing, etc.) could be applied to this problem, we instead take the approach of
trying to limit the optimization space as much as possible and then using some human intuition.
The first step in the methodology is to define the limits placed on certain input parameters.
These limits were discussed in the previous section but are summarized in figure 3.5a:
29
Parameter Name
Limit
VDD
VDD < 2.5V
∆V
200mV < ∆V < 2.4V
I
0.5uA < I < 100uA
LA, LB, LC
LA, LB, LC = 0.25um
WA, WB, WC
0.5um < WA, WB, WC < 1.5um
WRFP
WRFP = 0.5um
LRFP
0.25um < LRFP < 1.5um
WRFN
WRFN = 2.0um
LRFN
LRFN = 0.5um
Figure 3.5a : Input Parameter Optimization Limits
The first step in the optimization process is to initialize the VDD = 2.5V. The outer sweep
variable for this optimization is the current level, I. For a number of discrete values of I ranging
from 0.5uA to 100uA, we are trying to find the VDD, ∆V, WA, WB, WC, and LRFP which
adheres to all of the hard design constraints and produces the smallest energy delay product. The
next loop variable in the optimization is the voltage swing (∆V). For each I, we choose and fix a
voltage swing (∆V). The third loop variable used is the pmos load length (LRFP). Finally,
within each iteration of this loop, we find the best WA, WB, and WC so that the hard constraints
explained in section 3.3 are met. If it is possible to meet these hard constraints with this
selection of I, ∆V, and LRFP, we then lower VDD until the hard constraints are no longer met.
Finally, we simulate the gate with all of these design parameters fixed and record the value of
energy delay product. Once we sweep through all of the possible ∆V's and LRFP's for a given
current, we find the set of parameters which gives the overall minimum for energy-delay. We
then move on to another current level and repeat the process. This optimization procedure is
illustrated in figure 3.5b:
30
for 0.5uA < I < 100uA {
for 200mV < ∆V < 2.4V {
Initialize VDD = 2.5V
for 0.25um < LRFP < 2.5um {
Find smallest WA, WB, WC which satisfy:
0.5um < WA, WB, WC < 1.5um
Gain > 1.45,
Current Matching Ratio > 90%,
Voltage Swing Ratio > 98%,
Signal Slope Ratio < 5,
|VDD - RFP| voltage < 2.2V
If above is possible, find smallest VDD which satisfies:
Gain > 1.4,
Current Matching Ratio > 90%,
Voltage Swing Ratio > 98%,
Signal Slope Ratio < 5
If above is possible, store parameters and ED product.
}
Find LRFP which gives minimum ED for given I, ∆V
}
Find ∆V which gives minimum ED for given I
Store values of I, ∆V, VDD, WA, WB, WC, LRFP
}
Figure 3.5b : MCML Gate Optimization Procedure
While in general the above optimization algorithm can be up to O(n6), it is usually possible to
prune the design space dramatically when using human intuition. Optimization does become
exponentially more difficult as the number of levels in the gate increase as well as there being an
increase in simulation time. With some practice and a properly setup simulation environment, a
gate can be optimized in less than ten or fifteen minutes for a wide variety of current levels.
3.6 MCML Gate Optimization Results
In order to see the effects of the system level design decisions to be demonstrated chapter 5,
it is useful to have a set of idealized data points as an upper bound on performance. This section
31
will show the theoretical performance limits of individual MCML gates and compare them with
equivalent CMOS gates.
As discussed in sections 2.1 and 3.2, the power-delay and energy-delay metrics for MCML
gates are directly proportional to the logic depth of the circuitry being used. It is therefore
extremely unfair to compare CMOS and MCML gates at their absolute maximum clock
frequency (1/tp). Instead, we assume an optimistic yet achievable logic depth of 4 for all gates in
this section. The actual performance of the MCML gates can be scaled accordingly from that
depth in order to see the actual performance under real circuit conditions.
The final disclaimer before displaying the results is that this optimization does not take any
layout effects into account. The load capacitance value uses an estimate of 1fF of wiring
capacitance per fanout and a fanout factor of 3 (4 for CMOS) identical gates. The effects of
actual circuit layout will be discussed in the next chapter.
The first MCML gate optimized through the above procedure was a simple inverter/buffer.
The result parameters are given in figure 3.6a for several different current levels, each optimized
for energy-delay product. The plots of delay, energy, and energy-delay vs. current are shown in
figure 3.6b. Note that each value of current is fully optimized in all of the other parameters. The
effects of fixing parameters across different current levels will be explored in chapter 5.
I (uA)
0.5
1
3
6
10
30
60
100
∆V (mV)
200
200
200
250
300
400
500
600
tp (ps)
VDD (V) (W/L)A (W/L)RFP (W/L)RFN
0.80
.5/.25
.5/1.5
2.0/.5
2134
0.75
.5/.25
.5/1.0
2.0/.5
1075
0.80
.5/.25
.5/.6
2.0/.5
391
0.85
.5/.25
.5/.6
2.0/.5
247
0.90
.5/.25
.5/.6
2.0/.5
182
1.10
.85/.25
.5/.25
2.0/.5
98
1.25
1.0/.25
.6/.25
2.0/.5
68
1.35
1.25/.25
.8/.25
2.0/.4
57
Figure 3.6a : MCML Inverter Optimization Results
32
ED (pJ*ps)
7.22
3.40
1.43
1.21
1.15
1.21
1.30
1.57
Energy for logic depth = 4
Propagation Delay (tp)
1200
30
1000
25
800
20
Energy-Delay for logic depth = 4
3.5
3
2.5
600
15
Dela y (ps )
400
2
Ener gy (fJ )
10
Ener gy-Dela y (pJ* ps )
200
0
0
10
1.5
5
1
10
Current, I (uA)
2
10
0
0
10
1
10
Current, I (uA)
2
10
1
0
10
1
10
Current, I (uA)
Figure 3.6b : MCML Inverter Performance for Logic Depth = 4
There are several interesting things to note about the optimization results for the MCML
inverter. The first is the deviation from the statement made in chapter 2.1 that there is no
theoretical minimum to the energy delay product. As we can see from the figure on the right, a
minimum does exist around I = 10uA. The reason that energy-delay cannot be decreased further
is that the voltage gain begins to degrade for larger currents. In order to maintain the minimum
gain metric, it is necessary to either increase ∆V, WA, or LRFP. We cannot actually increase LRFP
because the signal slope ratio (SSR) begins to dominate and we must instead increase either WA
or ∆V, both of which negatively impact the performance. Therefore, at high current levels, we
reach a limit to the speed improvements but not to the energy increase and therefore, the energydelay goes up.
On the low current end, the main factors hurting performance are the limits on minimum
swing and minimum transistor widths. While high current levels require higher voltage swings
in order to reach the minimum gain metric, low current levels do not. With low current levels,
the gain metric is met easily and the higher than required swing merely hurts the performance. It
33
2
10
would also be desirable to decrease the width of the transistors in order to reduce capacitance but
the process minimum widths are already achieved.
With these results, we can generate some comparisons with CMOS inverters. The CMOS
inverter simulated has nmos width = 0.5um and pmos width = 1.0um with a fanout of 4 identical
inverters (plus interconnect capacitance). In order to compare the energy efficiency of MCML
and CMOS gates, we plot the energy delay (ED) product versus the delay of the gate. In MCML
gates, we vary the delay by changing the current level. In CMOS gates, we vary the delay by
changing VDD. Once again, we assume a logic depth of 4 in order to allow a more fair
comparison. The results are shown in figure 3.6c:
Propagation Delay for CMOS Inverter (tp)
800
Energy-Delay vs. Delay for MCML and CMOS Inverters
2.5
700
CMOS
MCML
600
2
500
400
Dela300
y (ps )
1.5
Ener gy-dela y (pJ* ps )
200
100
0
0.5
1
1.5
VDD (V)
2
1
2.5
0
200
400
600
Delay (ps)
800
1000
Figure 3.6c : MCML and CMOS Inverter Comparison
We can see from these graphs a few more interesting trends. For small delays (i.e. high
performance), the MCML inverter performs up to two times better than the CMOS inverter and
can achieve smaller energy-delay products than even possible with CMOS circuitry. For large
delays (i.e. low performance), the benefits gained by reducing VDD in CMOS circuits far
34
outweigh possible savings in MCML current reduction and CMOS is more energy efficient.
This graph shows a crossover point at around 300ps which corresponds to around 5uA of current
in MCML or 1.1V in CMOS. If performance requirements are greater than 300ps, then MCML
is more energy efficient.
It is also interesting to note that the overall performance limit is greater for MCML than for
CMOS. An MCML inverter with 100uA of current achieves a propagation delay of only 57ps
while a CMOS inverter running at 2.5V has a tp of 100ps. While the energy efficiency of
MCML gates degrades significantly at these high performance levels, it is possible to achieve
extremely high performance designs, about twice as fast as CMOS.
The above graphs depend on several very important assumptions. First of all, the logic depth
is assumed to be equal to 4 in the above analysis. If the logic depth is greater than 4, the MCML
curve will become less energy efficient in comparison to CMOS. Also, we allow arbitrary
selection of the design parameters for MCML gates.
Many of these parameters will be
constrained at the system level and may not be optimized.
We performed a similar analysis on a wide variety of gates: NAND2, NOR2, XOR2,
MUX21, NAND3, NOR3, XOR3, MUX41, Full Adder, D Flip-Flop.
All of the gates show the
same general trends as the inverter comparison and perform better in comparison for high
performance, low depth circuits. We will not give the sizing results for all of the gates but it is
illustrative to show the results for one of the larger gates, XOR3. These results for a limited set
of current points are shown in figure 3.6d:
I (uA)
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
(W/L)RFN
tp (ps)
1
10
100
250
500
800
0.75
1.00
1.70
.5/.25
.5/.25
1.40/.25
.5/.25
.5/.25
1.45/.25
.5/.25
.5/.25
1.50/.25
.5/1.0
.5/.6
.6/.25
2.0/.5
2.0/.5
2.0/.4
2414
532
201
Figure 3.6d : MCML XOR3 Optimization Results
35
ED
(pJ*ps)
17.0
10.5
25.6
As stated earlier, the general trends tend to agree with the inverter optimization. In general,
larger gates require slightly larger signal swings, larger transistor sizes, and larger VDD in order
to achieve the minimum robustness constraints. All of these factors hurt the performance of the
gate and we can see that the XOR3 has about ten times the energy-delay as the inverter across all
current points.
We can also construct the energy-delay comparison curves against a transmission gate based
CMOS XOR3 from the ST Microelectronics standard cell library (figure 3.6e).
Energy-Delay for MCML XOR3 - logic depth = 4
Energy-Delay vs. Delay for MCML and CMOS XOR3
28
70
MCML XOR3
CMOS XOR3
26
60
24
22
50
20
40
18
16
Ener gy-Dela y (pJ* ps )
30
Ener gy-Dela y (pJ* ps )
14
20
12
10
0
10
1
10
Current, I (uA)
10
2
10
0
500
1000
1500
2000
Delay (ps)
2500
3000
Figure 3.6e: MCML XOR3 Comparisons
The shape of both curves is relatively the same as in the inverter case, but the CMOS XOR3
is significantly worse in energy-delay than the MCML XOR3 at all performance levels.
Whereas the inverter comparison gave a maximum benefit of MCML of around 2 times, the
XOR3 can perform up to 6 times better. The reason for this improvement over the inverter
comparison is that the CMOS XOR3 becomes much more complex than a simple inverter while
the MCML XOR3 has a very similar structure. This comparison is extremely encouraging for
36
arithmetic circuits since the XOR3 function is used for sum generation in full adders. We can
also see similar trends in other complex gates. In general, MCML performs better in comparison
to CMOS for circuits composed of more complex gates.
The above sections have demonstrated an intuition and methodology for optimizing MCML
gates at the transistor level. We see from the results that MCML is most energy efficient in the
range around I = 10uA. We also see that in comparison with CMOS logic, MCML is more
energy efficient at higher performance levels. These results are still somewhat idealistic and rely
on some impossible system level design choices. Chapter 5 will look at these system level
tradeoffs and analyze their degradation of the ideal results presented. Before we look at the
system level issues though, we will spend a chapter to analyze the tradeoffs and requirements for
MCML gate layout.
37
Chapter 4
MCML Gate Layout
4.1 Local vs. Global Effects
Now that we have devised a procedure for determining MCML gate parameters, it is
important to see the effects of circuit layout on the optimization results. All layout was done in
Cadence with the ST-Microelectronics, 0.25um, 6-layer process. While it is not possible to
generate the layout for each data point optimized at the schematic level, we would like to
compare enough gates to draw some general conclusions about performance degradation. We
would also like to analyze the difference in performance degradation between MCML gates and
their CMOS counterparts.
The first distinction to be made in this analysis is between local and global layout effects.
Local layout effects in this context are taken to be the difference in performance of a single gate
with ideal surroundings when compared to its schematic level description. Examples of local
effects are intra-gate device mismatch, capacitance balancing, and layout topology. In general,
local wires are short and we can therefore ignore resistive effects.
In contrast, global layout issues are defined as the performance degradation due to the
connection of multiple gates in a larger block of logic. Examples of global layout effects are
clock and control signal routing, wire resistance and buffering, load balancing, power
distribution, and device matching to VSC's. This chapter deals exclusively with local layout
effects and leaves the discussion of global effects until later in the report.
38
4.2 Layout Topology
The second main topic of layout methodology concerns the general cell design framework.
There are two primary ways of doing layout for cell based design: standard cell format and
datapath format [3]. In standard cell designs, all cells typically have the same height but vary in
width based upon the complexity of the gate. The inputs and outputs are left floating in a
standard cell so that routing tools can send the signals in any direction. In datapath designs, the
width is usually fixed for all cells in order to ensure pitch matching but the height can vary. The
inputs and outputs are aligned to opposite sides of the cell and brought to the edges to enable
tiling. Both of these approaches are shown in figure 4.1a:
Standard Cell Approach
Power
GND
Same height
RFN
Vary in width
I/O
Clock
Control
VDD
I/O randomly
scattered
RFP
Datapath Approach
Power
Same width
Vary in height
I/O regularly placed
I/O
Control
Figure 4.1a : Alternative layout structures
Both of these structures were used for MCML cells depending on the application. For this
chapter, the standard cell methodology is the structure of choice because we are discussing the
39
layout of individual gates.
In chapter 7, the datapath approach is used to implement the
CORDIC algorithm.
4.3 Transistor Matching
One of the key differences between standard CMOS layout and MCML layout is the issue of
transistor matching. It is known in CMOS processing that many parameters of the transistor can
vary both within a chip and between chips. These differences between transistors can degrade
performance of differential circuits. There are three main places where transistor mismatch can
adversely affect MCML gate performance: input differential pair mismatch, pmos load
mismatch, and VSC-gate mismatch.
The first two forms of mismatch, input diff. pair and pmos load, are very similar in their
scope and effects. It can be shown [6] that these mismatch effects can be modeled by an input
offset voltage of the differential pair. For example, the combination of the threshold voltage,
gate oxide thickness, and transistor W/L mismatch may create a 50mV offset voltage for the
input of a single gate. Therefore, if 200mV is applied to the input, the actual effective input
voltage will be either 150mV or 250mV depending on the polarity. If 150mV is not enough for
the gate to operate properly, then the mismatch has effectively destroyed the circuit.
The third kind of mismatch, between the VSC and the gate, is slightly different in its effects.
If the VSC does not properly model the gate being controlled, the RFP voltage will not set the
load resistance correctly. In this case, the gate may not produce the correct output voltage swing
and later gates will have degraded performance.
Analog designers are constantly faced with the matching problem. Many techniques are
known for increasing matching in differential pairs. The most general of these techniques
40
involve aligning both transistors in the same way (same current flow direction) and fingering
devices. Unfortunately, these techniques have tremendous consequences in digital logic design.
The primary effects of using fingering and alignment are that the area of the cells grows
significantly. A secondary effect is that the performance (speed and/or power) will degrade.
The area penalty is typically not a problem in analog circuits while matching is a severe problem.
In digital logic, mismatch degrades noise margins but usually does not prevent operation while
area and speed are extremely important. Therefore, using standard matching techniques may not
be the appropriate choice for MCML.
Another difficulty in using matching techniques for MCML is that process data on transistor
mismatch is hard to obtain. The information provided for analog designers is typically for large,
non-minimum length devices while MCML uses small, minimum length devices. It is nearly
impossible to calculate the actual expected mismatch of the MCML circuits without better
process information.
With all of this information in mind, we would like to examine the tradeoffs in both area and
performance when using and not using matching techniques. We define two types of MCML
layouts: parallel and anti-parallel. These topologies are shown in figure 4.3a below. Antiparallel designs are the worst possible case for matching but are the most area efficient and have
lowest capacitance. Parallel layout is better for matching but has area and performance penalties.
The results of the simulations for both types of gates are shown in the next section. It is left as a
topic for future work to perform a more detailed analysis as to the effects of mismatch and how it
should affect the gate optimization procedure.
41
D1
D2
G1
Anti-Parallel
G1
G2
G2
D1
S12
D2
S12
Parallel
D1
D2
G1
G2
S12
Figure 4.3a : Parallel vs. Anti-Parallel Layout of Differential Pairs
4.4 Layout Results
4.4.1 Parallel vs. Anti-Parallel MCML Layout
The first comparison to be made is the difference in area and performance of the two styles
of MCML layout for basic cells. The two gates examined were the inverter and XOR3 whose
parameters were derived in chapter 3. Both gates were layed out for three different current levels
in both the parallel and anti-parallel styles. The layout for the I = 10uA versions are shown in
figures 4.4a and 4.4b.
If we compare the implementations of the parallel and anti-parallel MCML blocks, we
observe that the parallel implementations are larger, as expected. Figure 4.4c gives a summary
of the key simulated results.
42
Anti-Parallel
Parallel
Figure 4.4a : MCML Inverter/Buffer Layouts
Parallel
Anti-Parallel
Figure 4.4b : MCML XOR3 Layouts
43
Gate
I (uA)
INV
INV
INV
XOR3
XOR3
XOR3
1
10
100
1
10
100
Anti-Parallel
Area (um2)
35.9
29.0
25.4
89.9
90.2
113.5
Parallel
Area (um2)
43.4
36.2
37.4
115.0
115.0
139.8
Anti-Parallel
Delay (ps)
1202
198
62.5
3013
662
225
Parallel
Delay (ps)
1155
198
62.4
3442
753
252
Anti-Parallel
ED (pJ*ps)
4.24
1.36
1.90
26.4
16.2
32.0
Parallel
ED (pJ*ps)
3.92
1.35
1.90
34.5
21.0
40.4
Figure 4.4c : Anti-Parallel vs. Parallel MCML Layout Area and Performance
We can see that the total area difference varies from about 20% to 50% depending on the
current point but the parallel implementations are all significantly larger. From a performance
perspective, the parallel implementation performs just as well or even slightly better than the
anti-parallel version for the inverter but is significantly worse (12%-14% in delay) for the larger
gate, XOR3.
The decision on whether to use the parallel or anti-parallel cell layouts is left as an open
question depending on the application and performance requirements. The above experiment
indicates the approximate area and delay penalties but does not give a quantitative analysis of the
matching benefits of the parallel layout. For the remainder of this chapter, we assume the use of
the anti-parallel cell versions for comparison with schematics and with CMOS gates.
4.4.2 MCML and CMOS Layout vs. Schematic
Now that we have seen the tradeoffs associated with MCML gate layout topologies, we
would like to evaluate the performance degradation between simulations of schematics to
simulations of layout. It is known that the layout simulations will produce slower gates but the
important measurement is to determine the relative slowdown in comparison with the slowdown
of CMOS circuits. With this knowledge, we can more accurately predict the benefits of MCML
over CMOS when using schematic level simulation numbers.
44
For this analysis, we use only the anti-parallel implementations of the MCML gates. The
results are shown in figures 4.4d and 4.4e:
Gate
INV
INV
INV
XOR3
XOR3
XOR3
I
(uA)
1
10
100
1
10
100
Area
(um2)
35.9
29.0
25.4
89.9
90.2
113.5
Schematic
Delay (ps)
1074
182
60.1
2402
532
200
Layout
Delay (ps)
1202
198
62.5
3013
662
225
% change
11.2
10.9
4.0
25.4
24.4
12.5
Schematic
ED (pJ*ps)
3.40
1.14
1.76
16.9
10.5
25.4
Layout
ED (pJ*ps)
4.24
1.36
1.90
26.4
16.2
32.0
% change
24.7
19.3
8.0
56.2
54.3
26.0
Figure 4.4d: MCML Schematic vs. Layout Performance
Gate
INV
INV
INV
INV
XOR3
XOR3
XOR3
XOR3
VDD
(V)
1.0
1.5
2.0
2.5
1.0
1.5
2.0
2.5
Area
(um2)
9.6
9.6
9.6
9.6
117
117
117
117
Schematic
Delay (ps)
364
161
112
92
1994
826
569
454
Layout
Delay (ps)
385
169
118
97
2382
974
647
529
% change
5.8
5.0
5.4
5.4
19.4
17.9
13.7
16.5
Schematic
ED (pJ*ps)
1.32
1.34
1.68
2.25
36.9
36.8
49.7
66.9
Layout
ED (pJ*ps)
1.46
1.46
1.86
2.45
53.5
52.0
66.6
92.5
% change
10.6
9.0
10.7
8.9
45.0
41.3
38.2
38.3
Figure 4.4e CMOS Schematic vs. Layout Performance
Several important trends can be observed from the above data. The first column to examine
is the area numbers for the MCML and CMOS blocks. For different performance levels in
MCML, the transistor sizes are changed in order to preserve robustness. This leads to the
difference in areas for the different current levels. The CMOS gates operate with the same
transistor sizes as the supply voltage is scaled.
As you can see, the CMOS inverter is much smaller than the MCML inverter at any
performance level. Since MCML is differential, the only inverters required will be used for
buffering and therefore this large area difference is not very important. The more important area
comparison is for the more complex block, the XOR3, where the MCML gate is smaller for all
current levels. This fact shows that even a fully differential implementation can be more area
efficient than CMOS for certain logic blocks. It is important to note that these two blocks are
45
probably extremes as far as the area comparison and that most gates will probably fall
somewhere in between with CMOS being slightly smaller than MCML.
The next important comparison to make is the variation in delay and energy-delay between
the schematic and layout simulations. In both MCML and CMOS, the variation is larger for the
more complex logic than for the simple inverter. We expect that the inverter is too small of a
block to be useful for this comparison and we will only consider the XOR3.
For the MCML blocks, we notice that the variation in much less at the high current level.
This is due to the fact that the transistors are much larger for this point and therefore the change
due to wiring capacitance is not as large of a factor in the overall capacitance.
The total variation between layout and schematic is greater for MCML at the low current
points and less at the high current points. Since the variations in either case are not much
different than the CMOS variations, we can safely use the analytical techniques of chapter 3 at
the schematic level in order to compare CMOS and MCML blocks. This verifies that the results
of chapter 3 are not skewed in relation to actual layout simulations.
In this chapter, we have looked at the tradeoffs in MCML cell layout techniques and have
seen that MCML cells can have similar area and schematic-layout matching as their CMOS
counterparts. We now have enough confidence in our optimization procedure of chapter 3 in
order to begin a discussion of system level design tradeoffs in MCML.
46
Chapter 5
MCML System Level Design
5.1 MCML System Overview
Chapters 3 and 4 dealt with the design of the individual gates to be used in an MCML logic
block. This chapter describes the peripheral circuitry necessary to correctly operate the logic and
analyzes some of the tradeoffs in the system level optimizations which can be made. The basic
MCML system is shown in figure 5.1a.
We can divide the MCML system into five key
subsystems: the core, the current control block, the swing control block, the clock distribution
network, and the conversion circuitry.
Data Inputs
Current
Control
CMOS-CML
RFN1
RFNN
Logic
Swing
Desired Swing
Data
Conversion
MCML
RFP1
Control
RFPN
and
Core
Registers
CLK
CMOS-CML
Clock tree
CML-CMOS
Data
Conversion
Data Outputs
Figure 5.1a : General MCML System Topology
The MCML Core is composed of both combinational and sequential logic. It can be a
pipelined datapath, a finite state machine, or any other digital logic operation required. As seen
47
in chapter 2, the best applications for MCML logic are those which have high performance and
shallow logic depth. Therefore, MCML cores typically are deeply pipelined and have high clock
frequencies.
The swing control circuitry controls the load resistances of the MCML gates. The primary
pieces of this block are the Variable Swing Controllers (VSC's) described in chapter 2. There are
many tradeoffs in the design of the VSC and some of those issues will be explored in this
chapter.
The responsibility of the current control circuitry is to monitor the current levels used in the
individual gates. There may be a single or multiple current levels and these levels may be
adaptively controlled in order to fine tune performance. Two models of current control circuitry
will be presented later in this chapter.
Finally, the clock circuitry and conversion circuits are crucial in order to interface with a
CMOS dominated world. The clock circuitry is composed of multi-level buffers and requires the
ability to use multiple drive strength gates.
Before we begin examining the peripheral circuitry in detail, we must take one more look at
the optimization procedure for individual gates proposed in chapter 3. In order to build a
realistic MCML core, it is necessary to relax some optimization parameters in order to have gates
which can operate together. This generalization of the optimization procedure is the topic of the
next section.
5.2 MCML Gate Parameter Generalization
In chapter 3, an optimization procedure was developed which found the best parameter
selection for each individual gate for a variety of current levels. The VDD, ∆V, and all transistor
48
sizes could be chosen completely independent of the selection for any other gate. Unfortunately,
some of these parameters need to be shared across different gates if we want to build complex
logic. The amount of generalization required will be determined by the exact logic to be
implemented, but this section describes the sensitivity of the gate optimizations when certain
parameter constraints are enforced.
5.2.1 Voltage Swing, ∆V
The first obvious generalization to be made is that each gate should have the same voltage
swing as every other gate for a given operating point. The easiest and most constraining way to
generalize the voltage swing is to make sure that each gate has the same voltage swing for every
possible current level. This leads to a family of gates that will work together at any current level.
We will see in chapter 6 that it is sometimes desirable to have different gates running with the
same voltage swing but with different current levels. We ignore this case for now and assume
that all gates in the core will operate at the same current level.
As seen in chapter 3, the optimal point for voltage swing is different for different size gates.
The optimal swing tends to be larger for larger gates (i.e. XOR3) and smaller for smaller gates
(i.e. INV). In order for all of the gates to work together, we must either use an optimal swing for
one of the gates and adjust all of the other gates to this swing or use a non-optimal swing for all
of the gates. Once again, we explore the different voltage swings while re-optimizing all of the
other parameters.
As our primary example, we will look at the Inverter, XOR2, and XOR3 blocks for a current
level of I=10uA. The fully optimized statistics created by the optimization procedure in chapter
3 are shown in figure 5.2a:
49
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
(W/L)RFN
tp (ps)
INV
XOR2
XOR3
300
400
500
0.90
0.95
1.00
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.5/.6
.5/.6
.5/.6
2.0/.5
2.0/.5
2.0/.5
182
317
532
ED
(pJ*ps)
1.15
3.58
10.5
Figure 5.2a : Fully Optimized MCML Gates for I = 10uA
We now rerun the optimization procedure from chapter 3 except that we fix the voltage
swing to a certain level for all of the blocks. The results are shown in figures 5.2b-d:
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
tp (ps)
INV
XOR2
XOR3
300
300
300
0.90
0.95
1.10
.5/.25
.6/.25
1.1/.25
.9/.25
1.15/.25
1.2/.25
.5/.6
.5/.6
.5/.6
182
325
584
ED
(pJ*ps)
1.15
3.85
14.6
ED %
from min
0
7.5
39.0
Figure 5.2b : Optimized MCML Gates for I = 10uA, ∆V = 300mV
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
tp (ps)
INV
XOR2
XOR3
400
400
400
0.90
0.95
1.10
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.75/.25
.5/.4
.5/.6
.5/.6
202
317
510
ED
(pJ*ps)
1.41
3.58
11.1
ED %
from min
22.6
0
5.7
Figure 5.2c : Optimized MCML Gates for I = 10uA, ∆V = 400mV
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
tp (ps)
INV
XOR2
XOR3
500
500
500
0.90
0.95
1.00
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.5/.6
.5/.6
.5/.6
241
357
532
ED
(pJ*ps)
2.00
4.51
10.5
ED %
from min
74
26.0
0
Figure 5.2d : Optimized MCML Gates for I = 10uA, ∆V = 500mV
The rightmost columns of figures 5.2b-d show the percent deviation from optimal when the
three types of gates are used at voltage swings other than their preferred settings. We can see
that both large voltage swings for inverters and small voltage swings for XOR3 give large
deviations from optimal. Also note that the area of gates using smaller than optimal swings
tends to increase.
The correct choice for the overall voltage swing depends heavily on the types of circuits
being used. If the majority of the circuitry is composed of two-level gates (including flip-flops),
then 400mV should be used. On the other hand, if the circuit uses a majority of 3-level gates
(including adders), then the optimization of these gates may outweigh the degradation of the
50
other types of gates. In general, there are not many single level gates due to the complementary
logic style but there may be a large load capacitance which needs buffering and hence a smaller
voltage swing. The same tradeoff will exist for all current levels although for small currents, all
of the desired voltage swings tend to converge to the minimum allowable swing (200mV) and
the choice becomes trivial.
5.2.2 Supply Voltage, VDD
The next generalization to be made is to ensure that all gates operate at the same supply
voltage for a given current level. Once again, we will leave the generalization of operating at
multiple current levels until later. This procedure is much simpler than the voltage swing
optimization due to the fact that a large barrier to performance exists if we try to lower VDD
from its optimal. Therefore, it makes sense to just use the highest VDD required by any of the
gates used. In other words, use the VDD specified by the optimal point of the 3-level gate (i.e.
XOR3) because these are always the highest for a given current level. This increases the power
consumption of the 1 and 2-level gates but is necessary unless multiple supply voltages are used.
5.3 Voltage Swing Control Circuitry
The primary responsibility of the swing control circuitry is to properly adjust the load
resistance of the pmos devices by setting the RFP voltage. The main circuit block is the Variable
Swing Controller (VSC) shown in chapter 2 and repeated in figure 5.3a. The VSC models a
particular gate and uses an operational amplifier and negative feedback loop to set the RFP
voltage. This section explores the issues of VSC design and usage.
51
VDD
Vlow
VDD
Vlow
+
VDD
Vlow
Vlow
RFP
-
Inputs
RFN
Figure 5.3a : Variable Swing Controller (VSC)
The first issue to be discussed is the tradeoff between using a single or multiple VSC's. The
optimizations in chapter 3 and earlier in this chapter assume the use of a VSC which is perfectly
matched to the gate under test. It is possible to share VSC's across multiple types of gates but
with some loss of swing precision. The amount of tolerance to this loss is dependent on the
operating point and circuit block under consideration. If small voltage swings are being used, it
may be necessary to use multiple VSC's in order to have finer control over the small swing.
The first requirement of using a single VSC for multiple types of gates is that the pmos load
transistors must be the same size for all of the gates. This can be accomplished by the same
generalization procedure as in the previous section and will result in some loss of performance.
Then we must choose which gate to use as the model for VSC. This analysis is very similar to
the analysis done in section 5.2.1 where we could use the optimal swing conditions for any of the
three gate types. We use the data points from section 5.2.1 with a swing of 400mV and VDD =
1.1V for our example. The (W/L)RFP is chosen to be .5u/.6u for this experiment. Note also that
RFP voltages are allowed to be negative as long as the gate-source voltage of the pmos devices is
less than 2.5V. The key results are shown in figures 5.3b-e:
52
Gate
INV
XOR2
XOR3
RFP Voltage
(V)
-.349
-.339
-.317
Actual
Swing (mV)
401
399
392
Deviation in
swing (%)
0.25
-0.25
-2.0
Midswing
Gain
1.80
1.54
1.41
tp (ps)
205
308
510
ED
(pJ*ps)
1.81
4.06
11.1
ED %
from min
57.4
13.4
5.7
Figure 5.3b : MCML Gates with 3 separate VSC's (ideal)
VDD = 1.1V, ∆V = 400mV, (W/L)RFP = .5u/.6u
Gate
INV
XOR2
XOR3
RFP Voltage
(V)
-.349
.349
.349
Actual
Swing (mV)
401
388
360
Deviation in
swing (%)
0.25
-3.0
-10.0
Midswing
Gain
1.80
1.51
1.33
tp (ps)
205
302
480
ED
(pJ*ps)
1.81
3.90
9.84
ED %
from min
57.4
8.9
-6.3
Figure 5.3c : MCML Gates using Inverter based VSC
Gate
INV
XOR2
XOR3
RFP Voltage
(V)
-.339
-.339
-.339
Actual
Swing (mV)
412
399
370
Deviation in
swing (%)
3.0
-0.25
-7.5
Midswing
Gain
1.83
1.54
1.36
tp (ps)
209
308
491
ED
(pJ*ps)
1.87
4.06
10.3
ED %
from min
62.6
13.4
-1.9
Figure 5.3d : MCML Gates using XOR2 based VSC
Gate
INV
XOR2
XOR3
RFP Voltage
(V)
-.317
-.317
-.317
Actual
Swing (mV)
440
424
392
Deviation in
swing (%)
10.0
6.0
-2.0
Midswing
Gain
1.90
1.60
1.41
tp (ps)
217
321
510
ED
(pJ*ps)
2.02
4.41
11.1
ED %
from min
75.7
23.2
5.7
Figure 5.3e : MCML Gates using XOR3 based VSC
The above tables give us several interesting results. First, we can see that using a smaller
VSC than ideal leads to a reduced voltage swing but an increase in performance (tp, ED). These
results come at a cost - the reduction of the robustness (Gain, Noise Margins). We would like to
limit the possible operating conditions to those that provide at least the desired swing. If we
were to allow reduced swing for the larger gates, we would have to re-simulate all of the gates at
this reduced swing level for a worst case analysis. Therefore, if we wish to use a single VSC, we
will use the largest one possible and allow for larger voltage swings from the smaller gates. This
will reduce our performance (up to about 20%) but will not decrease the robustness of the
circuitry.
53
The total effect of the generalizations made can thus be seen in figure 5.3e. Compared with
using different VDD's, different voltage swings, and different (W/L)RFP's, and multiple VSC's,
we see that the Inverter loses about 76%, the XOR2 loses 23% and the XOR3 loses 6% of their
ideal energy-delay values. There are also other reasons why it is desirable to use multiple VSC's;
these reasons will be explored in chapter 6.
Now that we have examined the tradeoff between using a single VSC and using multiple
VSC's, we would like to explore the issues of designing the VSC's themselves. The main circuit
block design is identical to the design of a standard MCML gate. The only important issue in
designing the gate part of the VSC is to make sure that the alignment of the transistors is the
same as the gates to which the RFP voltage is being broadcast. We would like the matching
between the VSC and the logic gates to be as close as possible in order to achieve predictable
swings.
The main design challenge in the VSC is the operational amplifier. While the topic of opamp design is beyond the scope of this report, it is well covered in [6]. Instead, we will specify
the constraints imposed on the op-amp by the MCML VSC design. The gain of the op-amp
determines (partially) the accuracy of the voltage swing. If we assume a worst case output
voltage difference, VDD - RFP = 2.5V, and we allow the low voltage point to vary by 10mV,
then the gain of the op-amp must be approximately equal to 250. Lower gain op-amps can be
used but the actual circuit voltage swing will vary by larger amounts.
The frequency response requirements of the op-amp are fairly lax. Since the RFP signal is
really a DC value, the bandwidth of the op-amp can be quite low. The only real requirement is
that the op-amp and VSC feedback loop is stable. Since the RFP node typically has a large
capacitance, this requirement is easily achieved. The actual capacitance on the RFP node varies
54
based upon the number and type of gates being driven by a single VSC. There may also be a
frequency requirement set by the stabilization time during power-on.
One major difficulty worth discussion was encountered in the design of the VSCs. If the
VDD of the MCML circuits is lowered below process maximum (to save power) and the pmos
loads are kept to be minimum width, then the RFP required by the feedback loop can go below
0V. The Vgs of the pmos transistors is still within the 2.5 Volts required by the process but the
output of the op-amp must now generate a negative voltage. There were several possible
solutions to this problem.
The first solution is to increase the width of the pmos load devices and therefore require less
Vgs voltage for RFP. There are two major problems with this approach. First, the larger width
increases the capacitance at the output node and will slow the circuit down.
Even more
importantly, the increased width will decrease the Vdsat of the load device and increase the nonlinearity of the resistance. We would like the load devices to operate in the linear mode but low
Vdsat values cause many non-idealities and full voltage swing times can increase by an order of
magnitude. This difficulty led us to try other solutions to the negative RFP problem.
The second option for generating the negative gate voltage would be to operate the op-amp
between Vdd and Vss where Vss is lower than ground. For example, if the logic Vdd was 1.0V, the
op-amp would be run between 1.0V and -1.5V. It would then be possible for the op-amp to
generate negative voltages with no difficulty.
The problem with this approach is that the bulk of the nmos devices must all be tied together
because the process being used is a single n-well process. Either all of the nmos bulks are tied to
0 or they are tied to Vss. In the first case, the op-amp nmos devices will have positively biased
junction diodes and will not work properly. In the second case, all of the logic transistors will
55
have larger threshold voltages and performance will suffer.
Neither of these options is
particularly desirable, so other solutions were examined.
The next solution would be to use a negative charge pump to offset the op-amp voltage to be
below ground. This also poses several analog design challenges and it was determined that there
was not enough time to complete this solution within the scope of this project. The actual
solution used was to assume that the process was actually a dual well process (not that
uncommon with next generation processes) and to use an ideal op-amp for simulation purposes.
While not a currently implementable solution, this is nevertheless the most realistic solution for
this project.
5.4 Current Control Circuitry
Now that we have discussed the design of the circuitry to control the voltage swing for
variable current settings, let's discuss the current control circuitry. The goal of the current
control circuitry is to set the RFN voltage to a desired level. As with the RFP voltage, there can
be a single RFN voltage or multiple RFN voltages although using a single RFN voltage
generator is typically adequate.
The RFN signal determines the amount of current flowing in the current source and therefore
determines the speed and power of the circuit. The simplest way to set this reference voltage is
to use a current mirror [6]. Alternatively, an adaptive pipelining system can be used [2], shown
in figure 5.4a:
56
Datapath
CMOS-CML Converter
CML-CMOS Converter
CMOS-CML Converter
CML-CMOS Converter
Logic
and
Registers
Inputs
Outputs
CMOS-CML Converter
CML-CMOS Converter
CMOS-CML Converter
CML-CMOS Converter
RFP2
VSC2
Critical Path Model
RFP1
VSC1
Vlow
DLL
RFP3
VSC3
Phase
Control
Detector
Clock
CMOS-CML Converter
Buffer
RFN
Figure 5.4a : Adaptively pipelined MCML system
The basic principle behind an adaptive pipeline is to use a Delay Locked Loop (DLL) to
measure the delay through a model of the critical path of the circuit. If the critical path delay is
greater than the required clock period, then the DLL increases the RFN voltage and thereby
increases the current, speed and power of the circuit. If the delay is less than the required clock
period, then RFN is decreased and less current is used. Single or multiple VDC's can be used to
maintain a fixed voltage swing as the current varies. Multiple DLL's could also be used if there
are requirements for multiple RFN voltages to be generated.
The goal of the adaptive pipelining is to make the circuit timing insensitive to process,
temperature, and voltage variations. For example, if a chip comes back from fabrication and
happens to be near the slow process corner, the adaptively pipelined circuit will meet the same
timing requirements as the chip near the fast process corner. The difference between the chips
will be in the power consumption and not the timing.
57
In a standard CMOS design methodology, designers must always design for the worst case.
This leads to using VDDs which are higher than required for the nominal case and therefore
increases power consumption for all designs. With adaptive pipelining, designers can design for
the nominal case for delay and instead, the power will vary. If multiple chips are used on a
board, the average power consumption of all the chips should approach the nominal value. This
technique can also improve the yield of circuits and allow for late changes in system clock
frequency. It was shown in [2] that clock skew of 20% could be easily compensated for by using
a slightly altered version of adaptive pipelining.
While it was not possible to design the complete adaptive pipeline circuit within the scope of
this project, it is easy to make estimates of the effects of using such circuitry. In CMOS, the
clock frequency of a circuit will be reported as the worst case process condition while in MCML,
the frequency will be the nominal value. Please refer to [2] for more details about the individual
circuits used in the adaptive pipeline.
5.5 Support for Current Variability
One of the key aspects to the adaptive pipelining system described above is that the logic
gates and flip-flops are able to operate at different current levels in order to adjust the
performance. Earlier in this chapter, we generalized the optimization procedure so that an
MCML system could use multiple gate types with a single VDD, ∆V, and a single VSC. Now
we would like to add the option of using multiple current levels while still maintaining
acceptable robustness.
The first, and simplest model for current variability, is to support two modes: on and off.
The gates can be completely optimized for a pre-determined current level and are shut off by
58
either turning off the reference current source in the current mirror design or by turning off the
clock with the adaptive pipeline. Both of these schemes reduce the gate currents to zero and
therefore remove all power consumption. This mode is an ideal "sleep" mode and power
consumption should be negligible.
Two main problems exist with the on/off technique. The first problem is that unlike static
CMOS circuits, all of the MCML flip-flops and latches lose their storage when turned off. This
may be acceptable if the circuit block being deactivated is a datapath element and can be flushed
upon startup. An alternative scheme for sensitive data would be to use two separate RFN control
circuits: one for logic and one for flip-flops. With this scheme, the logic could be turned off
while the flip-flops retain their value (and burn power). A third choice would be to use a hybrid
CMOS / MCML system where static CMOS storage is used during sleep mode to store the data
and is reloaded back into the MCML flip flops upon start up.
The second problem with the on/off technique is the ramp-up time of the system. While the
actual time is highly dependent on the op-amp specifications and the capacitance of the RFN and
RFP node, the time could approach thousands of clock cycles. Once this time is known, it is
relatively easy to account for and the system could only be turned off during long stall periods.
Once the on/off mechanism is determined, we can imagine several other possible levels of
current variability. A possible system could need support for a high performance mode and a
low performance mode in which two discrete current levels need to be supported. A system may
require that a continuous range of current levels be supported (i.e. adaptive pipelining). The
most complex system would require support for multiple, continuous current ranges.
In all of these systems, the optimizations performed in chapter 3 must be relaxed so that each
gate is fully operational at each possible current level. As an example, we will use the case
59
where the circuits are optimized for use at I = 10uA but we would now like to support a range
around that current. We will look at 2 different ranges and the requirements on the gate
optimizations in order to support them: 9uA-11uA and 5uA-20uA. For both cases, we will use
the assumption that I=10uA is the most common case and optimize for it as much as possible.
We reoptimize the Inverter, XOR2, and XOR3 gates using the same robustness constraints as in
chapter 3. For this analysis, we allow the use of 3 separate VSC's and therefore allow different
(W/L)RFP's for the different gates. The results are shown in figures 5.5a-d:
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
INV
XOR2
XOR3
400
400
400
1.10
1.10
1.10
.5/.25
.5/.25
.5/.25
.5/.25
.5/.25
.75/.25
.5/.4
.5/.6
.5/.6
tp
(ps)
195
308
510
ED
(pJ*ps)
1.64
4.06
11.1
ED % from
fixed I
0
0
0
Figure 5.5a : Optimized MCML Gates for I = 10uA (fixed)
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
INV
XOR2
XOR3
400
400
400
1.20
1.20
1.20
.5/.25
.5/.25
.55/.25
.5/.25
.60/.25
.75/.25
.5/.4
.5/.6
.5/.6
tp
(ps)
193
304
512
ED
(pJ*ps)
1.77
4.36
12.3
ED % from
fixed I
7.9
7.4
10.8
Figure 5.5b : Optimized MCML Gates for I = 10uA (9uA-11uA supported)
Gate
∆V (mV)
VDD (V)
(W/L)A
(W/L)B
(W/L)C
(W/L)RFP
INV
XOR2
XOR3
400
400
400
1.25
1.25
1.25
.5/.25
.7/.25
.85/.25
.8/.25
1.0/.25
1.25/.25
.5/.6
.5/.6
.5/.6
tp
(ps)
203
354
666
ED
(pJ*ps)
2.03
6.18
21.8
ED % from
fixed I
23.8
52.2
96.4
Figure 5.5c : Optimized MCML Gates for I = 10uA (5uA-20uA supported)
We can draw a few conclusions from the above tables. First, the penalty for supporting small
current variations (~10%) is relatively small. The main penalty in this case is due to the higher
VDD needed to allow the higher current point to operate properly. The penalty for the larger
current variation (50%) is much larger. Not only must the VDD be increased in order to support
the high current end, but the transistor sizes must be increased in order to increase the gain. The
60
96.4% penalty in energy-delay is probably not acceptable for the XOR3 gate which supports
currents from 5uA to 20uA.
Not only does the performance suffer from using larger current ranges, but there is a definite
maximum variation around 75%. This maximum is set by the conflicting requirements of the
high current and low current points on the pmos load transistor sizes. The high current point
requires smaller length devices in order to generate an RFP voltage within the limits. The low
current point requires a large length in order to control the Vdsat of the device and therefore the
Signal Swing Ratio (SSR).
The overall conclusion of this experiment is that while small current variation is relatively
harmless, the model described above of having multiple current levels with the same hardware is
not feasible with this technology. Adaptive pipelining is expected to produce small current
variations (<20%) and is therefore an acceptable technique. The best design technique with
MCML is to determine the required performance level and to optimize the gates around that
current. If other performance levels are required, either design alternative MCML blocks with
different voltage swings and transistor sizes or use a different technology.
5.6 Gate Drive Strength Scaling
The previous sections and chapters have dealt with the optimization necessary to design a set
of "minimum" sized MCML gates. If the fanout of the gate is large however, it is desirable to
have gates with larger drive strengths. This is especially crucial in clock circuitry where large
capacitave loads need to be driven. The general approach in CMOS circuits is to multiply all
transistors by a fixed factor in order to increase the drive strength. This same approach is applied
61
to MCML circuits but careful analysis must be done in order to assure that the correct signal
swing and robustness properties are maintained.
In order to double the drive strength of an MCML gate, we double all of the transistor
widths. If we are using a current mirror for the RFN current control, we keep the mirror device
the same size while we double the size of the RFN device. This will generate twice the current
of the reference source in the MCML gate. Since all devices' (W/L)'s are doubled and the
current is doubled, all of the Vdsats and voltages should remain constant. We simulated the
results of this scaling operation on the Inverter for several different power strengths. The base
sizes and parameters are those used in figure 5.5b above. The results are shown in figure 5.6a:
Scaling
Factor, N
1
2
4
16
Gate Current,
I (uA)
9.853
18.94
37.16
146.6
Deviation in I
from ideal (%)
-1.5
-5.3
-7.1
-8.4
Actual ∆V
(mV)
401
345
317
298
Change in ∆V
(%)
-14.0
-20.9
-25.7
Gain
1.62
1.51
1.44
1.40
tp with scaled
CL (ps)
193
191
187
184
Figure 5.6a : MCML Inverter Scaling, all device sizes scaled by N
We can see several interesting things from this experiment. First, the current scales fairly
well for even large drive strength devices (<9% deviation). Unfortunately, the characteristics
associated with the pmos load devices do not scale as well. Both the gain and voltage swing
metrics are degraded substantially by scaling all of the transistor sizes equally. Fortunately, we
can easily adjust for this by fine tuning the pmos load device sizes for each drive strength. We
can also adjust for the current variation by fine tuning the RFN device widths. The result from
this fine tuning is shown in figure 5.6b:
Scaling
Factor, N
1
2
4
16
RFP Scaling
Factor
1
1.9
3.9
15.2
RFN Scaling
Factor
1
2.075
4.25
17.2
Gate Current,
I (uA)
9.853
19.6
39.4
157.4
Actual
∆V (mV)
401
448
388
399
Change in
∆V (%)
11.7
-3.2
-0.5
Gain
1.62
1.68
1.59
1.62
Figure 5.6b : MCML Inverter Scaling, pmos devices fine tuned
62
tp with scaled
CL (ps)
193
210
198
202
We can see that by making these simple adjustments, the current and voltage swing track
appropriately. This same analysis can be done with any gate type and fine tuning the RFN and
RFP transistors will lead to a wide variety of accurate drive strength devices.
5.7 Conversion Circuitry
The final system level issue to be discussed is the conversion of MCML logic to CMOS logic
and vice versa. This ability is necessary for systems which use an MCML core surrounded by
standard CMOS logic.
The conversion from CMOS to MCML is a trivial operation. Since all MCML gate can
operate correctly with larger differential inputs than required, a simple CMOS inverter and
MCML inverter can be used to generate the proper MCML signal.
The conversion from MCML to CMOS is slightly more complicated but can be performed by
using a differential to single ended amplifier [6] and a CMOS inverter. The amplifier only
requires enough gain to amplify the voltage swing beyond the switching threshold of the
inverter. Even with a minimum input voltage swing of 200mV, the amplifier only needs a gain
of 10 for all possible operation points. Since the output load is also small, the speed of the
amplifier can be good for even small source currents. The two conversion circuits are shown in
figure 5.7a.
63
RFP
CMOS - CML
CMOS In
CMOS Inverter
MCML Out
+
-
RFN
CMOS Out
MCML In
+
-
CML - CMOS
RFN
Figure 5.7a : Conversion Circuits
64
Chapter 6
System Design Example : Ripple Adders
6.1 MCML Full Adder design
In order to see the overall effects of the optimizations and guidelines of the previous
chapters, we would like to design some more complex blocks of logic in MCML. The first such
example will be a case study of ripple adder design. Please refer to [3] for a more detailed
analysis of ripple adder architecture.
The building block of the ripple adder is the full adder block. The logic equations for a full
adder are well known to be an XOR3 for the sum and a 3 input majority vote for the carry [3]. In
nearly all CMOS ripple adders, an optimization is made which pre-computes the propagate and
generate signals in order to speed up the carry path [3]. There is a similar possibility for MCML
adders and 2 implementations can be imagined. The two possibilities are shown in Figs. 6.1a
and 6.1b below. The CMOS adder used for comparison in this section uses transmission gates
and is generally minimum sized and optimized for low power.
While the second MCML full adder will have a lower carry delay, it is not clear whether this
improvement will compensate for the additional current required due to there now being 4 gates
instead of 2. In fact, for small adders (< 16 bits), it is more efficient to use the first full adder
implementation and this is the circuit used for the CORDIC. Both of these architectures will be
used and analyzed in the next sections.
65
RFP
RFP
RFP
RFP
Cout
SUM
SUM
Cout
A
A
A
B
B
A
A
B
Cin
Cin
Cin
A A
A
B
Cin
Cin
Cin
A
B
B
RFN
RFN
Sum Circuit (XOR3)
Carry Circuit (AB+BC+AC)
Figure 6.1a : MCML Full Adder #1
RFP
RFP
G
RFP
RFP
P
P
G
B
B
B
B
B
A
A
A
A
RFN
RFN
Propagate Circuit (P=A B)
Generate Circuit (G=AB)
RFP
RFP
Cout
Cout
Cin
RFP
RFP
Sum
Sum
Cin
Cin
Cin
Cin
P
P
G
P
P
G
RFN
RFN
Sum Circuit (P Cin)
Carry Circuit (G + PCin)
Figure 6.1b : MCML Full Adder #2
66
6.2 Basic Ripple Adder Design
Now that we have defined the two alternative adder structures in MCML, we would like to
compare the two architectures against each other and against the equivalent CMOS adder. For
this experiment, we designed several different size ripple adders ranging from 4 bits to 32 bits
and measured the delay and power consumption for both the MCML and CMOS
implementations. The same experiment could be done at any performance level but we limited
the scope to a single current point near the optimal for MCML at I = 10uA for all of the "gates"
inside of the full adders. We use the same generalization techniques mentioned in chapter 5 to fix
the voltage swing and VDD across gates and we allow for the use of two VSC's for better swing
control.
The first comparison to be made is the relative performance of the two different full adder
architectures for varying ripple adder bit length. The key graphs of delay, power-delay, and
energy-delay vs. bit length are shown in figure 6.2a. In chapter 2, it was shown that the delay of
a chain of MCML gates is proportional to the number of gates, N. It was also shown that the
power-delay is proportional to N2 and the energy-delay is proportional to N3. In order to
demonstrate these properties, the quantities in figure 6.2b are normalized by these respective bit
length factors.
67
Delay vs. Number of Bits
Power-Delay vs. Number of Bits
7
6
Full Adder #1
Full Adder #2
6
Full Adder #1
Full Adder #2
5
5
Energy-Delay vs. Number of Bits
35
Full Adder #1
Full Adder #2
30
25
4
4
20
3
3
t p (ns )
15
2
Power-Dela y (pJ )
2
1
1
0
10
Ener gy-Dela y (pJ*ns )
0
10
20
30
0
5
0
10
20
N
30
0
0
10
N
20
30
N
Figure 6.2a : MCML Ripple Adder Performance
Normalized Delay
400
Normalized Power-Delay
10
Full Adder #1
Full Adder #2
Full Adder #1
Full Adder #2
350
Normalized Energy-Delay
2
Full Adder #1
Full Adder #2
8
300
250
1.5
6
200
1
4
150
Normalized
t p (ps )
100
Normalized P-D (f J )
2
0.5
Normalized E-D (pJ* ps )
50
0
0
10
20
30
0
0
10
20
N
30
0
0
10
N
20
30
N
Figure 6.2b : MCML Ripple Adder Performance (Normalized)
We can see several interesting trends from the above graphs. First of all, the proportionality
to logic depth shown in Chapter 2 seems accurate, especially for larger N. The reason for the
larger delay at low N is that the slower generate and sum functions become a more important
factor in proportion to the faster carry circuitry.
The determination of the better full adder topology depends on the desired goal. In general,
the second full adder is faster but consumes about twice the power of the first full adder.
Therefore, the power-delay is lower for the first full adder for all N. The energy-delay product
however is smaller for the first adder for small N and larger for large N. If a large ripple adder is
68
required and speed is the primary objective, then the second adder topology should be used. If
the power efficiency is more important than absolute performance, the first adder topology is
better. The other difference between the adders is that the first adder is significantly more area
efficient than the second one.
Since high performance designs tend to use non-ripple
architectures (bypass, lookahead, etc.), the first topology is more desirable in almost all cases.
For the remainder of this report, only the first full adder topology will be analyzed.
Now that the two alternative MCML full adders have been analyzed and compared, we
would like to compare the performance of the MCML adders to traditional CMOS ripple adders.
The VDD for the CMOS adders was set at 1.9V to most closely model the performance of the
MCML adders with I=10uA. The power measurements for the CMOS adders are found using
randomly generated input vectors while the delay is worst case. The comparison results are
shown in figure 6.2c:
Delay
Power-Delay
1
7
10
MCML Adder #1
CMOS Adder
6
MCML Adder #1
CMOS Adder
MCML Adder #1
CMOS Adder
1
5
10
0
4
10
3 )
t p (ns
0
10
Power-Dela y (pJ )
2
Ener gy-Dela y (pJ*ns )
-1
1
0
Energy-Delay
2
10
10
0
10
20
N
30
-1
10
0
10
20
30
0
10
N
20
30
N
Figure 6.2c : Comparison of MCML and CMOS Ripple Adders
The above results demonstrate the effects of logic depth on MCML performance. For small
N, the MCML Adder is superior in both power-delay and energy-delay while it is inferior for
larger N. For N = 4, the MCML adder has 35% of the power-delay and 48% of the energy-delay
69
of the CMOS adder. For N = 32, it has 225% of both power-delay and energy-delay as the
CMOS adder. This result verifies the conclusion from chapter 2 that MCML is most applicable
to circuits with shallow logic depth.
6.3 Modified MCML Ripple Adders with Current Ratio Adjustment
The previous section validated the theoretical results of chapter 2 that MCML circuits with
moderate to high currents and shallow logic depth are more energy-efficient that equivalent
CMOS circuits. We now would like to add one more design optimization to the MCML ripple
adders which increase energy efficiency. The optimization to be discussed in this section is that
of Current Ratio Adjustment (CRA) and it will first be generalized to non-ripple adder
architectures.
The basic principle behind CRA is to optimize energy efficiency by reducing the
performance (and therefore power) in non-critical paths of logic circuits. This is accomplished
in MCML circuits by using different amounts of current in the gates of different logical paths.
As long as the critical path timing is maintained, the extra power reduction will not affect the
system performance. In a system with perfect CRA, all logic paths will have exactly the same
delay and overall power will be utilized in the most efficient manner possible.
An analogy can be made and applied to static CMOS circuitry. The equivalent operation to
CRA in CMOS would be to use a different supply voltage, VDD, for gates in different paths.
This is extremely difficult to do in practice due to the difficulty in maintaining multiple supply
networks. In MCML, the use of CRA requires minimal change of the core circuitry and only
requires the use of multiple VSC's - a practice already supported by previous analysis. In circuits
70
which require tight swing control and therefore already require multiple VSC's, the addition of
CRA is completely free of cost. The general idea of CRA is presented in figure 6.3a below.
tp = T
Pwr = P
Input
Output
tp = T
Pwr = P
tp = T
Pwr = P
tp = T
Pwr = P
tp = T
Pwr = P
tp = T
Pwr = P
tp = T
Pwr = P
No CRA - Total Delay = 3T, Total Power = 7P
tp = 3T
Pwr = P/3
Input
tp = 3T/2
Pwr = 2P/3
tp = T
Pwr = P
Output
tp = 3T/2
Pwr = 2P/3
tp = T
Pwr = P
tp = T
Pwr = P
tp = T
Pwr = P
With CRA - Total Delay = 3T, Total Power = 5.67P
Figure 6.3a : Example of Current Ratio Adjustment
The application of CRA to an MCML ripple adder is a fairly straightforward procedure. Let
Ic be the current in the carry gate and Is be the current in the sum gate. Also let N be the number
of bits of the adder. If we assume a linear relationship between the current and delay, we can
easily write that:
Tp tot = tp abtoc + ( N − 2) × tp ctoc + tpctos
Tp tot =
k
k1
k
+ ( N − 2) × 2 + 3
Ic
Ic Is
where tpabtoc is the input to carry delay, tpctoc is the carry to carry delay, and tpctos is the carry to
sum delay for a full adder. For simplicity, assume k1=k2=k3=k. When no CRA optimization is
done, then Ic = Is = I and,
71
Tptot =
k
×N
I
(No CRA)
Now, let I c = x × I and I s = y × I . Then we can write that,
Tp tot =
k  ( N − 1) 1 
×
+  (With CRA)
I  x
y
We can also write the expressions for power, power-delay, and energy-delay:
P = N × I × ( x + y )× VDD
 ( N − 1) 1 
PD = k × N × VDD × ( x + y )× 
+ 
y
 x
 ( N − 1) 1 
k2
ED = × N × VDD × ( x + y )× 
+ 
I
y
 x
2
If we set the delay with CRA equal to the delay without CRA, we can solve for x and y which
gives the minimum energy-delay. Figure 6.3b gives the results of this first order optimization:
N
xmin
ymin
xmin/ymin
ED with CRA / ED without CRA
4
1.18
.686
1.72
0.933
8
1.21
.452
2.68
0.830
16
1.18
.304
3.88
0.742
32
1.14
.208
5.48
0.674
Figure 6.3b : Theoretical Results of Current Ratio Adjustment for optimal Energy-Delay
We can see from the above table that the potential savings in energy-delay increase from 7%
for N=4 to 33% for N=32. The most important number from the above analysis is the ratio,
xmin/ymin. This ratio is the amount of current scaling which should be done between the carry and
sum circuits for optimal energy-delay product. For example, for a 32 bit adder, the current in the
carry path should be 5.48 times larger than the current in the sum path for optimal energy-delay.
In general, this ratio is approximately proportional to N .
We can also examine the sensitivity of the optimization to this ratio by plotting the energydelay as a function of xmin/ymin in figure 6.3c. From this plot, we can see that the sensitivity
72
around the optimal point is not very great. For N = 32, the deviation from optimal is less than
10% for a ratio range of 2.6-12.8 (optimal = 5.48) and less than 5% for a range of 3.3-10.2.
Therefore, the actual ratio used only needs to be somewhat close to the optimal but it is not
necessary to be extremely accurate.
Normalized Energy-Delay vs. Current Ratio
1.3
1.2
N=4
1.1
N=8
1
0.9
N = 16
0.8
Normalized Ener gy Dela y
0.7
N = 32
0.6
1
2
3
4
5
6
7
Current Ratio = xmin/ymin
8
9
10
Figure 6.3c : Normalized Energy-delay vs. Current Ratio
The last step in the CRA optimization analysis is to compare the theoretical results with
actual simulations. The ripple adders from section 6.2 were redesigned using the x and y ratios
from figure 6.2b. The gates were reoptimized in order to achieve the necessary constraints from
chapter 3. The results from this experiment are presented in figure 6.4d:
73
Delay
8
Power-Delay
1
10
Basic MCML
MCML w CRA
CMOS
7
6
Energy-Delay
2
10
Basic MCML
MCML w CRA
CMOS
Theoretical CRA
Basic MCML
MCML w CRA
CMOS
Theoretical CRA
1
10
5
0
10
4
0
t p (ns
3 )
10
Power-Dela y (pJ )
2
10
1
0
Ener gy-Dela y (pJ*ns )
-1
0
10
20
Number of Bits, N
30
-1
10
0
10
20
Number of Bits, N
30
0
10
20
Number of Bits, N
30
Figure 6.3d : Current Ratio Adjustment Results
While the actual implementations of CRA are better in power efficiency than non-CRA
circuits, they are less efficient than the theoretical CRA results. The primary reason for this
variation is that the performance of MCML circuits does not vary linearly with current for large
current deviations.
When re-optimization is required, the smaller current gates have their
performance reduced in a greater than linear amount and hence, the theoretical results are not
achieved. Even with this difficulty, Current Ratio Adjustment is a powerful technique that can
significantly improve energy efficiency in MCML circuits.
This chapter has compared CMOS and MCML ripple adder performance. As predicted in
chapter 2, the additional logic depth of large ripple structures makes MCML an unattractive
choice. On the other hand, MCML adders perform significantly better for smaller numbers of
bits. The addition of current rationing in the adders improves the adder performance of MCML
even more. While only applied to ripple adders in this chapter, CRA can be applied to any
general logic circuit and can increase energy efficiency dramatically.
74
Chapter 7
System Design Example : CORDIC
7.1 CORDIC Algorithm
In order to test many of the optimizations and analysis developed earlier, we felt it was
necessary to design a complex block of logic using MCML. The target block of logic chosen
was a pipelined CORDIC. The basic CORDIC algorithm is used for computing angles of vectors
and for rotating vectors [7]. The vectoring algorithm iteratively computes the angle of a vector
using the following equations:
X i = X i −1 + σ i −1 × 2 − i × Yi −1
Yi = Yi −1 − σ i −1 × 2 − i × X i −1
σ i = sign(Yi ) ∈ (1,−1)
After N iterations, where N is the bit width of X and Y, the vector (X0,Y0) will be rotated
onto the X-axis and the angle of the vector will be equal to:
π N −1
θ = × ∑ σ i 2 −(i +1)
2 i =0
The algorithm can also be used to rotate a vector by a fixed angle. This is accomplished with
the same equations as above but the sign bits are not calculated at each stage. Instead, the sign
bits are stored and used from a previous angle calculation. In hardware, this added functionality
translates into a small amount of control circuitry.
One algorithmic difficulty in the vector rotation scheme is that while the vector is rotated by
the proper angle, the magnitude of the vector is not preserved.
75
In order to preserve the
magnitude after rotation, several magnitude scaling iterations are added to the algorithm in
between rotations [7]. These scaling factors are broken down into multiplication by quantities of
the form (1 - 2-x) which are implemented by a shift and subtract in hardware.
7.2 CORDIC Architecture
In our implementation of the CORDIC, we decided to pipeline every iteration, both rotating
and scaling. For 8 bits of precision, there were a total of 14 pipeline stages, 8 for rotation and 6
for scaling. While this pipelining will introduce significant latency, the targeted applications are
throughput dominated and should be able to tolerate the extra latency. It is also important to
remember that an unpipelined CORDIC will have large logic depth and would not be a good
candidate for MCML logic.
Another important feature to note is that additional bits are required in order to maintain
precision during rotation. Three extra bits are used to reduce truncation error and a single bit is
used to prevent overflow before scaling. The total bit width of the stages is 12 bits for an 8 bit
input and output.
A basic rotation stage of the CORDIC is shown in Fig. 7.2a. Note that the outputs from the
XNOR and XOR blocks are actually shifted right in relation to the other inputs to the adder. The
amount of shifting depends on the rotation number but is hardwired and requires no extra logic.
The critical path begins with the signi-1 bit, goes through a buffer, an xnor, a 12 bit adder, a 2
input mux, and to the signi output. Scaling iterations are entirely subtractions and have a shorter
critical path than all rotation iterations.
76
Modei-1 Signi-1
MUX
Modei
Signi
Xi-1 Register
Yi-1 Register
XNOR
XOR
+
+
Xi Register
Yi Register
Figure 7.2a : CORDIC Pipeline Stage
7.3 Circuit Optimization
The gates in each bit pipeline stage are implemented using the Current Ratio Adjustment
(CRA) optimization discussed in the previous section. While a fully detailed optimization was
not performed, approximations to optimize energy-efficiency were made and implemented. All
of the gate currents were fixed at some multiple of a base current, I. There are 4 different VSC's
used: one for the carry circuits, one for the sum circuits, one for all two level gates (FFs, XNOR,
XOR, MUX), and one for the buffers. The carry circuits in the ripple adders were most common
in the critical path and were therefore given 2I current. The XNOR, XOR, MUX, and flip flops
all use the same VSC's and are given I current. The sum circuits of the adder have very little
contribution to the critical path and are therefore given 0.5I current.
Three different values of I were chosen in order to achieve high, medium, and low
performance levels: 10uA, 5uA, and 2.5uA. The circuits were completely optimized for these
current points (including CRA scaling) using the procedure developed in chapter 3 and modified
in chapter 5. For the high performance CORDIC, more current is used throughout to reduce
77
delay. As a result of the increased current, slightly larger transistors, higher VDD, and higher
voltage swing are required to maintain the desired DC properties. In the low performance mode,
small currents are used and it can therefore utilize reduced voltage swing and VDD. The CMOS
equivalent design is not optimized for different performance levels but it was rather simulated at
4 different values of VDD.
While the design of a full adaptive pipelining system was beyond the scope of this project,
we can estimate the effects of its use. The non-adaptively pipelined results use the worst case
clock frequency and the nominal power consumption. The worst case clock frequency assumes
worst case process corner and +/-10% variation in VDD for CMOS. Since the MCML circuits
consume a constant amount of current, the variation on VDD will be much smaller and is
assumed to be negligible but the worst case process corner is still used. The adaptively pipelined
results use the nominal clock frequency and power consumption. All clock frequencies used
have a 10% margin over the total critical path delay.
7.4 Results
Now that the algorithm, architecture, and circuits of the CORDIC system are defined, we will
look at the simulation results. In order to verify the effects of wiring capacitance and other
layout effects, the critical path pipeline stage was layed out for both the CMOS and one of the
MCML CORDIC data points. We simulated the critical path delay for both the schematic and
extracted versions from this single pipeline stage and extrapolate some of the results to the entire
CORDIC. All of the estimated values are denoted with an asterisk but are believed to be very
close to actual values. A subject for future work would be to complete the layout of the entire
78
CORDIC block for both the CMOS and 3 different MCML implementations. The simulated and
predicted results are summarized in Figs. 7.3a-c:
Nominal VDD (V)
2.5
2.0
Schematic W.C. Clock Freq. (MHz)
235
180
Extracted W.C. Clock Freq. (MHz)
140
105
Schematic Power (mW)
21.3
9.80
Extracted Power (mW)
21.5*
9.94*
Schematic ED (pJ*ns)
311
247
Extracted ED (pJ*ns)
904
724
Figure 7.3a : CMOS CORDIC Results
1.5
110
65
3.16
3.33*
212
647
1.0
35
20
0.42
0.45*
248
787
Performance Level
High
Med.
Low
VDD (V)
1.1
1.05
1.0
Voltage Swing (V)
0.4
0.35
0.3
Schematic W.C. Clock Freq. (MHz)
280
195
120
Extracted W.C. Clock Freq. (MHz)
95
60
35
#
#
Schematic Power (mW)
18.6
9.00
4.33#
#
#
Extracted Power (mW)
18.6
9.00
4.33#
Schematic ED (pJ*ns)
192
195
242
Extracted ED (pJ*ns)
1682
1908
2492
Figure 7.3b : MCML CORDIC Results - No Adaptive Pipelining
Performance Level
High
Med.
VDD (V)
1.1
1.05
Voltage Swing (V)
0.4
0.35
Schematic Nom. Clock Freq. (MHz)
310
205
Extracted Nom. Clock Freq. (MHz)
105
65
Schematic Power (mW)
18.6#
9.00#
Extracted Power (mW)
18.6#
9.00#
Schematic ED (pJ*ns)
155
172
Extracted ED (pJ*ns)
1379
1699
Figure 7.3c : MCML CORDIC Results - With Adaptive Pipelining
Low
1.0
0.3
125
40
4.33#
4.33#
221
2269
* - Result is extrapolated to whole design from single pipeline stage
# - Result does not include power due to peripheral control circuitry (VSC's, DLL's, etc.)
There are several important things to notice about the final data. The first thing to notice is
that for the transistor level schematics, the MCML design has superior energy-delay properties
over almost the entire range of performance levels. The MCML design can operate at a faster
frequency than even possible than with CMOS at 2.5V and can do so with lower power
79
consumption. The addition of the adaptive pipelining allows operation at nominal instead of
worst case clock frequency and increases the benefit of MCML over CMOS.
Unfortunately, the delay scaling is not equivalent for the layout of the CMOS and MCML
designs. For the CMOS designs, the extracted layout which includes wire capacitances is about
1.7 times slower than the transistor only schematic. This scaling factor increases to between 2.9
and 3.2 times for the MCML design. The effect of these two different scaling factors is that the
MCML layout is no longer as efficient in energy-delay as the CMOS design for any design point.
While the actual factors causing this scaling difference are unknown, we speculate that the
primary reason is that the MCML design is much more susceptible to Miller effect, crosscoupling capacitance between parallel wires. In fact, all of the signals in the MCML design are
complementary and are run more or less in parallel so that this additional capacitance may be
quite large. A topic of future work would be to re-analyze the layout strategies employed and to
attempt to decrease this cross-capacitance by increasing wire spacing or using different metal
layers or routing topology. It is my opinion that this scaling factor could be reduce significantly
with more attention given to routing.
Besides from delay and power, we would also like to consider the area and current switching
properties of the two logic styles. Since only a single pipeline stage was layed out for both styles
of the CORDIC, we cannot compare total areas but we can compare the area of these stages. We
can also look at the simulated results of the power supply current and report the change due to
switching activity. Since the areas are all very similar between design points and the supply
switching properties are more important for the high performance designs, we only report those
results here. These results are summarized in figure 7.3d:
80
MCML
CMOS
Area (um )
11,200
9,000
Nominal Supply Current (mA)
1.65
0
Maximum Supply Current (mA)
1.69
22
Minimum Supply Current (mA)
1.61
-4
Figure 7.3d : Area and Supply Current for CORDIC Pipeline Stage
2
As you can see, the supply current variation of the MCML design is superior than the CMOS
design. For mixed signal communications systems, this property is extremely important to allow
high precision analog circuit operation. The area of the CORDIC pipeline stage is about 25%
bigger for the MCML design and may or may not be an acceptable price to pay for low noise
computation.
This chapter has applied the analysis and design techniques of the previous chapters to the
implementation of the CORDIC algorithm. We have seen that even with the use of adaptive
pipelining and current ratio adjustment (CRA), the extracted MCML design is less power
efficient than the equivalent CMOS design. The primary reasons for this result are that the
layout of the MCML CORDIC was not done with careful consideration of cross-coupling
capacitance and that the logic depth of the circuitry is larger than ideal. In other functions with
shallower logic depth and more efficient routing, the MCML designs may in fact be more power
efficient than the equivalent CMOS function.
81
Chapter 8
Conclusions
8.1 Summary
This report has been an in depth analysis of the benefits and pitfalls of using MOS Current
Mode Logic. We have analyzed the transistor level behavior of MCML circuits and compared
their properties to those of standard CMOS circuits. In chapter 3, a design algorithm was
proposed for MCML gates and many of the design constraints were analyzed and explained. The
algorithm was used to design larger pieces of logic and several global design optimizations such
as adaptive pipelining and current ratio adjustment were proposed. We have seen that under
ideal circuit conditions, MCML can be much more energy efficient than equivalent CMOS
circuits and also present significantly less noise on the supply network.
8.2 Future Work
While this report attempted to analyze many different facets of MCML operation, there were
several areas which still require more research. The first goal of any future work would be to
automate the design process for MCML gates using the algorithm developed in chapter 3. This
automation would greatly reduce the time of implementing larger logic functions. Along the
same lines, use of MCML gates should be incorporated into a standard digital logic design flow
including synthesis, placement, and routing. The main challenges in this arena would be the
82
characterization of the MCML gates for the synthesis tool and the implementation of differential
routing.
The second main area of future work would be in the implementation of the control logic
presented in chapter 5. While many of the effects of adaptive pipelining are estimated in this
report, it would be nice to actually implement the adaptive pipeline circuitry and measure its
area, power consumption, and tracking properties. Also included in this future work would be
the design of the VSCs and closer measurements of opamp requirements.
A third area for future research would be a much more detailed analysis of the effects of
noise on circuit performance and design. The MCML gate design procedure should be modified
to incorporate maximum noise restrictions and more modeling of the noise sources and immunity
in MCML circuits is necessary for truly robust design.
Finally, the design of different large circuit blocks would give a better picture as to the true
effects of global routing and system issues. Further work is needed to optimize the differential
routing problem noticed in the CORDIC design. Other issues such as clock distribution and
CMOS interfacing need to also be addressed.
83
Appendix A
Derivation of Ideal MCML Gate Performance
A.1 Goal
The basic goal of the following analysis is to understand the effects of transistor linearity on
MCML gate performance and power. The key results will be that a completely ideal MCML
gate has constant power consumption during switching and that non-ideal transistor behavior is
responsible for finite dI/dt effects. We begin by an analysis of a completely ideal MCML gate.
A.2 MCML Gate with Ideal Load
The first step in the analysis is to derive the transient properties of a MCML circuit with ideal
current source, ideal switch, and ideal load. This device is shown in figure A.2a:
R
R
VR
VL
Out
Out
t=0
C
C
I
Figure A.2a : Basic MCML Gate
84
We assume in this analysis that the output loads are symmetrical and the pull down network
is an ideal switch which moves from left to right at time t = 0. Now we model the left and right
sides as separate first order RC networks:
Right Side
Left Side
R
R
VDD
+
IL
-
VR
VL
+
C
-
VDD
IR
C
I
The DC conditions at t = 0- and t = ∞ are as follows:
DC Conditions
VL(0-) = VDD - IR
VL(∞) = VDD
VR(0-) = VDD
VR(∞) = VDD - IR
With these models, we can easily solve the first order differential equations and achieve the
following transient responses:
VL(t) = VDD - IRe-t/RC
VR(t) = VDD - IR(1 - e-t/RC)
Vdiff(t) = VR(t) - VL(t) = IR(2e-t/RC - 1)
IL(t) = (VDD - VL(t))/R = Ie-t/RC
IR(t) = (VDD - VR(t))/R = I(1 - e-t/RC)
IVDD = IL(t) + IR(t) = I
85
There are two important things to notice from the above equations. First, the differential
output voltage is a simple first order RC response. We can solve this response at the 10%, 50%,
and 90% points to predict propagation delay and rise and fall times. Doing this gives us a value
for the Signal Slope Ratio, SSR, (see chapter 3) that we can use to compare against. The SSR
value from a first order RC system works out to equal 3.17.
The second important result of the above transient equations is that the current from the
voltage supply, VDD, is a constant. Since the current is not a function of the switching activity,
switching speed, capacitance, resistance, etc., the amount of power consumption is always the
same and hence the sum of the dynamic and static power must be constant. The results of this
fact is that dI/dt = 0 and no switching noise exists on the power lines.
A.3 MCML Gate with Non-Ideal Load
Now we would like to perform the same analysis as above except that we allow load
resistances to be non-ideal in order to more closely model the performance of a pmos transistor.
Using a level one model of the transistor, we will assume that the drain current has two modes,
linear and saturation, with a definite transition point at Vdsat. We use a voltage controlled current
source instead of a simple load resistance as shown in figure A.3a. We model the pmos device
with an I-V characteristic as shown in figure A.3b.
Now we would like to analyze the transient behavior of this gate while still assuming a
perfect switch and ideal current sources, capacitors, etc.
We begin by looking at the DC
properties of the load resistances. When we define a voltage swing, ∆V, and a current level, I,
for our MCML gates, we are effectively setting the DC resistance of the load device. This DC
resistance is set by adjusting the Vgs of the devices with the VSC.
86
VR
VL
Out
Isd
IR = f{VR}
IL = f{V L}
Linear
Region
Saturation
Region
Out
Isat
t=0
C
Isd = I
C
sat
+ k 2(V sd - V dsat)
Isd = k 1V sd
I
Vdsat
Figure A.3a : Basic MCML Gate with non-ideal loads
Vsd
Figure A.3b : PMOS load level 1 model
At time t = 0-, the switch is assumed to be on the left side. The current flowing in the left
load device must be equal to the current in the current source, I, in order for equilibrium to exist.
We also know that the Vsd of the pmos device is equal to ∆V. Therefore, VL = VDD - ∆V and IL =
I. The right side has no current flowing and therefore, VR = VDD and IR = 0. At t = ∞, the
opposite is true.
If ∆V < Vdsat, then the transistor is in the linear region for the entire range of switching
voltages. This case is identical to having a resistor R1 = 1/k1 and the analysis is the same as in
section A.2. We are only interested in the case where ∆V > Vdsat and the device travels through
both regions of operation during switching. In this case, the left side will begin in the saturation
region until VL = VDD - Vdsat and then continue in the linear region until VL = VDD. The right
side begins in the linear region until VR is pulled down to VDD - Vdsat and then it enters saturation
until it reaches its final value of VDD - ∆V. Note that the slopes of the I-V curve in the two
regions can also be expressed by the resistances, R1 and R2 which are the inverses of k1 and k2
respectively.
87
We can now define four separate phases to analyze and solve the transient response. The
following solutions come directly from solving the first order differential equations associated
with each region of operation:
Phase #1: Left side, VL starts at VDD - ∆V and ends at VDD - Vdsat at time = t1, load in sat region
VL (t ) = VDD

t 

 −


 R2 C  
,
− ∆V + IR2 1 − e




t < t1


IR2
t1 = R2 C × ln 

Vdsat − ∆V + IR2 
Phase #2: Left side, VL starts at VDD - Vdsat and approaches VDD, load in linear region
VL (t ) = VDD − Vdsat × e
 (t −t1 ) 
 −

 R1C 
,
t > t1
Phase #3: Right side, VR starts at VDD and ends at VDD - Vdsat at time = t2, load in linear region
VR (t ) = VDD
 t 

 −
 

− IR1 1 − e  R1C   ,




t < t2
 IR1

t 2 = R1C × ln 

 IR1 − Vdsat 
Phase #4: Right side, VR starts at VDD - Vdsat and approaches VDD - ∆V, load in sat region
VR (t ) = VDD − ∆V + (∆V − Vdsat )× e
 (t − t 2 ) 
 −

 R2C 
,
t > t2
We now make one assumptions and calculate the differential transient response.
The
necessary assumption is that t2 > t1. This will be true in most cases except when Vdsat is very
88
(< .5 * ∆V). With this assumption, the equation for the differential output voltage can be
small
written:
 t 

t 


 −
 
 −



 R1C  
 R2 C  
Vdiff (t ) = ∆V − IR1 1 − e
,
− IR2 1 − e








Vdiff (t ) = Vdsat × e
 t −t 
− 1 
 R1C 
t < t1,t2
 t 

 −

RC

− IR1 1 − e  1   ,




Vdiff (t ) = − ∆V + Vdsat × e
 t −t 
− 1 
 R1C 
t1 < t < t2
+ (∆V − Vdsat ) × e
 t −t 2 

−
 R2 C 
,
t > t1,t2
In order to see the effects of nonlinearity on the output waveform shape, we can graph the
response for a few different values of Vdsat and Isat. Figure A.3c has I = 10uA, ∆V = 0.4V, and C
= 10fF:
Differential Transient Response of Non-linear load MCML circuit
0.4
0.3
0.2
0.1
0
Vdiff (V )
-0.1
Vdsat=.35
Isat=9u
-0.2
-0.3
-0.4
Vdsat=.30
Isat=9u
Vdsat=.30
Isat=9.9u
Vdsat=.25
Isat=9u
Vdsat=.35
Isat=9.9u
Perfect Linear
0
500
1000
1500
2000
2500
Time (ps)
3000
3500
4000
4500
5000
Figure A.3c : Transient Response of Non-linear load MCML circuits
The important things to note from the above graph are that as either the Vdsat decreases and
the Isat increases, the nonlinearity of the curve become more pronounced. For the case when Vdsat
89
= 0.30V and Isat = 9.9uA, the output is at around -.325V at 5ns after the switching event. This
slowdown in the voltage fall is due directly to the fact that the right side voltage must continue
through a very strong saturation region.
We can also use the previous data points to illustrate the change in the Signal Slope Ratio
(SSR) for the different levels of nonlinearity. Figure A.3d examines the change in propagation
delay vs. fall time for this level one model. We see that the SSR is a strong function of the I-V
curve shape of the pmos load devices. While the actual implications of using large SSR circuits
depends on the actual function being implemented, the propagation delay of circuits being driven
by a gate with slow fall time will almost definitely decrease.
Vdsat (V)
Isat (uA)
tp (ps)
t10% (ps)
t90% (ps)
SSR
Linear
Linear
277
42
922
3.18
.25
9
265
42
1608
5.91
.30
9
268
42
1162
4.18
.35
9
276
42
948
3.28
.30
9.9
263
42
3623
13.62
.35
9.9
271
42
1102
3.91
Figure A.3d : Signal Slope Ratio (SSR) Effects of non-linear pmos loads
We see from the above table that while the propagation delay stays relatively fixed across the
different shapes of the I-V curve, the fall time and hence the SSR can increase dramatically.
This behavior is undesirable for circuit reliability and should be avoided by increasing the Vdsat
voltage above voltage swing, ∆V, by adjusting the pmos device sizes.
The last piece of analysis to perform is to determine the effects of non-linear load devices on
the current behavior of the power supply. We can easily compute the current equations by taking
the derivative of the voltage responses in the four different phases and summing the results.
Once again, we achieve different current responses for the three time regions:
90
−t
−t
I VDD = I + I × e R2C − I × e R1C ,
I VDD = I + I sat × e
I VDD = I + I sat × e
− (t −t1 )
R1C
− (t −t1 )
R1C
− I ×e
t < t1,t2
−t
R1C
,
− ( I − I sat ) × e
t1 < t < t2
− (t − t 2 )
R2C
,
t > t1,t2
With these equations, we can plot the supply current for a variety of I-V curve
characteristics. The result is shown in figure A.3e:
Supply Switching Current for MCML Circuits with Non-linear loads
13.5
13
Vdsat = .25
Isat = 9u
12.5
Vdsat = .30
Isat = 9.9u
12
Vdsat = .30
Isat = 9u
11.5
Vdsat = .35
Isat = 9.9u
Vdsat = .35
Isat = 9u
11
VDD10.5
Current (uA )
10
9.5
9
-500
0
500
1000
1500
2000
2500
3000
3500
Time (ps)
Figure A.3e : Supply Switching Current for MCML Circuits with Non-linear Loads
We see that as the nonlinearity increases, the variation of the ideal constant current draw also
increases. Therefore, in order to maintain low power supply switching noise, it is crucial that the
load devices be operated entirely in regions below the Vdsat voltage.
91
4000
References
[1] B. Davari, R. H. Dennard, G. G. Shahidi, "CMOS Scaling for High Performance and Low
Power - The Next Ten Years," Proceedings of the IEEE, Vol 83, No. 4, April 1995, p595606.
[2] M. Mizuno, M. Yamashina, K. Furuta, H. Igura, H. Abiko, K. Okabe, A. Ono, H. Yamada,
“A GHz MOS, Adaptive Pipeline Technique Using MOS Current-Mode Logic,” IEEE
Journal of Solid-State Circuits, Vol 31, No. 6, June 1996, p784-791.
[3] Jan Rabaey, “Digital Integrated Circuits: A Design Perspective,” Prentice Hall, 1996.
[4] Anantha P. Chandrakasan and Robert W. Broderson, “Minimizing Power Consumption in
Digital CMOS Circuits,” Proceedings of the IEEE, Vol 83, No. 4, April 1995, p498-523.
[5] Giovanni De Micheli, "Synthesis and Optimization of Digital Circuits," McGraw Hill, 1994.
[6] Paul R. Gray and Robert G. Meyer, "Analysis and Design of Analog Integrated Circuits,"
John Wiley and Sons, Inc., 1993.
[7] J.E. Volder, "The CORDIC trigonometric computing technique," IRE Trans. Electron.
Comput., vol. EC-8, p. 330-334, Sept. 1959
[8] J. S. Walther, "A unified algorithm for elementary functions," in Proc. AFIPS Spring Joint
Comput. Conf., 1971, p. 379-385.
[9] Dake Liu and Christer Svennson, “Trading Speed for Low Power by Choice of Supply and
Threshold Voltages,” IEEE Journal of Solid-State Circuits, Vol 28, No. 1, January 1993,
p10-17.
92
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement