Complexity Reduction in the CORDIC Algorithm by using MUXes Yuhang Sun

Complexity Reduction in the CORDIC Algorithm by using MUXes Yuhang Sun
Master’s Thesis
Complexity Reduction in the
CORDIC Algorithm by using
MUXes
Yuhang Sun
Department of Electrical and Information Technology,
Faculty of Engineering, LTH, Lund University, August 2015.
Complexity Reduction in the
CORDIC Algorithm by using
MUXes
By
Yuhang Sun
Department of Electrical and Information Technology
Faculty of Engineering, LTH, Lund University
SE-221 00 Lund, Sweden
1
2
Abstract
Nowadays, the CORDIC algorithm plays an important role to deal with
the non-linear functions in hardware. In this thesis, a novel methodology is
described to reduce the complexity in an unrolled CORDIC architecture,
which gives higher speed, lesser area, and lower power consumption. That
is, MUXes are used to replace adder stages. Five different unrolled
CORDIC architectures have been implemented in ASIC using a 65nm
CMOS technology with Low Power High ்ܸ transistors. The area,
computational speed, accuracy, error behavior, and power consumption
have been analyzed. The design aim is to reduce the power consumption,
which is more and more important depending on the area. As a result the
area and power consumption get 7.9% lower and 27.2% lower separately,
and the speed is 22.9% higher compared to the original unrolled CORDIC
architecture.
Keywords: CORDIC, power consumption.
3
Acknowledgments
This Master’s thesis would not exist without the support and guidance
of many people, my friends, my parents, especially my supervisors.
I would like to express my deepest gratitude to my supervisors,
Professor Peter Nilsson, Erik Hertz, and Rakesh Gangarajaiah who guided
me and helped me in this thesis work.
Finally, thanks to my relatives who kept encouraging me and my
parents who afford me to study in Sweden
Yuhang Sun
4
Contents
Abstract ......................................................................................................... 3
Acknowledgments ......................................................................................... 4
List of figures ................................................................................................. 7
List of tables .................................................................................................. 8
List of acronyms .......................................................................................... 10
1.
Introduction......................................................................................... 11
1.1.
2.
3.
4.
5.
Thesis outlines ............................................................................. 12
Theory.................................................................................................. 13
2.1.
The CORDIC algorithm ................................................................. 13
2.2.
Two starting angles ..................................................................... 16
2.3.
Four starting angles ..................................................................... 17
Software simulation ............................................................................ 21
3.1.
Outputs simulation ...................................................................... 21
3.2.
Error behavior ............................................................................. 22
3.3.
Accuracy ...................................................................................... 24
Hardware implementation .................................................................. 25
4.1.
Design flow .................................................................................. 25
4.2.
Hardware architectures and simulation results .......................... 26
4.2.1.
The original unrolled CORDIC architecture ......................... 26
4.2.2.
First stage removed architecture ........................................ 28
4.2.3.
Two stages eliminated architecture .................................... 30
4.2.4.
Three stages eliminated architecture ................................. 31
4.2.5.
Final architecture................................................................. 33
Results ................................................................................................. 37
5.1.
Test setup .................................................................................... 37
5
5.2.
The area ....................................................................................... 37
5.3.
Timing information ...................................................................... 39
5.4.
Power consumption .................................................................... 40
5.4.1.
Power analysis ..................................................................... 40
6.
Conclusions.......................................................................................... 45
7.
Future work ......................................................................................... 47
References ................................................................................................... 49
Appendix A .................................................................................................. 51
6
List of figures
Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.
Fig. 10.
Fig. 11.
Fig. 12.
Fig. 13.
Fig. 14.
Fig. 15.
Fig. 16.
Fig. 17.
Fig. 18.
Fig. 19.
Fig. 20.
Vector rotations diagram (five stages)................................ 14
The last vector ends up on the unit circle (five stages) ...... 15
Two starting angles (three stages) ...................................... 16
Four starting angles (two stages) ........................................ 18
MATLAB simulation of CORDIC design.......................... 21
The cosine error after truncation ........................................ 22
The cosine error distribution............................................... 23
The absolute cosine error in dB .......................................... 23
Degrees and the number of error bits ................................. 24
The design flow............................................................... 25
An unrolled 5-stage CORDIC architecture ..................... 26
First stage removed architecture ..................................... 28
Two stages are removed in the architecture .................... 30
Three stages are removed architecture ............................ 32
Final architecture ............................................................ 33
The Sgn detector ............................................................. 34
The design test setup ....................................................... 37
Area at different frequencies ........................................... 38
Power at various frequencies .......................................... 42
magnified diagram for Fig. 19 ........................................ 43
7
List of tables
TABLE I. Power at 10MHz.......................................................... 27
TABLE II. Power at maximum frequency..................................... 27
TABLE III.
Timing at maximum speed constraints .................... 27
TABLE IV.
Timing at minimum area constraints ....................... 27
TABLE V. Area at minimum area constraints .............................. 28
TABLE VI.
Area at maximum speed constraints ........................ 28
TABLE VII. Power at 10MHz ...................................................... 29
TABLE VIII. Power at maximum frequency ................................. 29
TABLE IX.
Timing at maximum speed constraints .................... 29
TABLE X. Timing at minimum area constraints ........................... 29
TABLE XI.
Area at minimum area constraints .......................... 29
TABLE XII. Area at maximum speed constraints ........................ 29
TABLE XIII. Power at 10MHz ...................................................... 30
TABLE XIV.
Power at maximum frequency ............................. 31
TABLE XV. Timing at maximum speed constraints .................... 31
TABLE XVI.
Timing at minimum area constraints ................... 31
TABLE XVII.
Area at minimum area constraints ...................... 31
TABLE XVIII. Area at maximum speed constraints .................... 31
TABLE XIX.
Power at 10MHz .................................................. 32
TABLE XX. Power at maximum frequency ................................. 32
TABLE XXI.
Timing at maximum speed constraints ................ 32
TABLE XXII.
Timing at minimum area constraints ................... 33
TABLE XXIII. Area at minimum area constraints ...................... 33
TABLE XXIV. Area at maximum speed constraints .................... 33
TABLE XXV. Power at 10MHz .................................................. 34
TABLE XXVI. Power at maximum frequency ............................. 34
TABLE XXVII. Timing at maximum speed constraints ................ 35
TABLE XXVIII. Timing at minimum area constraints ................. 35
TABLE XXIX. Area at minimum area constraints ...................... 35
TABLE XXX. Area at maximum speed constraints .................... 35
TABLE XXXI. Area at minimum area constraints ....................... 38
8
TABLE XXXII. Area at maximum speed constraints .................... 39
TABLE XXXIII. Timing at maximum speed constraints .............. 39
TABLE XXXIV. Timing at minimum area constraints ................. 40
TABLE XXXV. Power at maximum speed constraints ................. 41
TABLE XXXVI. Power at minimum area constraints ................... 42
TABLE XXXVII. Coefficient angles ............................................. 51
9
List of acronyms
CORDIC
COordinated Rotation DIgital Computer
LUT
Look Up Table
ASIC
Application Specific Integrated Circuit
VHDL
Very High speed integrated circuit Hardware
Description Language
CMOS
Complementary Metal Oxide Semiconductor
VCD
Value Change Dump
VLSI
Very Large Scale Integrated circuit
10
1. Introduction
In most cases, non-linear functions play an important role in hardware.
The area, computational speed, accuracy, error behavior, and the power
consumption are the important factors that we need to consider in the
hardware design. There are several algorithms that can be chosen for
implementation of non-linear functions.
Look-Up Table (LUT) is a simple and direct method to compute nonlinear functions. However, a look-up table is suitable if the precision is low
or the area is not necessarily considered. The size of the table grows
exponentially, which makes this method unsuitable for hardware design
when the precision goes high.
Another method, Parabolic Synthesis [1] is used to compute unary
functions with parabolic functions. It is a novel methodology that came up
with a high speed computational technology. The advantage of this method
is that the delay is short.
Polynomial approximations [2] are implemented with multipliers and
adders, using an iterative algorithm. Which polynomial curves we choose
leads to how closely the polynomial curve can follow the special function.
That is where the error is generated. Very high order polynomials are used if
high accuracy is required. To meet the requirement, the polynomial
approximations can be implemented with least squares approximations,
which minimize the average error, or least maximum approximations, which
minimize the worst-case error.
This thesis work is focus on the CORDIC algorithm. The CORDIC
algorithm is an efficient algorithm to compute non-linear functions in
hardware.
The COordinate Rotation Digital Computer (CORDIC) algorithm [3],
also known as the digit-by-digit method or the Volder’s algorithm, was first
described by Jack E. Volder in 1959. It is an efficient algorithm to compute
non-linear functions especially trigonometric functions in hardware design.
Compared to other methods described above, the CORIDC has no
multipliers, which means that it is only based on additions, subtractions and
bit shifts. Traditionally, iterative CORDICs [4] have been widely used since
the cost of area is less.
Nowadays, power consumption is more important than the area
parameter, [5] in hardware design, which leads to that unrolled CORDICs
are feasible to use. Meanwhile, it is necessary to reduce the complexity
11
since it can also reduce the power consumption, which is the aim of this
thesis.
The implementation includes five different unrolled CORDIC
architectures in an Application Specific Integrated Circuit (ASIC). One is
the original CORDIC architecture and the other four are CORDIC
architectures, with reduced complexities on various levels. The designs are
written in Very High Speed Integrated Circuit Hardware Description
Language (VHDL) using a 65nm CMOS technology with Low Power High
்ܸ transistors. The used supply voltage is VDD = 1.2 volts and the
temperature is 25 degrees.
As a result, area, timing, and power consumption, operating under
different frequencies, are reported for the five different unrolled CORDIC
architectures.
1.1. Thesis outlines
Remaining sections are outlined below:
Section 2 introduces the CORIDC algorithm and methods to reduce the
complexity.
Section 3 describes the software simulation, accuracy, and error
behavior of the CORIDC algorithm.
Section 4 describes the hardware implementations of the five different
unrolled CORIDC architectures.
Section 5 lists and analyzes the result of section 4, consisting of area,
timing and power consumption.
Section 6 concludes this thesis work.
Section 7 analyzes possible future work after this thesis work.
12
2. Theory
In this section, the CORDIC algorithm and the methods to reduce the
complexities are introduced. Using two starting angles and four starting
angles together with MUXes will reduce one adder stage and two adder
stages separately in the unrolled CORDIC architectures.
2.1. The CORDIC algorithm
The CORDIC algorithm, based on vector rotations is used to get an
approximation of the non-linear function. In this thesis, the sine functions
and cosine functions are the functions that are implemented with the
CORDIC algorithm. An iterative CORDIC architecture consists of several
adder stages. When the numbers of stages increased, the error between the
approximation and the original function gets smaller, which leads to an
improved accuracy.
In Fig. 1, a 30 degree angle is taken to be the input angle. There are five
vector rotations, which mean there are five stages. The initial vector, a 0
degree angle, is rotated by 45 degrees, which is detected to be larger than
the input angle. In next stage, the vector is rotated -27 degrees and so on, to
approximate the input angle. The positive degrees means the direction of the
rotation is counter clockwise, the negative degrees means the direction of
the rotation is clockwise. After five vector rotations, the last vector’s angle
approximates the input angle.
To improve the accuracy, more stages can be used. This will make the
approximation infinitely close to the input angle. The angles 45 and 27
degrees are the coefficient angles, which are shown in Table XXXIII in
Appendix A.
13
Fig. 1.
Vector rotations diagram (five stages)
The function of coefficient angles is shown in (1). Where݈ܽ݊݃݁ሺ݅ሻ is
the coefficient angle, ݅ is the number of stages. The first five coefficient
angles are given in (2).
݈ܽ݃݊݁ሺ݅ሻ ൌ ܽ‫݊ܽݐܿݎ‬൫ͳΤʹ௜ିଵ ൯
(1)
݈ܽ݊݃݁ሺͳሻ ൌ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹଵିଵ ሻ ൌ Ͷͷ
‫ۓ‬
ଶିଵ
݈ۖܽ݊݃݁ሺʹሻ ൌ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹ ሻ ൌ ʹ͸Ǥͷ͸
݈ܽ݊݃݁ሺ͵ሻ ൌ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹଷିଵ ሻ ൌ ͳͶǤͲͶ
‫݈݁݃݊ܽ ۔‬ሺͶሻ ൌ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹସିଵ ሻ ൌ ͹Ǥͳ͵
ۖ
‫݈݁݃݊ܽ ە‬ሺͷሻ ൌ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹହିଵ ሻ ൌ ͵Ǥͷͺ
(2)
The output is the last vector coordinates, (‫ݔ‬ǡ ‫)ݕ‬. In this paper, CORDIC
algorithm is used to deal with the sine function and cosine function. The
approximated sine and cosine value of the input angle are shown in (3) and
(4).
‫ݕ‬
‫ݎ‬
‫ݔ‬
…‘• ܽ ൌ
‫ݎ‬
•‹ ܽ ൌ
14
(3)
(4)
Where ܽ is the input angle, ‫ ݎ‬is length of the last vector. It can be noted
that if ‫ ݎ‬ൌ ͳ , the sine value and cosine value are exactly the
coordinates,‫ ݔ‬ൌ …‘• ܽ ܽ݊݀‫ ݕ‬ൌ •‹ ܽǡ which means the last vector should
end up on the unit circle, as shown in Fig. 2.
Fig. 2.
The last vector ends up on the unit circle (five stages)
To make the last vector end up on the unit circle, the length of the first
vector‫ݎ‬ሺͳሻ should be known. There are many ways to calculate‫ݎ‬ሺͳሻ. In
this thesis, (5) is used to get‫ݎ‬ሺͳሻ. When‫ݎ‬ሺ͸ሻ ൌ ͳand‫ݒ‬ሺͷሻ ൌ ͵Ǥͷͺι, given
in (2), the result can be obtained that ‫ݎ‬ሺͳሻ ൌ ͲǤ͸Ͳͺͺ and the first vector
coordinates is (0.6088, 0). In this case, the output corresponds to the cosine
and sine value of the input angle. The architecture in hardware is shown in
Fig. 11 in section 4.2.1.
‫ݎ‬ሺ݅ െ ͳሻ ൌ ‫ݎ‬ሺ݅ሻ ൈ …‘•ሺ‫ݒ‬ሺ݅ െ ͳሻሻ
15
(5)
2.2. Two starting angles
The idea of this project is to use more than one start angle. This will
reduce the number of stages, while the MUXes will be introduced into the
design, which will be discussed in section 4. In Fig. 3, 30 degrees and 60
degrees are used as the two input angles.
Fig. 3.
Two starting angles (three stages)
In Fig. 3, there will be a comparison between the input angle and the 45
degree angle, which is implemented with two MUXes in hardware. The
architecture is shown in Fig. 13 in section 4.2.3. The MUXes in hardware
are used to determine the starting vector coordinate.
When the input is a 60 degree angle, which is larger than 45 degrees,
the vector will rotate in the red region. The starting vector coordinate is
(‫ݔ‬ଵ ǡ ‫ݕ‬ଵ ). After three rotations, the output will be (‫ݔ‬ଷ ǡ ‫ݕ‬ଷ ).
When the input is a 30 degree angle, which is smaller than 45 degrees,
the vector will rotate in the blue region. The starting vector coordinate is
(‫ݔ‬ଵᇱ ǡ ‫ݕ‬ଵᇱ ). After three rotations, the output will be (‫ݔ‬ଷᇱ ǡ ‫ݕ‬ଷᇱ ).
16
When the input angle is exactly 45 degrees, both red lines and blue lines
are suitable for the vector rotations.
‫ݒ‬ሺͳሻ ൌ ൜
ܽ‫݊ܽݐܿݎ‬ሺͳሻ െ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ ൌ ͳͺǤͶ͵
ܽ‫݊ܽݐܿݎ‬ሺͳሻ ൅ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ ൌ ͹ͳǤͷ͸
(6)
Equation (6) indicates that, the starting vector angles are 18.4349
degrees for the input angle below 45 degrees and 71.5651 degrees for angles
above 45 degrees. Fig. 11 in section 4.2.1 shows the original unrolled
CORDIC architecture, where the starting vector’s coordinates for Fig. 3 can
be obtained.
From the left of the architecture in Fig. 11, the first stage, is shown in
(7), where ‫ݔ‬ଵ is the ‫ݎ‬ሺͳሻ in section 2.1.
‫ݕ‬ଶ ൌ ‫ݔ‬ଵ
ቄ‫ ݔ‬ൌ ‫ݔ‬
ଶ
ଵ
(7)
‫ݔ‬ଶ ͵‫ݔ‬ଵ
ൌ
ൌ ͲǤͻͳ͵ʹ
ʹ
ʹ
൞
‫ݕ‬ଶ ‫ݔ‬ଵ
‫ݔ‬ଷ ൌ ‫ݔ‬ଶ െ ൌ
ൌ ͲǤ͵ͲͶͶ
ʹ
ʹ
(8)
‫ݔ‬ଶ ‫ݔ‬ଵ
ൌ
ൌ ͲǤ͵ͲͶͶ
ʹ
ʹ
൞
‫ݕ‬ଶ ͵‫ݔ‬ଵ
ൌ ͲǤͻͳ͵ʹ
‫ݔ‬ଷ ൌ ‫ݔ‬ଶ ൅ ൌ
ʹ
ʹ
(9)
‫ݕ‬ଷ ൌ ‫ݕ‬ଶ ൅
‫ݕ‬ଷ ൌ ‫ݕ‬ଶ െ
In the second stage, the starting vector coordinates can be obtained.
When the input is larger than 45 degrees, the coordinates shown in (8) will
be used and when lesser than 45 degrees, the coordinate shown in (9) will
be used. This will make the architecture in Fig. 13 has two stages lesser than
the one in Fig. 11.
2.3. Four starting angles
Fig. 4 shows the four starting angles algorithm. 4 degrees, 30 degrees,
60 degrees, and 86 degrees are used as the four input angles. The rotations
depends on the comparisons between the input angle and three different
degrees, 18.43 degrees, 71.56 degrees and 45 degrees from (6), which is
implemented with six MUXes in a hardware design. The architecture is
shown in Fig. 14 in section 4.2.4.
17
Fig. 4.
Four starting angles (two stages)
ܽ‫݊ܽݐܿݎ‬ሺͳሻ െ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ െ ܽ‫݊ܽݐܿݎ‬ሺͳΤͶሻ ൌ ͶǤͶͲ
ܽ‫݊ܽݐܿݎ‬ሺͳሻ െ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ ൅ ܽ‫݊ܽݐܿݎ‬ሺͳΤͶሻ ൌ ͵ʹǤͶ͹
‫ݒ‬ሺͳሻ ൌ ൞
ܽ‫݊ܽݐܿݎ‬ሺͳሻ ൅ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ െ ܽ‫݊ܽݐܿݎ‬ሺͳΤͶሻ ൌ ͷ͹Ǥͷ͵
ܽ‫݊ܽݐܿݎ‬ሺͳሻ ൅ ܽ‫݊ܽݐܿݎ‬ሺͳΤʹሻ ൅ ܽ‫݊ܽݐܿݎ‬ሺͳΤͶሻ ൌ ͺͷǤ͸Ͳ
(10)
Equation (10) indicates the starting vector angles for the input angles in
four different regions. When the input angle is from 0 degree to 18.43
degrees, the starting vector angle is 4.40 degrees. When the input angle is
from 18.43 degrees to 45 degrees, the starting vector angle is 32.47 degrees.
When the input angle is from 45 degrees to 71.56 degrees, the starting
vector angle is 57.53 degrees. When the input angle is from 71.56 degrees to
90 degrees, the starting vector angle is 85.60 degrees.
With the same methodology, described in section 2.2, the third stage in
Fig. 11 is shown in (11), (12), (13), and (14).
18
‫ݔ‬ଷ ‫ݔ‬ଵ ͵‫ݔ‬ଵ ‫ݔ‬ଵ
ൌ െ
ൌ
ൌ ͲǤͲ͹͸ͳ
Ͷ
ʹ
ͺ
ͺ
൞
‫ݕ‬ଷ ͵‫ݔ‬ଵ ‫ݔ‬ଵ ͳ͵‫ݔ‬ଵ
‫ݔ‬ସ ൌ ‫ݔ‬ଷ ൅ ൌ
൅ ൌ
ൌ ͲǤͻͺͻ͵
Ͷ
ʹ
ͺ
ͺ
(11)
‫ݔ‬ଷ ‫ݔ‬ଵ ͵‫ݔ‬ଵ ͹‫ݔ‬ଵ
ൌ ൅
ൌ
ൌ ͲǤͷ͵ʹ͹
Ͷ
ʹ
ͺ
ͺ
൞
‫ݕ‬ଷ ͵‫ݔ‬ଵ ‫ݔ‬ଵ ͳͳ‫ݔ‬ଵ
‫ݔ‬ସ ൌ ‫ݔ‬ଷ െ ൌ
െ ൌ
ൌ ͲǤͺ͵͹ͳ
Ͷ
ʹ
ͺ
ͺ
(12)
‫ݔ‬ଷ ͵‫ݔ‬ଵ ‫ݔ‬ଵ ͳͳ‫ݔ‬ଵ
െ ൌ
ൌ ͲǤͺ͵͹ͳ
‫ݕ‬ସ ൌ ‫ݕ‬ଷ െ ൌ
Ͷ
ʹ
ͺ
ͺ
൞
‫ݕ‬ଷ ‫ݔ‬ଵ ͵‫ݔ‬ଵ ͹‫ݔ‬ଵ
‫ݔ‬ସ ൌ ‫ݔ‬ଷ ൅ ൌ ൅
ൌ
ൌ ͲǤͷ͵ʹ͹
Ͷ
ʹ
ͺ
ͺ
(13)
‫ݔ‬ଷ ͵‫ݔ‬ଵ ‫ݔ‬ଵ ͳ͵‫ݔ‬ଵ
൅ ൌ
ൌ ͲǤͻͺͻ͵
‫ݕ‬ସ ൌ ‫ݕ‬ଷ ൅ ൌ
Ͷ
ʹ
ͺ
ͺ
൞
‫ݕ‬ଷ ‫ݔ‬ଵ ͵‫ݔ‬ଵ ‫ݔ‬ଵ
‫ݔ‬ସ ൌ ‫ݔ‬ଷ െ ൌ െ
ൌ
ൌ ͲǤͲ͹͸ͳ
Ͷ
ʹ
ͺ
ͺ
(14)
‫ݕ‬ସ ൌ ‫ݕ‬ଷ െ
‫ݕ‬ସ ൌ ‫ݕ‬ଷ ൅
When the input angle is from 0 degree to 18.43 degrees, the coordinate
shown in (11) will be used. When the input angle is from 18.43 degrees to
45 degrees, the coordinate shown in (12) will be used. When the input angle
is from 45 degrees to 71.56 degrees, the coordinate shown in (13) will be
used. When the input angle is from 71.56 degrees to 90 degrees, the
coordinate shown in (14) will be used. This will make the architecture in Fig.
14 has three stages lesser than the one in Fig. 11.
19
20
3. Software simulation
In this section, the outputs of the designs are simulated in
MATLAB. The error behavior and accuracy tests are done in this
section as well.
3.1. Outputs simulation
In this section, the original CORDIC architecture, shown in Fig. 11, is
simulated in MATLAB. The outputs of the sine and cosine functions are
shown in Fig. 5. To improve the accuracy, 18 stages are used. Note that all
architectures are simulated with the same result.
1.2
1
Outputs
0.8
0.6
0.4
0.2
Approximated cosine outputs
Approximated sine outputs
Theoretical sine outputs
Theoretical cosine outputs
0
-0.2
0
10
Fig. 5.
20
30
40
50
Degrees
60
70
80
90
MATLAB simulation of CORDIC design
Fig. 5 shows the approximated cosine outputs, theoretical cosine
outputs, approximated sine outputs, and theoretical sine outputs with
various input degrees. The green line and the black line, the blue line and
the red line match each other. That is, the simulated CORIDC functions
approximate the theoretical (floating point) functions, which is acceptable.
21
3.2. Error behavior
The errors are tested with all possible 32768 input angles,
where͵ʹ͹͸ͺ ൌ ʹଵହ . It means 15-bit inputs in hardware. Fig. 6 shows the
error of the CORDIC cosine function after truncation to 15 bits.
-5
0.5
x 10
0
Error
-0.5
-1
-1.5
-2
-2.5
0
10
Fig. 6.
20
30
40
50
Degrees
60
70
80
90
The cosine error after truncation
Fig. 6 indicates that with the increasing of the input angles, the error is
increasing from െʹǤͶ ൈ ͳͲିହ toͲǤͶ ൈ ͳͲିହ. By displaying a histogram of
the error function, the distribution of cosine error can be shown in Fig. 7.
22
4500
4000
3500
Number of errors
3000
2500
2000
1500
1000
500
0
-2.5
-2
-1.5
Fig. 7.
-1
Error
-0.5
0
0.5
-5
x 10
The cosine error distribution
Fig. 7 indicates that the cosine error peak is placed atെʹǤͶ ൈ ͳͲିହ .
Most of the errors are placed away from zero, which is a drawback. The
absolute cosine error in dB is shown in Fig. 8.
-80
The absolute cosine error in dB
-100
-120
-140
-160
-180
-200
0
10
Fig. 8.
20
30
40
50
Degrees
60
70
The absolute cosine error in dB
23
80
90
Since the combination of binary numbers and decibel (dB) matches
very well, displaying the errors in dB can simplify the understanding of the
errors. The errors in dB are shown in (15), where ‫ ݔ‬is the error.
‫ݔ‬ௗ஻ ൌ ʹͲ݈‫݃݋‬ଵ଴ ȁ‫ݔ‬ȁ
(15)
3.3. Accuracy
In binary, one bit can presents 2 results, 0 and 1. That is,ʹͲ݈‫݃݋‬ଵ଴ ሺʹሻ ൎ
͸݀‫ ܤ‬corresponds to 1 binary bit in resolution. The logarithmic error divided
by െʹͲ݈‫݃݋‬ଵ଴ ሺʹሻ gives the number of the bits, as shown in (16).
݊ ൌ െ‫ݔ‬ௗ஻ ΤʹͲ݈‫݃݋‬ଵ଴ ሺʹሻ
(16)
Fig. 9 shows the number of error bits on all possible input angles, where
15.33 is the peak, which gives 15.33 bits accuracy.
34
32
30
Sine
Cosine
Number of bits
28
26
24
22
20
18
16
14
0
10
Fig. 9.
20
30
40
50
Degrees
60
70
80
Degrees and the number of error bits
24
90
4. Hardware implementation
In this section, the design flow for the thesis and the five different
unrolled CORDIC architectures are implemented in hardware.
IO65LPHVT_SF_1V8_50A_7M4X0Y2Z_nom_1.00V_1.80V_25C.db and
CORE65LPHVT_nom_1.20V_25C.db are the libraries used in design
compiler and prime time. In other words, a low power high VT (LPHVT)
technology is used at a supply voltage at 1.2V.
4.1. Design flow
Design
requirements
Matlab
simualtion
Matlab
RTL simulation
ModelSim
verfication
Matlab
synthesis
Design
complier
Post synthesis
verfication
ModelSim
Power analysis
Primetime
tool
Fig. 10. The design flow
Fig. 10 shows the design flow for the thesis. To implement the
CORDIC design in hardware, it should be starting with the MATLAB
simulation. The design is coded in VHDL after that the MATLAB model
25
satisfies the requirements of the design. By using ModelSim to simulate the
VHDL code, the result will be compared to the MATLAB model. A netlist
and a Value Change Dump (VCD) file are generated by design complier and
ModelSim separately, which are used for power analysis with the primetime
tool.
4.2. Hardware architectures and simulation results
In this section, five different CORDIC architectures are implemented in
hardware. One is the original CORDIC architecture. The other four
architectures reduce the complexity in different levels.
4.2.1. The original unrolled CORDIC architecture
The original CORDIC architecture, which corresponds to the rotations
in Fig. 2, is shown in Fig. 11. Only adders and inverters are used in the
architecture. The multiplications such as ͳΤʹ , ͳΤͶ ǡ ƒ†ͳΤͺ will be
achieved by hardware wired shifts on the ASIC. Note that the figure only
shows the first 5 stages of the 18-stage CORDIC that is used for the
simulations.
α
45
27
14
7
ADD
SUB
ADD
SUB
ADD
SUB
ADD
SUB
1
0
y(1)
y(2)
ADD
SUB
sgn
-/+
y(3)
ADD
SUB
1/2
x(1)
ADD
SUB
y(4)
+/-
1/8
+/-
1/4
1/8
ADD
SUB
x(3)
-/+
sgn
y(5)
ADD
SUB
1/4
ADD
SUB
x(2)
-/+
ADD
SUB
1/2
1
sgn
sgn
Sin(α)
1/16
+/-
1/16
ADD
SUB
x(4)
-/+
ADD
SUB
+/ADD
SUB
cos(α)
x(5)
Fig. 11. An unrolled 5-stage CORDIC architecture
In Fig. 11, ‫ݔ‬ሺͳሻ ൌ ͲǤ͸Ͳͺͺ and ‫ݕ‬ሺͳሻ ൌ Ͳ are the first vector coordinates
and ܽ is the input angle. At the top, 45 degrees, 26.5651 degrees, 14.0362
degrees, and 7.1250 degrees are the coefficient angles got from (1). There
are 19 coefficient angles in the design. A total of 19 sign bits (sgn), is the
result from the comparisons between the input angle and the coefficients
26
angle, which is obtained in the upper adder row. The middle and the lower
adder row compute the approximations from the left to the right, where the
output is generated. The sign bits determine if the middle and the lower row
should be added or subtracted.
The power consumption at both 10MHz and maximum frequency,
which is 76.7MHz are shown in TABLE I and TABLE II, where the supply
voltage is 1.2V.
TABLE I.
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
TABLE II.
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
POWER AT 10MHZ
Frequency
10MHz
10MHz
10MHz
10MHz
Unit
nW
nW
nW
nW
LPHVT
129100
110600
131.5
239900
POWER AT MAXIMUM FREQUENCY
Frequency
76.7MHz
76.7MHz
76.7MHz
76.7MHz
Unit
nW
nW
nW
nW
LPHVT
1714000
1137000
291.8
2852000
The results of synthesis are shown in TABLE III, TABLE IV, TABLE
V, and TABLE VI, where the highest speed, minimum area, and maximum
area are obtained. The timing at minimum area constraints is tested under a
10MHz frequency, which is shown in TABLE IV.
TABLE III. TIMING AT MAXIMUM SPEED CONSTRAINTS
Voltage
Speed
Time
Unit
V
MHz
ns
LPHVT
1.2
76.7
13.03
TABLE IV. TIMING AT MINIMUM AREA CONSTRAINTS
Voltage
Speed
Time
Unit
V
MHz
ns
27
LPHVT
1.2
31
32.22
TABLE V.
AREA AT
MINIMUM AREA CONSTRAINTS
Unit
V
um2
Voltage
Area
LPHVT
1.2
35482
TABLE VI. AREA AT MAXIMUM SPEED CONSTRAINTS
Unit
V
um2
Voltage
Area
LPHVT
1.2
100989
4.2.2. First stage removed architecture
A CORDIC architecture where the first stage is removed is shown in
Fig. 12. This architecture has 2 adders less than the original one, i.e. 2 times
19 adder cells out of a total of 19 times 19 adder cells, less than the original
design. The first vector coordinates are‫ݔ‬ሺʹሻ ൌ ͲǤ͸Ͳͺͺ and‫ݕ‬ሺʹሻ ൌ ͲǤ͸Ͳͺͺ.
α
45
27
14
7
ADD
SUB
ADD
SUB
ADD
SUB
ADD
SUB
1
sgn
-/+
y(2)
y(3)
ADD
SUB
x(2)
-/+
y(4)
ADD
SUB
1/2
1/2
sgn
sgn
+/-
1/4
1/8
+/-
1/8
ADD
SUB
x(3)
y(5)
ADD
SUB
1/4
ADD
SUB
-/+
sgn
Sin(α)
1/16
+/-
1/16
ADD
SUB
x(4)
-/+
ADD
SUB
+/ADD
SUB
cos(α)
x(5)
Fig. 12. First stage removed architecture
The power consumption at both 10MHz and maximum frequency,
which is 76.7MHz are shown in TABLE VII and TABLE VIII, where the
supply voltage is 1.2V.
28
TABLE VII. POWER AT 10MHZ
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
Frequency
10MHz
10MHz
10MHz
10MHz
TABLE VIII.
Unit
nW
nW
nW
nW
LPHVT
121700
104500
131.5
226300
POWER AT MAXIMUM FREQUENCY
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
Frequency
76.7MHz
76.7MHz
76.7MHz
76.7MHz
Unit
nW
nW
nW
nW
LPHVT
1692000
1109000
296.2
2802000
The results of synthesis are shown in TABLE IX, TABLE X, TABLE
XI, and TABLE XII, where the highest speed, minimum area, and
maximum area are obtained.
TABLE IX. TIMING AT MAXIMUM SPEED CONSTRAINTS
Voltage
Speed
Time
TABLE X.
Unit
V
MHz
ns
LPHVT
1.2
76.7
13.03
TIMING AT MINIMUM AREA CONSTRAINTS
Voltage
Speed
Time
TABLE XI. AREA AT
Voltage
Area
Unit
V
MHz
ns
LPHVT
1.2
31
32.21
MINIMUM AREA CONSTRAINTS
Unit
V
um2
LPHVT
1.2
35394
TABLE XII. AREA AT MAXIMUM SPEED CONSTRAINTS
Voltage
Area
Unit
V
um2
29
LPHVT
1.2
100484
4.2.3. Two stages eliminated architecture
A CORDIC architecture where two stages are removed is shown in Fig.
13, which corresponds to the rotations in Fig. 3. This architecture has four
adders less than the original one. Two of these adders are replaced by two
MUXes, which are controlled by the first sgn bit in the upper adder row.
The two hardware wired shifts (1/2) are also removed compared to the
original architecture. As described in section 2.2, the first vector coordinates
ሺ‫ݔ‬ሺ͵ሻǡ ‫ݕ‬ሺ͵ሻሻ are shown in (8) and (9).
α
45
27
14
7
ADD
SUB
ADD
SUB
ADD
SUB
ADD
SUB
sgn
1
y(3)1
1
y(3)2
0
sgn
sgn
-/+
y(4)
ADD
SUB
x(3)1
1
x(3)2
0
1/8
+/-
y(5)
ADD
SUB
1/4
1/4
-/+
sgn
1/8
ADD
SUB
-/+
ADD
SUB
Sin(α)
1/16
+/-
1/16
ADD
SUB
x(4)
+/ADD
SUB
cos(α)
x(5)
Fig. 13. Two stages are removed in the architecture
The power consumption at both 10MHz and maximum frequency,
which is 83.1MHz are shown in TABLE XIII and TABLE XIV, where the
supply voltage is 1.2V.
TABLE XIII.
Power consumption at
Net switching power
Cell internal power
Cell leakage power
Total power
POWER AT 10MHZ
Frequency
10MHz
10MHz
10MHz
10MHz
30
Unit
nW
nW
nW
nW
LPHVT
115700
96490
126.2
212400
TABLE XIV.
Power consumption at
Net switching power
Cell internal power
Cell leakage power
Total power
POWER AT MAXIMUM FREQUENCY
Frequency
83.1MHz
83.1MHz
83.1MHz
83.1MHz
Unit
nW
nW
nW
nW
LPHVT
1615000
1071000
319.2
2687000
The results of the synthesis are shown in TABLE XV, TABLE XVI,
TABLE XVII, and TABLE XVIII, where the highest speed and minimum
area, are obtained.
TABLE XV. TIMING AT MAXIMUM SPEED CONSTRAINTS
Voltage
Speed
Time
TABLE XVI.
Unit
V
MHz
ns
LPHVT
1.2
83.1
12.03
TIMING AT MINIMUM AREA CONSTRAINTS
Voltage
Speed
Time
TABLE XVII.
Unit
V
MHz
ns
AREA AT
Voltage
Area
TABLE XVIII.
Unit
V
um2
LPHVT
1.2
31.5
31.78
MINIMUM AREA CONSTRAINTS
LPHVT
1.2
33660
AREA AT MAXIMUM SPEED CONSTRAINTS
Voltage
Area
Unit
V
um2
LPHVT
1.2
109840
4.2.4. Three stages eliminated architecture
A CORDIC architecture where three stages are removed is shown in
Fig. 14, which corresponds to the rotations in Fig. 4. This architecture has 6
adders less than the original one. Four of these adders are replaced by six
MUXes, which are controlled by the first two sgns in the upper adder row.
As described in section 2.3, the first vector coordinates ሺ‫ݔ‬ሺͶሻǡ ‫ݕ‬ሺͶሻሻ are
shown in (11), (12), (13), and (14).
31
α
45
27
14
ADD
SUB
ADD
SUB
ADD
SUB
sgn
1
y(3)1
y(3)2
1
0
y(3)3
y(3)4
1
0
1
0
x(3)3
x(3)4
1
0
ADD
SUB
sgn
sgn
y(4)
sgn
y(5)
-/+
ADD
SUB
1
x(3)1
x(3)2
7
-/+
ADD
SUB
Sin(α)
0
1/8
1/8
1
1/16
+/-
1/16
ADD
SUB
0
x(4)
+/ADD
SUB
cos(α)
x(5)
Fig. 14. Three stages are removed architecture
The power consumption at both 10MHz and maximum frequency,
which is 90.7MHz are shown in TABLE XIX and TABLE XX, where the
supply voltage is 1.2V.
TABLE XIX.
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
POWER AT 10MHZ
Frequency
10MHz
10MHz
10MHz
10MHz
Unit
nW
nW
nW
nW
LPHVT
97410
88820
123.5
186400
TABLE XX. POWER AT MAXIMUM FREQUENCY
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
Frequency
90.7MHz
90.7MHz
90.7MHz
90.7MHz
Unit
nW
nW
nW
nW
LPHVT
1396000
922900
308.8
2319000
The results of synthesis are shown in TABLE XXI, TABLE XXII,
TABLE XXIII, and TABLE XXIV, where the highest speed, minimum area,
and maximum area are obtained.
TABLE XXI.
TIMING AT MAXIMUM SPEED CONSTRAINTS
Voltage
Speed
Time
Unit
V
MHz
ns
32
LPHVT
1.2
90.7
11.03
TABLE XXII.
TIMING AT MINIMUM AREA CONSTRAINTS
Unit
V
MHz
ns
Voltage
Speed
Time
TABLE XXIII.
LPHVT
1.2
32.3
30.93
AREA AT
MINIMUM AREA CONSTRAINTS
Unit
V
um2
Voltage
Area
TABLE XXIV.
LPHVT
1.2
33407
AREA AT MAXIMUM SPEED CONSTRAINTS
Unit
V
um2
Voltage
Area
LPHVT
1.2
119512
4.2.5. Final architecture
A CORDIC architecture, where three stages and one upper adder are
removed is shown in Fig. 15. The first adder of the coefficients angle is
replaced by a MUX and a Sgn detector compared to the architecture in Fig.
16. This means seven MUXes and a Sgn detector are introduced into the
architecture. The first vector coordinates ሺ‫ݔ‬ሺͶሻǡ ‫ݕ‬ሺͶሻሻ are also presents in
(11), (12), (13), and (14).
45+27
45-27
1
0
ADD
SUB
α
14
7
ADD
SUB
ADD
SUB
sgn
sgn
Sgn
detector
y(3)1
y(3)2
1
0
y(3)3
y(3)4
1
0
0
x(3)1
x(3)2
1
0
1
y(4)
1
0
y(5)
ADD
SUB
1
1/8
-/+
ADD
SUB
Sin(α)
1/16
+/-
1/8
x(4)
+/-
1/16
ADD
SUB
0
x(3)3
x(3)4
-/+
sgn
ADD
SUB
x(5)
Fig. 15. Final architecture
33
cos(α)
The Sgn detector is used to detect if the input angle is larger than 45
degrees or not. If the input is larger than 45 degrees, the Sgn detector’s
output is 1, if not the Sgn detector’s output is 0. The Sgn detector is realized
by using the upper 6 bits of the input angle and it consists of the inverters
and NAND gates in hardware. The architecture of the Sgn detector is shown
in Fig. 16.
αbit21
αbit20
αbit19
αbit18
αbit17
αbit16
Fig. 16. The Sgn detector
The logic function of the Sgn detector is shown in (17).
ܵ݃݊ ൌ ܽ௕௜௧ଶଵ ൅ ܽ௕௜௧ଶ଴ ܽ௕௜௧ଵଽ ൅ ܽ௕௜௧ଶ଴ ܽ௕௜௧ଵ଼ ܽ௕௜௧ଵ଻ ܽ௕௜௧ଵ଺
(17)
The power consumption at both 10MHz and maximum frequency,
which is 99.6MHz are shown in TABLE XXV and TABLE XXVI, where
the supply voltage is 1.2V.
TABLE XXV.
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
TABLE XXVI.
Power consumption
Net switching power
Cell internal power
Cell leakage power
Total power
POWER AT 10MHZ
Frequency
10MHz
10MHz
10MHz
10MHz
Unit
nW
nW
nW
nW
LPHVT
91220
83460
123.5
174800
POWER AT MAXIMUM FREQUENCY
Frequency
99.6MHz
99.6MHz
99.6MHz
99.6MHz
34
Unit
nW
nW
nW
nW
LPHVT
1211000
932500
316.1
2044000
The results of synthesis are shown in TABLE XXVII, TABLE XXVIII,
TABLE XXIX, and TABLE XXX, where the highest speed, minimum area,
and maximum area are obtained.
TABLE XXVII.
TIMING AT MAXIMUM SPEED CONSTRAINTS
Voltage
Speed
Time
TABLE XXVIII.
Voltage
Speed
Time
TABLE XXIX.
Voltage
Area
TABLE XXX.
Unit
V
MHz
ns
LPHVT
1.2
99.6
10.04
TIMING AT MINIMUM AREA CONSTRAINTS
Unit
V
MHz
ns
AREA AT
Unit
V
um2
LPHVT
1.2
32.3
30.95
MINIMUM AREA CONSTRAINTS
LPHVT
1.2
33394
AREA AT MAXIMUM SPEED CONSTRAINTS
Voltage
Area
Unit
V
um2
35
LPHVT
1.2
135193
36
5. Results
In this section, a test setup for the designs is introduced. The area,
timing information, and power consumption are also analyzed.
5.1. Test setup
Fig. 17 shows the test setup of the designs. In this design, the
input to the testbench is generated into a text file by software
simulation. The output from the top design is also verified by the
software.
output
Testbench
clk
rst
Top
design
input
enable
Software
simulation
Fig. 17. The design test setup
5.2. The area
The minimum area is estimated by set_max_area 0 script in design
complier. The areas, using the minimum area constraint, for the five
CORDIC architectures are shown in Fig. 18.
37
4
14
x 10
Origianal architecture
First stage eliminated architecture
Two stages eliminated architecture
Three stages eliminated architecture
Final architecture
12
area (um2)
10
8
6
4
2
0
10
20
30
40
50
60
Fequency (MHz)
70
80
90
100
Fig. 18. Area at different frequencies
The minimum areas can be shown in TABLE XXXI. The final
architecture has the lowest area, 33200um2.
TABLE XXXI.
AREA AT MINIMUM AREA CONSTRAINTS
Architectures
Minimum area (um2)
The original architecture
36067
First stage eliminated architecture
35966
Two stages eliminated architecture
33999
Three stages eliminated architecture
33389
Final architecture
33200
TABLE XXXI indicates that with reduced complexity, the minimum
areas of the designs are also reduced. The percentage of the area reduction is
7.9%. The areas at maximum speed constraints for the five architectures are
shown in TBALE XXXII.
38
TABLE XXXII.
AREA AT MAXIMUM SPEED CONSTRAINTS
Architectures
Area (um2)
The original architecture
100989
First stage eliminated architecture
100484
Two stages eliminated architecture
109840
Three stages eliminated architecture
119512
Final architecture
135193
5.3. Timing information
The maximum speed is estimated when the area has not been set for any
value. When a highest clock frequency is specified, i.e. the slack is zero, the
clock frequency corresponds to the maximum speed. The timing
information gives the bottleneck at the highest frequency, shown in TABLE
XXXIII.
TABLE XXXIII.
TIMING AT MAXIMUM SPEED CONSTRAINTS
Architectures
Critical path (ns) Frequency (MHz)
The original architecture
13.03
76.7
First stage eliminated architecture
13.03
76.7
Two stages eliminated architecture
12.03
83.1
Three stages eliminated architecture
11.03
90.7
Final architecture
10.04
99.6
Table XXXIII indicates that with the complexity reduced, the maximum
frequencies of the designs are also increased. More optimizations
architecture compared to the others architectures, has the highest speed
10.04ns with ͻǤͻ͸ ൈ ͳͲ଻ Hz frequency. The speed got 22.9% higher
39
performance. The timing at minimum area constraints are shown in
TABLEXXXIV.
TABLE XXXIV.
TIMING AT MINIMUM AREA CONSTRAINTS
Architectures
Time (ns)
Speed (MHz)
The original architecture
32.22
31
First stage eliminated architecture
32.21
31
Two stages eliminated architecture
31.78
31.5
Three stages eliminated architecture
30.93
32.3
Final architecture
30.95
32.3
5.4. Power consumption
5.4.1. Power analysis
The CMOS transistors’ power consists of dynamic power and static
power, as shown in (18) [6].
ܲ௧௢௧ ൌ ܲௗ௬௡௔௠௜௖ ൅ ܲ௦௧௔௧௜௖
(18)
The dynamic power consists of the switching power and the internal
power, as shown in (19).
ܲௗ௬௡௔௠௜௖ ൌ ܲ௦௪௜௧௖௛௜௡௚ ൅ ܲ௜௡௧௘௥௡௔௟
(19)
The dynamic power can be written in (20).
ܲௗ௬௡௔௠௜௖ ൌ ܽ‫ ܸܥ‬ଶ ݂
(20)
Where the factor ܽ is the switching activity, ‫ ܥ‬is the node capacitance, ݂
is the clock frequency, and ܸ is the supply voltage. The static power comes
mainly from the sub threshold leakage current.
40
TABLE XXXV and TABLE XXXVI indicate the power consumption at
the maximum speed constraints and the power consumption at minimum
area constraints separately. The power at minimum area constraints is
tested under a 10MHz frequency.
TABLE XXXV.
POWER AT MAXIMUM SPEED CONSTRAINTS
Architectures
Switching
power
Internal
power
(mW)
(mW)
The original 76.7
architecture
1.714
1.137
291.8
2.852
First
stage 76.7
eliminated
architecture
1.692
1.109
296.2
2.802
Two stages 83.1
eliminated
architecture
1.615
1.071
319.2
2.687
Three stages 90.7
eliminated
architecture
1.396
0.9229
308.8
2.319
99.6
1.211
0.8325
316.1
2.044
Frequency
(MHz)
Final
architecture
41
Leakage
(nW)
Total
power
(mW)
TABLE XXXVI.
POWER AT MINIMUM AREA CONSTRAINTS
Architectures
Switchin
g power
Internal
power
(mW)
(mW)
The original 10
architecture
0.1291
0.1106
131.5
0.2399
First
stage 10
eliminated
architecture
0.1217
0.1045
131.5
0.2263
Two stages 10
eliminated
architecture
0.1157
0.0964
126.2
0.2124
Three stages 10
eliminated
architecture
0.09741
0.0882
123.5
0.1864
10
0.09122
0.08346
123.5
0.1748
Frequency
(MHz)
Final
architecture
Leakage
Total
power
(nW)
(mW)
The power consumption for the five different CORDIC architectures at
various frequencies is shown in Fig. 19.
7
10
6
10
Origianal architecture
First stage eliminated architecture
Two stages eliminated architecture
Three stages eliminated architecture
Final architecture
Power (nW)
5
10
4
10
3
10
2
10
1
10
2
10
3
10
4
5
10
10
Fequency (Hz)
6
10
Fig. 19. Power at various frequencies
42
7
10
8
10
Power (nW)
Origianal architecture
First stage eliminated architecture
Two stages eliminated architecture
Three stages eliminated architecture
Final architecture
5
10
6.63
10
6.65
10
6.67
10
6.69
10
Fequency (Hz)
6.71
10
6.73
10
Fig. 20. magnified diagram for Fig. 19
Fig. 19 shows the power consumption at different frequencies for the
five architectures described in section 4. Five curves are closed, but in Fig.
20, the magnified diagram for Fig. 19, indicates that the final architecture
cost the least dynamic power. We can also get that when the adder stages
are eliminated, the cost of power is less. There is a knee in the figure at low
frequencies. This is because at low frequencies the static power is much
larger than the dynamic power, and it changes only a little with the
frequency.
As a result the area is 7.9% lower in the final architecture. A substantial
improvement can be seen for the power consumption, which is 27.2% lower
in the final architecture compared to the original one and the speed, which is
22.9% higher.
Area much higher for maximum speed, because there are too many
logic gates used to achieve higher speed.
43
44
6. Conclusions
In this thesis, MUXes are used in hardware to reduce the complexity.
Five different CORIDC architectures are implemented with eliminating the
stages. The area, computational speed, accuracy, error behavior, and the
power consumption have been analyzed under the software simulation and
hardware implementation. The speed, minimum area and power
consumption have got optimized in different levels. As a result the area and
power consumption get 7.9% lower and 27.2% lower separately, and the
speed is 22.9% higher compared to the original unrolled CORDIC
architecture. It is also proved that in unrolled CORDIC architectures, the
reduction of the power consumption can be achieved by reducing the
complexity, which meets the aim of this thesis.
45
46
7. Future work
In this thesis, three stages are eliminated at most. There are more stages
can be eliminated for the reduction of the power consumption.
More than one Sgn detector can be introduced to reduce the coefficients
adder.
The number of the iteration should decrease to some extent, which can
also reduce the power consumption.
47
48
References
[1] Erik Hertz and Peter Nilsson, “Parabolic Synthesis Methodology
Implemented on the Sine Function”, in Proceedings of the 2009
International Symposium on Circuits and Systems (ISCAS’09), Taipei,
Taiwan, May 2427, 2009 .
[2] Gordon K. Smyth “Polynomial Approximation” in Encyclopedia of
Biostatistics ,John Wiley & Sons, Ltd, Chichester, Peter Armitage and
Theodore Colton, (ISBN 0471 975761) ,1998.
[3] J. E. Volder, “The CORDIC Trigonometric Computing Technique”,
IRE Transactions on Electronic Computers, vol. EC-8, no. 3, 1959, pp.
330–334.
[4] Hue, Y.H., “CORDIC-based VLSI architectures for digital signal
processing”, IEEE Signal Processing Magazine, pp. 16-35, ISSN:
1053-5888, July 1992.
[5] Peter Nilsson “Complexity Reductions in Unrolled CORDIC
Architectures,” in Proceedings of the IEEE 14th International
Conference on Electronics, Circuits and Systems (ICECS 2009), pp.
868-871, Hammamet, Tunisia, December 13-16, 2009.
[6] http://en.wikipedia.org/wiki/CMOS#Power:_switching_and_leakage
49
50
Appendix A
TABLE XXXVII.
α1
COEFFICIENT ANGLES
Decimal angle
Binary angle
45
00101101000000000000000
α2
26.565032958984375 00011010100100001010011
α3
14.036224365234375 00001110000010010100011
α4
7.125000000000000
00000111001000000000000
α5
3.576324462890625
00000011100100111000101
α6
1.789886474609375
00000001110010100011011
α7
0.895172119140625
00000000111001010010101
α8
0.447601318359375
00000000011100101001011
α9
0.223785400390625
00000000001110010100101
α10
0.111877441406250
00000000000111001010010
α11
0.055938720703125
00000000000011100101001
α12
0.027954101562500
00000000000001110010100
α13
0.013977050781250
00000000000000111001010
α14
0.006988525390625
00000000000000011100101
α15
0.003479003906250
00000000000000001110010
α16
0.001739501953125
00000000000000000111001
α17
0.000854492187500
00000000000000000011100
α18
0.000427246093750
00000000000000000001110
α19
0.000213623046875 00000000000000000000111
51
52
Series of Master’s theses
Department of Electrical and Information Technology
LU/LTH-EIT 2015-460
http://www.eit.lth.se
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement