Parabolic Synthesis and Non-Linear Interpolation Adeel Muhammad Hashmi Master’s Thesis

Parabolic Synthesis and Non-Linear Interpolation Adeel Muhammad Hashmi Master’s Thesis
Master’s Thesis
Parabolic Synthesis
and Non-Linear Interpolation
Adeel Muhammad Hashmi
Department of Electrical and Information Technology,
Faculty of Engineering, LTH, Lund University, January 2015.
Master’s Thesis
Parabolic Synthesis
and
Non-Linear Interpolation
By
Adeel Muhammad Hashmi
Department of Electrical and Information Technology
Faculty of Engineering, LTH, Lund University
SE-221 00 Lund, Sweden
Abstract
Computation and implementation of unary functions such as
trigonometric, logarithmic and exponential function have a vital
importance in modern applications, e.g., Digital Signal Processing,
computer graphics, wireless systems and virtual reality simulations.
Over the past few years many software solutions have been used,
which provide extreme precision but take a lot of computation time
for real-time applications. As compared to the software routines, a
hardware implementation of unary function is found to be a best
solution for real-time applications where fast and numerically
intensive solutions are required.
This thesis work presents an approximation of trigonometric
functions, i.e. Sine and Cosine using Parabolic Synthesis combined
with Non-Linear Interpolation. The architecture for the
approximation is designed and implemented in the stm65 CMOS
technology. There is a high degree of parallelism in the design
which makes it faster than other methodologies to calculate unary
functions. The same design can be used to implement various kinds
of unary function like logarithmic and exponential etc. with the
same architecture.
The design is compared, with respect to power consumption, area
and maximum speed, with the existing methodologies like the
CORDIC, Parabolic Synthesis, and the Parabolic Synthesis with
Linear Interpolation. It is found that the architecture has better
performance in terms of chip area, speed and power consumption.
i
ii
Acknowledgments
I would like to begin to express my sincere thanks and gratitude
towards Prof. Peter Nilsson and Erik Hertz, Supervisor, to
provide me with this opportunity to experiences this research
oriented MS Thesis entitled: “Parabolic Synthesis & NonLinear Interpolation”, in the field of Digital ASIC. I would
appreciate their explicit guidance, prolific command and
remarkable knowledge about efficient algorithms for
computation and implementation of unary functions using
innovative techniques. Without their continuous help and
guidance this thesis work would not have been possible.
Next, I would like to cordially thank Pia Bruhn, Program
Coordinator EIT Department, Lund University, for being a
guardian and helping me during my studies tenure. She was
really marvelous to tackle problems and difficulties that a
newcomer faces when they arrive in a new country; guided and
helped me out with the administrative issues in the best possible
manner whenever I requested for intervention and assistance.
My thanks and warm regards also goes to Anna Carlqvist and
Helene Von Wachenfelt, the International Master’s
Coordinators, for their help and guidance in study administration
and residence permit issues.
Then I wish to continue by thanking Dr. S.M. Yasser Sherazi
and Dr. Taimoor Abbas for providing me there sincere guidance
of how to tackle problems step-by-step and move forwards
towards progressive development. Also these two people have
iii
been a really good resource when it came to technical
discussions and knowledge sharing.
Now I will concentrate my attention to express my gratitude
towards my friends and colleagues here at Lund University,
without whom the time spent here in Sweden would not have
been joyful. I would appreciate whole heartedly my friends
Waqas Shafiq and Karrar Rizvi to help me out in the basic
understanding and developing my competencies and skill set
and giving me a push to complete this Thesis Work, while being
part of this Master’s Degree. I would like to thank Shabraiz
Muhammad for proof reading my thesis and guiding me with
Technical Report Writing Skills. I would gladly express my
gratitude towards colleagues Shoaib, Adnan, Naveed, Azhar,
Farhan, Sardar Sulaman, Aadil and Rizwan for the group
activities, bar-be-cues and evening gatherings. I would also like
to thank Jovita, Minna, Erica and Justyna for their love, care and
affection during my stay in Sweden and so on.
Last but the most important of all, I would like to thank my
family, especially my mother for all her love, care, affection,
support and prayers.
Adeel Muhammad Hashmi
iv
Table of Contents
ABSTRACT
I
ACKNOWLEDGMENTS
III
1
INTRODUCTION
1
2
PARABOLIC SYNTHESIS AND NON-LINEAR INTERPOLATION 5
2.1 Parabolic Synthesis
2.1.1
2.1.2
2.1.3
2.1.4
5
Normalization
First sub-function
Second sub-function
Sub-functions for
6
6
8
9
2.2 Interpolation
2.2.1
2.2.2
12
Linear Interpolation
Non-linear Interpolation
12
14
2.3 Parabolic Synthesis Combined with Interpolation
16
3
19
HARDWARE ARCHITECTURE
3.1 Preprocessing
20
3.2 Processing
20
3.2.1
3.2.2
Parabolic Synthesis
Parabolic Synthesis with Non-Linear Interpolation
20
22
3.3 Post processing
23
4
25
ERROR EVALUATION
4.1 Error Metrics
26
i
4.1.1
4.1.2
4.1.3
4.1.4
4.1.5
Maximum Absolute Error
Mean Error
Standard Deviation
Median Error
Root-Mean-Square
26
26
26
27
27
4.2 Error Distribution
27
5
29
ARCHITECTURE AND COEFFICIENTS APPROXIMATION
5.1 Architecture
5.1.1
5.1.2
5.1.3
29
Preprocessing
Processing
Post Processing
30
31
32
5.2 Coefficients Approximation
5.2.1
5.2.2
6
34
Linear part
Non-linear part
35
36
HARDWARE DESIGN
39
6.1 Preprocessing
40
6.2 Processing
40
6.3 Post Processing
44
6.4 Final Architecture
45
6.5 Word Lengths
46
7
51
IMPLEMENTATION AND ERROR BEHAVIOR
7.1 Optimization
52
7.2 Truncation
53
ii
7.3 Error Behavior
53
8
57
RESULTS
8.1 Synthesis
8.1.1
8.1.2
8.1.3
57
Area Results
Timing/Speed Results
Power Results
57
60
61
8.2 Existing Algorithms
64
9
69
CONCLUSIONS
10 FUTURE WORK
71
REFERENCES
73
LIST OF FIGURES
75
LIST OF TABLES
79
LIST OF ACRONYMS
81
iii
Chapter 1
1 Introduction
With the advent of Chip Technology the size of technical equipments and
electronics hardware have reduced significantly. This is being perceived as
the future of Next Generation Technologies. In olden days, canon sized
devices were used for complex computations and calculations.
Nowadays digital circuits and devices of mere existence possessing the
ability to perform similar objectives by utilizing these limited resources
namely: memory, time of execution and power.
We can observe in our surroundings that there is increase in demand for
ultra-low weight, less power consuming and super-efficient devices over
the past few years. General public is unaware of the challenges faced by the
researchers in order to attain these said objectives. The researchers try to
make ends meet by working to devise ways and methods to produce
equipments that can provide the optimum performance with effective
utilization of the aforementioned limited resources. This Master’s Thesis
comprises of a study and comparative analysis conducted to ensure usage of
the Parabolic Synthesis and Non-Linear Interpolation. It also provides the
knowledge about how this next generation computational methodology can
be fruitful, if their architectures are implemented in real time systems.
Computation and implementation of unary functions such as trigonometric,
logarithmic and exponential function have a vital importance in modern
applications, e.g., Digital Signal Processing (DSP), computer graphics
(2D/3D), wireless systems and virtual reality simulations. Over the past few
1
years many software solutions have been used, which provide extreme
precision but take a lot of computation time for real-time applications. As
compared to the software routines, a hardware implementation of unary
function is found to be a best solution for real-time applications where fast
and numerically intensive solutions are required.
There are different methods that are employed for hardware implementation
of unary functions. The easiest method is by using look-up table [1] [2]. It
is an efficient method for low precision computations where the input
word-length is between 12-16 bits which corresponds to a table size of
4096-65536 words.
(1.1)
Where n is the input word-length.
It can be seen in (1.1) that the table size will increase exponentially with the
increased number of input word-length. Therefore for high precision
applications the execution time will be large and unacceptable in certain
cases.
With the evolution of the various industrial sectors like DSP, Robotics,
Communication Systems, there has been an increase in demand of high
speed hardware implementations. A variety of solutions have been
proposed ranging from implementation of algorithms that utilize the lookup
tables for low precision computations [9]. Various other hardware
approaches have been implemented e.g. CORDIC [9] & Polynomial based
approximation e.g. Taylor Series Implementation [9] [14].
Polynomial based approximation is another method that is being used for
computing the unary functions. It has an advantage of being table-less but it
introduces large number of computational complexities since it is
performed with multipliers and adders. The computational complexity of
2
this method can be reduced by combining it with look-up table methods.
Taylor polynomial is an example of such scheme [3]. Designing an efficient
approximation for the function to be approximated is the key in polynomial
based approximations [4].
COordinate Rotation DIgital Computer (CORDIC) is a widely used
algorithm for hardware implementation of basic elementary function like
logarithmic, trigonometric, exponential etc. It was proposed by Jack
E.Volder in 1959 to provide the real-time digital solution for navigational
computations [5] [6]. It is an iterative method that requires simple shift and
add operation together with a small look-up table [7]. Therefore it is used in
designs where different design aspects like critical speed, low area and low
power consumption are of vital importance. Since it is an iterative method,
it produces one extra bit of accuracy in each rotation [8]. For higher
accuracy applications, CORDIC method will require more iterations in
order to get better resolution. That will increase the execution time of the
operation therefore it will be insufficient for very high speed applications.
A new methodology Parabolic Synthesis has recently been proposed by
Erik Hertz and Peter Nilsson to perform the realization of unary functions
like trigonometric, logarithms as well as division and square-root functions
in hardware [9] [10]. The parallel architecture of this method increases the
performance and decrease the power, area and speed limitations compared
to previously mentioned algorithms including CORDIC. The main feature
of parabolic architecture is that it can be used for the realization of different
unary functions. Only the coefficients need to be changed for different
functions but the hardware will remain fixed. Thus the design will remain
the same and can be directly used without any changes for other
applications [8].
In this thesis, a methodology is presented by combining parabolic synthesis
with non-linear interpolation for the realization of trigonometric functions
sine and cosine. Parabolic methodology is a synthesis of second order
functions which provides accuracy depending on the number of second
order functions [7]. In the combined methodology, the accuracy depends on
3
the number of intervals in the non-linear interpolation. Furthermore, the
behavior and optimization of coefficients for the implementation of
trigonometric functions, sine and cosine, is discussed.
The proposed architecture is designed using two stages of parabolic
synthesis [11] where the second stage is implemented as a non-linear
interpolation in the stm65 CMOS technology. The design is simulated and
compared for accuracy, power consumption and performance. The core
area is also estimated. Synthesized VHDL is used in the project. Low
Power High VT and Low Power Low VT transistors are used, in separate
designs. Three different supply voltages, VDD = {1.00, 1.10, 1.20} volts
are used. The power and energy consumption, both static and dynamic, are
estimated. The design is compared, with respect to power consumption,
area and maximum speed, with the existing methodologies like CORDIC,
Parabolic Synthesis, and Parabolic Synthesis with Linear Interpolation.
4
Chapter 2
2 Parabolic Synthesis and Non-Linear
Interpolation
2.1 Parabolic Synthesis
The Parabolic Synthesis Methodology is a Hardware Approach proposed by
Erik Hertz and Peter Nilsson in order to develop functions to perform
approximation in the hardware [9]. The implementation involves a parallel
architecture for providing solution to the complex computational problem
to reduce execution time. In parabolic synthesis methodology an
approximation of unary functions in hardware is dealt with.
This methodology is based on second order parabolic functions, called subfunctions sn(x) [7]. These sub-functions are multiplied together to found the
original function forg(x) as shown in (2.1) [14]. The original function is the
product of all sub-functions, when the number of sub-functions approaches
infinity. The sub-function must satisfy that the function is limited to the
range
and
(2.1)
In order to gradually develop sub-functions, we need to determine the first
help function. First help function is the ratio of original function and first
sub-function, i.e.
.
5
(2.2)
The individual help functions can be generalized to be evaluated as:
(2.3)
These help functions are in turn used to compute the values of sub functions
by performing normalization. These sub-functions are constructed as
second or polynomials depicting the parabolic functions [14].
2.1.1 Normalization
First the function to be approximated has to be normalized according to the
parabolic synthesis methodology. Normalization limits the function in a
numerical range to facilitate the hardware implementation. It must satisfy
that the function is limited to the range
and
Starting and ending coordinate should be (0,0) and less than (1,1)
respectively [14].
2.1.2 First sub-function
In order to develop the first sub-function,
, the original function,
, should cross two points i.e., (0,0) and (1,1) as shown in the Fig.
2.1.
6
1
forg(x)
x=y
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Figure 2.1: Comparison of original function,
The first sub-function,
by the (2.4).
0.7
0.8
0.9
1
, with straight line x=y
, is a second order parabolic function as define
(2.4)
The starting point, , of first sub-function,
, is calculated to be zero as
it crosses (0,0). As the function lies between the points, (0,0) and (1,1), the
slope is 1 [7] [9] [16]. Therefore, the first sub-function can be simplified
as shown in (2.5).
(2.5)
The coefficient
is computed according to (2.6).
7
(2.6)
2.1.3 Second sub-function
In order to make the total error smaller, the second sub-function,
, is
developed to approximate the value of first help function,
. A strictly
convex or concave first help function,
, can be developed from
original function,
, using (2.2) [16].
The second sub-function,
, can be defined as shown in (2.7).
(2.7)
1.1
f1(x)
1.08
1.06
1.04
1.02
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Figure 2.2: A strictly convex first help function,
.
As it can be seen in Fig. 2.2 that the second sub-function starts at a point
(0,1) and finishes at (1,1), so the starting point, , of second sub-function is
1 and the slope, , of the function is 0. Therefore the equation for second
sub-function can be reduced as shown in (2.8).
(2.8)
8
f1(x)
s2(x)
1.1
1.08
1.06
1.04
1.02
1
0
0.1
0.2
0.3
0.4
0.5
Figure 2.3: Comparison of first help function,
.
0.6
0.7
0.8
0.9
1
, with second sub-function,
In order to develop and verify second sub-function, it must cross the
starting point, middle point and the end point of the help function as shown
in Fig. 2.3.
2.1.4 Sub-functions for
In order to develop further sub-functions,
for
, same
methodology is applied as given in (2.2) and (2.3). However, the functions
will not be strictly convex or concave in the range of 0 to 1. For example,
the function,
, shown in fig. 2.4 is a pair of convex and concave
functions. The first function is in the range
and the second
function is in the range
.
Therefore the second help function can be expressed as (2.9).
(2.9)
9
1.003
1.002
1.001
f2(x)
1
0.999
0.998
0.997
0.996
0
0.2
0.4
0.6
0.8
1
x
Figure 2.4: Second help function,
, pair of opposite convex and concave
functions.
The approximation of a function which is composed of two parabolic
curves can be performed by normalizing each curve in the interval 0 to 1 on
x axis. In order to map the input x to the normalized parabolic curve, x can
be replaced with x’ as shown in (2.10).
(2.10)
The approximation of each parabolic curve is performed as described in
Section 2.1.3. In order to approximate the third sub-function,
is
calculated when
and
is calculated when
as
given in (2.11).
(2.11)
10
A larger number of n results in higher number of convex and concave
functions. The methodology can be generalized to calculate the nth help
function as shown in (2.12).
(2.12)
Using these partial help functions, the corresponding sub-function are
developed. The sub-function is also divided into partial sub-functions as
given in (2.13).
(2.13)
In the same way, the input x is substituted by xn to map the input to the
normalized parabolic curve.
(2.14)
Similar to the second sub-function given in (2.8), the start value of each of
the partial help function is 1 and the end value of each partial help function,
, interval is also 1. Therefore, the gradient,
, of each subfunction is 0. This enables to reduce the sub-function as shown in (2.15).
(2.15)
The coefficients,
, are calculated in such a way to satisfy the quotient
between help function,
, and the partial sub-function,
, is
equal to 1, when xn is equal to 0.5.
11
(2.16)
2.2 Interpolation
Interpolation is a method of finding new data points from a set a known
data points.
2.2.1 Linear Interpolation
Linear interpolation is the simplest method of interpolation. It takes two
data points to construct the value of new data points. The classical linear
interpolation for two data points is shown in (2.17).
(2.17)
In (2.17),
is the starting and
is the ending breakpoint of
each interval.
and
are the respective
value at these
breakpoints [14]. Linear interpolation using two intervals is shown in Fig.
2.5. It can be seen that
for first interval and
for the second interval. Equation (2.18) shows the corresponding
values.
(2.18)
More intervals can be used for better accuracy, e.g. four intervals, that give
the breakpoint values as shown in (2.19). For the sake of hardware
architecture, breakpoints are always the power of number 2 [14].
(2.19)
For more intervals, equation (2.17) can be modified as shown in (2.20).
12
1
0.9
First Interval
0.8
0.7
0.6
Original Function
0.5
0.4
Second Interval
0.3
Interpolated Function
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 2.5: Linear interpolation of a normalized function
(2.20)
Where
is the number of intervals. For example, for
we get
(2.21)
(2.21)
Or (2.22)
13
(2.22)
A good property of (2.22) is that the denominator is always “1” as division
is not suggested for a hardware design. It is appreciated for other more
hardware reasons as well. For intervals, the linear interpolation is shown
in (2.23) [14].
(2.23)
2.2.2 Non-linear Interpolation
This thesis work is about parabolic synthesis and non-linear interpolation.
The non-linear interpolation follows the same idea as the linear
interpolation, with the difference that the approximations in the intervals
are parabolic functions [14].
(2.24)
14
The second stage (2.24) is a non-linear interpolation of the first help
function, where
index stands for the intervals in which the interpolation
is performed. The interval index is a power of 2, which gives the number
of intervals, i.e. 1, 2, 4, 8, 16 and so on. For instance, in the case of 2, there
will be two intervals, the first
and the second
in a normalized space
.
The index , in (2.24), shows that the term in the interpolation stage is
dependent in the number of intervals used in the interpolation. The term
is affected in such a way that when two intervals (
) are used in the
interpolation then the most significant bit is thrown away. When four
intervals (
) are used in the interpolation then the two most significant
bits are thrown away in the x term. The second sub-function can be divided
into two parts, a linear part shown in (2.25) and a non-linear part as shown
in (2.26) [14].
(2.25)
(2.26)
In (2.25) there are two coefficients for interpolation in each interval, a
starting point, , and a gradient,
. The starting point of an interval for
the interpolation can be calculated by placing the value of x for the starting
point of the interval
, to the first help function,
[14].
(2.27)
The second coefficient, , is the gradient of the interval in which the
interpolation is being performed. The gradient is calculated by subtracting
the end point value,
, from the start point value,
, of the
interval [14].
(2.28)
As it is mentioned before that the intervals are normalized, so there is no
denominator needed.
15
In (2.26),
is calculated in advance so that the second sub-function,
, for the corresponding interval cuts the first help function,
, in
the middle of the interval i. therefore it satisfies the middle point,
,
for
, as shown in (2.29) [14].
(2.29)
In (2.30) we have a simplification of (2.24). This simplification reduces an
adder in hardware implementation.
(2.30)
Where
(2.31)
2.3 Parabolic Synthesis Combined with Interpolation
The drawback with parabolic synthesis is that if we want to increase the
accuracy of the approximated function, the number of sub-function needs to
be increased which in the result will increase the complexity of the
hardware. In this thesis work, Parabolic Synthesis is combined with nonlinear interpolation. In this case, only two sub-functions are required to get
the same accuracy as in parabolic synthesis. So the equation (2.1) can be
reduced to equation (2.32).
(2.32)
This will decrease the hardware significantly. Another benefit of combining
the parabolic synthesis with non-linear interpolation is that this approach
will make it easy to adjust the error behavior of the approximation [7].
Therefore the first sub-function,
, is used to calculate the initial value
16
of approximation and second sub-function,
, is used to get the desired
accuracy depending on the number of intervals used in the interpolation.
The approximation of the function can be implemented with two stages.
The first stage is implemented according to first sub-function as shown in
(2.5). The second stage can be implemented using non-linear interpolation
as shown in (2.24). The first sub-function,
is constructed as parabolic
synthesis as described in section 2.1.2 and second sub-function,
will
be constructed as non-linear interpolation as described in section 2.2.2. The
original function (2.32) will become (2.33).
(2.33)
In (2.33), 2,i index represents the interval in which the interpolation is
performed. The interval index, i, is a power of 2, which results in the
number of intervals equal to 1, 2, 4, 8, and so on. The index w shows that
the x term in the interpolation stage is dependent on the number of
intervals. The x term is modified in such a way that when four intervals are
used in the interpolation, then the two most significant bits are thrown away
in the x term, i.e. 2 left shifts in the hardware. The truncation in (2.34) is
performed in order to normalize the interval for second sub-function.
(2.34)
The removed integer part is used to decode in which interval of second subfunction the interpolation is performed. This integer part is used as an
address to fetch the corresponding coefficients in the specific interval in the
hardware.
The second sub-function is divided in partial sub-functions as shown in
(2.35).
17
(2.35)
As it can be seen that x is changed to
function
of second sub-function,
18
, which means that the partial sub-
, have equal range.
CHAPTER
3
3 Hardware Architecture
The hardware architecture of the methodology can be divided into three
parts i.e., preprocessing, processing, and post processing. It was introduced
by P.T.P Tang [1]. The preprocessing and post processing is the
transformation stages and in processing part, the original function,
,
is calculated [16].
v
Preprocessing
x
Processing
y
Postprocessing
z
Figure 3.1:Three stage Architecture
19
3.1 Preprocessing
In the preprocessing part, the input signal v is normalized to prepare it for
the processing part. For example an input signal sin(v) that lies between the
interval 0 to , will be normalized and converted into an output x that lies
between the interval 0 to 1. This is performed by multiplying it with
[16].
3.2 Processing
In the processing part, the original function,
, is approximated that
results in an output y. In this section the processing part for parabolic
synthesis will be discussed first and then parabolic synthesis with nonlinear interpolation will be discussed.
3.2.1 Parabolic Synthesis
Fig. 3.2 shows the basic architecture of the loop unrolled parabolic
synthesis with four sub-functions. This architecture has an advantage of fast
computation speed at the cost of large chip area [7].
x
s1(x)
x
S2(x)
x
y
S3(x)
x
S4(x)
Figure 3.2: Basic hardware for loop unrolled architecture
20
The detailed hardware architecture of loop unrolled parabolic synthesis
with four sub-functions is given in Fig. 3.3.
x
c1
+
X2
X32
X42
i
X3
x
+
c2
1
x
+
c3,1
+-
x
X32
h
X4
x
x
+
1
y
x
c4,h
+X42
x
+
1
Figure 3.3: Detailed hardware architecture for 4 sub-function parabolic synthesis
In this architecture, (x-x2) part is same for both first sub-function and
second sub-function. The output of this part is multiplied with
for first
sub-function,
, and with
for second sub-function,
[7]. In the
first sub-function,
, after the multiplication with , the x-value is
added to it. However, in the second sub-function,
, after the
multiplication with , a 1 is added. A special squaring unit is designed to
calculate the partial products of x32 and x42. The latency and chip area can
21
be significantly reduced by designing this squaring unit, in comparison to
using separate multipliers for each product. The index i, in the Fig. 3.3, are
the most significant bits which help to determine the
coefficient for the
interval. Similarly, the index h in the fourth sub-function is the two most
significant bits of x and it helps as an address for value of
coefficients
in the four intervals. The value of first and second sub-function is
multiplied in parallel with the third and fourth sub-functions. The result of
these two multiplications is multiplied with each other to compute the value
of y [7].
3.2.2 Parabolic Synthesis with Non-Linear Interpolation
The processing part of parabolic synthesis combined with non-linear
interpolation can be graphically visualized in Fig. 3.4. This architecture is
designed to calculate a single function.
x
X2
-+
x
xw2
+
c1
c2,i
x
j2,i
x
x
+
+
y
-
xw
l2,i
Figure 3.4: Architecture of parabolic synthesis with non-linear interpolation
22
The result of
is multiplied with in the first sub-function,
,
and the result is added to . As mentioned before, second sub-function is
implemented as non-linear interpolation and it consists of three look-up
tables, i.e.
,
and
for each interval . The coefficient
is
multiplied with
which is the normalized value for corresponding
interval. The results of this multiplication is added to
. The partial
product of
, i.e.
is multiplied with
. The result of this
multiplication is subtracted from the result of addition of
[14]. The results of both sub-functions are multiplied with each other to
compute the value of y.
The design contains four adders, four multipliers and one squarer block.
Instead of using a multiplier a squarer block is specially designed to
produce all the partial products needed to compute
and
[14]. A
simplified version of a 6-bit squarer block can be seen in Fig. 3.5.
x5x4
x5
x5x3
x5
x5
x4
x4
x3
x3
x2
x2
x1
x1
x0
x0
x2x0
x1x0
0
x0
x5x2
x5x1
x5x0
x4x0
x3x0
x4x3
x4x2
x4x1
x3x1
x2x1
x4
x3x2
x3x2
x1
x2
Figure 3.5: Specially designed 6-bit squarer
3.3 Post processing
The post processing stage is used to transform the value z from the output
of processing stage i.e., y to the desired format in order to fulfill the
approximation.
23
24
CHAPTER
4
4 Error Evaluation
The performance of any algorithm is characterized by its error behavior.
Since the parabolic synthesis is an approximation based method, the error
behavior holds a vital importance. An example of the error behavior for
sine function using Parabolic Synthesis methodology is shown in Fig. 4.1.
-5
4
x 10
3
2
Error
1
0
-1
-2
-3
-4
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Figure 4.1: Error behavior for Parabolic Synthesis
There are five different metrics that can be used to characterize the error
behavior [13] [14]. These metrics are as follow.

Maximum Absolute Error
25




Mean Error
Standard Deviation
Root-Mean-Square
Median Error
4.1 Error Metrics
A brief description of the error metrics are given below. For detailed study,
readers are referred to [12][13].
4.1.1 Maximum Absolute Error
The difference between the approximated value and the actual value
called the absolute error
. Absolute error is shown in (4.1).
is
(4.1)
It is the maximum value that is calculated in the interval where the error is
investigated [14].
4.1.2 Mean Error
For numbers of separate values in a specific sequence of errors, the mean
error can be seen in (4.2).
(4.2)
In other words, it is the average of the absolute error of a sequence of
numbers [14].
4.1.3 Standard Deviation
The standard deviation is used to calculate the amount of change in a value
from its expected value. The difference between standard deviation and
average deviation is that the average value is calculated with power instead
26
of amplitude. In order to calculate the standard deviation, the deviations are
squared before averaging. It is defined in (4.3) [14].
(4.3)
4.1.4 Median Error
The median error
is used to calculate the middle value for a given
sequence of errors. If the sequence contains odd number of samples the
median error is the middle sample and if the sequence contains even
number or samples, median error is the mean of the two middle samples.
For example, for a sequence
, the median error can be
calculate as (4.4) and (4.5) [14].
If
is odd
If
(4.4)
is even
(4.5)
4.1.5 Root-Mean-Square
In order to calculate the deviation of a sinusoidal signal, Root-Mean-Square
(RMS) value is used. This error metric is widely employed in electronics
where both AC and DC values of a signal need to be measured. It is the
square root of the average of squared difference between the approximated
value and the actual value [14].
(4.6)
4.2 Error Distribution
There are two development strategies that can be employed while
developing an approximation. These are least square approximation and
least maximum approximation. Least squares approximation is used to
27
minimize the average error and least maximum approximation is used in
order to minimize the maximum error. Least square approximations are
suitable when the approximated function is to be used in a series of
computations. It is also important to investigate the error distribution so that
the error of approximation is not of unilateral polarity [13] [14].
In order to evaluate error distribution evenness, standard deviation is
compared with RMS. The error distribution is even if both the values are
equal. The error behavior of sine function in Fig. 4.2 provides a good
example of the error behavior methodologies explained in this Chapter. The
manner of error distribution shows that the approximated value oscillates
around the original function and is evenly distributed around zero [12] [13].
A diagram to visualize the error distribution is shown in Fig. 4.2.
-3.2e-05
-1.6e-05
0.0e-05
1.6e-05
3.2e-05
Figure 4.2: The distribution of error between original function and the
approximation
28
CHAPTER
5
5 Architecture and Coefficients
Approximation
The objective of this thesis work is to design and implement the
approximation of the sine and cosine functions in all quadrants i.e., 360˚.
The approximation is implemented using two stages of parabolic synthesis.
The first stage is implemented using parabolic synthesis methodology and
second stage is implemented as a non-linear interpolation as described in
section 3.2. In this chapter, the hardware architecture to implement sine and
cosine functions using parabolic synthesis and non-linear interpolation
technique will be discussed. A methodology is also described to calculate
the coefficients for the second stage of approximation, i.e. non-linear
interpolation.
5.1 Architecture
As described in chapter 3, the hardware architecture of the methodology
consists of three parts i.e., preprocessing, processing, and post processing.
This architecture will compute the sine and cosine functions based on the
input signal v and produce the output z sine and z cosine. The block diagram of
the architecture is given in Fig. 5.1.
29
Ɵ0
v
v
14
13
Ɵ1 Ɵ0
12
11
...
x
ysin’
ysin
2
1
0
x
Approximation
(Processing)
Output conversion for sine
(post processing)
zsine
Output conversion for
cosine
(post processing)
zcosine
Output
Multiplexers
ysos
ycos’
Ɵ1Ɵ0
Figure 5.1: Block diagram of the architecture
As shown in the Fig. 5.1 the normalized input signal v in the interval, 0 to
2 , converted into the input x. The two most significant bits,
, of the
input signal, v, are taken away and used as an enable signal for the output
multipliers and two’s conversions. The rest of the bits are used as input
signal, x, for the approximation (processing) block. The processing block
performs the approximation and multiplications for the sub-functions of
sine and cosine approximations. The approximated output, ysin and ycos,
from the processing block goes to the output multiplexers and new, ysin’ and
ycos’, are chosen depending on the input quadrant. The sign of the new, ysin’
and ycos’, values are changed in the output conversion blocks by using,
, as enable signals to produce the output , zsin and zcos.
5.1.1 Preprocessing
A normalized input to the system, v, is expressed in 15 bits, which means
that the input signal is divided in 0 to 215 – 1 steps. The maximum input to
the system is ‘1111111111111112’ which corresponds to a normalized
angle of 3.99999 in decimal. Therefore, the function of pre-processing
block is to remove the two MSBs (integer part) and send the rest of the bits
as x value to the processing part.
30
v
14
13
12
11
10
9
8
7
Ɵ1 Ɵ0
6
5
4
3
2
1
0
x
Figure 5.2: Pre processing block
5.1.2 Processing
In the processing part, the original function,
, is approximated that
results in output ysin and ycosine. In this architecture, only two sub-functions
are required to get the same accuracy as in parabolic synthesis. Therefore
the equation (2.1) can be reduced to (5.1).
(5.1)
The approximation of sine and cosine functions is given in (5.2) and (5.3).
The angle is the normalized fractional part of . It can be seen that only
the first sub-function,
, differs for both sine and cosine functions [14].
(5.2)
(5.3)
31
The original function,
, for both sine and cosine will become as
shown in (5.4) and (5.5) respectively.
2
(5.4)
2
(5.5)
It can be seen that both the first and second sub-functions for sine and
cosine are identical. There is one extra subtraction in the first sub-function
for cosine. The second sub-functions for sine and cosine are similar and the
only difference in second sub-functions is that they use different set of
coefficients. Therefore both the sub-functions can be combined in parallel.
The multiplications of these sub-functions with their corresponding subfunctions produce the result for sine and cosine functions simultaneously.
In this way, the hardware for a multiplier, adder and another special squarer
can be saved.
5.1.3 Post Processing
In the post processing block, the output from the processing block is
converted in order to get the desired results. The output of the processing
block, ysin and ycos, are the approximated result from the processing stage in
the range 0 to 1 for an input x. However, the actual quadrant of any output
is unknown since the computations are performed in first quadrant. The
output, ysin and ycos, has to be transformed back to their actual values in
their respective quadrants which is determined using
bits that come
from preprocessing block.
In order to change the output from processing block to its corresponding
quadrant, for both sine and cosine, output multiplexers are used that
determine the new, ysin’ and ycos’, values based on the input quadrant. The
32
input quadrant is determined using, , as enable signal for multiplexers.
The sign of the new, ysin’ and ycos’, values needs to be changed as well. The
sine function is positive in first and second quadrant, therefore, no
conversion is needed. However, it is negative is third and fourth quadrant,
therefore the sign needs to be changed. This is achieved by a two’s
complement conversion at the final stage, where is used as an enable
signal for two’s complement conversion.
Similarly, cosine function is positive in first and the fourth quadrant and
negative in second and third quadrant, therefore, we need to change the sign
of the ycos’ value for the second and third quadrants. This conversion can
easily be performed by using
as a control signal for the sign
conversion in the respective quadrants. The Table I shows when we need to
transform the outputs for sine and cosine depending on the integer part,
, coming from the preprocessing stage [14].
TABLE I: OUTPUT TRANSFORMS
Quadrant 1
Quadrant 2
Quadrant 3
Quadrant 4
Sine
+
+
-
-
Cosine
+
-
-
+
The architecture of two’s complement conversion for sine function is
shown in Fig. 5.3. Half adders (HAs) and XOR gates are used in the
architecture. For example, in order to calculate the z value for trigonometric
identities, a control signal
or
will be used for the conversion
of sine or cosine function respectively [14].
33
HA
HA
HA
HA
HA
Figure 5.3: Two’s complement architecture for sine function
5.2 Coefficients Approximation
In order to implement the approximation of trigonometric functions, sine
and cosine, we need to develop the first help function. The first help
function,
, is the function from which the non-linear interpolation is
developed from. The first help function, for the sine function, is developed
according to (5.6).
(5.6)
34
1.14
1.12
1.1
f1(x)
1.08
1.06
1.04
1.02
1
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
Figure 5.4: First help function,
0.8
0.9
1
.
5.2.1 Linear part
The linear part of the interpolation consists of two coefficients for each
interval, a starting point, , and a gradient,
. In (2.24),
is the starting
point of an interval of the interpolation, which is computed by inserting the
value of for the starting point of the interval,
, in the first help
function
[14].
(5.7)
In (2.24),
is the gradient for an interpolation interval. The gradient
for an interval is computed as the end point value of the function
,
subtracted with the start point value of the function
of an
interval. Since the interval is normalized to one, no denominator is needed,
as shown in (5.8) [14].
(5.8)
35
The coefficients for the linear part of the interpolation are calculated
according to (5.7) and (5.8) for four intervals, i.e.
. The result of linear
interpolation is shown in Fig. 5.5.
1.14
1.12
First help function
1.1
f1(x)
1.08
1.06
Linear Interpolation
1.04
1.02
1
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Figure 5.5: First help function and the linear interpolation of the first help
function.
5.2.2 Non-linear part
In (2.24),
is pre-computed so that the sub-function for the interval ,
, cuts the function
, in the middle of the interval when
, which satisfies the point
for
, as shown in (5.9).
(5.9)
If we subtract the linear interpolation of first help function from the first
help function, it will generate a function with a parabolic looking function
36
in each interval as shown in Fig. 5.6. The coefficients for the non-linear
part of the interpolation are calculated in according to (5.9).
-3
8
x 10
C2,4
C2,3
7
C2,2
f1(x) - Linear Interpolation
6
C2,1
5
4
3
2
1
0
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Figure 5.6: Approximation of the difference of first help function subtracted with
the linear interpolation of first sub-function
The peak value of each curve represents the corresponding
coefficients
of each interval. The rest of the coefficients, i.e.
,
, and
are also
calculated using the equations (5.7), (5.8), and (5.9) . Similarly, the
coefficients for cosine function can also be calculated in Matlab [14]. The
approximated coefficient values for both sine and cosine function are listed
in Table II and Table III respectively.
37
TABLE II: COEFFICIENT VALUES FOR SINE FUNCTION
Coefficients
Value in decimal
0.0199535148492645,
0.0267498465570153,
1.000000000000000,
1.100234394764010,
0.287477578201686,
-0.088843391191025,
0.0237052129326976,
0.0289774221630130
1.071895094550420,
1.078018196966255
0.113380000854360,
-0.312024187865020
TABLE III: COEFFICIENT VALUES FOR COSINE FUNCTION
Coefficients
Value in decimal
0.0289774221630130,
0.0237052129326976,
1.000000000000000,
1.100234394764010,
0.312024187865020,
-0.113380000854360,
0.0267498465570153,
0.0199535148492645
1.078018196966255,
1.071895094550420
0.088843391191025,
-0.287477578201686
These coefficients are used to approximate the sine and cosine functions.
The design and hardware implementation of these functions using the
coefficients values given in Table II and Table III is explained in chapter 6.
38
CHAPTER
6
6 Hardware Design
In this thesis work, Parabolic Synthesis is combined with non-linear
interpolation to implement the approximation of sine and cosine functions.
The design is implemented using two stages of parabolic synthesis, i.e.,
parabolic synthesis and non-linear interpolation as discussed in chapter 5.
In this chapter, the hardware structure of the combined methodology is
discussed. There are two sub-functions that are used to get the same
accuracy as in parabolic synthesis. Therefore the equation for original
function,
, can be written as
(6.1)
The first sub-function,
, is constructed as parabolic synthesis as
described in section 2.1.2 and second sub-function,
, will be
constructed as non-linear interpolation as described in section 2.3.
The hardware design is divided into three different parts, i.e. preprocessing,
processing and post processing. In the preprocessing part the two most
significant bits are removed from the signal. The implementation of
approximation of the original function,
, is performed in processing
part. In this part, the parabolic synthesis is combined with non-linear
interpolation [14]. In the post processing part, the output from the
processing block is converted back to its original value.
39
6.1 Preprocessing
The parabolic synthesis uses the already normalized input v. As described
in section 5.1.1, the input transformation is performed in the pre processing
block where the integer part i.e., two most significant bits,
, are taken
away.
is used as an input for the multiplexer to select the corresponding
output for the multiplexer depending on the input quadrant. This integer
part,
, is also used as an enable signal to determine the two’s
complement transformation in the post processing stage.
v
v
14
13
12
11
...
2
1
0
x
x
Ɵ1 Ɵ0
Figure 6.1: Pre processing block
6.2 Processing
When performing the approximation of the sine and cosine functions only
the approximation of first quadrant needs to be done. In order to design the
whole unit circle, the first quadrant of function can be reused with some
additional hardware. The first help function,
, for sine and cosine is
shown in equation (6.2) and (6.3) [14].
(6.2)
(6.3)
40
It can be seen that both the sub-functions can be combined in parallel to
produce the result for sine and cosine functions simultaneously. It should be
noted that the
coefficient for both sine and cosine is same. The
architecture for calculating the first sub-function for sine and cosine
functions based on parabolic synthesis methodology is shown in Fig. 6.2.
x
sine s1
+
cosine s1
c1
-
+
x2
+
x
1
+
-
Figure 6.2: First sub-function architecture for sine and cosine
The calculation of the first coefficient
(6.4) [14].
for the since function is shown in
(6.4)
The
multiplication in Fig. 6.2, uses a fixed multiplier so it can be
replaced with simple shift and add operations. In the same way, the addition
of “1” is simply a matter of routing wire in hardware [14].
Since we have power of two numbers, we can use the left fractional bits as
address bits to look-up-table for the coefficient selection. The bits can be
separated by
AND no-of-bits. However, in hardware it is simply a
question of routing wires. For example, if we have two intervals, we
separate one bit only, i.e. if the fractional MSB bit is “0”, the left interval is
addressed and if the MSB is “1”, the right interval is used. For four
41
intervals, we get the four addresses
“00”, “01”. “10”, and “11”, e.g. if
we have
, the third interval will be addressed [14].
The remaining bits are used as a new “x-value”, which is
,
where t is the two MSBs of x. For the above example,
, we
thus get
, which are the remaining bits of shifted two times
to the right. The t bits will be used as address bits for the coefficient i.e.,
, , and
tables.
The second help function
for both sine and cosine will remain the
same and is shown in (6.4) and (6.5).
(6.4)
(6.5)
The term
is the square of partial product which comes from the special
squarer designed in the project to produce the outputs x2 and
simultaneously. Similar to first sub-function, the hardware of second subfunction can also be joined to share some part of hardware. In this way, the
area for an adder can be saved. Fig. 6.3 shows the second sub-function in
the improved architecture, based on non-linear interpolation [14].
42
j2,is
x
i
+
l2,is
i
i
xw
i
xw
+
+
cosine s2
x
l2,ic
x
c2,is
x
i
j2,ic
sine s2
2
-
i
+
i
c2,ic
+
Figure 6.3: Hardware design for combined second sub-function
Finally, the outputs from first sub-function block and second sub-function
block are multiplied together to calculate the output of processing block for
both sine and cosine simultaneously.
Sine s1
x
sine q1
Sine s2
cosine s1
x
cosine q1
cosine s2
Figure 6.4: Multiplication of outputs from first and second sub-function blocks
43
6.3 Post Processing
As explained in section 5.1.3, all the calculations are performed in first
quadrant. Therefore the output needs to be transformed back to their actual
quadrants. This is achieved by transforming the outputs sineq1 and cosineq1
from the processing to their original quadrant. This is done by using a
multiplexer and using
as an enable signal as shown in Fig. 6.5. the sign
of the output from these multiplexers is changed by performing two’s
complement conversion. For the cosine output
is used as an
enable signal to ensure that the cosine output is positive in first and fourth
quadrant and negative in second and third quadrant. Similarly,
is used as
an enable signal for the two’s complement conversion for the sine signal
which ensures that the sine is positive for sign of the output is positive in
first and second quadrant and negative in third and fourth quadrant.
sine q1
Two’s
Compl
cosine(x)
Two’s
Compl
sine(x)
cosine q1
sine q1
cosine q1
Figure 6.5: Post processing architecture for all four quadrants
44
6.4 Final Architecture
In order to compute the sine and cosine approximations, the architecture in
Fig. 6.6 is used in the thesis work [14].
+
c1
-
+
x2
x
1
x
v
v
14
13
Ɵ1 Ɵ0
12
11
...
k2,is
2
1
0
x
i
x
x
+
+
+
+
l2,is
x
i
i
x2
i
xw2
-
k2,ic
x
Two’s
Compl
cosine(x)
sine(x)
x
i
i
x
x
+
l2,ic
c2,is
Two’s
Compl
i
+
c2,ic
+
Figure 6.6: The final architecture
The architecture consist of multipliers, one special squarer block, adders,
two two’s conversion converters, and two multiplexers. The input, x, from
preprocessing block goes to first and second sub-function blocks i.e., the
processing part. The output for first sub-function for both sine and cosine
functions is multiplied with the respective output from the second subfunction block. These multiplications produce intermediate results, sineq 1
and cosineq1 from processing block. These intermediate values need to be
converted into the desired results, which depends on the transformation of
the quadrant in preprocessing stage. Therefore, two multiplexers are used
the convert them into their respective quadrants and two’s complement
45
conversion is performed in order to change their signs, in the post
processing stage, to get the final results.
The critical path of the design is given in Fig. 6.7.
Figure 6.7: Critical path of the design
The critical path of the hardware goes through





One squarer
Two multipliers
Two adders
One Multiplexer
One two’s conversion converter
6.5 Word Lengths
The input word length for the hardware design is 15 bits. As shown in (6.6),
all possible input values should be tested at the end.
46
(6.6)
For hardware design, these integer values are not longer than 15 bits and
they are not needed to be truncated. However, the values needs to be scaled
down to a 0 to 90 degree scaled as shown in (6.7) [14].
(6.7)
Since 90 degrees are not allowed, the maximum input value is shown in
(6.8).
(6.8)
All the operations in VHDL are performed in floating point and the
numbers are expressed as signed. Therefore it will add an extra bit to all the
signals going to adders. Fig. 6.8 shows the internal word lengths of all the
signals in the hardware design.
47
+
14 bits
x2
16 bits
15 bits
v
14
13
Ɵ1 Ɵ0
12
11
...
k2,is
2
1
0
x
i
x
16 bits
12 bits
i
x2
12 bits
i
i
xw2
+
12 bits
i
k2,ic
Two’s
Compl
18 bits
cosine(x)
18 bits
c2,is
x
13 bits
Two’s
Compl
18 bits
sine(x)
18 bits
18 bits
12 bits
x
l2,ic 12 bits
x 16 bits+
18 bits
11 bits
x
-
18 bits
+ 18 bits
+
i
15 bits
17 bits
x
+
l2,is
x
14 bits
x
17 bits
+
x
15 bits
- 16 bits
+
1
14 bits
v
c1
i
c2,ic
11 bits
17 bits
13 bits
18 bits
+
Figure 6.8: Internal word lengths of the design
The word lengths of the coefficients in Table II and Table III can also be
optimized. The
coefficients are greater than 1 so there will be 16 bits
needed to express them in binary numbers plus an extra bit for signed
number. However, when these numbers are truncated and converted into
binary number there are many zeroes in the LSBs. These zeroes can be
ignored in hardware which leaves 12 bits representation for
coefficients.
A 15-bit signed representation is used for
coefficients and
coefficients are expressed in 11-bit signed numbers. This will help greatly
to reduce the area for multipliers and adders. The optimized and truncated
coefficient values for sine and cosine functions are given in Table IV and
Table V respectively.
48
TABLE IV: TRUNCATED COEFFICIENT VALUES FOR SINE
FUNCTION
Coefficients
Value in decimal
0.570556640625
0.01953125,
0.0263671875,
1.0000000000,
1.1009765625,
0.287506103515625,
-0.088836669921875,
0.0234375,
0.02880859375
1.07177734375,
1.07763671875
0.1134033203125,
-0.312042236328125
TABLE V: TRUNCATED COEFFICIENT VALUES FOR COSINE
FUNCTION
Coefficients
Value in decimal
0.570556640625
0.02880859375,
0.0234375,
1.00000000000000,
1.10020446777344,
0.312042236328125,
-0.1134033203125,
49
0.0263671875,
0.01953125
1.0780029296875,
1.07186889648438
0.088836669921875,
-0.287506103515625
50
CHAPTER
7
7 Implementation and Error Behavior
Based on the methodology described in Chapter 2, 3, 5, and 6, a reference
model for the approximation is implemented in MATLAB and
implemented in hardware using VHDL. In this way the functional behavior
is of the design is verified. The coefficients in Table IV and Table V are
used in the design. Fig. 7.1 shows the approximated sine and cosine
functions and their error behavior in decibel.
1
Sinus
Cosinus
0.5
0
-0.5
-1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Error Sinus: -91
-50
Sinus Error in dB
-100
-150
-200
-250
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Error Cosinus: -90
-50
Cosinus Error in dB
-100
-150
-200
-250
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 7.1: Approximation of sine and cosine functions and the error
51
7.1 Optimization
In order to increase the accuracy of the approximation, the coefficients, ,
, and
in the second sub-function,
, need to be optimized. The
optimization helps to characterize the behavior of the error. The
optimization must be performed in parallel with the truncation and the
evaluation of word lengths. For a better understanding, truncation effects
are not taken into consideration in this section.
The second sub-function is given in (7.1).
(7.1)
The optimization strategy can be performed on all 12 coefficients of the
second sub-function,
, using four intervals. Since the coefficients
through
adjust the height of the parabolic part of the second subfunction, the optimization is primarily performed on these coefficients.
10
Bits of Accuracy
After Optimization
Before Optimization
15
20
25
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
Figure 7.2: The absolute accuracy in bits of approximation, before and after
optimization
52
As it can be seen in Fig. 7.2, there is a reduction of a half bit for the largest
error in the interval,
. However, there is a negligible
improvement in terms of largest error of approximation. During the
hardware design, the optimization is performed on bit level [7].
7.2 Truncation
All the coefficients and signals in the MATLAB reference model need to be
truncated since it is implemented exactly like the hardware architecture
implemented in VHDL. The word length of the coefficients can be
optimized in such a way that the system does not lose its precision. All the
signal need to be truncated in such a way that the MATLAB model is an
exact mirror of ASIC implementation. For example, a calculation
Should be implemented like this
7.3 Error Behavior
In order to provide the greater resolution and better understanding of the
results, a logarithmic scale is used. The logarithmic unit is decibel (dB) and
the binary numbers can be related to each other as shown in (7.2).
(7.2)
This shows that 6dB is equal to 1 bit of resolution. For example, an error of
0.001 is same as 20log (0.001) = 20*(-3) = -60dB. We can transform it into
bits, which gives the error 60/6 = 10 bits or less [14].
53
The error behavior of the Parabolic Synthesis combined with Non-Linear
Interpolation can be seen in Fig. 7.3. The error is calculated by subtracting
the sine function approximation from the original sine function after the
truncation.
-5
4
x 10
3
Error Sinus: -89.0791
2
1
0
-1
-2
-3
-4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 7.3: Error behavior of sine function after truncation
It should be noted that the approximated value oscillates around the original
function in the desired manner and it confirms that the error is evenly
distributed around zero.
54
TABLE VI: THE ERROR METRICS FOR THE TRUNCATED AND
OPTIMIZED IMPLEMENTATION
Error Metrics
Maximum Absolute Error
Mean Error
Median
Standard Deviation
Root Mean Square
Value
0.00003600399789538
-0.0000006352127232
-0.0000018745591403
0.00001891092717920
0.00001891104738872
Bits
14.84
20.65
Table VI shows that the resolution of the algorithm is almost 14.84 bits,
which is very close to the required resolution for this thesis work. The mean
error is very small. It should be noted that the standard deviation and root
mean square values are almost identical which indicates that the error of
approximation is evenly distributed around zero.
55
56
CHAPTER
8
8 Results
The approximation for Parabolic Synthesis and Non-Linear interpolation is
implemented in stm65 CMOS technology. The design is simulated and
compared for speed, area, and power consumption. The design is
implemented in VHDL and the synthesized code is simulated for different
standard libraries in Design Vision. Low Power High (LPHVT) and Low
Power Low
(LPLVT) transistors are used in with different supply
voltages,
=
volts. This chapter describes the speed,
area, and power consumption of the system and comparison with other
methodologies.
8.1 Synthesis
The synthesis is performed in a design tool called Design Vision by
Synopsis. During the synthesis a gate level netlist is generated from the
VHDL design using STMicroelectronics 65nm Technology. This netlist is
analyzed for speed, area, and power consumption. The results of different
parameters are described below.
8.1.1 Area Results
The minimum area of the design is estimated by setting the area design
constraint to zero in Design Vision. The minimum area results of the design
for different libraries of Low Power High
(LPHVT) and Low Power
Low
(LPLVT) for supply voltages,
=
volts are
given in the Table VII.
57
TABLE VII: MINIMUM AREA RESULTS FOR LPHVT AND
LPLVT
Voltage (V)
Area (
)
1.00
15953
LPHVT
1.10
1.20
15974
15966
1.00
16132
LPLVT
1.10
1.12
16592
17056
Area (μm²)
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
Area (μm²)
1.00 V
1.10 V
1.20 V
1.00 V
LPHVT
1.10 V
1.20 V
LPLVT
Figure 8.1: Minimum area results in a bar graph
The area for different sub-functions and output multiplier can be seen in the
Table VIII: The Area Results for Individual Blocks in Design for LPHVT
@ 1.2 Volts. The synthesis is performed with Low Power High
(LPHVT) library at a supply voltage of
= 1.2 volts.
58
TABLE VIII: THE AREA RESULTS FOR INDIVIDUAL BLOCKS IN
DESIGN FOR LPHVT @ 1.2 VOLTS
Area (μm²)
2433
8448
3992
575
518
15966
Block
First Sub-function
Second Sub-function
Output Multipliers
Output Conversions
In/out Registers
Total
Percentage (%)
15.23
52.91
25
3.6
3.24
100
For better understanding the individual blocks can be identified in Fig 8.2.
First Sub-function
+
+
x2
Output Multiplier
C1
-
x
x
1
x
k2,is
x
i
+
+
+
+
l2,is
i
i
x2
i
xw2
+
k2,ic
x
+
Two’s
Compl
x
i
i
x
C2,is
cosine(x)
sine(x)
x
-
l2,ic
Two’s
Compl
i
Output Conversion
C2,ic
+
Second Sub-function
Figure 8.2: Different modules in the design
59
For the sake of comparison with previous work, we can approximate the
area needed to calculate single function e.g., sine from the Table VIII. A
rough calculation is given in Table IX.
TABLE IX: APPROXIMATED AREA FOR SINGLE FUNCTION
Module
First Sub-function
Second Sub-function
Output Multipliers
Two's Conversions
In/out DFFs
Total
Approximated area (μm²)
2400
4224
2000
300
300
9224
3%
3%
26%
22%
First Sub-function
Second Sub-function
Output Multipliers
Two's Conversions
In/out DFFs
46%
Figure 8.3: Approximated area for one function
8.1.2 Timing/Speed Results
The maximum speed of the design is calculated by setting the timing
constraint in design vision to 1 ns. This gives an x value of negative slack
60
for the critical path. The x value is added to 1 ns and the simulation is
performed again unless the slack is zero.
TABLE X: SPEED RESULTS FOR LPHVT AND LPLVT AT
NORMAL CONSTRAINST
Voltage (V)
Speed (MHz)
Time (ns)
LPHVT
1.10
1.00
30.91
32.35
42.51
23.52
1.20
1.00
LPLVT
1.10
52.96
18.88
73.52
13.60
86.50
11.56
1.20
100.20
9.98
Frequency (MHz)
120
100
80
60
40
20
0
1.00 V
1.10 V
1.20 V
LPHVT
1.00 V
1.10 V
1.20 V
LPLVT
Figure 8.4: Frequency results for LPHVT and LPLVT
It can be seen that LPLVT transistors are considerably faster than LPHVT
transistors. The maximum frequency using LPHVT transistor is 135.5 MHz
at 1.20 volts of supply voltage where as in case of LPLVT transistors it is
265.25 MHz at the same voltage.
8.1.3 Power Results
The power dissipation in a CMOS transistor consists of two sources given
by (8.1).
61
(8.1)
The dynamic power is the total switching power and the internal power. It
depends on the charging and discharging of the capacitances, switching
activity, supplied voltage and the operating frequency as given in (8.2).
(8.2)
where
= Switching activity
C = Capacitance
V = Supplied voltage
f = Clock frequency
The power consumption for the Parabolic Synthesis and Non-Linear
Interpolation is simulated using the PrimeTime tool at a frequency of
10MHz at the supply voltages mentioned above. In order to analyze the
power consumption of the design a Value Change Dump (VCD) file is
generated in ModelSim using the netlist file generated during the synthesis
process. The power results for both LPHVT and LPLVT are given below.
TABLE XI: POWER ANALYSIS USING LPHVT LIBRARIES AT
DIFFERENT VOLTAGES
LPHVT
Voltage (V)
Net Switching Power (μW)
Cell Internal Power (μW)
Cell Leakage Power (nW)
Total Power (μW)
1.00
1.10
1.20
11.26
12.04
23.35
23.32
13.87
14.74
33.5
28.64
16.65
17.63
48.96
34.33
62
TABLE XII: POWER ANALYSIS USING LPLVT LIBRARIES AT
DIFFERENT VOLTAGES
LPLVT
Voltage (V)
Net Switching Power (μW)
Cell Internal Power (μW)
Cell Leakage Power (μW)
Total Power (μW)
1.00
1.10
1.20
12.16
13.11
4.20
29.47
15.16
17.05
6.54
38.75
18.32
21.95
9.83
50.10
Total Power (μW)
60
50
40
30
20
10
0
1.00 V
1.10 V
1.20 V
1.00 V
LPHVT
1.10 V
1.20 V
LPLVT
Figure 8.5: Total power comparison for LPHVT and LPLVT at the frequency
10MHz
The cell leakage power increases with increased supply voltage. It should
be noted that static power dissipation (cell leakage power) is considerably
high in LPLVT transistors as compared to LPHVT.
63
8.2 Existing Algorithms
In this section, the area, speed and power results of the Parabolic Synthesis
combined with Non-Linear Interpolation are compared to other algorithms
like the CORDIC and previous thesis work on Parabolic Synthesis
methodology like ‘Sine Function Approximation using Parabolic Synthesis
and Linear Interpolation’ [15] and “Hardware Implementation of Logarithm
function using improved parabolic synthesis”[16]. However, it is not
possible to compare the results precisely, since the above mentioned
algorithms were implemented for different functions, e.g., sine or logarithm
and the operating frequency of the design to find the power dissipation is
not mentioned clearly.
The implementation results compared in this section are taken from the
thesis work sine function implementation by Madhubabu Nimmagadda and
Surendra Reddy Utukuru[15], Improved Parabolic Synthesis by Jingou
Lai[16] and Logarithmic and exponential function implementation by
Peyman Pouyan[8]. As mentioned before that the Parabolic Synthesis
methodology can be used to implement different unary functions using the
same architecture with different set of coefficients. Hence it is possible to
compare the results of different implementations. However, the CORDIC
algorithm has a simple hardware to implement the trigonometric and
logarithmic functions. It is implemented by using simple shift and add
operations and a look-up table (LUT). In order to get a precision of 15 bits,
more than 15 iterations will be required, which will increase its
computation time considerably. However, almost the same resolution is
achieved in this thesis work by combining Parabolic Synthesis with NonLinear Interpolation.
The chip area result for different methodologies is given in the Table XIII.
64
TABLE XIII: AREA ANALYSIS OF ASIC IMPLEMENTATION FOR
DIFFERENT METHODOLGIES(LPHVT @ 1.20V)
Methodology
Area
(
)
19048 1
25249 1
11397 2
15982
5894
CORDIC
Parabolic Synthesis
Parabolic Synthesis with Linear Interpolation
Parabolic Synthesis with Non-Linear Interpolation
Improved Parabolic Synthesis
1
2
The results are with pads
The analysis is done at 1.25 volts
Area (μm²)
30000
25000
20000
15000
10000
5000
0
CORDIC
Parabolic
Synthesis
Parabolic
Parabolic
Synthesis with Synthesis with
Linear
Non-Linear
Interpolation Interpolation
Improved
Parabolic
Synthesis
Figure 8.6: ASIC synthesis analysis for area
The parabolic synthesis combined with non-linear interpolation occupies
less area compared to the CORDIC and it can be used to implement
different unary functions like the logarithmic, exponential, division and
square-root function. Only the set of coefficients in the look-up table (LUT)
are needed to be changed to implement a different unary function with the
main architecture unchanged [8]. On the other hand, the CORDIC
algorithm needs a different architecture and extra iterations in order to
implement logarithmic function.
65
TABLE XIV: FREQUENCY FOR THE ASIC IMPLEMENTATION OF
DIFFERENT METHODOLOGIES (LPHVT @ 1.20V)
Methodology
CORDIC
Parabolic Synthesis
Parabolic
Synthesis
with
Linear Interpolation
Parabolic Synthesis with NonLinear Interpolation
Improved Parabolic Synthesis
Frequency
(MHz)
11.5
47.5
58.82 3
Critical Path Delay
(ns)
86.72
21.47
18.18 3
53.99
18.52
12.00
83.33
3
The analysis is done at 1.15 volts
Frequency (MHz)
100
80
60
40
20
0
CORDIC
Parabolic
Synthesis
Parabolic
Parabolic
Synthesis with Synthesis with
Linear
Non-Linear
Interpolation Interpolation
Improved
Parabolic
Synthesis
Figure 8.7: Frequency for the ASIC implementation of different methodologies
The ASIC implementation shows that the parabolic Synthesis combined
with non-linear interpolation is 4.6 times faster than the CORDIC, 1.16
times faster than the non-pipelined Parabolic Synthesis. It should be noted
that this design uses an extra multiplexer and two’s complement conversion
at the output for output transforms, which adds up to increase the critical
66
path. These results are compared for LPHVT implementation at a supply
voltage of 1.20 Volts.
Table XV shows the estimated power consumption for different
methodologies with a LPHVT implementation at a frequency of 10 MHz
using the PrimeTime tool.
TABLE XV: POWER ANALYSIS FOR DIFFERENT
METHODOLOGIES ( LPHVT @1.20V)
Methodology
CORDIC
Parabolic Synthesis
Parabolic Synthesis with Linear Interpolation
Parabolic Synthesis with Non-Linear Interpolation
Improved Parabolic Synthesis
Power @ 10 MHz
(μW)
99.79
98.08
39.4(@1.25V)
34.33
17.38
Power (μW)
120
100
80
60
40
20
0
CORDIC
Parabolic
Synthesis
Parabolic
Parabolic
Synthesis with Synthesis with
Linear
Non-Linear
Interpolation Interpolation
Improved
Parabolic
Synthesis
Figure 8.8: Power dissipation analysis for different algorithms
Since the design occupies less area and there is a lower switching activity
the Parabolic Synthesis combined with Non-Linear Interpolation therefore
67
uses less power as compared with CORDIC and Parabolic Synthesis. It is
not possible to fairly compare the power results since the other algorithms
were implemented to approximate only one function however, in this
design is used to implement two trigonometric functions i.e. sine and cosine
at the same time.
68
CHAPTER
9
9 Conclusions
The approximation of the trigonometric identities i.e., sine and cosine was
designed and implemented in the same architecture using Parabolic
Synthesis combined with Non-Linear Interpolation. Four intervals were
used in the interpolation part which leads to a set of 12 coefficients for both
functions i.e., sine and cosine.
The truncation changes the error behavior. In order to compensate for that,
the coefficients need to be optimized manually. This should be done by
changing one set of coefficients at a time.
The architecture was carefully designed to have high degree of parallelism
therefore it has short critical path and fast computation speed. The design is
suitable for high speed applications since it is much faster than the
CORDIC and other implementations for the same resolution.
Certain simplifications were done in the architecture that includes designing
a special squarer to find
and
using the same architecture, which
makes the area less as compared to the Parabolic Synthesis with Linear
Interpolation.
The resolution of the approximation is almost 15 bits, which is according to
the required resolution for this thesis work. The error behavior indicates
that the error of approximation is evenly distributed around zero.
69
70
CHAPTER
10
10
Future Work
The error behavior can be improved by using more intervals in the second
sub-function,
. It will increase the look-up table (LUT) size but there
can be a compromise between area and accuracy.
The architecture of the approximation can be made faster by introducing
pipelining at different stages.
The same architecture can be used to calculate different unary functions
including various trigonometric, logarithmic, exponential, division and
square-root function by only changing the set of coefficients used.
71
72
References
[1] P. T. P. Tang, “Table-lookup algorithms for elementary functions and
their error analysis,” in Proc. of the 10th IEEE Symposium on Computer
Arithmetic ISBN: 0-8186-9151-4, pp. 232 - 236, Grenoble, France, June
1991.
[2] J. M. Muller, “Elementary Functions: Algorithm Implementation,” in
second edition Birkhauser, ISBN 0-8176-4372-9, Birkhauser Boston, c/o
Springer Science+Business Media Inc., 233 Spring Street, New York, NY
10013, USA.
[3] Ateeq Ur Rahman Shaik, “Hardware Implementation of the exponential
function using Taylor series and Linear Interpolation”, Lund University
Mater Thesis, April 2014.
[4] Erik Hertz, “Parabolic Synthesis”, Thesis for the degree of Licentiate in
Engineering, Lund University, 2011.
[5] Ray Andraka, “A survey of CORDIC algorithms in FPGA based
computers”, Andraka Consulting Group, Inc. North Kingstown, USA.
[6] Muhammad Waqas Shafiq and Nauman Hafeez, “Design of FFTs using
CORDIC and Parabolic Synthesis as an alternative to Twiddle Factor
Rotations”, Lund University, Master Thesis, 2011.
[7] Erik Hertz , Bertil Svensson, and Peter Nilsson, “Combining the
Parabolic Synthesis Methodology with Second-Degree Interpolation”.
Centre for Research on Embedded Systems, Halmstad University,
Halmstad, Sweden, Electrical and Information Technology Department,
Lund University, Lund, Sweden.
[8] Peyman Pouyan, Erik Hertz, and Peter Nilsson, “A VLSI
Implementation of Logarithmic and Exponential Functions Using a Novel
73
Parabolic Synthesis Methodology Compared to the CORDIC Algorithm”,
20th European Conference on Circuit Theory and Design (ECCTD), 2011.
[9] E. Hertz and P. Nilsson, “A Methodology for Parabolic Synthesis,” a
book chapter in VLSI, In Tech, ISBN 978-3- 902613-50-9, pp. 199-220,
Vienna, Austria, September 2009.
[10] E. Hertz and P. Nilsson, “Parabolic Synthesis Methodology
Implemented on the Sine Function,” in Proc. of the 2009 International
Symposium on Circuits and Systems (ISCAS’09), Taipei, Taiwan, May 2427, 2009.
[11] Parabolic Synthesis. http://www.intechopen.com/articles/show/title/amethodology-for-parabolic-synthesis
[12] J.-M. Muller, Elementary Functions: Algorithm Implementation,
second ed. Birkhauser, ISBN 0-8176-4372-9, Birkhauser Boston, c/o
Springer Science+Business Media Inc., 233 Spring Street, New York, NY
10013, USA, 2006.
[13] A. A. Giunta and L. T. Watson, “A Comparison of Approximation
Modeling Techniques,” American Institute of Aeronautics and
Astronautics, AIAA-98-4758, Blacsburg, USA, September 1998.
[14] Personal discussion and helping material provided by Peter Nilsson
and Erik Hertz.
[15] Madhubabu Nimmagadda and Surendra Reddy Utukuru, “Sine
Function Approximation using Parabolic Synthesis and Linear
Interpolation”, Master Thesis, Lund University,2011.
[16] Jingou Lai, “Hardware Implementation of the Logarithm Function
using Improved Parabolic Synthesis”, Master Thesis, Lund University,
2013.
74
List of Figures
Figure 2.1: Comparison of original function,
, with straight line x=y 7
Figure 2.2: A strictly convex first help function,
. ................................. 8
Figure 2.3: Comparison of first help function,
, with second subfunction,
. ................................................................................................ 9
Figure 2.4: Second help function,
, pair of opposite convex and concave
functions. ..................................................................................................... 10
Figure 2.5: Linear interpolation of a normalized function .......................... 13
Figure 3.1:Three stage Architecture............................................................ 19
Figure 3.2: Basic hardware for loop unrolled architecture ......................... 20
Figure 3.3: Detailed hardware architecture for 4 sub-function parabolic
synthesis ...................................................................................................... 21
Figure 3.4: Architecture of parabolic synthesis with non-linear interpolation
..................................................................................................................... 22
Figure 3.5: Specially designed 6-bit squarer ............................................... 23
Figure 4.1: Error behavior for Parabolic Synthesis ..................................... 25
Figure 4.2: The distribution of error between original function and the
approximation ............................................................................................. 28
75
Figure 5.1: Block diagram of the architecture ............................................ 30
Figure 5.2: Pre processing block ................................................................. 31
Figure 5.3: Two’s complement architecture for sine function .................... 34
Figure 5.4: First help function,
. .......................................................... 35
Figure 5.5: First help function and the linear interpolation of the first help
function. ...................................................................................................... 36
Figure 5.6: Approximation of the difference of first help function subtracted
with the linear interpolation of first sub-function ....................................... 37
Figure 6.1: Pre processing block ................................................................. 40
Figure 6.2: First sub-function architecture for sine and cosine................... 41
Figure 6.3: Hardware design for combined second sub-function ............... 43
Figure 6.4: Multiplication of outputs from first and second sub-function
blocks .......................................................................................................... 43
Figure 6.5: Post processing architecture for all four quadrants .................. 44
Figure 6.6: The final architecture ................................................................ 45
Figure 6.7: Critical path of the design......................................................... 46
Figure 6.8: Internal word lengths of the design .......................................... 48
Figure 7.1: Approximation of sine and cosine functions and the error....... 51
76
Figure 7.2: The absolute accuracy in bits of approximation, before and after
optimization ................................................................................................ 52
Figure 7.3: Error behavior of sine function after truncation ....................... 54
Figure 8.1: Minimum area results in a bar graph ........................................ 58
Figure 8.2: Different modules in the design ............................................... 59
Figure 8.3: Approximated area for one function......................................... 60
Figure 8.4: Frequency results for LPHVT and LPLVT .............................. 61
Figure 8.5: Total power comparison for LPHVT and LPLVT at the
frequency 10MHz ....................................................................................... 63
Figure 8.6: ASIC synthesis analysis for area .............................................. 65
Figure 8.7: Frequency for the ASIC implementation of different
methodologies ............................................................................................. 66
Figure 8.8: Power dissipation analysis for different algorithms ................. 67
77
78
List of Tables
Table II: Output Transforms ....................................................................... 33
Table III: Coefficient Values for Sine Function ......................................... 38
Table IV: Coefficient Values For Cosine Function .................................... 38
Table V: Truncated Coefficient Values for Sine Function ......................... 49
Table VI: Truncated Coefficient Values For Cosine Function ................... 49
Table VII: The Error Metrics for the Truncated and Optimized
Implementation ........................................................................................... 55
Table VIII: Minimum Area Results for LPHVT and LPLVT .................... 58
Table IX: The Area Results for Individual Blocks in Design for LPHVT @
1.2 Volts ...................................................................................................... 59
Table X: Approximated Area for Single Function...................................... 60
Table XI: Speed Results for LPHVT and LPLVT at Normal Constrainst .. 61
Table XII: Power Analysis Using LPHVT Libraries at Different Voltages 62
Table XIII: Power Analysis Using LPLVT Libraries at Different Voltages
..................................................................................................................... 63
Table XIV: Area Analysis of ASIC Implementation for Different
Methodolgies(LPHVT @ 1.20V) ............................................................... 65
79
Table XV: Frequency for the ASIC Implementation of Different
Methodologies (LPHVT @ 1.20V) ............................................................ 66
Table XVI: Power Analysis for Different Methodologies ( LPHVT
@1.20V) ...................................................................................................... 67
80
List of Acronyms
CMOS
Complementary Metal Oxide Semiconductor
DSP
Digital Signal Processing
CORDIC
COordinate Rotation DIgital Computer
VHDL
VHSIC Hardware Design Language
VHSIC
Very High Speed Integrated Circuit
RMS
Root Mean Square
MUX
Multiplexer
AC
Alternating Current
DC
Direct Current
MSB
Most Significant Bit
LPLVT
Low Power Low
LPHVT
Low Power High
VCD
Value Change Dump
LUT
Look-Up Table
81
Series of Master’s theses
Department of Electrical and Information Technology
LU/LTH-EIT 2015-422
http://www.eit.lth.se
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement