Institutionen för systemteknik Department of Electrical Engineering Direct Digital Frequency Synthesis in

Institutionen för systemteknik Department of Electrical Engineering Direct Digital Frequency Synthesis in
Institutionen för systemteknik
Department of Electrical Engineering
Examensarbete
Direct Digital Frequency Synthesis in
Field-Programmable Gate Arrays
Examensarbete utfört i Elektroniska System
vid Tekniska högskolan i Linköping
av
Petter Källström
LiTH-ISY-EX--10/4403--SE
Linköping 2010
Department of Electrical Engineering
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköpings tekniska högskola
Linköpings universitet
581 83 Linköping
Direct Digital Frequency Synthesis in
Field-Programmable Gate Arrays
Examensarbete utfört i Elektroniska System
vid Tekniska högskolan i Linköping
av
Petter Källström
LiTH-ISY-EX--10/4403--SE
Handledare:
Oscar Gustafsson
ISY, Linköpings universitet
Examinator:
Oscar Gustafsson
ISY, Linköpings universitet
Linköping, 19 April, 2010
Avdelning, Institution
Division, Department
Datum
Date
Electronic Systems
Department of Electrical Engineering
Linköpings universitet
SE-581 83 Linköping, Sweden
Språk
Language
Rapporttyp
Report category
ISBN
Svenska/Swedish
Licentiatavhandling
ISRN
Engelska/English
⊠
Examensarbete
⊠
C-uppsats
D-uppsats
Övrig rapport
2010-04-19
—
LiTH-ISY-EX--10/4403--SE
Serietitel och serienummer ISSN
Title of series, numbering
—
URL för elektronisk version
http://www.es.isy.liu.se
http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-56550
Titel
Title
Digital Frekvenssyntes för FPGAer
Direct Digital Frequency Synthesis in Field-Programmable Gate Arrays
Författare Petter Källström
Author
Sammanfattning
Abstract
This thesis is about creation of a Matlab program that suggests and automatically
generates a Phase to Sine Amplitude Converter (PSAC) in the hardware language
VHDL, suitable for Direct Digital Frequency Synthesis (DDFS). Main hardware
target is Field Programmable Gate Arrays (FPGAs).
Focus in this report is how an FPGA works, different methods for sine amplitude generation and their signal qualities vs the hardware resources they use.
Nyckelord
Keywords
PSAC, DDFS, FPGA, Matlab, Frequency Synthesis
Abstract
This thesis is about creation of a Matlab program that suggests and automatically generates a Phase to Sine
Amplitude Converter (PSAC) in the hardware language VHDL, suitable for Direct Digital Frequency Synthesis
(DDFS). Main hardware target is Field Programmable Gate Arrays (FPGAs).
Focus in this report is how an FPGA works, different methods for sine amplitude generation and their signal
qualities vs the hardware resources they use.
Sammanfattning
Detta exjobb handlar om att skapa ett Matlab-program som föreslår och implementerar en sinusgenerator i
hårdvaruspråket VHDL, avsedd för digital frekvenssyntes (DDFS). Ämnad hårdvara för implementeringen är
en fältprogrammerbar grindmatris (FPGA).
Fokus i denna rapport ligger på hur en FPGA är uppbyggd, olika metoder för sinusgenerering och vilka
kvaliteter på sinusvågen de ger och vilka resurser i hårdvaran de använder.
v
Acknowledgments
I would like to thank my supervisor and examiner Oscar Gustafsson, and Daniel Källming for a good opponent.
I would also want to thank Kent Palmkvist and a few others for technical support and advices during the thesis.
The entire ES corridor, Mikael Karlsson, Emanuel Eliasson, Kaveh Azizi and the Bertramm group have also
been very supportive and have kept me in a good mood - thank you all.
vii
Contents
1 Introduction
1.1 Background . . . . . . . . . . . . . .
1.1.1 DDFS . . . . . . . . . . . . .
1.2 Purpose . . . . . . . . . . . . . . . .
1.2.1 Quality vs Resource Problem
1.3 This Document . . . . . . . . . . . .
1.4 Project Approach and Overview . .
1.4.1 Implementation Language . .
1.4.2 Finding Existing Methods . .
1.4.3 Target Architectures . . . . .
1.4.4 Modeling and Analysis . . . .
1.4.5 VHDL Implementation . . .
1.4.6 Suggester . . . . . . . . . . .
1.5 Limitations . . . . . . . . . . . . . .
1.6 Notations and Abbreviations . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3
2 Methods
2.1 Symmetry Using Range Divider . . . . . . .
2.2 ROM/LUT . . . . . . . . . . . . . . . . . .
2.3 Decomposition . . . . . . . . . . . . . . . .
2.3.1 Polynomial Interpolation Alternative
2.3.2 Phase Truncation Alternative . . . .
2.3.3 Hutchinson’s Approach . . . . . . .
2.3.4 Sunderland’s Approach . . . . . . .
2.3.5 Nicolas’ Approach . . . . . . . . . .
2.3.6 Curticăpean’s Approach . . . . . . .
2.4 CORDIC . . . . . . . . . . . . . . . . . . .
2.4.1 Janiszewskis Hybrid . . . . . . . . .
2.5 Sine Compression . . . . . . . . . . . . . . .
2.5.1 Very Coarse Approximations . . . .
2.6 Complex Rotation . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
7
9
9
10
11
11
11
12
12
12
14
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Target Architectures
15
3.1 Common FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Altera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Xilinx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Modeling
4.1 Quality Units . . . . . . . . . . . . . .
4.2 Frequency Control Word Effects . . .
4.3 Rounding Noise Analysis . . . . . . . .
4.3.1 Methods . . . . . . . . . . . . .
4.4 Algorithm Verification . . . . . . . . .
4.4.1 ROM/Polynomial . . . . . . .
4.4.2 Other Decomposition Solutions
4.4.3 CORDIC . . . . . . . . . . . .
4.4.4 Sine Compression . . . . . . . .
4.4.5 Method Codes . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
20
20
20
21
21
24
24
24
24
x
Contents
4.5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
26
27
28
28
5 Implementations
5.1 ROM . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The Function create_rom . . . . . .
5.1.2 The VHDL Implementation . . . . .
5.2 SURD Implementation . . . . . . . . . . . .
5.3 Polynomials . . . . . . . . . . . . . . . . . .
5.3.1 The Function psac_polynomial_rom
5.3.2 The Function psac_polynomial . . .
5.3.3 The Function create_polynomial . .
5.3.4 The VHDL Implementation . . . . .
5.4 Sunderland’s . . . . . . . . . . . . . . . . .
5.4.1 The Function psac_sunderland_rom
5.4.2 The Function psac_sunderland . . .
5.4.3 The Function create_sunderland . .
5.4.4 The VHDL Implementation . . . . .
5.5 Test Bench . . . . . . . . . . . . . . . . . .
5.5.1 The Function create_testbench . . .
5.5.2 The VHDL Solution . . . . . . . . .
5.6 Automatic Generation/Verification . . . . .
5.6.1 The Function test_psac . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
31
31
32
32
32
32
33
33
33
34
34
34
34
35
35
35
35
35
4.6
Truncation Noise Analysis . . . . . . .
4.5.1 Polynomial . . . . . . . . . . .
4.5.2 Other Decomposition Solutions
4.5.3 Sine Compression . . . . . . . .
Conclusion . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
6 Suggester
37
6.1 The Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7 Result
39
8 Conclusions And Possible Improvements
41
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
8.2 Suggested Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Bibliography
43
A What is...?
45
B Quality and Resource Tables
B.1 ROM/Polynomial . . . . . . . . . . . .
B.2 Other Decompositions . . . . . . . . .
B.2.1 The F and Method Groupings
B.2.2 The Quality Groupings . . . .
B.3 Sine Compression . . . . . . . . . . . .
C Polynomial VHDL Example
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
47
51
51
51
53
58
Chapter 1
Introduction
This chapter will discuss the project and this report, and introduce some terms that can be good to know in
the thesis.
1.1
Background
Digital electronics become more and more widely used compared to analogue electronics. Many tasks that have
earlier been implemented with analogue circuits are today more suitable - one way or another - to be replaced
by digital technology.
One area that grows fast is for instance wireless communication, where information is sent as radio waves.
This requires that a sine wave is generated that can carry the information. This sine is comparably simple
to generate in analogue electronics, and rather complicated to calculate for digital circuits, why the analogue
method still is in use. The analogue method has two drawbacks; it is hard to control the frequency exactly, and
the generated signal may be hard to manipulate.
You can use a microprocessor to calculate the sine, but this report will focus on the ASIC1 implementation
- that is how to program logics that calculate the sine, rather than how to program the instructions that is
executed by the microprocessor in order to calculate the sine.
The main target for the ASIC is to use an FPGA - Field Programmable Gate Array, that is an electrical
chip with such programmable logic. This will be further described in chapter 3, “Target Architecture”.
1.1.1
DDFS
The term DDFS stands for Direct Digital Frequency Synthesis, and in this context means a way to produce
a sine wave with a given frequency. The method also use a clock signal that defines the time between one
calculation and the next. The clock signals typically flips between ’0’ and ’1’ at a frequency of fclk , for instance
300 MHz.
The DDFS usually contains a counter that counts from 0 to something big, and then restarts, and counts
up with a number – the Frequency’s Control Word (FCW). The content of the counter is treated as a phase
(angle, here called xN ), and is then sent to a Phase to Sine Amplitude Converter (PSAC) that calculates a sine
value, y, for the phase. See figure 1.1 for an illustration.
This sine value will then over time have the shape of a sine wave with the exact frequency that was given
to the DDFS.
Figure 1.1. Illustrations of the basic DDFS structure
1 Application
Specific Integrated Circuit, a method of customizing the hardware to a specific need
1
2
Introduction
1.2
Purpose
This thesis aims to develop a method for automatic generation of the PSACs for FPGAs, for different PSAC
methods, including a way of suggesting a suitable PSAC method for different types of FPGAs and for different
requirements. The purpose of this is to simplify the creation of the PSACs for FPGA developers.
1.2.1
Quality vs Resource Problem
One of the main problems with a manual implementation is that the user may have different requirements on
the signal. It can for instance be suitable to lower the quality on the signal in order to save some resources.
Problems like this can be time wasting to solve manually, and this project includes a solution for that (see
chapter 6, “Suggester”).
1.3
This Document
This documents is mainly intended for those with a basic knowledge in digital technology and a basic knowledge
in math. For those who are not familiar with all terms and expressions, appendix A, “What is...?” contains a
list of abbreviations/concepts and a short explanation.
1.4
Project Approach and Overview
Before the project was started there were some things to decide. First of all which language the project should
be implemented in, but also how to split up the project in sub tasks.
This section introduce each of those sub tasks, which more or less corresponds to the chapters in this report.
1.4.1
Implementation Language
The first thing to do was to decide in which programming language the system should be built. There were
three main alternatives: Matlab from The Mathwork Inc, Microsoft Excel and any high level language, like
C++ or Java.
Language
Matlab
Excel
C++
Benefit
Very good support for mathematic analysis, convenient file
I/O, widely used for similar
tasks.
Easy to create good graphical user interface, and to store
much data.
Widely known.
Drawbacks
Everyone doesn’t have Matlab.
Hard to do an FFT in Visual
Basic.
Not very suitable for this kind
of calculations
Table 1.1. Benefits and drawbacks for different implementation languages
The choice fell on Matlab, mostly because of the mathematical intensity of the program.
1.4.2
Finding Existing Methods
The first thing to do was to elaborate what else had already been done on the topic. The most interesting and
suitable methods were then chosen for futher analysis.
In this task it is good to have read up on the FPGA architectures, to easier know which methods are
interesting, and which one can be omitted at once. However, when studying the FPGAs it is good to know the
methods, to know what to look for. Therefore this task is put before the FPGA architecture study.
Many methods have one or more parameters, the most obvious one is the decomposition of the phase bits,
which can be done in N +1 ways (if the phase is N bits wide). In this report, and in the Matlab implementation,
the word configuration is used to describe a specific method and its different parameters. More about this in
chapter 2, “Methods”.
1.5 Limitations
1.4.3
3
Target Architectures
The main target is FPGAs, and the suggest functionality2 should have a good knowledge about the different
FPGA types and architectures, why a study of the most common FPGA types is required.
1.4.4
Modeling and Analysis
The chosen methods are verified and analyzed in Matlab. The worst methods are discarded from the project.
In order to analyze the resources needed for the methods, this task should be performed after the FPGA
architecture study.
1.4.5
VHDL Implementation
The main purpose with the project is to create a PSAC. In practice that means that one or more files with
VHDL-code3 are generated.
There are several other hardware describing languages as well, but the requirements specify a VHDL generator.
The Matlab implementation should be able to generate VHDL implementations for the given methods, in
all possible configurations, plus testbenches that verify the functionality.
This task must of course be performed after the method modeling. This task is covered in chapter 5.
1.4.6
Suggester
One of the more tricky parts of the project is to write an algorithm that checks for the best implementation
according to given preferences/limitations and a given FPGA architecture.
This task must also be performed after the method modeling. In order to get a better model of the resources
that are used, this task is placed after the VHDL implementation.
1.5
Limitations
The main limitation of this master thesis is the time of 800 hour, including time to present and defend the work
and the time to hold the place of an opponent. Within this time limit the thesis aims to be as good as possible,
according to some prioritation.
The main effects are a reduced subset of algorithms/methods, a limited extension of configurations within
the methods, and a restriction to only test the code for Matlab in Unix, in difference to Windows and/or
Octave4. It also affects the suggester function, both in efficiency and in complexity. More about the limitations
in the respective chapters.
1.6
Notations and Abbreviations
This is only a short list of the most important abbreviations. See Appendix A - “What is...?” for a complete
list of abbreviations, notations and descriptions.
dB - decibel, a logarithmic scale for comparing relative difference.
dBc - dB relative to the carrier, measured as the difference in power.
FCW - Frequency Control Word
LUT - Look Up Table, or function generator.
ModelSim - A program from Mentor Graphics aimed to compile and simulate for instance VHDL code.
PSAC - Phase to Sine Amplitude Converter, a sine function.
SFDR - Spurious Free Dynamic Range, a way to measure signal purity.
SINAD - Signal to Noise And Distortion ratio, a way to measure signal purity.
2 Suggests
a suitable configuration - see 1.4.6
VHDL code describes how the FPGA should be programmed to realize the intended behaviors
4 Octave is an open source variant of Matlab
3 the
4
Introduction
SNR - Signal to Noise Ratio, a way to measure signal purity.
SURD - Symmetry Using Range Divider. A phrase within this thesis for a way to reduce the input range to
the PSAC.
VHDL - A Hardware Description Language, describes how the information flows and is treated in the FPGA.
Chapter 2
Methods
There are of course a number of ways to compute a sine wave.
The authors Langlois et al. has summarized a large number of DDFS techniques[1], which is the main source
of methods in this thesis.
Some notations:
N - Number of bits in phase. The two MSBs are not used more than in the SURD.
C - Coarse, some of the most significant bits from the phase (except the two used by the SURD).
F - Fine, some of the least significant bits from the phase.
D - Some bits between C and F in Sunderland’s method. D is just a suitable letter between C and F .
W - Number of bits in amplitude. The MSB is not used more than in the SURD.
K - Number of coefficients in polynomial solutions.
SURD - Symmetry Using Range Divider. A method that uses the symmetry of the sine to reduce the phase
with two bits.
x - The first 90◦ of the phase.
xN,Q,C,D,F - The phase containing the N , C, D resp. F bits. xQ is the two bits used by SURD.
Those notations will be futher described in their respective sections below.
2.1
Symmetry Using Range Divider
The Symmetry Using Range Divider (SURD) decreases input phase from range 0–360◦ to either 0–90◦, for
algorithms returning only sine, or 0–45◦ for quadrature algorithms, returning both sine and cosine, by using the
symmetry in the sine wave. This method is used as a wrapper function around other algorithms, minimizing
the input range to the functions and compensating the output result from the function. All discussions and
illustrations here will assume the 0–90◦ version, for simplicity reasons. See figure 2.1 for the phase decomposition,
or figure 2.2 for the phase and amplitude effects. Equation 2.1 illustrate the math behind the SURD.

sin(90◦ · x),
xQ = 0 (0◦ )



◦
sin(90 · (1 − x)),
xQ = 1 (90◦ )
sin(90◦ · xN ) = sin(90◦ · (xQ + x)) =
(2.1)
◦
− sin(90 · x),
xQ = 2 (180◦ )



◦
◦
− sin(90 · (1 − x)), xQ = 3 (270 )
The SURD is divided into the two parts PreSURD and PostSURD, where PreSURD handles the phases
phase inversion (1 − x), and PostSURD inverts the result, if needed.
The values xN are in equation 2.1 meant to be a number 0 to 4 (0 ≤ xN < 4), which implies that xQ (that
are the 2 MSBs of xN ) are an integer with value 0, 1, 2 or 3. x is then the fraction part, 0 ≤ x < 1.
Note that xN is just a series of ones and zeros, and that this representation uses a decimal point between
the xQ and x fields. The sine approximation function may set the decimal point somewhere else for a suitable
representation.
Because of the great benefits with SURD (see table 2.1), it is used in all implementations.
5
6
Methods
Figure 2.1. SURD decomposition
Figure 2.2. The SURD signal effects.
Benefits
Drawbacks
Other
properties
The other algorithms can be designed with a very much smaller
memories.
In those cases there is a ROM dedicated for this task and it is
big enough to fit the entire 0–360◦, than that may be slightly
faster.
This method does not affect the precision of the result, but will
increase the tCO and tSU with the time of one adder resp. one
inverter, and will use slightly more logic resources. Depending
on which other algorithm that is used it will however save loads
of ROMS and/or FAs.
Table 2.1. Some properties for SURD
2.2
ROM/LUT
The ROM/LUT method uses a big look up table, or better known as ROM, to store the entire function. See
table 2.2.
sin(x) = R[x],
where R is the ROM and x is used as the address.
This method has high priority according to the requirements. Langlois[1] mention it.
(2.2)
2.3 Decomposition
7
Benefits
Drawbacks
Other
properties
Works fast, very simple.
Grows exponentially with input width.
Exact result (as exact as possible with actual word width).
Suitable for simulator and FPGAs with big ROMs and/or high
performance requirements.
Table 2.2. Some properties for ROM implementation
2.3
Decomposition
Another method than the ROM solution may be “decomposition solutions” (or “bipartite solutions”), discussed
as follows.
Split the N − 2 input bits into C (Coarse) MSB and F (Fine) LSB, where N = 2 + C + F (the 2 MSBs are
reserved for the SURD). Let x = xC + xF be the values stored by the C and F bits, as illustrated in figure 2.3.
Or, in the Sunderland’s approach case (see section 2.3.4 below), let N be 2 + C + D + F and x = xC + xD + xF
in a similar way. See figure 2.3 and 2.4.
Figure 2.3. Decomposition of quartile phase into two fields
Figure 2.4. Decomposition of quartile phase into three fields
For example, if N = 11, C = 4, F = 5 and xN = 11000011111, then xQ = 11, xC = 0000 and xF = 11111.
2.3.1
Polynomial Interpolation Alternative
Split the phase into 2C parts, and calculate each part as a polynomial, according to figure 2.5. Because you
will use all coefficients for a certain polynomial in the same time (± a few clock cycles), you can read out all
the coefficients for that polynomial in the same time. Therefore, you can use a ROM with 2C lines and store
the K coefficients side by side in it, one polynomial per ROM line. This way you only need one ROM (which
may however require several ROMs if the total required number of bits does not fit into one ROM).
When using K coefficients you get a (K − 1):th grade polynomials. The “x” in the polynomials are the F
fraction bits, xF , according to equations 2.3.
sin(xC
sin(xC
sin(xC
sin(xC
+ xF ) ≈ R1 [xC ],
+ xF ) ≈ R1 [xC ] · xF + R2 [xC ],
+ xF ) ≈ (R1 [xC ] · xF + R2 [xC ]) · xF + R3 [xC ],
+ xF ) ≈ (...(R1 [xC ] · xF + R2 [xC ])... + RK−1 [xC ]) · xF + RK [xC ],
K
K
K
K
=1
=2
=3
≥ 4,
(2.3)
where Rk [xC ] is coefficient k for polynomial xC .
The coefficients on a line is used for a (K − 1):th grade polynomial applied on the F truncated bits. This
allows C to be small if K is fairly big without loosing too much quality. In table 2.3 the quality terms SINAD
and SFDR are used, those will be described in section 4.1, Modelling - Quality Units.
Worth mentioning is that a high K requires complex calculations, which will cause a high latency and/or a
huge restriction of the clock frequency.
The last coefficient in the ROM (RK ) will always be W − 1 bits wide, because it must store the “offset”
position for that polynomial, which will be within the range [0, 2W −1 ). The previous coefficients will shrink
with roughly C bits per coefficient.
8
Methods
Figure 2.5. Polynomial decomposition illustration
K
Big
C
Big
Benefit
Exact value (except noise in
W’s LSBs).
Big
Small
Few ROMs are needed.
Small
Big
Small
Small
Few and small mults are required.
Resources effective, easy to
calculate.
Drawbacks
Much memory and many
(rather small) multipliers
needed.
Many and big multipliers are
required. Low SFDR.
Much memory is required.
Low SINAD.
Rough approximation. Very
low SFDR and SINAD.
Table 2.3. Some properties for polynomial solution
One of the big implementation problems with polynomials is to create the coefficients, and which approach
to use. In table 2.4 some methods are mentioned, which will be analyzed futher in chapter 4.
Least Square
Chebyshev
Interpolation
TaylorLeft
TaylorMid
Truncation
ROM
Minimize average error ⇒ maximize SFDR (in general)
A kind of interpolation where the interpolation points have been chosen to minimize the
maximum error ⇒ maximize ENOB ⇒ maximize SINAD (in general).
Like Chebyshev interpolation, but “stretch out” the points so there are one point in each
edge of the ranges.
Taylor series from a point in the left edge of the F interval.
Taylor series from a point in the middle of the F interval.
Just pick the left most value in each range. Requires that K=1.
The ROM solution (section 2.2) is a truncation special case where F=0.
Table 2.4. Some methods for polynomial coefficients
An explanation ◦to the ◦Chebyshev points: They are placed within each polynomial ranges in the same way
), i = 0..K − 1 are placed in the range (−1, 1).
as the values cos( 90 +i·180
K
2.3 Decomposition
2.3.2
9
Phase Truncation Alternative
Truncate the input to the C MSB in order to save some ROM. This is a special case of polynomial interpolation
where K = 1.
This method can be seen either as a way of reducing the ROM size as mentioned (by reducing the phase bus
to the memory), or it can be seen as taking a ROM solution and increase the phase accumulator width with F
bits in order to increase the frequency precision, but without giving the extra bits to the ROM – see table 2.5.
sin(xC + xF ) ≈ sin(xC ) = R[xC ]
View
Benefits
Drawbacks
Truncated phase bus
Smaller ROM.
Much more noise.
(2.4)
Increased phase accumulator
Higher frequency resolution.
More noise.
Table 2.5. Some properties for phase truncation
2.3.3
Hutchinson’s Approach
Hutchinson et al.[2] suggested a trigonometric approximation. This approach[1] uses the approximation sin(x) =
sin(xC + xF ) = sin(xC ) cos(xF ) + sin(xF ) cos(xC ) ≈ sin(xC ) + sin(xF ) cos(xC ) which is without multiplication,
since sin(xF ) cos(xC ) is precalculated and stored in a separate ROM - a ROM that will be as high as the pure
ROM solution, but around W − F bits wide instead of W − 1 bits. In figure 2.6 the 3 graphs illustrate a solution
where C is 5, 3 and 1. The einf and e2 values are the maximum and average errors for the values, the terms
will be better described in section 4.1. See also table 2.6.
sin(x) ≈ R1 [xC ] + R2 [x]
(2.5)
Hutchinson method, N=11, W=11
C=5, F=4 => rom=4 kbits
einfmsb=0.00185, e2msb=0.000541
C=3, F=6 => rom=5 kbits
einfmsb=0.0181, e2msb=0.00575
1.2
1.2
1
1
0.8
0.8
0.6
0.6
C=1, F=8 => rom=6 kbits
einfmsb=0.206, e2msb=0.0663
1.4
1.2
1
0.8
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
Figure 2.6. Hutchinsons implementation examples
The method can be improved by setting the content of ROM 2 to a “correction” to ROM 1. R2 [x] =
sin(x) − R1 [xC ], rather than Hutchinsons original assignment R2 [x] = sin(xF ) cos(xC ). This improvement does
not affect the implementation cost of the algorithm, but removes the algorithmic error. This variant is in this
thesis called Hutchinson’s 2, and it will be futher analyzed in chapter 4, Modelling, page 19.
10
Methods
Benefit
Drawbacks
Very simple and quick.
Requires very much memory, and is rather inexact.
Table 2.6. Some properties for Hutchinsons approach
2.3.4
Sunderland’s Approach
An extended version of Hutchinson’s method was suggested by Sunderland et al.[3], where they divide the N − 2
input bits into 3 fields:
sin(x) = sin(xC + xD + xF )
= sin(xC + xD ) · cos(xF ) + cos(xC + xD ) · sin(xF )
= sin(xC + xD ) · cos(xF ) + cos(xC ) · cos(xD ) · sin(xF ) − sin(xC ) · sin(xD ) · sin(xF )
≈ sin(xC + xD ) + cos(xC ) · sin(xF )
(2.6)
and so
sin(x) = R1 [xC+D ] + R2 [xC+F ]
R1 [xC+D ] = sin(xC + xD )
R2 [xC+F ] = cos(xC ) · sin(xF )
(2.7)
where xC+D is the xC and xD bits concatenated together, and corresponding for the xC+F bits.
Sunderland method, N=11, W=11
C=1, F=4 => rom=512 bits
einf
=0.0308, e2
=0.00781
msb
C=0, F=6 => rom=592 bits
einf
=0.173, e2
=0.0511
msb
msb
1.2
1.2
1
1
0.8
0.8
C=0, F=8 => rom=3 kbits
einf
=0.413, e2
=0.153
msb
msb
msb
1.6
1.4
1.2
1
0.6
0.6
0.4
0.4
0.2
0.2
0.8
0.6
0.4
0.2
0
0
0
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
Figure 2.7. Sunderlands implementation examples
Benefit
Drawbacks
Other
properties
Very simple to implement in VHDL.
Rather bad precision.
Due to the triple decomposition of the phase, this method have
very many configurations when N is big.
Table 2.7. Some properties for Sunderlands method
1.5
2.4 CORDIC
2.3.5
11
Nicolas’ Approach
Nicholas et. al. [4] suggested that the ROM contents in Sunderland’s approach should be changed and optimized
according to some goal (e.g. high SFDR). Due to time limitation this optimization will not be investigated.
2.3.6
Curticăpean’s Approach
Curticăpean et al.[5] suggested an improvement to Hutchinson’s approach, that stores sin(xF ) and cos(xC ) in
one ROM each and multiplies them together. See table 2.8 and figure 2.8.
sin(x) ≈ R1 [xC ] + R2 [xC ] · R3 [xF ]
(2.8)
where R1 and R2 can be stored side by side in one Rom (because both are addressed with xC ).
Curticapean method, N=11, W=11
C/F=5/4 => rom=736 bits
einf
=0.00145, e2
=0.000511
msb
C/F=3/6 => rom=672 bits
einf
=0.0186, e2
=0.00577
msb
msb
1.2
1.2
1
1
0.8
0.8
0.6
0.6
C/F=1/8 => rom=3 kbits
einf
=0.206, e2
=0.0664
msb
msb
msb
1.4
1.2
1
0.8
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
Figure 2.8. Curticăpeans implementation examples
Benefit
Drawbacks
Saves some ROM.
Cost one multiplication.
Table 2.8. Some properties for Curticăpeans method, relative Hutchinson’s
2.4
CORDIC
The CORDIC algorithm is a very resource efficient and “exact” method for sine calculation, it requires no
multiplication and very little ROM. It does, however, require N − 2 comparison with corresponding additions/subtractions, that must be executed after each others, which gives either a slow clock or a very large latency.
CORDIC stands for Coordinate Rotation Digital Computer, and is a set of algorithms based on the idea
to process the input with smaller and smaller steps toward the zero, and at the same time process the output
with some corresponding operations. In each step there is a binary decision, like “increase or decrease” which
affect the following operations on the (modified) input and output values. After a predefined number of steps,
the output is ready. The CORDIC sine method is carefully described on for instance Wikipedia[6].
This method has medium priority according to specification.
12
Methods
Benefit
Drawbacks
No multiplication, very limited ROM.
Takes long time.
Table 2.9. Some properties for CORDIC
2.4.1
Janiszewskis Hybrid
Look up the first C bits in a LUT, and feed that to step K..W of the CORDIC. Due to the drawback in table
2.10, this method will not be implemented.
Benefit
Drawback
Faster than CORDIC, smaller than LUT
Still not as fast as for instance polynomial solution.
Table 2.10. Some properties for Janiszewskis hybrid
2.5
Sine Compression
Calculate a rough estimation to the sine somehow, and include a ROM that contains the errors. This ROM will
be as high as the pure ROM solution, but much thinner.
As estimation it is convenient to use any of the methods discussed in Decomposition above, or a “very coarse
approximation” (see below). Some drawbacks and benefits are shown in table 2.11.
Benefit
Drawbacks
You get an “exact” solution (errors ≤ 12 LSB)
The extra ROM needs to be 2N −2 rows high.
Table 2.11. Some properties for sine compression
This compressed ROM may then and now be called a “correction ROM” to the approximating function.
2.5.1
Very Coarse Approximations
Some extremely simple approximations is worth mentioning. All approximating sin(90◦ · x), where 0 ≤ x < 1.
The cost of using those approximation is the cost of the approximations themselves plus one adder, where
you add the approximated value to the compressed ROM output. The cost of one register for storing the
approximation may be needed. See figure 2.9 for illustrations.
Due to time limitations, those methods will not be implemented.
Identity Approximation
The simplest one.
sin(x) ≈ x.
(2.9)
This can be shown to save 2 bits of ROM width, which holds even if x is truncated to 5 bits.
Langlois Approximation
Named after Langlois and Al-Khalili[7].
sin(x) ≈







3
2 x,
5
32 +
1+x
2 ,
5
16
< 43
0≤x<
x,
5
16 ≤ x
3
4 ≤ x<
(2.10)
1
This will save 4 bits of ROM width. All segments in this solution uses at most one adder and no other logic.
2.5 Sine Compression
13
Sodogar Approximation
Named after Sodogar and Lahiji[8].
sin(x) ≈ x(2 − x),
0 ≤ x < 2.
(2.11)
The subtraction is done using a simple inverse of all bits in x since 0 ≤ x < 2 causes 2 − x to be equal −x, and
the negation of x can be approximated to bitwise inverse in this case. The formula is valid for 0 ≤ x < 2, but
the interesting part is 0 ≤ x < 1 due to the SURD.
1
The error here is < 16
, which means we save 4 bits of ROM width.
LUT Approximation
As will be described in chapter 3, “Target”, the FPGAs are to a high degree built up of very small memories,
so called LUTs. These have typically 4 or 6 address bits, say L input bits, and just one output bit. If you
feed the L MSBs of xC to some LUTs, the LUT outputs can act as an approximation, and will save L − 1
bits from the ROM, assuming at least L LUTs are used. Except the Identity matrix, that do not require any
LUTs at all, the other “very coarse approximation” methods will require at least as many LUTs as precision
uses for the approximation. Because of that, this method will not be more expensive than for instance Langlois
approximation.
If you take only the L − 1 MSB of xC , you can use the output from the compressed ROM as the last bit, and
combine it with the adder functionality (if the used FPGA architecture allows it). In this way the compression
may be around L − 2 bits to the same cost as the identity approximation. The exact value will however not be
investigated due to time limitations.
In the illustration, L is set to 3 for illustration reasons, and the worst error is ≈ 0.2, just below position
x = 0.125. This saves 2 bits of ROM width. The figure does not illustrate effects from a limited number of
LUTs used.
Langlois
1
0.8
0.8
0.6
0.6
amplitude
amplitude
Identity
1
0.4
0.2
0
0.4
0.2
0
0.2
0.4
0.6
0.8
0
1
0
0.2
0.4
x
0.8
1
0.6
0.8
1
LUTs
1
0.8
0.8
0.6
0.6
amplitude
amplitude
Sodogar
1
0.4
0.2
0
0.6
x
0.4
0.2
0
0.2
0.4
0.6
x
0.8
1
0
0
0.2
0.4
x
Figure 2.9. Some “Very Coarse Approximations”
Usability
The four approximations above can be used not only to compress the pure ROM solutions; they can also be
used to compress any coefficients that store a sine table, e.g. the last field in the polynomial solutions, or the
first ROM in Sunderlands method.
14
2.6
Methods
Complex Rotation
Input the frequency rather than the angle. Have a complex vector v, that is multiplied with the complex
constant eangle·i in each step.
Benefit
Drawbacks
Pretty simple, low power, since the sine and cosine is to be calculated once per frequency change rather than once per sample.
Low precision.
Table 2.12. Some properties for complex rotation
This method has very low priority, according to thesis specification.
In applications where the power is critical and resources are cheep this is interesting. If you use a PSAC to
calculate the complex value ef req·i every time the frequency is changed, and then and now between that do a
correction of the actual vector v, the method can save some power because the complex multiplication in some
cases may be more energy efficient than the PSAC calculation.
Chapter 3
Target Architectures
The main implementation target device is FPGAs, Field Programmable Gate Arrays. The FPGA is a chip that
is configurable to behave in almost any way the user want it.
There are two main competing FPGA vendors, Altera and Xilinx, with some FPGA families and generations
each. There are other vendors as well, but this thesis will only cover those two.
To simplify the handling of the different families/generations of FPGAs in this thesis, they have been assigned
abbreviations, or codes. See tables 3.2 and 3.4 for the codes.
3.1
Common FPGA Architecture
The FPGA is normally used for digital signal processing, “glue logic”, and other types of digital tasks. Therefore
it is both generalized and specialized in the same time. Typical FPGA components are:
Logic: FPGAs are mainly built up by many small LUTs (look-up tables), FAs (Full Adders), and DFFs (D-flipflops), and small muxes. Many LUTs can also work as memories. Xilinx group the logic into slices/CLBs,
Altera group it into LE/ALM/LABs. See figure 3.1 for a very simplified example of how the structure
can be organized. A LUT with e.g. 6 inputs (6-LUT) can implement any combinatorial function of those
inputs.
Memories: There use to be synchronous memory blocks, often configurable as RAM, ROM, shift register or
FIFO buffer, and in several different heights and widths. For example the size 4 kBits = 212 bits can be
shaped as 8 bits wide and 512=29 rows high, or 32 bits wide and 128=27 rows high. The address widths
are 9 and 7 bits in those cases, respectively.
Memories can have single port (SP) or dual port (DP) features, meaning you can access the memory
content from one and two sides, respectively. This thesis will only use ROMs with SP configurations, and
hence the rest of the memory configurations are not listed here. For futher improvements the DP may be
interesting, as noted below.
In most cases there are optional bits reserved for parities, usually 1 parity bit for each byte.
The LUTs are very small ROMs. Many of the LABs/CLBs can configure the LUTs as e.g. RAM, but in
this thesis that is extraneous. The ROM function is implicit, and omitted in the ROM list.
Multipliers: Most FPGAs are equipped with dedicated binary multipliers. Altera uses 18 × 18 bits signed or
unsigned, Xilinx uses 25 × 18 signed in their latest FPGAs.
FPGAs typically contains a lots of other features as well, but nothing interesting in this thesis.
3.2
Altera
Altera has a number of different FPGA families[9]. Common for all is:
Logic: The logic is based on LABs (Logic Array Blocks), consisting of either 8-10 ALMs (Adaptive Logic
Module) or 10-16 LE (Logical Element).
• The LE has typically a 4-LUT, a FA and a DFF. In some devices the entire FA is implemented into
the LUT.
15
16
Target Architectures
(a) LE/LC
(b) LAB/CLB
Figure 3.1. Simplified example of a general FPGA logic architecture
• The ALM has typically four 4-LUTs, two FAs and two DFF, but can be configured as two 5-LUTs
with two DFF, or one 6-LUT with a DFF.
ROMs: There are a few different ROMs. See table 3.1.
Multipliers: They have 18x18-multipliers, most of them configurable as 2 · (9 × 9) or 1 · (18 × 18). Both signed
and unsigned (and combined) values are accepted.
Name
Size
Modes
Address widths
M512
M4K
M9K
M144K
512 bits + 1 parity/byte
4 kBits + 1 parity/byte
8 kBits + 1 parity/byte
128 kBits + 1 parity/byte
SP
SP/DP
SP/DP
SP/DP
5 – 9 (18–1 bits data wide)
7 – 12 (36–1 bits data wide)
8 – 13 (36–1 bits data wide)
11 – 14 (72–8 bits data wide)
Table 3.1. ROMs in Altera’s FPGAs
A summary of the Altera FPGAs and their relevant resources can be seen in table 3.2.
Device
Code
ROM
LABs
Multipliers
Cyclone 1 [10]
Cyclone 2 [11]
Cyclone 3 [12], Cyclone 4 [13]
Arria 1 [14]
Arria 2 [15]
Stratix 1 [16], Stratix 1 GX [17]
Stratix 2 [18] Stratix 2 GX [19]
Stratix 3 [20], Stratix 4 [21]
cy1
cy2
cy3, cy4
ar1
ar2
sx1
sx2
sx3, sx4
M4K
M4K
M9K
M512, M4K
M9K
M512, M4K
M512, M4K
M9K, M144K
10 LEs
16 LEs
16 LEs
8 ALMs
10 ALMs
10 LEs
8 ALMs
10 ALMs
no
18x18
18x18
18x18
18x18
18x18
18x18
18x18
Table 3.2. Alteras FPGA Families
3.3
Xilinx
Xilinx has two different families with different generations. Common for all:
Logic: The logic is based on CLBs (Configurable Logic Blocks), which have two or four Slices. Two notations
within this thesis:
• The slice2 has 2·(4-LUT + FA + DFF), but can act as a 5-LUT + 1 DFF. [22]
• The slice4 has 4·(6-LUT + FA + 2 DFFs), but can act as an 8-LUT + 1 DFF. [23]
Xilinx often implements the FAs into the LUTs, except the carry logic, to optimize speed and complexity.
Memories: There are two types of memories: Block SelectRAM = 4-32 kBits, in this thesis abbreviated
’bsRam’, or Slices configured as 16 or 64 bits. Many bsRams do not have any preload functionality and
will not be listed as ROM. The slices ROM functionality is simple LUT usage, which are implicit and will
not be listed in table 3.3.
3.3 Xilinx
17
Multipliers: Xilinx’ multipliers are typically 18x18 bits signed. Unlike Alteras FPGAs you cannot configure
them to be unsigned. To do an unsigned multiplication, which this thesis deals with, you must sacrifice
the sign bit, so that typically 17x17 bits are used.
Name
Size
Modes
Address Widths
bsRam16
bsRam32
16 kbits + 1 parity/byte
32 kbits + 1 parity/byte
SP/DP
SP/DP
10–14 (18–1 bits data width)
10–15 (36–1 bits data width)
Table 3.3. ROMs in Xilinx’ FPGAs
The resource types for the different generations are summarized in table 3.4.
Device
Code
ROM
CLBs
Multipliers
Spartan-3 [24]
Spartan-6 [25]
Virtex [28, 29, 30]
Virtex-II [31, 32]
Virtex-4 [33, 34]
Virtex-5 [35], Virtex-6 [36]
sp3
sp6
vx1
vx2
vx4
vx5,vx6
bsRam16
bsRam16 [26]
no
no
bsRam16
bsRam32 [37]
4
2
2
4
4
2
18x18 signed
18x18 signed [27]
8x8 or 16x16
18x18
18x18 signed
25x18 signed [38]
slices2
slices4
slices2
slices2
slices2
slices4
Table 3.4. Xilinx’ FPGA Families
Chapter 4
Modeling
As decided in the introduction chapter, section 1.4.1, the main language will be Matlab. Therefore all models
will be built in Matlab.
The models have two purposes: Verify (and understand) the algorithm, and to analyze the signal quality.
The quality term refers to how good the signal is compared to how much noise it contains.
In the digital environment in an FPGA there are two types of noises: Truncation and rounding noise.
Truncation noise origins from approximation errors1, e.g. introduced error when doing piecewise linear approximation. Rounding noise is introduced because of finite word length in operations and result, e.g. 11 bits
precision in result, or several mult/adds where each operation introduce a rounding error.
4.1
Quality Units
The quality of a signal uses to be measured in SFDR2 and SNR3 . The SFDR measures how much “louder” the
carrier is than the loudest noise tone. The SNR compares the carrier to the sum of the noise, after the harmonics
to the signal has been removed, this is meant to measure the noise that does not belong to the carrier.
Because the base frequency is f2clk
N (where fclk is the clock frequency), and all occurring tones, carrier as well
as noise, have frequencies F CW · f2clk
N , all tones are harmonics to the base frequency. Therefore there is no ’SNR
noise’. However, the rounding noise is “white” over these discreet frequencies, and can be seen as not belonging
to the carrier signal. Therefore the SNR will in this thesis be counted as only the Carrier-to-rounding-noise
error.
To count the harmonics as noise, you use the measurement SINAD 4 , which counts everything that is not
the carrier as noise.
The SFDR compares the carrier and the loudest noise tone. That tone is usually generated by a truncation
error. The truncation errors is only limited by the approximating algorithm, and they occurres often “in groups”
- that is, if the approximation has an error in one point, it is likely to have similar errors in the closest points.
Other quality methods are the ENOB5 , average error and maximum error, according to the following definitions:
EN OB
e2
einf
=
SIN AD−1.76
6.02
p
mean(e2i ),
=
= max(|ei |)
ei = error in point no i.
(4.1)
i = 0, 1, ..., 2N − 1
e2 is the average error (or more accurately the RMS6 error), and einf is the maximum error. The notations
1/α
are derived from the mathematical terms e2 and e∞ , where eα = (mean(eα
i ))
1 error
= difference between calculated and exact sine value
Free Dynamic Range
3 Signal to Noise Ratio
4 Signal to noise and distortion ratio
5 Effective Number Of Bits
6 Root Mean Square
2 Spurious
19
20
Modeling
4.2
Frequency Control Word Effects
ENOB
SINAD [dB]
SFDR [dB]
The DDFS’ generated signal frequency is controlled through the Frequency Control Word (FCW), according to
equation 4.2.
N
F CW
(4.2)
f (F CW ) = fclk ·
, 0 ≤ FCW < 22
2N
Because of frequency mirror effects the harmonic
SFDR
tones will be mirrored back again when FCW is big, and
N=8, W=11
odd FCW
90
even FCW
due to number theoretical effects, it will never be added
85
upon any other frequency as long as FCW is odd. Therefore the amplitude of the noise is not affected by (odd)
80
FCW, and neither are the amplitude of the carrier signal.
75
0
20
40
60
80
100
120
140
In figure 4.1 you can see how the qualities are constant
SINAD
90
when the FCW is odd, but how they are affected when
FCW is even. One reason for the even FCW behavior is
80
that half of the phases are used twice, and the rest is not
70
used at all. Some rounding errors will therefore not affect
60
the result. Which phases that are used or not depends on
0
20
40
60
80
100
120
140
ENOB
which phase you “start” on, with many possible quality
14
values for a given signal at a given (even) FCW.
13
Because of this a normal approach is to meassure only
one odd FCW, and the analysis of a signal is therefore
12
drastically eased. The analysis in this thesis will use that
11
0
20
40
60
80
100
120
140
technique.
FCW
einf and e2 are dependent of the output values for
each phase, independent of the order, and are thus FCW Figure 4.1. FCW effects on the SFDR, SINAD and
independent (as long as FCW is odd).
ENOB
4.3
Rounding Noise Analysis
In this chapter rounding noise from operations will be discussed under the noise sections belonging to each
method, and this section will only deal with method independent noise.
The rounding noise can be seen as white if W is fairly big, typically at least N − 2, and is therefore not
sensitive to FCW. If W is less than N − 2, then some harmonics can occur, but those will never be bigger than
the base tone divided by 2W . In cases of decomposition, the limit may be C − 2 rather than N − 2, or some
other limit.
4.3.1
Methods
Three rounding methods are covered, illustrated by an example where W = 11, which gives an integer range
between -1023 and 1023 for the result after rounding. E.g. y(phase) ≈ 1023 · sin0 (phase), where y is the output
value from the DDFS, phase is an angle (xQ in the DDFS), and sin0 is an approximation algorithm (the PSAC).
Method 1 describes the maximum amplitude solution.
y0 (phase) = round(1023.5 · sin0 (phase))
(4.3)
where round(...) round toward closest integer.
Method 2 gives possibly less rounding errors when the phase is very close to ± 90◦ , but over all very similar
noise analyzing results as Method 1.
y0 (phase) = round(1023 · sin0 (phase))
(4.4)
Method AWGN is a method to add some White Gaussian Noise with amplitude 12 ·LSB.
y0 (phase) = round(1023 · sin0 (phase) + rnd − 0.5)
(4.5)
where rnd is a (new) random value between 0 and 1 for each phase.
An example can illustrate the direct effect of the AWGN: If the amplitude y0 for a phase should have been
511.75 before rounding, then it may be 511, but with 75 % probability it will be 512.
The effect on the signal, compared to method 2, is that the SFDR is increased because the probability
rounding tends to counteract the continuous aviations in the rounding errors that can occur when W < N − 2.
4.4 Algorithm Verification
21
The SINAD is decreased, because we add some noise. ENOB and einf is 0.5 bits worse, because
it can be
√
rounded away up to 1 LSB. e2 is increased with around 41 % (that means multiplied with ≈ 2).
Due to all the negative aspects of method AWGN, and the notice that the only profit of it is not likely to
happen often, that method is discarded. And because of this, the exact effect from decomposition will not be
investigated.
The difference between method 1 and 2 is so small so only method 1 will be used (a bigger amplitude is
always preferred).
ENOB, N=12
ENOB, N=16
0.2
0.4
0.2
ENOB − W [bits]
ENOB − W [bits]
0
method 1
method 2
method AWGN
−0.2
−0.4
0
−0.2
−0.4
−0.6
−0.6
−0.8
−0.8
2
4
6
8
10
12
14
16
18
−1
20
2
4
6
8
10
W
einf (o) and e2 (x), N=12
18
20
14
16
18
20
14
16
18
20
1
0.7
0.6
0.8
Error [LSB]
Error [LSB]
16
0.9
einf, 1
e2, 1
einf 2
e2 2
einf AWGN
e2 AWGN
0.8
0.5
0.4
0.7
0.6
0.5
0.4
0.3
0.3
2
4
6
8
10
12
14
16
18
0.2
20
2
4
6
8
10
W
12
W
sfdr (x) and sinad (o), N=12
sfdr (x) and sinad (o), N=16
12
16
sinad, 1
sfdr, 1
sinad 2
sfdr 2
sinad AWGN
sfdr AWGN
10
9
8
14
12
[dB] / W
11
[dB] / W
14
einf (o) and e2 (x), N=16
1
0.9
0.2
12
W
7
10
8
6
6
5
4
2
4
6
8
10
12
14
16
W
18
20
4
2
4
6
8
10
12
W
Figure 4.2. The ROM signal quality for the different rounding methods
Figure 4.2 illustrates the methods for N =12 and 16. You can see how Method AWGN (dotted lines) boosts
the SFDR (cross marker in lower graphs) when W is small, but make all other quality units worse.
4.4
Algorithm Verification
This section discusses how the algorithms from chapter 2 works, and briefly what signal qualities that can be
expected from the different configurations.
As defined chapter 2, the phase = 90◦ · (xQ + x), where xQ = 0, 1, 2 or 3, and 0 ≤ x < 1. Due to the SURD7
implementation, this chapter will only illustrate the first quadrant, which means xQ = 0.
For decomposition methods, that split the N − 2 bits in x into C + F or C + D + F bits, the (virtual) decimal
point in x will be moved from the left to the right of xC , making xC to an integer, 0 ≤ xC < 2C , indexing all
sub ranges in the first quartile, and xF (or xD + xF ) to the new fractional part.
4.4.1
ROM/Polynomial
In this chapter x = xC + xF , where xC is the integer part 0...(2C − 1), and xF is the fractional part 0 ≤ xF < 1
with F bits precision, which will be used as the ’x’ in the polynomials. Remember that there will be one
7 Symmetry
Using Range Divider
22
Modeling
polynomial for each xC , whose coefficients are stored in one (or several) ROMs/LUTs addressed with the xC .
The number of coefficients to the polynomials (per range) is denoted K, which gives K = polynomial order
+ 1. The implementation uses Horners scheme in order to save some multiplication, it calculates e.g. the
polynomial P (xF ) = a0 + a1 xF + a2 x2F + a3 x3F as ((a3 xF + a2 )xF + a1 )xF + a0 , if K = 4.
Coefficient Evaluation Methods
There are a number of different methods for calculating the content of the ROM storing the polynomial coefficients, which affects the qualities of the result.
This thesis will investigate five methods, each of them is well known or inherits from a well known approximation method.
The graphs in figure 4.3 (on page 23) shows only the first quadrant, because the rest is just mirrored versions
of this. For illustration reasons C is set to 1 in all examples, which gives 2 polynomials in the shown quadrant.
Furthermore N=10, which gives xF = 7 bits, or a resolution of 2−7 = 1/128, or 128 points per polynomial. The
average and maximum errors are mentioned in each graph, as well as the number of ROM bits used to store
the actual solution. This is further discussed in the analysis section.
• The TaylorLeft approach approximates the sine with a Taylor polynomial around sin(xC ) in the range
[xC , xC + 1). This gives good approximation for small xF , but bad when xF is close to 1. In the analysis
section there will be shown that this method is the worst method in all categories.
• The TaylorMid approach works like the TaylorLeft, but approximates the sine around xC + 0.5, which
increases the quality significantly compared to TaylorLeft.
• The Chebyshev approach equals the sine in K equality points distributed according to the Chebyshev
zeros, to minimize the maximum error. The Chebyshev zeros in the range (-1, 1) are the solution to
π
·{1, 3, . . ., 2K − 1}). Because 0 ≤ x < 1, rather than -1 to
0 = cos(K· arccos(x)), which gives x = cos( 2K
1, the Chebyshev points are just rescaled from (-1, 1) to (0, 1).
An exception is made for K = 1 (when the sine is nothing but a constant within each range), which should
have given a Chebyshev point in xF = 0.5, but instead takes the mean of the max and min sine values in
the range. This way the approximation still minimize the maximum error for K=1.
• The Interpolation approach works like the Chebyshev, but the equality points are “stretched out” so the
left- and rightmost points are placed in xF = 0 and xF = 1 − 2−F to ensure the result will be continuous
between the integer ranges. When K = 1 the xF = 0.5 point is used as equality point.
• The LeastSquare approach is a least square polynomial approximation, that minimizes the sum of the
squares of the errors. When the Chebyshev approach tries to minimizes the maximum error einf , this
one minimizes the RMS error8 e2.
In figure 4.3 the different methods are illustrated. Some clarification about the graphs: The three plots
for each method differs in the K value only, which is 1, 2 and 3, from left to right. The thin line is the real
sine. The upper thick line of dots is the approximated sine, and the lower thick line of dots is the error = the
approximated minus the real sine.
The two Taylor methods are mostly mathematical approximations that are “good” when xF is “close to”
0 resp. 0.5. xF is however uniformly distributed over [0, 1), why those methods (especially TaylorLeft) are
supposed to be worse in all quality measurements than the other three, who are better optimized for the entire
range [0, 1).
For these five methods there are however a security scaling, that is not plotted in figure 4.3. This scales
down the coefficients if the sine exceeds 1 in any point, which saves the final hardware implementation the need
of an overflow detection.
Some comments about different variants:
• The pure ROM/LUT solution is any of the methods with K = 1 and F = 0.
• The truncation solution is the TaylorLeft with K = 1 and F > 0 (see fig 4.3a, left graph).
• The linear interpolation is the Interpolation with K = 2 and F > 0 (see fig 4.3d, middle graph).
8 root-mean-square
error = square root of the mean of the |errors|2
4.4 Algorithm Verification
23
Polynomial TaylorLeft method, N=10, W=16, C=1
K=2 => rom=60 bits
K=3 => rom=86 bits
K=1 => rom=30 bits
einf=0.259, e2=0.0832
einf=0.0765, e2=0.024
einf=0.702, e2=0.334
1.4
1.4
1
0.8
Polynomial TaylorMid method, N=10, W=16, C=1
K=2 => rom=60 bits
K=3 => rom=88 bits
K=1 => rom=30 bits
einf=0.0719, e2=0.0243
einf=0.015, e2=0.00527
einf=0.382, e2=0.16
1.4
1.2
1
1.2
1.2
0.8
1.2
1
1
1
0.6
1
0.8
0.2
0.8
0.8
0.4
0.8
0.6
0
0.6
0.6
0.2
0.6
0.4
0.4
0.4
0
0.4
0.2
0.2
0.2
−0.2
0.2
0
0.6
0.4
−0.2
−0.4
−0.6
−0.8
0
0.5
1
1.5
0
0
0.5
1
0
1.5
0
0.5
1
−0.4
1.5
0
0.5
1
1.5
(a) TaylorLeft
0
0.5
1
1.5
−0.2
1
1
1
0.8
1
1
0.6
0.8
0.8
0.6
0.8
0.8
0.4
0.6
0.6
0.4
0.6
0.6
0.2
0.4
0.4
0.2
0.4
0.4
0
0.2
0.2
0
0.2
0.2
−0.2
0
0
−0.2
0
0
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
−0.4
1.5
0
0.5
1
1.5
(c) Chebyshev
0
0.5
1
1
0.6
0.8
0.8
0.4
0.6
0.6
0.2
0.4
0.4
0
0.2
0.2
−0.2
0
0
0
0.5
1
1.5
(d) Interpolation
0.8
−0.4
−0.2
Polynomial LeastSquare method, N=10, W=16, C=1
K=2 => rom=60 bits
K=3 => rom=88 bits
K=1 => rom=30 bits
einf=0.0471, e2=0.0161
einf=0.00397, e2=0.00113
einf=0.37, e2=0.159
1.2
1.2
1
0.5
1
1.5
Polynomial Interpolete method, N=10, W=16, C=1
K=2 => rom=60 bits
K=3 => rom=88 bits
K=1 => rom=30 bits
einf=0.0693, e2=0.0385
einf=0.00355, e2=0.0019
einf=0.38, e2=0.16
1.2
1.2
0.8
−0.4
0
(b) TaylorMid
Polynomial Chebyshev method, N=10, W=16, C=1
K=2 => rom=60 bits
K=3 => rom=88 bits
K=1 => rom=30 bits
einf=0.0494, e2=0.025
einf=0.00293, e2=0.0015
einf=0.353, e2=0.162
1.2
1.2
1
0
1
1.5
−0.2
0
0.5
1
1.5
−0.2
0
0.5
1
1.5
(e) Least Square
Figure 4.3. Illustrations of the different polynomial coefficients methods
−0.2
0
0.5
1
1.5
24
4.4.2
Modeling
Other Decomposition Solutions
After polynomial solution there are three mentioned methods left from the decomposition classification: Hutchinson’s, Sunderland’s and Curticăpean’s. The Hutchinson’s approach can be modified by changing the value in
on of the ROMs, which gives a fourth approach.
Hutchinson’s approach: sin(x) ≈ sin(xC ) + sin(xF ) · cos(xC ), used without multiplication, since sin(xF ) ·
cos(xC ) is precalculated and stored in a ROM (same height but thinner than storing entire sin(x)). This
approximation requires that 2C > W + 1 to “hide” the truncation errors in the rounding noise. See figure 2.6
(pg 9) for an illustration.
The alternative to Hutchinson’s approach, in this project called “Hutchinson’s 2”, is to store sin(x) −
sin(xC ) rather than sin(xF ) · cos(xC ), which will give no truncation error at all, and in general no higher
resource usage then Hutchinson’s original approach, why the original will be discarded. This approach will be
further investigated under the subject “Sine compression”.
Sunderland’s approach: sin(x) ≈ sin(xC +xD )+cos(xC )·sin(xF ). Store sin(xC +xD ) and cos(xC )·sin(xF )
in one ROM each, and add together. This reduces the size of the ROM quite markedly when comparing with
Hutchinson’s. The approximation requires that 2C + D > W to hide the truncation errors. Illustrated in figure
2.7.
Curticăpean’s approach: sin(x) ≈ sin(xC ) + sin(xF ) · cos(xC ), used with multiplication, since sin(xF )
and cos(xC ) are stored in one ROM each. This method gives the same truncation but slightly more rounding
noise than Hutchinson’s, but on the other hand reduces the ROM size markedly. This also needs 2C > W + 1
to hide the truncation error. Illustrated in figure 2.8.
However; if 3C > W + 2 then sin(xF ) = xF (but scaled due to the adjusted phase unit), why this method
turns to polynomial. Therefore, assume 3C ≤ W + 2 and 2C ≤ W + 1 in the analysis.
4.4.3
CORDIC
The CORDIC algorithm is a classical way to calculate the sine.
Advantage: Uses no multiplication. Only as many ROM words are needed as there are bits in the output.
It is quadrature (generates both sine and cosine).
Disadvantage: Requires very much logic/low clock frequency, very low throughput, or very high latency,
depending on how it is pipelined. Of course you can have a trade off between those.
Due to time limitation and low priority, this method will not be implemented.
4.4.4
Sine Compression
This method is very general. It can be combined with any of the methods mentioned earlier. It stores all
truncation errors (but negated), and those are added to the approximated signal in the PSAC. This gives the
effect that all truncation errors disappear, and only rounding noise is left. The cost is a large ROM with high
= 2N −2 rows, and as wide as it takes to store the truncation errors. Therefore there is only a need to look at
the maximum error factor (einf ) at the used method. For the polynomial case this means that the Chebyshev
method will be used in most cases. Experiments show that Chebyshev and interpolation polynomials and the
Sunderland’s approach are the best methods.
The choice of approximation method and its parameters is called the configuration of the sine compression
in this thesis.
Table 4.1 shows the resources needed for some configurations. The last field of the configuration is the
method code (see table 4.3). The columns ROM (appr), ROM (corr) and ROM (tot) are the ROM sizes used by
the approximating method, the Correction ROM, and the sum of them respectively. The column mults shows
the number of multipliers that is used.
When letting the sine compression function choose a configuration from given N and W only (without
specifying the rest of the configuration), it minimizes the number of ROM bits used for methods using zero
to three multipliers. See table 4.2 for some examples. Polynomial configurations will normally use two ROMS
while Sunderland’s will use three.
See the Appendix B for more tables showing resource usages.
4.4.5
Method Codes
The final product will use a number of codes for the methods and their sub groups, as shown in table 4.3.
For instance cpi is a polynomial method with the interpolation coefficients approach, and a correction ROM
on that.
4.5 Truncation Noise Analysis
25
Configuration
N=10, W=11, F=1, K=1, pls
N=10, W=11, F=2, K=1, pls
N=10, W=11, F=4, K=1, pls
N=10, W=11, F=2, K=2, pls
N=10, W=11, F=4, K=2, pls
N=10, W=11, F=2, K=3, pls
N=10, W=11, F=4, K=3, pls
N= 9, W=10, F=4, K=2, pc
N= 9, W=12, F=4, K=2, pc
N=11, W=10, F=4, K=2, pc
N=11, W=12, F=4, K=2, pc
N=11, W=10, F=6, K=2, pc
N=11, W=12, F=6, K=2, pc
N=10, W=11, F=3, C=4, s
N=10, W=11, F=3, C=3, s
N=10, W=11, F=4, C=3, s
ROM (appr)
ROM (corr)
ROM (tot)
mults
1.25 kbits
640 bits
160 bits
960 bits
272 bits
960 bits
320 bits
128 bits
160 bits
448 bits
576 bits
128 bits
160 bits
1.06 kbits
704 bits
1.03 kbits
1 kbits
1.25 kbits
1.75 kbits
512 bits
768 bits
768 bits
768 bits
384 bits
512 bits
1.5 kbits
1 kbits
1.5 kbits
2 kbits
1 kbits
1.25 kbits
1.25 kbits
2.25 kbits
1.88 kbits
1.91 kbits
1.44 kbits
1.02 kbits
1.69 kbits
1.06 kbits
512 bits
672 bits
1.94 kbits
1.56 kbits
1.62 kbits
2.16 kbits
2.06 kbits
1.94 kbits
2.28 kbits
0
0
0
1
1
1
2
1
1
1
1
1
1
0
0
0
Table 4.1. Some examples of sine_compression resources
N
10
W
10
10
16
10
20
16
10
16
16
16
20
20
10
20
16
20
20
0 mults
F=2, C=3, s
ROMs = 1.44 kbits
F=3, K=1, pc
ROMs = 3.22 kbits
F=4, K=1, pc
ROMs = 4.3 kbits
F=6, C=4, s
ROMs = 36.2 kbits
F=4, C=5, s
ROMs = 66 kbits
F=3, C=8, s
ROMs = 104 kbits
F=9, K=1, pc
ROMs = 517 kbits
F=6, C=6, s
ROMs = 588 kbits
F=5, C=9, s
ROMs = 776 kbits
1 mults
F=4, K=2, pc
ROMs = 752 bits
F=3, K=2, pc
ROMs = 1.81 kbits
F=4, K=2, pc
ROMs = 3.05 kbits
F=10, K=2, pc
ROMs = 32.2 kbits
F=5, K=2, pc
ROMs = 43 kbits
F=6, K=2, pc
ROMs = 55.8 kbits
F=14, K=2, pc
ROMs = 512 kbits
F=9, K=2, pc
ROMs = 523 kbits
F=7, K=2, pc
ROMs = 568 kbits
2 mults
F=6, K=3, pc
ROMs = 604 bits
F=5, K=3, pc
ROMs = 1.05 kbits
F=4, K=3, pc
ROMs = 1.48 kbits
F=12, K=3,pls
ROMs = 32.1 kbits
F=11, K=3, pc
ROMs = 48.3 kbits
F=9, K=3, pc
ROMs = 49.4 kbits
F=16, K=3, pls
ROMs = 512 kbits
F=15, K=3, pc
ROMs = 768 kbits
F=13, K=3, pc
ROMs = 769 kbits
3 mults
F=7, K=4, pc
ROMs = 574 bits
F=5, K=4, pc
ROMs = 864 bits
F=5, K=4, pls
ROMs = 992 bits
F=14, K=4, pc
ROMs = 48 kbits
F=12, K=4, pc
ROMs = 48.2 kbits
F=11, K=4, pc
ROMs = 48.5 kbits
F=18, K=4, pc
ROMs = 768 kbits
F=16, K=4, pc
ROMs = 768 kbits
F=15, K=4, pc
ROMs = 768 kbits
Table 4.2. Sine compression optimized for some N and W
Code Method
VHDL
p
Any kind of polynomial
ptl
Polynomial with Taylor Left coefficients
ptm
Polynomial with Taylor Mid coefficients
pc
Polynomial with Chebyshev coefficients
X
pi
Polynomial with Interpolation coefficients
X
pls
Polynomial with Least Square coefficients
X
s
Sunderland
X
c#
Sine Compression. # = any of the methods above X
Note: The methods marked with a “X” in the column “VHDL” are
implemented as VHDL generators.
Table 4.3. The method codes for modelled methods
All methods mentioned in chapter 2, Methods, are not listed here. Only those that has been modelled are
listed.
4.5
Truncation Noise Analysis
Most of the truncation (algorithmic) errors will appear as an overtone to the base frequency, due to the systematic pattern in the error – if a point is too low then its neighbors are probably also too low, which typically gives
26
Modeling
a harmonic. However, errors that occur in only one or perhaps two neighboring phases will not be identified as
a real overtone, because it will occur only once (or twice) every 2N times, why those will be thought of as white
noise.
4.5.1
Polynomial
Figure 4.4 illustrates the quality measurements when modeling the methods for C = 4 and 10, and with rather
big F and W to filter out all but the truncation effects. This gives a huge error for small K, and a big
improvement when K increases. Note the rather strange scales in the figures, especially the error graphs.
Some test where different C, F and K has been tested has resulted in the conclusion as follows (See appendix
B for tables with results).
One limitation in all coefficient assignments is the double floating point precision used by Matlab, causing
an upper limit of about 50 bits. This affects the LeastSquare method most and is clearly visible in figure 4.4(b),
the ’⊲’ marker where K = 4.
• The TaylorLeft approach (’+’ marker in the figures) gives of course very bad qualities. According to the
tests the TaylorLeft has the worst qualities of the five methods. In figure 4.4 this is clearly visible, low
ENOB/SFDR/SINAD and high error. This method is therefore discarded.
• The TaylorMid approach (’×’ marker in the figures) are (as expected) better than TaylorLeft, but not
as good as the three later methods (at least for K > 2). This method is also discarded.
• The Chebyshev approach (’⊳’ marker in the figures) have a rather low maximum error (continuous line
to the right in fig 4.4a and b). The Chebyshev approach to minimize the max error works as best when
C and/or K is big.
• The Interpolation approach (’⋄’ marker in the figures) works like the Chebyshev, but distributes the
equality points different.
• The LeastSquare approach (’⊲’ marker in the figures) minimizes the e2 factor.
Some examples of frequency spectrum’s derived from truncation errors are shown in figure 4.5. The graphs
shows the configuration ’method=pi, C=4’, for N = 14, W is set to “big” to remove the rounding errors. K is
set to 1, 2 and 4 in the graphs. FCW is set to 1489 in this case. All three plots have frequency components
around -350 dB and below, but those are derived from rounding errors rather than truncation errors, and have
therefore been removed in the graphs. The main noise frequency is harmonics number {1, 2, 3, ...} · 22+C ± 1 =
{63, 65, 127, 129, 191, 193, ...} to the main tone, with decreasing amplitude.
Figure 4.6 show the SFDR, SINAD and ROM result for different C values, N = 16, W = 20, K = {1, 2, 3},
and Chebyshev coefficients. You can clearly see the W effects on the SFDR. The upper graph to the right shows
number of kBits ROM used, and the lower graph to the right the same thing, but rescaled y-axis.
C=4
SFDR (line)
SINAD (dots)
ENOB
6.5
tl
tm
i
c
ls
6
einf (line)
e2(dots)
1
40
C=10
10
SFDR (line)
SINAD (dots)
ENOB
12.5
tl
tm
i
c
ls
38
12
10
74
36
72
11⋅K
10
Truncation error ⋅ 2
32
5
0
11.5
70
dB/K
dB/K
bits/K
34
10
bits/K
Truncation error ⋅ 2
5⋅K
0
5.5
68
11
−1
10
−1
10
30
66
4.5
10.5
28
4
einf (line)
e2(dots)
1
76
1
2
3
K
4
26
64
−2
1
2
3
K
4
10
1
2
3
4
K
(a) C=4
10
1
2
3
4
62
−2
1
2
K
3
K
(b) C=10
Figure 4.4. Example of different polynomial qualities
See Appendix B, Analysis, for tables with values.
4
10
1
2
3
K
4
4.5 Truncation Noise Analysis
27
method=pi, N=14, C=4, K=1
0
method=pi, N=14, C=4, K=2
method=pi, N=14, C=4, K=4
0
N
0
f = FCW = 1489≈2 /11
−20
harmonic
127
−30
−60
harmonic
129
harmonic
191
−40
−50
−40
harmonic
65
−100
−80
dBc
dBc
−20
harmonic
63
dBc
−10
−100
−150
−120
−50
−200
−140
−60
−160
−70
−80
−250
−180
0
2000
4000
6000
Freq
8000
10000
−200
0
2000
4000
6000
Freq
8000
−300
10000
0
2000
4000
6000
Freq
8000
10000
Figure 4.5. Examples of truncation derived frequency spectrum’s for polynomial interpolation, where K=1, 2 and 4
SFDR and SINAD for Chebyshev polynomials. N=16, W=20
150
Total ROM kBits used.
120
K=1
K=2
K=3
100
kBits
80
60
40
100
20
0
2
4
6
8
C
first 14 kBit ROM
10
12
0
2
4
10
12
dB
0
14
12
50
kBits
10
K = 1: SFDR
K = 1:SINAD
K = 2: SFDR
K = 2:SINAD
K = 3: SFDR
K = 3:SINAD
0
8
6
4
2
0
0
2
4
6
C
8
10
12
6
C
8
Figure 4.6. SFDR/SINAD and ROM usage for Chebyshev polynomials with different C
4.5.2
Other Decomposition Solutions
The four solutions Hutchinson’s, Hutchinson’s 2, Sunderland’s and Curticăpean’s differs in results.
• The Hutchinson’s has no advantages at all over Hutchinson’s 2, and is therefore discarded.
• The Hutchinson’s 2 is in fact a sine compression method, because Hutchinson’s first ROM is the TaylorLeft polynomial with K=1, and the second ROM is a correction to that. The analysis of the sine
compression shows that the Taylor Left is not efficient as approximation method, why Hutchinson’s 2 are
also discarded.
• The Curticăpean’s is a TaylorLeft polynomial if 3C ≥ W + 2, why this analysis will assume that
3C ≤ W + 1 (then the truncation errors are not hidden in the rounding noise). Within this range the
method may be is slightly better than the TaylorLeft polynomial. However, according to equations 4.6
and 4.7 (where dy denotes the errors), the TaylorMid should always have around 4 times better (smaller)
maximum truncation error than Curticăpean.
28
Modeling
Curticăpean:
y1 = sin(xC ) + cos(xC ) · sin(xF ).
|dy1 | = sin(xC ) · (1 − cos(xF )) ≈ sin(xC ) ·
2
|dy1 | .
x2F
x2
π/2
≤ F ; 0 ≤ xF < C
2
2
2
(4.6)
π
8 · 22·C
Polynomial/TaylorMid:
y2 = sin(xC ) + xF · cos(xC )
π/2
x2
π/2
sin(xC )
≤ xF <
≤ F ;−
C
2
2
2·2
2 · 2C
2
2
π /4
π
|dy2 | . 0.5 2C =
2
32 · 22C
|dy2 | ≈ x2F
(4.7)
A similar calculation can be done to show the factor 4 for average truncation error. The ENOB, SINAD
and SFDR have their sources in the errors, why they should follow and be better for the TaylorMid.
Therefore Curticăpean’s method is discarded.
• The Sunderland’s method is quite good for not using any multiplication. Due to the extra parameter
(D = N − 2 − D − F ) the method is flexible, and if F and C are set right there is a good balance between
the number of ROM bits and the quality.
In figure 4.7 the SFDR, SINAD and ROM usage are plotted when N = 16, W = 27 and C is swept from
0 to 12. Three different alternatives for D and F are shown. If D = 0 the Sunderland turns out to be the
discarded Hutchinson’s approach, why D = 1 is one of the alternatives. If F = 0, we get a pure ROM
,
solution, why F = 1 is another alternative. The third alternative is the “middle”, where D = F = N −2−C
2
or D = F + 1 when N − C − 2 is odd. W = 27 is the least output width that will not affect the SINAD. A
lower W will save a lot of memory, but reduce the signal quality in cases where C is close to N-2. Notable
is how the D = F alternative saves ROM compared to the other alternatives.
In figure 4.8 some examples of frequency spectrum derived from truncation errors are shown. The graphs
shows the configuration N = 10, method=Sunderland’s, the D and F are set according to the ROM saving
alternative in figure 4.7 (D = F ). Just like figure 4.5, W is set to “big” to separate and then remove the
rounding errors.
4.5.3
Sine Compression
According to statistical testings, the mostly used methods are Chebyshev and Sunderland’s, and a few times
also Least Square.
See Appendix B, Analysis, for tables showing more results from Sine Compression.
4.6
Conclusion
The following methods are generated:
1. Polynomial, K=1..4
(a) Interpolation coefficients.
(b) Least Square coefficients.
(c) Chebyshev coefficients.
2. Sunderland’s method.
3. Sine compression – this will however be implemented as an add-on to the other methods.
The rest of the methods are discarded.
4.6 Conclusion
29
SFDR and SINAD. N=16, W=27
Total ROM bits used. N=16, W=27
180
350
D=1
D=F
F=1
160
300
140
250
120
200
dB
kbit
100
80
150
60
D = 1: SFDR
D = 1:SINAD
D = F: SFDR
D = F:SINAD
F = 1: SFDR
F = 1:SINAD
40
20
0
0
2
4
6
C
8
10
100
50
0
12
0
2
4
6
C
8
10
12
Figure 4.7. SFDR/SINAD and ROM usage for Sunderland’s with different C, D and F distributions
N=10, method=s, C=2, D=F=3
N=10, method=s, C=4, D=F=2
0
N=10, method=s, C=6, D=F=1
0
−20
f = FCW = 93 ≈ 2N / 11
−20
0
−20
f = FCW = 93
−40
f = FCW = 93
−40
−40
−60
−60
−80
dBc
dBc
dBc
−60
−80
−100
−80
−100
−120
−100
−120
−140
−120
−140
−140
−160
0
200
400
Freq
600
−180
0
200
400
Freq
600
−160
0
200
400
600
Freq
Figure 4.8. Example of truncation derived frequency spectrum for Sunderland’s, where C=2, 4 and 6
Chapter 5
Implementations
This chapter will describe the structure of the VHDL modules that are generated. It will also briefly describe
the matlab files that generate them.
5.1
ROM
An essential part of the PSAC implementations is the ROMs. In the VHDL solutions those are implemented
as arrays of constants, leaving to the compiler to select which ROM(s) to use.
5.1.1
The Function create_rom
To simplify the generation of ROMs to the VHDL, there is a matlab file for this, called create_rom.
Here are some key properties of create_rom:
• It generates VHDL code for a ROM.
• The result can be put to screen, to an own file or to an existing file pointer.
• The module can split output into several fields (used by the polynomial).
• The module can have synchronous or asynchronous output.
• The module can use std_logic_vector or unsigned as interface vector types.
• It can insert attributes into the entity.
5.1.2
The VHDL Implementation
The main structure of the VHDL file is a bit matrix with ROM data, where the rows are indexed by the address.
The result is optionally synchronized in a process, and if many fields are used, the data is also split into those.
Timing Problem
One possible problem with this implementation is that many of the ROMs in the FPGAs have synchronous
address inputs. This gives a rather small tSU 1 , which is good, but it also gives a long tCO 2 , which may lower
the maximum clock frequency.
In general, the ROMs will be placed in dedicated ROM blocks, but if they are implemented as LUTs (typically
when there are just a few address bits), then the synchronization will be placed at the output rather than the
input. This way there will be a bigger tSU , and a very low tCO .
1t
SU
2t
CO
= SetUp time, minimum required time with “stable inputs before clock flank”
= Guaranteed maximum time from “clock flank to stable output”
31
32
Implementations
5.2
SURD Implementation
The SURD (Symmetry Using Range Divider) reduces the input range from [0, 360◦ ) to [0, 90◦ ). In the implementation this is done in two steps: PreSURD and PostSURD. The PreSURD modifies the x input, so the
resulting range will be [0, 90◦ ) or [180, 270◦) (first and third quadrant). These ranges are treated exactly equally,
and the psac algorithm calculates the first quadrant. The PostSURD handles the difference between [0, 90◦ )
and [180, 270◦).
Figure 5.1. An RTL schematic of SURD
Figure 5.1 illustrate the SURD function. The “MSB2” signal in the schematic is the second most significant
bit of xN . The mux (1) inverts the phase (if necessary) and the mux (3) inverts the result if necessary. Compare
with figure 2.2 on page 6. The DFFs (2) delays the signal a number of clock cycles – 1 for Sunderland’s and K
for polynomial – compare to the RTL schematic examples for each method.
The PreSURD will increase the PSACs tSU with roughly the time for one LUT operation. This may be a
problem if the ROMs or other connected components have big tSU or if the setup time is critical.
The PostSURD will increase the tCO with roughly the time for one W − 1 bits addition. This may be a
problem for Sunderland’s, that already have a big tCO . Polynomials have registered outputs (if no correction
ROM is used) and are therefor less critical.
5.3
Polynomials
The polynomial solution uses - as mentioned earlier - Horner’s scheme for calculations, so that yi = ((di xF +
ci )xF +bi )xF +ai when K = 4 and i = xC , for the calculations. This will minimize the number of multiplications
used.
Due to trigonometric effects, ai and bi are ≥ 0, while ci and di are ≤ 0. The VHDL implementation uses
however the unsigned data type in the calculations. To solve this the c and d coefficients are negated (if K ≥ 3),
resulting in the following formula:
yi = ((di xF + ci )xF · (−1) + bi )xF + ai
= ai + xF (bi − xF (ci + xF di ))
(5.1)
The first line shows the implemented order of calculation, the second line shows a simplified variant.
If K = 1 then coefficient ai is used and the rest are zero. If K = 2 then bi is also used, and so on.
5.3.1
The Function psac_polynomial_rom
The coefficients ai , bi and so on are created in the function psac_polynomial_rom, according to the different
polynomial configurations - the main parameters are N , W , F , K and method. This function is used by the
psac_polynomial and create_polynomial functions.
Beside the coefficient vector, this function calculates how many bits wide the coefficient fields should be, how
big the partial sums are, and also detect and correct possible overflows (where the result will not fit into the
W bits). In order to calculate this, the resulting sine approximation must be calculated for the first quadrant.
That sine is returned as a byproduct.
5.3.2
The Function psac_polynomial
The modelling of the polynomial is done in the function psac_polynomial, which allows the user to select
the different configurations. Due to the sine approximation byproduct from the psac_polynomial_rom, this
method does nothing but calculate the SURD.
This function returns, in addition to the approximated sine vector, a list of the sizes of the multipliers and
ROMs that are used.
5.4 Sunderland’s
5.3.3
33
The Function create_polynomial
The VHDL implementation is created in the function create_polynomial. This takes for instance a psac
configuration and some output settings arguments, and write the VHDL code.
One feature is to choose whether to express small multiplications as multiplications or as shift-add operations.
If the later is chosen, the multiplication operator will be overridden/redefined so it compares the arguments size
with a given limit, and either use the built in multiplier or the shift-add structure. This feature is usually not
necessary, because the compilers may do this themselves when needed.
The create_polynomial function can also add a correction ROM to the solution, to implement the sine
compression.
The ROM(s) that are used are put as a private module in the same VHDL file as the PSAC.
This function returns the expected result and which latency that was used (with high C and low W, the
need of high grade is reduced, and extra grades only results in coefficients = 0).
5.3.4
The VHDL Implementation
The solution calculates one sample per clock cycle, and is pipelined so each multiplication has one pipeline stage
each, according to figure 5.2, which illustrates the solution when K = 3. The ROM is synchronous, why that
will add another pipeline stage. This way, the polynomial solution has a latency of K clock cycles. Because
all coefficients are stored on the same “rows” in the ROM, all those are fetched at the same time. The ROM
values must be shifted through the pipeline stages until they are used.
In the case a correction ROM is used, the correction is added at the end, just before the PostSURD operation,
without using any extra pipeline stage (gray path in figure 5.2).
Figure 5.2. Example of an RTL schematic for a polynomial where K=3
Timings
As mentioned in the ROM section above, the ROMs will in most cases have synchronous inputs, and therefore
not synchronous outputs (as plotted in fig. 5.2). This will be a big drawback for the polynomial timings,
because the data is asynchronously fed to the multipliers that delay it even more. Finally the multiplier output
is added to another coefficient before it is synchronized again. In total this gives very long delays, which will
result in a big limitation of the maximum clock frequency.
Because all PSACs inputs are fed directly to the input to registers (except the PreSURD and the possibly
non-asynchronous input to the ROMs), the tSU is not more critical than discussed earlier.
The tCO is dependent on PostSURD and on an eventual correction addition, and should not be worse than
one addition.
5.4
Sunderland’s
Sunderland’s solution is very simple from an implementation point of view. It is easy to calculate the ROM
contents, as long as you use the original Sunderland method. Nicholas et. al.[4] suggested other and harder ways
of compute them, that gives better signal quality. The ROM contents are all positive, which is very convenient
in the VHDL implementation.
As mentioned in earlier chapters, the Sunderland method splits the input into three parts (in addition to
the initial xQ for the SURD), that is xC , xD and xF with widths C, D and F respectively. In difference to the
polynomial solution, where xC was the integer part and xF the fraction part of the angle, all three input parts
are treated as integers in Sunderland.
34
Implementations
Two more variables are introduced; xCD and xCF , which is the bit field xC concatenated with xD and xF
respectively. These two variables are used to index the ROMs, to get the values resCD and resCF , which are
added together.
5.4.1
The Function psac_sunderland_rom
The values resCD and resCF are created in the function psac_sunderland_rom, according to the parameters
N , W , C and F (the D field is calculated from N, C and F). This method is used by the psac_sunderland and
create_sunderland functions.
Beside the coefficient vector, this function calculates how many bits wide the ROMs should be, and also
detects and corrects possible overflows (where the result will not fit into the W resulting bits). In order to
calculate this, the resulting sine approximation must be calculated for the first quadrant. Therefore this result
is returned as a byproduct.
5.4.2
The Function psac_sunderland
The modelling of the Sunderland method is done in the function psac_sunderland, which allows the user to
select the different configurations. Due to the sine approximation byproduct from the psac_sunderland_rom,
this method does nothing but calculate the SURD.
This function returns, in addition to the approximated signal, a list of the ROMs sizes that are used.
5.4.3
The Function create_sunderland
The VHDL implementation is created in the function create_sunderland. This takes for instance a psac
configuration and some output settings arguments, and writes the VHDL code.
The create_sunderland function can also add a correction ROM to the solution, to implement the sine
compression.
The ROMs that are used are put as private modules in the same VHDL file as the PSAC.
5.4.4
The VHDL Implementation
The Sunderland’s has only one clock cycle latency (the registers are located in the synchronous ROMs). In
difference to the polynomial implementation, this method uses two ROMs for the approximation, named romCD
and romCF. If there is a correction ROM that is used in the same way. The ROM values are added together and
fed to the postSURD part.
Figure 5.3 shows the Sunderland schematic, and just like in the polynomial figure, the ROM has registered
address inputs. A correction ROM is added in gray in the figure.
Figure 5.3. Sunderland RTL schematic
Timings
Because there are only one pipeline stage in Sunderland, there are no register-to-register path, and thus no strict
upper bound on the frequency. The inputs from the PreSURD is fed to the ROMs (which will most probably
be ROM blocks rather than LUTs), why the tSU is rather short.
The output from the ROMs are delayed by the ROMs, then the two ROM outputs are added together, plus
the possible correction value, and the result is finally fed to the PostSURD. This gives a very long tCO , roughly
tCO for the ROMs plus two adder delays.
5.5 Test Bench
5.5
35
Test Bench
In order to test the solutions there is a test bench generator. The main feature is that the testbench creates a
Matlab function file with the test result, or a function file that reads the data from a data file, that is produced
as well. The data output file (within the function file or as its own data file) has no correction for the skew
in the latency. The function file does however adjust for this, and returns the adjusted simulation result. The
function file has also the ability to plot the result and the noise in the signal.
5.5.1
The Function create_testbench
The function create_testbench creates the testbench in the desired way. The user gives the N and W parameters, the latency and the entity name of the psac. The user can also choose output destination (folder/file/file
pointer). Other possible settings are incrementation step (the FCW) and how the matlab function file/data file
should be generated.
5.5.2
The VHDL Solution
The target for the testbench VHDL implementation is a simulator, why there is no problem with timings and
so on.
The built in clock generator in the test bench runs the simulation clock in 500 MHz3 . An automated test
system (see section test_psac below) will run the simulation for 1 (simulated) second, which is far too much
(as long as N ≤ 28). Because of this, the test bench stops the clock generator when it is done.
The VHDL process that handles the phase accumulator (xN ), also reads the result and write the phase and
result to the result file.
5.6
Automatic Generation/Verification
In order to simplify the process, there is a create-simulate-verify-analyze function, that takes a psac configuration
and some other settings, and generates/verifies the psac.
5.6.1
The Function test_psac
This function is named test_psac, and can be seen as the main method for producing the PSACs.
The following actions are typically taken when running test_psac:
1. Generate the psac.
2. Generate the testbench.
3. Generate the simulation do file (it is like an instruction file for the simulator).
4. Create a simulation work Library, if that does not already exist.
5. Start ModelSim, and tell it to execute the do file:
• Compile the psac and testbench.
• Simulate the testbench for one second (this will generate a matlab and a result file).
• Exit ModelSim.
6. Execute the produced matlab file, that reads the result data file, corrects the latency skew and plots the
result.
7. Verify the result by comparing it to the expected values. Add some signals to the plot.
Most of the parts in the list above can be deactivated or controlled from the arguments.
test_psac can also produce a status to a given file pointer, which is suitable when running scripted tests.
test_psac returns a matrix with the tested phases and their results from the simulation. It also returns the
carrier.
The output from test_psac is (may vary if some parts are disabled):
3 The
time in the simulation differs from the real time
36
Implementations
• psac.vhdl, the psac.
• test_psac.vhdl, the testbench.
• run_psac.do, the ModelSim do-file.
• test_psac_res.m, the resulting function file.
• test_psac_res.txt, the test result data file.
• work, the VHDL library folder
Note: The name “psac” in this list can be changed to anything else.
A part from the test_psac documentation describes it’s arguments:
[vector,carrier] = test_psac(N, W, config, ename, path, flag1, flag2, ...)
[vector,carrier] = test_psac(N, config, ename, path, flag1, flag2, ...)
[vector,carrier] = test_psac(config, ename, path, flag1, flag2, ...)
Generate and test VHDL code for a psac polynomial implementation of the PSAC
N = optional phase bit width (0:2^N = 0:360 degree).
W = optional output bit width (signed)
config = psac config and its parameters: ’method=?? F=?? C=?? ...’
* method: any of
* * ’ptl’ - Using Taylor coefficients from the left of the range
* * ’ptm’ - Using Taylor coefficients from the middle of the range
* * ’pi’ - Interpolation
* * ’pc’ - Minimize maximal error with Chebyshev
* * ’pls’ - Least Square Method
* * ’s’
- Sunderlands
* * ’c*’ - Sine compression with *=any of above methods.
* F: number of fine bits = 0..N-2.
* C: number of coarse bits = 0..N-2.
* K: number of coefficients for polynomial solution = grade + 1.
* W: Will overwrite the argument W, if both are given
* N: Will overwrite the argument N, if both are given
ename = psac entity name.
path = optional path where solution will be placed as string.
flags = optional flags as strings:
* ’leavePsac’ - do not generate a new Psac
* ’leaveTB’
- do not generate a new testbench
* ’leavePsacTB’ - neither generate a new Psac nor testbench
* ’leaveSim’ - Do not simulate or produce a do-file
* ’leaveRes’ - Do not try to run the result file
* ’quiet’
- Do not write status
* ’noPlot’
- Do not plot anything
* ’TBinc=<N>’ - increase phase in testbench with <N> rather than 1 each cycle.
* ’logfp=<fp>’- <fp> is a file pointer to a status log file.
* ’logmsg=<msg>’ - <msg> is a message to put into the status log file
* ’minMultW=<N> - all multipliers with LESS THAN <N> bits in one of the arguments
will be replaced by additions. Standard = 1 = disabled
Return values:
* vector(:, 1) = testbenchs input to PSAC.
* vector(:, 2) = output from the PSAC.
* carrier = main tone, all other frequencies ignored.
Chapter 6
Suggester
This thesis has two main tasks. The first task is to generate PSAC implementations according to the users
specification. The second task is to suggest a suitable specification, according to the users preferences and the
selected FPGA. The second task is done in the function psac_suggest.
The function takes some preferences from user, and suggests one or more implementation configurations in
a format that test_psac reads.
test_psac takes as argument:
• N - The N parameter as an integer.
• W - The W parameter as either an integer, a vector of possible values, or a limit/cost.
• Some other properties that describe e.g. required SFDR, device or anything else.
6.1
The Properties
The properties are given as a property name followed by its value as the next argument. The property names
are case insensitive. See table 6.1 for a list of available properties.
Example: “psac_suggest(16, 10, ’sfdr’, 100)” requires a psac with 16 bits phase, (at least) 10 bits
output and SFDR ≥ 100 dB. This will probably result in a solution with W ≥ 13, because it is not possible to
get 100 dB SFDR with 10 bits W (see figure 4.2 on page 21).
Name
Value
’Device’
dev
Units
Default
’SFDR’
lim/w
dBc
0/1
’SINAD’
lim/w
dBc
0/1
’ENOB’
lim/w
#bits
3/1
’e2’
lim/w
LSB
/1
’einf’
lim/w
LSB
/1
’me2’, ’meinf’
’ROMs’
lim/w
MSB
#ROMs
/1
’RomUnit’
’Mults’
x
lim/w
kbits
Mults
1
/1
’Multsize’
bits
18x18
’Lat’
width x
height
lim/w
clock
cycles
/0
’ListSize’
’disp/tot’
0/256
Meaning
Notes
Description
Optimize solution for Xilinx or Alteras FPGA structure.
dev = device family codes, e.g. “cy2”, see tables 3.2 and 3.4.
SF DR ≥ lim
1
Set a limit/weight on the SFDR.
Cost = −w · SF DR
SIN AD ≥ lim
1
Set a limit/weight on the SINAD.
Cost = −w · SIN AD
EN OB ≥ lim
1
Set a limit/weight on the ENOB.
Cost = −w · EN OB
log2 (e2) ≤ lim
1
Set a limit/weight on the RMS-error.
Cost = w · log2 (e2)
log2 (einf ) ≤ lim
1, 2
Set a limit/weight on the max-error.
Cost = w · log2 (einf )
Unit = LSB.
Same as e2 and einf, but errors are measured in MSB.
#ROM s ≤ lim
1
Set a limit/weight on the number of
Cost = w · #ROM s
ROMs to be used.
3
Set the ROM block size to x kbits.
#M ults ≤ lim
1
Set a limit/weight on the mults to be
Cost = w · #M ults.
used. Unit = number of multipliers.
Set the multipliers width and height in
number of bits.
Latency ≤ lim
1
Set a limit/weight on the latency in the
Cost = w · Latency
PSAC.
The size of the list to display/store. The more to store (tot), the
likelier to find real optimum but the longer time it takes.
If disp > 0 the disp best entries in the list is written to the screen.
Table 6.1. Available properties for the psac_suggest function
37
38
Suggester
Notations from the column notes:
1: "lim/w" means an optional limit followed by an optional weight, according to any of these four syntaxes:
{lim, ’lim’, ’/w’, ’lim/w’}, where lim and w are replaced by values.
2: Hint: Set einf = 0.5 to get ’exact’ output quality.
3: The ROM unit size can have one of the three syntaxes: {x, ’x’, ’px’}, where the “p” adds a parity bit per
byte. E.g. ’4’ defines 4096 bits large ROMs, while ’p4’ defines 4096+512 = 4608 bits.
6.2
Cost Model
To decide how good or bad an implementation is, all quality and resource metrics have their own costs. In order
to compare quality and resource factors there is a need for a common cost unit. By natural reasons it should be
a resource measurement unit. A natural choice is the LAB (for Altera) or CLB (for Xilinx), but those are quite
different, and it is better to have one common unit for both vendors. Some more or less common measurement
units are discussed in table 6.2.
Unit
Notes
Prospectives
Drawbacks
LAB/CLB
Renamed to something else,
e.g. “Block”.
Lookup tables.
This is a very common unit.
Differs very much between the FPGAs.
Also a common measurement unit.
FAs
Full adder, or corresponding
logic.
DFFs
Use to occur one per FAs.
All FPGAs have Full adders or corresponding. Easy to decide how many
FAs that are used by the algorithms.
Same as FAs.
There are many different LUT solutions, and sometimes several LUTs in
different size per DFF.
Full adders are not a very common
measurement unit for FPGA resources.
LUTs
Same as FAs.
Table 6.2. Different cost units and their prospectives and drawbacks
Because of the drawbacks with the first two units, they are discarded. The alternatives “FAs” and “DFFs”
are rather equal, and both represent a cell with one or more LUTs, a FA (or corresponding) and a DFF. This
structure is found in all FPGAs, and very convenient. The name FA is used as a unit.
6.3
Algorithm
The method in psac_suggest is a kind of approximated brute force. There exists an approximation function,
estimator, that quickly gives an estimation to the different qualities and resources for a given configuration.
This function is called for all possible configurations with the required N . The exact steps are as follows:
1. Run a sine compression estimation for different W , to find the smallest possible W that can fulfill the
signal quality requirement.
2. Run an estimation for each possible configuration with the required N and the found W (also run for
some few bits bigger W s). Store the 256 best estimations in a list (this number can be changed by the
’ListSize’ property).
3. Run a real calculation for each configuration in the saved list.
4. Print the disp best solutions, if disp was set > 0 by the ’ListSize’ property.
5. Return the best configuration as a configuration string (in the format that test_psac want it).
This method is far from optimized, due to time limitations, and it is rather slow for the default list size
256. The algorithm is very sensitive to big N . The estimations part grows quadratic with N , due to the two
parameters in the Sunderland method. The real calculation part grows exponentially, due to the number of
elements in the result.
Chapter 7
Result
This chapter will discuss the result of the used algorithms, by comparing the SFDR with different hardware
costs. The target is an unspecified FPGA with 1 kBits big ROMs and 18x18 unsigned multipliers.
Figure 7.1 shows some relations between SFDR and ROMs (1 kBit without parity). In those graphs the
psac_suggest function has been run with different requirements on least SFDR, and tried to minimize the
ROM usage. Used methods are Sunderland’s and polynomials without correction ROM. N is set to 10 in (a),
and 16 in (b) and (c). W and other parameters are optimized by psac_suggest. (c) is the same as (b), but
showing only the first 20 ROMs. The dotted lines are Sunderland’s. The markers ’×’, ’∗’, ’+’ and ’◦’ stands
for K=1, 2, 3 and 4 respectively. Where the polynomials are missing for a certain K, one or more coefficients
in the polynomials has been zero, and the grade has been decreased.
Optimizing when N=10
8
6
5
4
3
All ROMs less than 400 bits
are assumed to be
implemented in LUTs.
2
1
0
20
400
300
200
100
0
40
60
80
100
120
SFDR (dB)
140
Sunderland
Polynomial, K=1
Polynomial, K=2
Polynomial, K=3
Polynomial, K=4
500
number of ROMs (1 kBit)
number of ROMs (1 kBit)
7
Optimizing when N=16
600
Due to the low N, the easiest way to get a high
SFDR is to do a pure ROM table with 256 rows.
Sunderland will do this by setting C=F=0.
Polynomial will do this by setting C=0.
Sunderland
Polynomial, K=1
Polynomial, K=2
Polynomial, K=3
Polynomial, K=4
160
180
200
220
0
50
100
150
200
250
SFDR (dB)
(a) N=10
(b) N=16
Optimizing when N=16
number of ROMs (1 kBit)
20
Sunderland
Polynomial, K=1
Polynomial, K=2
Polynomial, K=3
Polynomial, K=4
15
10
5
0
All ROMs less than 400 bits
are assumed to be
implemented in LUTs.
0
50
100
150
200
250
SFDR (dB)
(c) N=16, showing 0-20 ROMs only
Figure 7.1. SFDR vs ROMs without correction ROM
Figure 7.2 shows the same thing as figure 7.1(a) and (b), but where the methods are extended with a
correction ROM, resulting in sine compression methods.
The effect of the C parameter to the quality are discussed in section 4.5 on page 25, and illustrated in figure
4.6 (pg 27) and figure 4.7 (pg 29).
39
40
Result
Optimizing when N=10. Including a correction rom
8
Sunderland
Polynomial, K=1
Polynomial, K=2
Polynomial, K=3
Polynomial, K=4
Number of ROMs (1 kBit)
7
6
5
4
3
2
1
40
60
80
100
120
140
SFDR (dB)
160
180
200
220
(a) N=10
Optimizing when N=16. Including correction rom
400
Sunderland
Polynomial, K=1
Polynomial, K=2
Polynomial, K=3
Polynomial, K=4
Number of ROMs (1 kBit)
350
300
250
200
150
100
50
0
0
50
100
150
200
SFDR (dB)
(b) N=16
Figure 7.2. SFDR vs ROMs for sine compression
250
Chapter 8
Conclusions And Possible
Improvements
There are few conclusions and many possible improvements left after this thesis.
8.1
Conclusions
The project has been running in a number of phases. Different conclusions has been made during the different
phases.
• During the method phase the main conclusion was that many people have invented many different
algorithms, and that many of them can be expressed as different special cases of the polynomial implementation.
• During the target phase, the main task was to find out what resources are available for the different
implementations. One non-surprising conclusion was that FPGAs are well suited to implement a digital
DDFS.
• In the modelling phase, some conclusions about different algorithms where made.
AWGN: May be an interesting rounding method, and rather easy to implement, but doubles the number
of possible configurations, which add complexity to the program.
Polynomial: The polynomial solution is an efficient method that offers a spectrum of possible configurations.
Taylor: The TaylorLeft and TaylorMid coefficients assignments to the polynomial are not very efficient
in this application.
Hutchinson’s: This psac algorithm is not worth implementing. Its optimized variant, Hutchinson 2, is
a zero-grade polynomial with a correction ROM.
Curticăpean’s: This method uses more resources than a good polynomial.
Sunderland’s: This method is superior to the polynomial in some cases.
• The VHDL implementation and Suggester construction phases were mainly just construction phases,
without any bigger conclusions.
One generally important conclusion is that this project requires far more than one 800 hours thesis to be really
good.
8.2
Suggested Improvements
There are many possible improvements that can be done, some examples:
• The psac_suggesters optimization algorithm.
• Implement methods for quadrature algorithms (returning both sine and cosine).
• Better cost model for the suggester.
41
42
Conclusions And Possible Improvements
• Support for the Very Coarse Approximations (see section 2.5.1, page 12) to the existing PSACs.
• Frequency handling in the suggester.
• Find a better coefficient assigning method for Sunderlands, according to e.g. Nicholas et al. See section
2.3.5.
• Implement support for different pipeline levels (especially a register level after ROMs).
• Implement support for DP ROMs (Dual Port, see section 3.1), to minimize the amount of pipelined data
in the polynomials.
• Adjust all coefficients to the ’middle’ points in the polynomials, and calculate using xF 2 = [-0.5, 0.5) =
xF − 0.5, this should lower the rounding noise for higher grade polynomials. It should also give Xilinx’
multipliers one more bit to work with (they operate with e.g. 18 bits signed or 17 bits unsigned).
• Support for bigger N , during which Matlab will never store the entire sine vector in any specific moment.
During the development there was huge memory problems when N ≥ 24
• Model/implement more methods. e.g. a new coefficient assignment method to the Curticăpean algorithm
could make that one better.
Bibliography
[1] J.M.P. Langlois and D. Al-Khalili.
Phase to sinusoid amplitude conversion techniques for direct digital frequency synthesis. IEE Proc.–Circuits Devices Syst., 151(6), December 2004. URL:
http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1387797&[email protected]
[2] B.H. Hutchinson Jr. Contemporary frequency synthesis techniques. In J. (Ed.) Gorski-Pcpicl, editor,
Frequency synthesis: techniques and applications, pages 25–45. IEEE Press, 1975.
[3] D.A. Sunderland, D.A. Strauch, S.S. Wharfield, H.T. Peterson, and Cole C.R. CMOS/SOS frequency
synthesizer LSI circuit for spread spectrum communications. IEEE J. Solid-State Circuits, 19, 1984. pp.
497-505.
[4] H.T. Nicholas, Samueli H., and B Kim. The optimization of direct digital frequency synthesizer performance
in the presence of finite word length effects. Annual Frequency Control Symposium, 1988. pp. 357-363.
[5] F. Curticăpean, K.I. Palomäki, and J Niittylahti. The optimization of direct digital frequency synthesiser
with high memory compression ratio. Electron. Lett, 2001.
[6] Wikipedia’s article about cordic from 25 mars 2010.
http://en.wikipedia.org/w/index.php?title=CORDIC&oldid=351988079.
[7] J.M.P. Langlois and D. Al-Khalili.
ROM size reduction with low processing
cost for direct digital frequency synthesis.
In Proc. IEEE Pacific Rim Conferance. on Communication,
Computers and Signal Processing,
August 2001.
URL:
http://ieeexplore.ieee.org/search/srchabstract.jsp?arnumber=1387797&[email protected]
[8] A.M. Sodagar and G.R. Lahiji. Mapping from phase to sine-amplitude in direct digital frequency synthesizers usig parabolic approximation. IEEE Transaction on Circuits and Systems-II, Analog Digit Signal
Process., 47, 2000.
[9] Alteras webbsite. http://www.altera.com/products/devices/dev-index.jsp.
[10] Cyclone Architecture. http://www.altera.com/literature/hb/cyc/cyc_c51002.pdf.
[11] Cyclone II: Architecture. http://www.altera.com/literature/hb/cyc2/cyc2_cii51002.pdf.
[12] Cyclone III: Device Core. http://www.altera.com/literature/hb/cyc3/cyc3_ciii5v1_01.pdf.
[13] Cyclone IV: Device Core. http://www.altera.com/literature/hb/cyclone-iv/cyiv-5v1-01.pdf.
[14] Arria GX Architecture. http://www.altera.com/literature/hb/agx/agx_51002.pdf.
[15] Arria II GX: Device Core. http://www.altera.com/literature/hb/arria-ii-gx/aiigx_5v1_01.pdf.
[16] Stratix Architecture. http://www.altera.com/literature/hb/stx/ch_2_vol_1.pdf.
[17] Stratix GX Architecture. http://www.altera.com/literature/hb/sgx/sgx_sgx51004.pdf.
[18] Stratix II Architecture. http://www.altera.com/literature/hb/stx2/stx2_sii51002.pdf.
[19] Stratix II GX Architecture. http://www.altera.com/literature/hb/stx2gx/stxiigx_sii51003.pdf.
[20] Stratix III Device Core. http://www.altera.com/literature/hb/stx3/stx3_siii5v1_01.pdf.
[21] Stratix IV Device Core. http://www.altera.com/literature/hb/stratix-iv/stx4_5v1_01.pdf.
[22] Spartan-6 CLB User Guide. http://www.xilinx.com/support/documentation/user_guides/ug384.pdf.
43
44
Bibliography
[23] Virtex-6 CLB User Guide. http://www.xilinx.com/support/documentation/user_guides/ug364.pdf.
[24] Spartan-3 User Guide. http://www.xilinx.com/support/documentation/user_guides/ug331.pdf.
[25] Spartan-6 Overview. http://www.xilinx.com/support/documentation/data_sheets/ds160.pdf.
[26] Spartan-6 BlockRAM User Guide. http://www.xilinx.com/support/documentation/user_guides/ug383.pdf.
[27] Spartan-6 DSP Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug389.pdf.
[28] Virtex Data Sheet. http://www.xilinx.com/support/documentation/data_sheets/ds003.pdf.
[29] Virtex-E Data Sheet. http://www.xilinx.com/support/documentation/data_sheets/ds022.pdf.
[30] Virtex-E ExtMem Data Sheet. http://www.xilinx.com/support/documentation/data_sheets/ds025.pdf.
[31] Virtex-II Data Sheet. http://www.xilinx.com/support/documentation/data_sheets/ds031.pdf.
[32] Virtex-II Pro Data Sheet. http://www.xilinx.com/support/documentation/data_sheets/ds083.pdf.
[33] Virtex-4 User Guide. http://www.xilinx.com/support/documentation/user_guides/ug070.pdf.
[34] Virtex-4 Overview. http://www.xilinx.com/support/documentation/data_sheets/ds112.pdf.
[35] Virtex-5 User Guide. http://www.xilinx.com/support/documentation/user_guides/ug190.pdf.
[36] Virtex-6 Overview. http://www.xilinx.com/support/documentation/data_sheets/ds150.pdf.
[37] Virtex-6 Memory User Guide. http://www.xilinx.com/support/documentation/user_guides/ug363.pdf.
[38] Virtex-6 DSP Slice User Guide. http://www.xilinx.com/support/documentation/user_guides/ug369.pdf.
Appendix A
What is...?
An appendix with descriptions of some of the used concepts.
ALM – Adaptive Logic Module. Structural unit in Alteras’ FPGAs. Contains one or more LUT, DFF and
some other logics.
R – FPGA vendor. Have the three main FPGA families Cyclone, Arria and Stratix. Main competitor
Altera
to Xilinx.
AWGN – Adding White Gaussian Noise, a White Gaussian Noise have the same power in all frequencies.
Bit – The smallest piece of information in digital systems. May be ’1’ or ’0’.
Carry – A time critical signal in an adder. Compare with adding a digit to the value 99995. If the digit is ≥
5 then the entire chain of digits will change, the carry is then the “memory digit” that goes through all
the digits.
Carrier – The main tone of a signal. In this thesis it is the intended sine wave without any approximation or
rounding errors.
CLB – Configurable Logic Block, a structural unit in Xilinx’ FPGAs. May contain one or several Slices.
Combinatorial – A combinatorial digital function calculate a result from it’s current inputs. It is independent
of earlier inputs.
CPLD – Complex Programmable Logic Device.
dB – decibel, a logarithmic scale for comparing relative difference in power between signals.
dBc – dB relative to the carrier. For example a harmonic may have a dBc = -20, why it have a power of
10−20/10 relative the base tone.
DDFS – Direct Digital Frequency Synthesizer. A logical module that takes a frequency (in any unit) as
argument, and produce a pure sine wave with that frequency. A DDFS contains nothing but a phase
counter and a PSAC.
DFF – D-type FlipFlop, a small unit that delays a digital signal one clock cycle.
e2 – ||error||2 . The average (RMS) error of a signal.
einf – ||error||∞ . The maximum error of a signal.
ENOB – Effective Number Of Bits - approximately “number of correct bits” for a sine wave.
Error – The difference between the exact desired signal and its rounded and approximated value.
FA – Full Adder, a basic digital component.
FCW – Frequency Control Word, a number that is added to the phase accumulator in each clock cycle.
FPGA – Field Programmable Gate Array. An electronic chip that contains a lot of digital logic. The logic is
programmable to a very high degree, why the FPGA can be programmed to behave in a very complex
way, as long as the required behavior is digital. One type of use can be to code a signal from pure digital
into a sine modulated signal. In that case an important component would be the DDFS.
45
46
What is...?
Harmonic – A tone is harmonic to another if its frequency is an integer multiple of the other tone’s frequency.
The tone 440 Hz have the harmonics 2*440, 3*440, 4*440, ... Hz.
LAB – Logic Array Block, a structural unit in Alteras’ FPGAs. May contain one or several ALMs or LEs.
LC – Logic Cell. Structural unit in Xilinx’ FPGAs. Contains typically a LUT, a DFF and some carry logic.
LE – Logic Element. Structural unit in Alteras’ FPGAs. Contains typically a LUT, a DFF and some carry
logic.
LSB – Least Significant Bit, the rightmost digit in a binary number (compare the ’4’ in 1024).
LUT – Look Up Table, or function generator. Takes a few signals and produces a required result as a function
of those.
R – A program from The Mathsoft Inc. for mathematic calculations.
Matlab
Mirror effect - The mirror effect is an effect when you sample a signal into finitely many values. If the signal
frequency goes higher than half of the sample frequency, the sampled signals frequency will “bounce” in
the (sample frequency)/2.
R – A program from Mentor Graphics aimed to compile and simulate for instance VHDL code.
ModelSim
MSB – Most significant bit (or bits), the leftmost digit in a binary number (compare the 1 in 1024).
Multiplier – A component in an FPGA that performs a multiplication.
PSAC – Phase to Sine Amplitude Converter. A logical module that takes a phase and responds with its
corresponding sine amplitude.
Quadrant – A quarter of a rotation. The four quadrants represents 0-90◦ , 90-180◦, 180-270◦and 270-360◦respectively.
RMS – Root Mean Square. The RMS of a set of values is the square root of the average of the squares of the
values.
ROM – Read Only Memory, use to occur in a rather large amount in FPGAs, usually 512 bits to 8 kbits large.
RTL – Register Transfer Level, a low abstraction level view of a digital system/module/function.
SFDR – Spurious Free Dynamic Range, a way to measure signal purity.
SINAD – SIgnal to Noise And Distortion ratio, a way to measure signal purity.
Slice – Structural unit in Xilinx’ FPGAs. Contains one or more LC (logic cell), and usually some other logic.
SNDR – Signal to Noise and Distortion Ratio, see SINAD.
SNR – Signal to Noise Ratio, a way to measure signal purity.
SURD – Symmetry Using Range Divider or Division. A notation in this thesis for a way to reduce the input
range to the PSAC.
VHDL – VHSIC Hardware Description Language (VHSIC: Very High Speed Integrated Circuit). A language
to describe how to program e.g. an FPGA.
R – FPGA vendor. Have the two main FPGA families Spartan and Virtex. Main competitor to Altera.
Xilinx
Appendix B
Quality and Resource Tables
Tables from analysis. None of these aims to test all possible solutions, but to spread a number of tests within
the range of possible solutions.
B.1
ROM/Polynomial
These tables means to compare the polynomial methods for some different Ks and course/fine ratios.
For all tables are both accumulators and results 16 bits wide.
47
48
Quality and Resource Tables
Average error: e2
C=1, F=13, W=16
2
3
4
K =
1
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
0.337
0.16
0.16
0.163
0.159
0.103
0.0358
0.0393
0.0373
0.0363
0.0226
0.00314
0.00195
0.00137
0.00142
C=6, F=8, W=16
2
3
K=
1
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
0.01
0.00501
0.00501
0.00501
0.00501
K=
1
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
0.000283
0.000159
0.000156
0.000159
0.000156
0.000166
3.61e-05
3.72e-05
4.44e-05
3.42e-05
1.62e-05
1.66e-05
1.67e-05
1.67e-05
1.67e-05
C=11, F=3, W=16
2
3
9.79e-06
9.81e-06
9.8e-06
9.81e-06
9.81e-06
5
0.00301
0.000237
8.38e-05
6.23e-05
5.55e-05
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
0.000615
2.29e-05
5.22e-06
7.96e-06
1.57e-05
4
5
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
1.62e-05
4
5
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
9.79e-06
Max error: einf
K =
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
K =
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
1
0.707
0.383
0.383
0.354
0.373
C=1, F=13, W=16
2
3
0.161
0.0449
0.0642
0.00961
0.0704
0.00364
0.0671
0.00266
0.0653
0.00375
1
0.0244
0.0123
0.0122
0.0123
0.0122
C=6, F=8, W=16
2
3
0.000327
4.56e-05
8.35e-05
4.93e-05
8.58e-05
4.93e-05
0.000102
4.93e-05
7.69e-05
4.93e-05
1
0.000686
0.000398
0.000351
0.000398
0.000351
C=11, F=3, W=16
2
3
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
4
0.0127
0.00096
0.000159
0.000127
0.000228
4
4.56e-05
4.56e-05
4.56e-05
4.56e-05
4.56e-05
4
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
5
0.00244
9.52e-05
1.53e-05
1.52e-05
3.42e-05
5
4.56e-05
4.56e-05
4.56e-05
4.56e-05
4.56e-05
5
2.79e-05
2.79e-05
2.79e-05
2.79e-05
2.79e-05
B.1 ROM/Polynomial
49
Effective Number of Bits: ENOB
K =
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=1, F=13, W=16
1
2
3
0.771
3.63
5.04
1.84
5.17
7.78
1.84
5.06
8.24
1.84
5.16
8.89
1.84
5.18
9.09
4
7.97
11.6
12.8
13.2
13.4
5
10.2
14.8
16.8
17.3
16.3
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=6,
1
6.12
6.86
6.86
6.86
6.86
4
15.7
15.7
15.7
15.7
15.7
5
15.7
15.7
15.7
15.7
15.7
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=11,
1
11.2
11.9
11.9
11.9
11.9
4
15.9
15.9
15.9
15.9
15.9
5
15.9
15.9
15.9
15.9
15.9
F=8, W=16
2
3
13.2
15.7
15
15.7
14.9
15.7
14.9
15.7
15
15.7
F=3, W=16
2
3
15.9
15.9
15.9
15.9
15.9
15.9
15.9
15.9
15.9
15.9
Signal to Noise and Distorsion Ratio: SINAD
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=1,
1
6.31
12.8
12.8
12.8
12.8
F=13, W=16
2
3
23.5
32
32.8
48.5
32.1
51.3
32.7
55.2
32.8
56.4
4
49.7
71.4
78.6
81.3
82.4
5
63
90.9
103
106
99.9
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=6,
1
38.5
43
43
43
43
F=8, W=16
2
3
81
96.1
91.8
96
91.6
96
91.3
96
91.9
96
4
96.1
96.1
96.1
96.1
96.1
5
96.1
96.1
96.1
96.1
96.1
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=11,
1
69.4
73
73.2
73
73.2
4
97.2
97.2
97.2
97.2
97.2
5
97.2
97.2
97.2
97.2
97.2
F=3, W=16
2
3
97.2
97.2
97.2
97.2
97.2
97.2
97.2
97.2
97.2
97.2
50
Quality and Resource Tables
Spurious Free Dynamic Range: SFDR
K =
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
K=
TaylorLeft
TaylorMid
Interpole
Chebyshev
LeastSqr
C=1,
1
9.54
16.9
16.9
16.9
16.9
F=13, W=16
2
3
27.3
35.3
35.3
55.4
33.8
53.8
35.8
58.8
35.8
61.7
4
54.6
75.2
81.6
85.8
87.2
5
66.3
95.9
106
110
103
C=6, F=8, W=16
1
2
3
42.1
87
108
48.1
96.1
108
48.1
96.1
109
48.1
96.1
109
48.1
96.1
109
4
108
108
108
108
108
5
108
108
108
108
108
4
117
117
117
117
117
5
117
117
117
117
117
C=11,
1
73.4
76.7
76.7
76.7
76.7
F=3, W=16
2
3
117
117
117
117
117
117
117
117
117
117
Polynomial resources
K =
TaylorLeft
1
R: 15
2
R: [15 15], Σ = 30
M: 15
TaylorMid
R: 15
R: [15 15], Σ = 30
M: 15
Interpole
R: 15
R: [15 15], Σ = 30
M: 15
Chebyshev
R: 15
R: [15 15], Σ = 30
M: 15
LeastSqr
R: 15
R: [15 15], Σ = 30
M: 15
Some notes: Rom: 2 × . . . , Mult: 13 × . . .
C=1, F=13, W=16
3
4
R: [13 15 15], Σ = 43
R: [12 13 15 15], Σ =
M: [13 15]
M: [12 14 15]
R: [14 15 16], Σ = 45
R: [12 13 15 16], Σ =
M: [14 15]
M: [12 14 15]
R: [14 15 15], Σ = 44
R: [12 13 15 15], Σ =
M: [14 15]
M: [12 14 15]
R: [14 15 16], Σ = 45
R: [12 13 15 16], Σ =
M: [14 15]
M: [12 14 15]
R: [14 15 16], Σ = 45
R: [12 13 15 16], Σ =
M: [14 15]
M: [12 14 15]
C=6, F=8, W=16
2
R: [10 15], Σ = 25
M: 10
TaylorMid
R: 15
R: [10 15], Σ = 25
M: 10
Interpole
R: 15
R: [10 15], Σ = 25
M: 10
Chebyshev
R: 15
R: [10 15], Σ = 25
M: 10
LeastSqr
R: 15
R: [10 15], Σ = 25
M: 10
Some notes: Rom: 64 × . . . , Mult: 8 × . . .
K =
TaylorLeft
1
R: 15
3-5
R: [4 10 15],
M: [4 10]
R: [4 10 15],
M: [4 10]
R: [4 10 15],
M: [4 10]
R: [4 10 15],
M: [4 10]
R: [4 10 15],
M: [4 10]
C=11, F=3, W=16
1
2-5
R: 15
R: [5 15], Σ
20
M: 5
TaylorMid
R: 15
R: [5 15], Σ
20
M: 5
Interpole
R: 15
R: [5 15], Σ
20
M: 5
Chebyshev
R: 15
R: [5 15], Σ
20
M: 5
LeastSqr
R: 15
R: [5 15], Σ
20
M: 5
Some notes: Rom: 2k × . . . , Mult:
K =
TaylorLeft
=
=
=
=
=
3 × ...
55
56
55
56
56
Σ = 29
Σ = 29
Σ = 29
Σ = 29
Σ = 29
5
R: [9 12 13 15 15],
M: [9 12 14 15]
R: [9 12 14 15 15],
M: [9 12 15 15]
R: [9 12 14 15 15],
M: [9 12 15 15]
R: [9 12 14 15 15],
M: [9 12 15 15]
R: [9 12 14 15 15],
M: [9 12 15 15]
Σ = 64
Σ = 65
Σ = 65
Σ = 65
Σ = 65
B.2 Other Decompositions
B.2
51
Other Decompositions
The Hutchinson’s, Sunderland’s and Curticăpean’s methods are handled here.
For simplicity reasons, N = W = 16 bits in all these cases.
The data is presented in two ways; first grouped by F and method, and then by quality types.
B.2.1
The F and Method Groupings
These groupings are suitable for analyzing each method.
F=3
Hutchinson’s
C =
11
e2
1.19e-05
einf
3e-05
ENOB
15.6
Sinad
95.5
SFDR
116
C=
e2
einf
ENOB
Sinad
SFDR
0
1.93e-4
6.81e-4
12.3
75.4
83
1
1.19e-4
4.98e-4
12.8
78.9
87.6
2
6.25e-5
2.53e-4
13.7
84.4
92
Hutchinson’s 2
C =
11
e2
1.19e-05
einf
3.04e-05
ENOB
15.6
Sinad
95.5
SFDR
116
3
3.26e-5
1.32e-4
14.6
89.4
98.4
Sunderland’s
4
5
1.94e-5
1.4e-5
7.57e-5
5.14e-5
15.2
15.5
93
94.7
102
108
Curticăpean’s
C =
11
e2
1.04e-05
einf
2.82e-05
ENOB
15.8
Sinad
96.6
SFDR
106
6
1.25e-5
4.18e-5
15.5
95.3
114
7
1.21e-5
3.47e-5
15.6
95.4
115
8
1.19e-5
3.2e-5
15.6
95.5
116
F=8
Hutchinson’s
C=
6
e2
9.64e-05
einf
0.000321
ENOB
13.2
Sinad
80.9
SFDR
87.1
C =
e2
einf
ENOB
Sinad
SFDR
0
0.0067
0.0242
7.25
45.3
51.6
1
0.00412
0.017
7.82
48.7
56.2
Hutchinson’s 2
C=
6
e2
1.26e-05
einf
3.03e-05
ENOB
15.5
Sinad
95.1
SFDR
109
Sunderland’s
2
3
0.00216
0.00109
0.00908
0.00447
8.71
9.67
54.1
59.9
64.3
71
Curticăpean’s
C =
6
e2
9.62e-05
einf
0.00031
ENOB
13.2
Sinad
81
SFDR
87.1
4
0.000532
0.00212
10.7
65.8
75.3
5
0.000249
0.000919
11.7
72.2
78.3
F=13
Hutchinson’s
C =
1
e2
0.0665
einf
0.207
ENOB
3.77
Sinad
24.4
SFDR
26.1
Hutchinson’s 2
C =
1
e2
9.17e-06
einf
1.9e-05
ENOB
16
Sinad
98
SFDR
119
Curticăpean’s
C =
1
e2
0.0665
einf
0.207
ENOB
3.77
Sinad
24.4
SFDR
26.1
Sunderland’s
C =
0
1
e2
0.153
0.0665
einf
0.414
0.207
ENOB
2.87
3.77
Sinad
18.9
24.4
SFDR
19.9
26.1
B.2.2
The Quality Groupings
These groupings are suitable when comparing the different methods.
6
9.64e-05
0.000321
13.2
80.9
87.1
9
1.19e-5
3.12e-5
15.6
95.5
116
10,11
1.19e-5
3e-5
15.6
95.5
116
52
Quality and Resource Tables
e2
F =
Hutchinson’s
Hutchinson’s 2
Curticapean’s
Sunderland’s, C=2
Sunderland’s, C=4
Sunderland’s, C=6
Sunderland’s, C=8
Sunderland’s, C=10
Sunderland’s, C=12
Sunderland’s, C=14
Sunderland’s, C=16
0
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
8.79e-06
2
1.13e-05
1.13e-05
1.03e-05
7.97e-05
3.06e-05
1.33e-05
1.14e-05
1.13e-05
1.13e-05
1.13e-05
4
1.21e-05
1.21e-05
1.04e-05
0.000403
0.000131
3.58e-05
1.49e-05
1.23e-05
1.21e-05
F =
Hutchinson’s
Hutchinson’s 2
Curticapean’s
Sunderland’s, C=2
Sunderland’s, C=4
Sunderland’s, C=6
Sunderland’s, C=8
Sunderland’s, C=10
Sunderland’s, C=12
Sunderland’s, C=14
Sunderland’s, C=16
0
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
1.53e-05
2
3e-05
3e-05
2.82e-05
0.000287
0.000134
5.21e-05
3.5e-05
3.09e-05
3e-05
3e-05
4
3.06e-05
3.04e-05
2.85e-05
0.00144
0.000553
0.000157
5.65e-05
3.62e-05
3.06e-05
F =
Hutchinson’s
Hutchinson’s 2
Curticapean’s
Sunderland’s, C=2
Sunderland’s, C=4
Sunderland’s, C=6
Sunderland’s, C=8
Sunderland’s, C=10
Sunderland’s, C=12
Sunderland’s, C=14
Sunderland’s, C=16
0
16
16
16
16
16
16
16
16
16
16
16
F =
Hutchinson’s
Hutchinson’s 2
Curticapean’s
Sunderland’s, C=2
Sunderland’s, C=4
Sunderland’s, C=6
Sunderland’s, C=8
Sunderland’s, C=10
Sunderland’s, C=12
Sunderland’s, C=14
Sunderland’s, C=16
0
98.1
98.1
98.1
98.1
98.1
98.1
98.1
98.1
98.1
98.1
98.1
2
95.9
95.9
97.1
82.4
89.3
95.1
95.9
95.9
95.9
95.9
F =
Hutchinson’s
Hutchinson’s 2
Curticapean’s
Sunderland’s, C=2
Sunderland’s, C=4
Sunderland’s, C=6
Sunderland’s, C=8
Sunderland’s, C=10
Sunderland’s, C=12
Sunderland’s, C=14
Sunderland’s, C=16
0
128
128
128
128
128
128
128
128
128
128
128
2
114
114
109
88.4
97.4
111
112
114
114
114
6
1.41e-05
1.23e-05
1.3e-05
0.00167
0.000541
0.000139
3.64e-05
1.41e-05
8
9.64e-05
1.26e-05
9.62e-05
0.0067
0.00216
0.000532
9.64e-05
10
0.00147
1.32e-05
0.00147
0.0264
0.00822
0.00147
12
0.0209
1.15e-05
0.0209
0.0964
0.0209
14
8.79e-06
8.79e-06
1.39e-05
8.79e-06
8
0.000321
3.03e-05
0.00031
0.0242
0.00908
0.00212
0.000321
10
0.00481
3.02e-05
0.00479
0.0931
0.0327
0.00481
12
0.0703
2.81e-05
0.0703
0.306
0.0703
14
1.53e-05
1.53e-05
3.05e-05
1.53e-05
einf
6
4.49e-05
3.02e-05
3.5e-05
0.00603
0.0023
0.000567
0.000156
4.49e-05
ENOB
2
15.7
15.7
15.9
13.4
14.6
15.5
15.6
15.7
15.7
15.7
4
15.6
15.6
15.8
11.3
12.7
14.5
15.4
15.5
15.6
6
15.5
15.5
15.7
9.24
10.7
12.6
14.5
15.5
8
13.2
15.5
13.2
7.25
8.71
10.7
13.2
10
9.24
15.4
9.24
5.3
6.76
9.24
12
5.47
15.6
5.47
3.52
5.47
14
16
16
16
16
SINAD
4
95.3
95.3
96.6
69.4
78.3
88.9
94.4
95.3
95.3
6
94.8
95.2
95.9
57.3
66.2
77.7
88.7
94.8
8
80.9
95.1
81
45.3
54.1
65.8
80.9
10
57.3
94.6
57.3
33.6
42.4
57.3
12
34.6
95.8
34.6
22.9
34.6
14
98.1
98.1
98.1
98.1
SFDR
4
118
118
107
76.4
89.1
101
112
117
118
6
112
113
105
63.9
76.6
89.7
99.4
112
8
87.1
109
87.1
51.6
64.3
75.3
87.1
10
63.5
104
63.5
39.1
51.7
63.5
12
40.4
102
40.4
26.2
40.4
14
128
128
128
128
B.3 Sine Compression
53
Resources
F =
0
Hutchinson’s R: 16kx15
Hutchinson’s R: 16kx15
2
Curticapean’s R: 16kx31
M: 0x16
Sunderland’s,
C=2
Sunderland’s,
C=4
Sunderland’s,
C=6
Sunderland’s,
C=8
Sunderland’s,
C=10
Sunderland’s,
C=12
Sunderland’s,
C=14
Sunderland’s,
C=16
B.3
R: 16kx15
R: 16kx15
R: 16kx15
R: 16kx15
R: 16kx15
R: 16kx15
R: 16kx15
2
R: 4kx15 +
16kx4
R: 4kx15 +
16kx4
R: 4kx31 +
4x4
M: 4x16
R: 4kx15 +
4x4
R: 4kx15 +
16x4
R: 4kx15 +
64x4
R: 4kx15 +
256x4
R: 4kx15 +
1kx4
R: 4kx15 +
4kx4
R: 4kx15 +
16kx4
4
R: 1kx15 +
16kx6
R: 1kx15 +
16kx6
R: 1kx31 +
16x6
M: 6x16
R: 1kx15 +
16x6
R: 1kx15 +
64x6
R: 1kx15 +
256x6
R: 1kx15 +
1kx6
R: 1kx15 +
4kx6
R: 1kx15 +
16kx6
6
R: 256x15
+ 16kx8
R: 256x15
+ 16kx8
R: 256x31
+ 64x8
M: 8x16
R: 256x15
+ 64x8
R: 256x15
+ 256x8
R: 256x15
+ 1kx8
R: 256x15
+ 4kx8
R: 256x15
+ 16kx8
8
R: 64x15 +
16kx10
R: 64x15 +
16kx10
R: 64x31 +
256x10
M: 10x16
R: 64x15 +
256x10
R: 64x15 +
1kx10
R: 64x15 +
4kx10
R: 64x15 +
16kx10
10
R: 16x15 +
16kx12
R: 16x15 +
16kx12
R: 16x31
+1kx12
M: 12x16
R: 16x15 +
1kx12
R: 16x15 +
4kx12
R: 16x15 +
16kx12
12
R: 4x15 +
16kx14
R: 4x15 +
16kx14
R: 4x31 +
4kx14
M: 14x16
R: 4x15 +
4kx14
R: 4x15 +
16kx14
14
R: 16kx15
R: 16kx15
R: 16kx15
+ 1x16
M: 15x16
R: 16kx15
R: 16kx15
Sine Compression
The sine compression function can take an entire configuration and create a sine compression, or if you leave parts of the configuration it will generate the cheapest solution (with respect to number of ROM bits). The Hutchinson’s method is worse than the
Hutchinson’s 2, and Hutchinson’s 2 is already a sine compression (with ptl as method), why those two are not listed. The m=...
codes are the method codes as defined in table 4.3 on page 25, with the exception that c is the Curticăpean’s method. The three
tables are:
1. Full control: Testing the resources for some given inputs. No optimization. The entire configuration is handled to the
PSAC creator. Only a very few values are tested for each method in the table.
2. Method given: Testing the best result for every combination of N={8,12,16,20,24}, W={10,16,20,32} and all algorithms
(all coefficient assignment methods are tested for N=20, W=20). Other configurations are optimized by the PSAC creator.
There are one table for each tested N, 5 tables in total.
3. Full optimization: All combinations of N={8,12,16,20,24} and W={10,16,20,32} are tested, entire configuration is optimized.
54
Quality and Resource Tables
Full control psac_compression resources
Config
rom (appr)
N=16, W=16, F=8, K=1, m=ptl
960 bits
N=16, W=16, F=8, K=2, m=ptl
1.56 kbits
N=16, W=16, F=8, K=3, m=ptl
1.81 kbits
N=20, W=16, F=12, K=2, m=ptl
1.56 kbits
N=20, W=16, F=8, K=2, m=ptl
21 kbits
N=16, W=20, F=8, K=2, m=ptl
2.06 kbits
N=16, W=16, F=8, K=2, m=ptm
1.56 kbits
N=20, W=16, F=12, K=2, m=ptm
1.56 kbits
N=20, W=16, F=8, K=2, m=ptm
21 kbits
N=16, W=20, F=8, K=2, m=ptm
2.06 kbits
N=16, W=16, F=8, K=2, m=pc
1.56 kbits
N=20, W=16, F=12, K=2, m=pc
1.56 kbits
N=20, W=16, F=8, K=2, m=pc
21 kbits
N=16, W=20, F=8, K=2, m=pc
2.06 kbits
N=16, W=16, F=8, K=2, m=pi
1.56 kbits
N=20, W=16, F=12, K=2, m=pi
1.56 kbits
N=20, W=16, F=8, K=2, m=pi
21 kbits
N=16, W=20, F=8, K=2, m=pi
2.06 kbits
N=16, W=16, F=8, K=2, m=pls
1.56 kbits
N=20, W=16, F=12, K=2, m=pls
1.56 kbits
N=20, W=16, F=8, K=2, m=pls
21 kbits
N=16, W=20, F=8, K=2, m=pls
2.06 kbits
N=16, W=16, F=8, C=6, m=c
4.44 kbits
N=20, W=16, F=12, C=6, m=c
41.9 kbits
N=20, W=16, F=8, C=10, m=c
32.5 kbits
N=16, W=20, F=8, C=6, m=c
5.94 kbits
N=16, W=16, F=6, C=3, m=s
7.75 kbits
N=20, W=16, F=6, C=7, m=s
92 kbits
N=20, W=16, F=6, C=3, m=s
62 kbits
N=20, W=16, F=10, C=3, m=s
67.8 kbits
N=16, W=20, F=6, C=3, m=s
10.8 kbits
Some notes:
rom (appr): Rom sizes for approximating psac.
rom (corr): Correction rom size
m=c is methods Curticăpean’s
rom (corr)
rom (tot)
used multiplications
176 kbits
80 kbits
48 kbits
1.25 Mbits
512 kbits
144 kbits
64 kbits
1 Mbits
512 kbits
112 kbits
48 kbits
768 kbits
512 kbits
96 kbits
48 kbits
768 kbits
512 kbits
112 kbits
48 kbits
768 kbits
512 kbits
96 kbits
80 kbits
1.25 Mbits
768 kbits
144 kbits
112 kbits
512 kbits
768 kbits
1.75 Mbits
176 kbits
177 kbits
81.6 kbits
49.8 kbits
1.25 Mbits
533 kbits
146 kbits
65.6 kbits
1 Mbits
533 kbits
114 kbits
49.6 kbits
770 kbits
533 kbits
98.1 kbits
49.6 kbits
770 kbits
533 kbits
114 kbits
49.6 kbits
770 kbits
533 kbits
98.1 kbits
84.4 kbits
1.29 Mbits
801 kbits
150 kbits
120 kbits
604 kbits
830 kbits
1.82 Mbits
187 kbits
0
1
2
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
Optimized psac_compression for N = 8 and some W and methods
N,W,method
0 mults
F=2, C=4
roms = 560 bits
W=10, m=c
F=0, C=10
roms = 1.44 kbits
W=10, m=pls
F=3, K=1
roms = 520 bits
W=16, m=s
F=2, C=3
roms = 1.08 kbits
W=16, m=c
F=0, C=10
roms = 2.56 kbits
W=16, m=pls
F=3, K=1
roms = 952 bits
W=20, m=s
F=3, C=2
roms = 1.41 kbits
W=20, m=c
F=0, C=10
roms = 3.31 kbits
W=20, m=pls
F=1, K=2
roms = 1.19 kbits
W=32, m=s
F=3, C=2
roms = 2.34 kbits
W=32, m=c
F=0, C=10
roms = 5.56 kbits
W=32, m=pls
F=1, K=2
roms = 1.94 kbits
Some notes: m=c is methods Curticăpean’s
1 mults
2 mults
3 mults
F=3, C=10
roms = 528 bits
F=3, K=2
roms = 320 bits
F=4, K=3
roms = 220 bits
F=5, K=4
roms = 190 bits
F=3, C=10
roms = 1.03 kbits
F=3, K=2
roms = 736 bits
F=4, K=3
roms = 484 bits
F=4, K=4
roms = 328 bits
F=3, C=10
roms = 1.38 kbits
F=3, K=2
roms = 1.03 kbits
F=3, K=3
roms = 720 bits
F=4, K=4
roms = 520 bits
F=3, C=10
roms = 2.41 kbits
F=5, K=2
roms = 1.87 kbits
F=4, K=3
roms = 1.66 kbits
F=4, K=4
roms = 1.38 kbits
W=10, m=s
B.3 Sine Compression
55
Optimized psac_compression for N = 12 and some W and methods
N/W
0 mults
F=3, C=6
roms = 3.5 kbits
W=10, m=c
F=1, C=10
roms = 11.5 kbits
W=10, m=pls
F=3, K=1
roms = 4.12 kbits
W=16, m=s
F=3, C=6
roms = 10 kbits
W=16, m=c
F=0, C=10
roms = 37 kbits
W=16, m=pls
F=3, K=1
roms = 10.9 kbits
W=20, m=s
F=3, C=6
roms = 15 kbits
W=20, m=c
F=0, C=10
roms = 49 kbits
W=20, m=pls
F=4, K=1
roms = 15.2 kbits
W=32, m=s
F=3, C=3
roms = 29.3 kbits
W=32, m=c
F=0, C=10
roms = 85 kbits
W=32, m=pls
F=4, K=1
roms = 27.9 kbits
Some notes: m=c is methods Curticăpean’s
1 mults
2 mults
3 mults
F=6, C=10
roms = 3.67 kbits
F=6, K=2
roms = 2.23 kbits
F=8, K=3
roms = 2.09 kbits
F=8, K=4
roms = 2.1 kbits
F=4, C=10
roms = 8.09 kbits
F=4, K=2
roms = 4.56 kbits
F=7, K=3
roms = 3.3 kbits
F=8, K=4
roms = 3.2 kbits
F=4, C=10
roms = 12.7 kbits
F=3, K=2
roms = 9 kbits
F=5, K=3
roms = 4.38 kbits
F=7, K=4
roms = 3.47 kbits
F=5, C=10
roms = 25.8 kbits
F=4, K=2
roms = 22.6 kbits
F=4, K=3
roms = 13.8 kbits
F=5, K=4
roms = 7 kbits
W=10, m=s
Optimized psac_compression for N = 16 and some W and methods
N/W
0 mults
F=6, C=5
roms = 35.2 kbits
W=10, m=c
F=4, C=10
roms = 51 kbits
W=10, m=pls
F=5, K=1
roms = 36.5 kbits
W=16, m=s
F=4, C=7
roms = 66 kbits
W=16, m=c
F=0, C=10
roms = 544 kbits
W=16, m=pls
F=3, K=1
roms = 110 kbits
W=20, m=s
F=4, C=8
roms = 125 kbits
W=20, m=c
F=0, C=10
roms = 720 kbits
W=20, m=pls
F=4, K=1
roms = 179 kbits
W=32, m=s
F=4, C=8
roms = 341 kbits
W=32, m=c
F=0, C=10
roms = 1.27 Mbits
W=32, m=pls
F=4, K=1
roms = 383 kbits
Some notes: m=c is methods Curticăpean’s
1 mults
2 mults
3 mults
F=8, C=10
roms = 34.2 kbits
F=10, K=2
roms = 32.2 kbits
F=13, K=3
roms = 48.1 kbits
F=14, K=4
roms = 48 kbits
F=6, C=10
roms = 56.2 kbits
F=5, K=2
roms = 43 kbits
F=11, K=3
roms = 48.3 kbits
F=12, K=4
roms = 48.2 kbits
F=6, C=10
roms = 107 kbits
F=5, K=2
roms = 63 kbits
F=9, K=3
roms = 49.4 kbits
F=11, K=4
roms = 48.5 kbits
F=6, C=10
roms = 305 kbits
F=4, K=2
roms = 229 kbits
F=6, K=3
roms = 81.8 kbits
F=8, K=4
roms = 53.6 kbits
W=10, m=s
56
Quality and Resource Tables
Optimized psac_compression for N = 20, some W and all methods
N/W
0 mults
F=9, C=3
roms = 519 kbits
W=10, m=c
F=8, C=10
roms = 531 kbits
W=10, m=pls
F=9, K=1
roms = 517 kbits
W=16, m=s
F=6, C=7
roms = 580 kbits
W=16, m=c
F=3, C=10
roms = 1.47 Mbits
W=16, m=pls
F=3, K=1
roms = 992 kbits
W=20, m=s
F=5, C=9
roms = 948 kbits
W=20, m=c
F=0, C=10
roms = 10.5 Mbits
W=20, m=ptl
F=4, K=1
roms = 2.05 Mbits
W=20, m=ptm
F=4, K=1
roms = 1.8 Mbits
W=20, m=pi
F=4, K=1
roms = 1.8 Mbits
W=20, m=pls
F=4, K=1
roms = 1.8 Mbits
W=20, m=pc
F=4, K=1
roms = 1.8 Mbits
W=32, m=s
F=5, C=9
roms = 3.82 Mbits
W=32, m=c
F=0, C=10
roms = 19.2 Mbits
W=32, m=pls
F=4, K=1
roms = 4.98 Mbits
Some notes: m=c is methods Curticăpean’s
1 mults
2 mults
3 mults
F=10, C=10
roms = 784 kbits
F=9, K=2
roms = 523 kbits
F=14, K=3
roms = 769 kbits
F=14, K=4
roms = 769 kbits
F=8, C=10
roms = 810
F=6, K=2
roms = 620
F=6, K=2
roms = 620
F=7, K=2
roms = 568
F=7, K=2
roms = 568
F=7, K=2
roms = 568
F=12, K=3
roms = 771
F=13, K=3
roms = 769
F=13, K=3
roms = 769
F=13, K=3
roms = 769
F=13, K=3
roms = 769
F=13, K=4
roms = 770
F=15, K=4
roms = 768
F=15, K=4
roms = 768
F=14, K=4
roms = 769
F=15, K=4
roms = 768
W=10, m=s
F=10, C=10
roms = 519 kbits
F=14, K=2
roms = 512 kbits
kbits
kbits
kbits
kbits
kbits
kbits
F=8, C=10
roms = 3.57 Mbits
F=4, K=2
roms = 1.52 Mbits
kbits
kbits
kbits
kbits
kbits
F=9, K=3
roms = 802 kbits
kbits
kbits
kbits
kbits
kbits
F=12, K=4
roms = 774 kbits
B.3 Sine Compression
57
Optimized psac_compression for N = 24 and some W and methods
N/W
0 mults
F=12, C=2
roms = 8.01 Mbits
W=10, m=c
F=12, C=10
roms = 8.02 Mbits
W=10, m=pls
F=13, K=1
roms = 8 Mbits
W=16, m=s
F=9, C=6
roms = 8.14 Mbits
W=16, m=c
F=6, C=10
roms = 9.94 Mbits
W=16, m=pls
F=7, K=1
roms = 8.47 Mbits
W=20, m=s
F=8, C=10
roms = 8.67 Mbits
W=20, m=c
F=3, C=10
roms = 27.5 Mbits
W=20, m=pls
F=4, K=1
roms = 16.8 Mbits
W=32, m=s
F=6, C=10
roms = 42.2 Mbits
W=32, m=c
F=0, C=10
roms = 292 Mbits
W=32, m=pls
F=4, K=1
roms = 63.8 Mbits
Some notes: m=c is methods Curticăpean’s
1 mults
2 mults
3 mults
W=10, m=s
F=13, C=10
roms = 8.02 Mbits
F=14, K=2
roms = 8 Mbits
F=12, C=10
roms = 8.05 Mbits
F=13, K=2
roms = 8.01 Mbits
F=12, C=10
roms = 12.1 Mbits
F=11, K=2
roms = 8.05 Mbits
F=14, K=3
roms = 12 Mbits
F=10, C=10
roms = 40.3 Mbits
F=7, K=2
roms = 13.5 Mbits
F=13, K=3
roms = 12 Mbits
F=14, K=4
roms = 12 Mbits
Optimized psac_compression for some N and W
N/W
0 mults
1 mults
2 mults
3 mults
N= 8, W=10
F=3, K=1, m=pc
roms = 520 bits
F=3, K=1, m=pc
roms = 952 bits
F=1, K=2, m=pls
roms = 1.19 kbits
F=1, K=2, m=pls
roms = 1.94 kbits
F=3, C=6, m=s
roms = 3.5 kbits
F=3, C=6, m=s
roms = 10 kbits
F=3, C=6, m=s
roms = 15 kbits
F=4, K=1, m=pc
roms = 27.9 kbits
F=6, C=5, m=s
roms = 35.2 kbits
F=4, C=7, m=s
roms = 66 kbits
F=4, C=8, m=s
roms = 125 kbits
F=4, C=8, m=s
roms = 341 kbits
F=9, K=1, m=pc
roms = 517 kbits
F=6, C=7, m=s
roms = 580 kbits
F=5, C=9, m=s
roms = 948 kbits
F=5, C=9, m=s
roms = 3.82 Mbits
F=13, K=1, m=pc
roms = 8 Mbits
F=9, C=6, m=s
roms = 8.14 Mbits
F=8, C=10, m=s
roms = 8.67 Mbits
F=6, C=10, m=s
roms = 42.2 Mbits
F=3, K=2, m=pc
roms = 320 bits
F=3, K=2, m=pc
roms = 736 bits
F=4, K=2, m=pc
roms = 1.02 kbits
F=4, K=2, m=pc
roms = 1.86 kbits
F=6, K=2, m=pc
roms = 2.23 kbits
F=4, K=2, m=pc
roms = 4.56 kbits
F=3, K=2, m=pc
roms = 8 kbits
F=4, K=2, m=pc
roms = 21.6 kbits
F=10, K=2, m=pc
roms = 32.2 kbits
F=5, K=2, m=pc
roms = 43 kbits
F=6, K=2, m=pc
roms = 55.8 kbits
F=4, K=2, m=pc
roms = 213 kbits
F=14, K=2, m=pc
roms = 512 kbits
F=9, K=2, m=pc
roms = 523 kbits
F=7, K=2, m=pc
roms = 568 kbits
F=5, K=2, m=pc
roms = 1.39 Mbits
F=16, K=2, m=pc
roms = 8 Mbits
F=13, K=2, m=pc
roms = 8.01 Mbits
F=11, K=2, m=pc
roms = 8.05 Mbits
F=8, K=2, m=pc
roms = 12.8 Mbits
F=4, K=3, m=pc
roms = 220 bits
F=4, K=3, m=pc
roms = 484 bits
F=3, K=3, m=pls
roms = 720 bits
F=4, K=3, m=pc
roms = 1.66 kbits
F=8, K=3, m=pls
roms = 2.09 kbits
F=7, K=3, m=pc
roms = 3.3 kbits
F=5, K=3, m=pc
roms = 4.38 kbits
F=4, K=3, m=pc
roms = 13.8 kbits
F=13, K=3, m=pc
roms = 48.1 kbits
F=11, K=3, m=pc
roms = 48.3 kbits
F=9, K=3, m=pc
roms = 49.4 kbits
F=6, K=3, m=pc
roms = 81.8 kbits
F=16, K=3, m=pc
roms = 768 kbits
F=15, K=3, m=pc
roms = 768 kbits
F=13, K=3, m=pc
roms = 769 kbits
F=9, K=3, m=pc
roms = 802 kbits
F=5, K=4, m=pc
roms = 190 bits
F=4, K=4, m=pls
roms = 328 bits
F=4, K=4, m=pc
roms = 520 bits
F=4, K=4, m=pc
roms = 1.38 kbits
F=8, K=4, m=pc
roms = 2.1 kbits
F=8, K=4, m=pc
roms = 3.2 kbits
F=7, K=4, m=pc
roms = 3.47 kbits
F=5, K=4, m=pc
roms = 7 kbits
F=14, K=4, m=pc
roms = 48 kbits
F=12, K=4, m=pc
roms = 48.2 kbits
F=11, K=4, m=pc
roms = 48.5 kbits
F=8, K=4, m=pc
roms = 53.6 kbits
F=16, K=4, m=pc
roms = 768 kbits
F=16, K=4, m=pc
roms = 768 kbits
F=15, K=4, m=pc
roms = 768 kbits
F=12, K=4, m=pc
roms = 774 kbits
F=16, K=3, m=pc
roms = 12 Mbits
F=16, K=3, m=pc
roms = 12 Mbits
F=13, K=3, m=pc
roms = 12 Mbits
F=16, K=4, m=pc
roms = 12 Mbits
N= 8, W=16
N= 8, W=20
N= 8, W=32
N=12, W=10
N=12, W=16
N=12, W=20
N=12, W=32
N=16, W=10
N=16, W=16
N=16, W=20
N=16, W=32
N=20, W=10
N=20, W=16
N=20, W=20
N=20, W=32
N=24, W=10
N=24, W=16
N=24, W=20
N=24, W=32
Appendix C
Polynomial VHDL Example
This is an example of the VHDL polynomial code that is generated. To keep memory size down, the C parameter has been set
rather small (3).
The following configuration was used: ’N=20, W=16, method=pc, F=15, K=4’.
-- A PSAC sine implementation, using 2^3 polynomial of grade 3.
-- Each polynomial has 2^15 points
-- Input width: 20
-- Output width: 16
-- Used ROM: 8 x 44 bits
-- Used Mults: 3
-- File autogenerated from Matlab
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity my_psac_example is
port(
clk: in std_logic;
x : in UNSIGNED(19 downto 0);
a : out SIGNED(15 downto 0) := (others=>’0’));
end my_psac_example;
architecture my_psac_example_architecture of my_psac_example is
constant N : integer := 20;
-- number of phase bits.
constant W : integer := 16;
-- number of result width.
constant F : integer := 15;
-- number of fine bits.
constant C : integer := N-2-F; -- 3 coarse bits.
constant R1 : integer := 6;
-- ROM1 width
constant R2 : integer := 10;
-- ROM2 width
constant R3 : integer := 13;
-- ROM3 width
constant R4 : integer := 15;
-- ROM4 width
signal inv_res : std_logic_vector(0 to 4) := (others => ’0’); -- if res should be negated.
signal x2
: UNSIGNED(N-3 downto 0) := (others=>’0’);
-- x(N-3 downto 0) xor x(MSB-1)
signal xC
: UNSIGNED(C-1 downto 0) := (others=>’0’);
-- x(N-3 downto F) xor x(MSB-1)
signal xF
: UNSIGNED(F-1 downto 0) := (others => ’0’); -- x(F-1 downto 0) xor x(MSB-1)
signal ResQ1 : UNSIGNED(W-2 downto 0) := (others => ’0’); -- Result for quadrant 1
-- signals for iteration 1
signal xF_1
: UNSIGNED(F-1 downto 0) := (others => ’0’); -- 14..0
signal Res_1 : UNSIGNED(R1-1 downto 0) := (others => ’0’); -- 5..0
-- signals for iteration 2
signal xF_2
: UNSIGNED(F-1 downto 0) := (others => ’0’); -- 14..0
signal data2_1: UNSIGNED(R2-1 downto 0) := (others => ’0’); -- 9..0
signal Res_2 : UNSIGNED(R2-1 downto 0) := (others => ’0’); -- 9..0
-- signals for iteration 3
signal xF_3
: UNSIGNED(F-1 downto 0) := (others => ’0’); -- 14..0
signal data3_1: UNSIGNED(R3-1 downto 0) := (others => ’0’); -- 12..0
signal data3_2: UNSIGNED(R3-1 downto 0) := (others => ’0’); -- 12..0
signal Res_3 : UNSIGNED(R3-1 downto 0) := (others => ’0’); -- 12..0
-- signals for iteration 4
signal data4_1: UNSIGNED(R4-1 downto 0) := (others => ’0’); -- 14..0
signal data4_2: UNSIGNED(R4-1 downto 0) := (others => ’0’); -- 14..0
signal data4_3: UNSIGNED(R4-1 downto 0) := (others => ’0’); -- 14..0
signal Res_4 : UNSIGNED(R4-1 downto 0) := (others => ’0’); -- 14..0
-- *_K => signals is delayed K clock cycles from it’s x input.
-- dataM = the synchronous ROM output.
-- Res = (result of iter M=K) = dataM +- xF*(data(M-1) -+ xF*(...)).
58
59
component POLY_ROM
port( clk : in
addr : in
data1: out
data2: out
data3: out
data4: out
end component;
is
std_logic; --synchronous ROM
unsigned(C-1 downto 0); -- 2..0
unsigned(R1-1 downto 0); -- 5..0
unsigned(R2-1 downto 0); -- 9..0
unsigned(R3-1 downto 0); -- 12..0
unsigned(R4-1 downto 0)); -- 14..0
-- "+"/"-" for adding/subtracting a carry to/from a vector
function "-"(l : unsigned; r : std_logic) return unsigned is
begin
return unsigned(ieee.std_logic_unsigned."-"(std_logic_vector(l), r));
end "-";
function "+"(l : unsigned; r : std_logic) return unsigned is
begin
return unsigned(ieee.std_logic_unsigned."+"(std_logic_vector(l), r));
end "+";
begin
-- Initial phase adjustment to quadrant:
x2 <= x(N-3 downto 0) when x(N-2) = ’0’ else
not x(N-3 downto 0);
xC <= x2(N-3 downto F); -- 17..15
xF <= x2(F-1 downto 0); -- 14..0
inv_res(0) <= x(N-1);
rom : component POLY_ROM
port map( clk => clk, -- output is delayed one cycle
addr => xC,
data1 => Res_1,
data2 => data2_1,
data3 => data3_1,
data4 => data4_1);
process(clk)
variable tmp : UNSIGNED(W + F - 1 downto 0); -- 30..0
begin
if rising_edge(clk) then
------ Pipeline stage 1 ------xF_1 <= xF;
------ Pipeline stage 2 ------xF_2 <= xF_1;
tmp := (others => ’0’);
tmp(R1+F-1 downto 0) := Res_1 * xF_1; -- tmp(20..0)
Res_2 <= data2_1 + tmp(R2+F-1 downto F) + tmp(F-1);
data3_2 <= data3_1;
data4_2 <= data4_1;
------ Pipeline stage 3 ------xF_3 <= xF_2;
tmp := (others => ’0’);
tmp(R2+F-1 downto 0) := Res_2 * xF_2; -- tmp(24..0)
Res_3 <= data3_2 - tmp(R3+F-1 downto F) - tmp(F-1);
data4_3 <= data4_2;
------ Pipeline stage 4 ------tmp := (others => ’0’);
tmp(R3+F-1 downto 0) := Res_3 * xF_3; -- tmp(27..0)
Res_4 <= data4_3 + tmp(R4+F-1 downto F) + tmp(F-1);
inv_res(1 to inv_res’high) <= inv_res(0 to inv_res’high-1);
end if;
end process;
ResQ1 <= Res_4;
-- final result adjustment to quadrant:
a <= signed(’0’ & ResQ1) when inv_res(inv_res’high) = ’0’ else
-signed(’0’ & ResQ1);
end my_psac_example_architecture;
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity POLY_ROM is
60
port (
clk : in
addr : in
data1: out
data2: out
data3: out
data4: out
end POLY_ROM;
Polynomial VHDL Example
std_logic; --synchronous ROM
UNSIGNED(2 downto 0);
UNSIGNED(5 downto 0) := (others => ’0’);
UNSIGNED(9 downto 0) := (others => ’0’);
UNSIGNED(12 downto 0) := (others => ’0’);
UNSIGNED(14 downto 0) := (others => ’0’));
architecture POLY_ROM_architecture of POLY_ROM is
type romt is array(0 to 7) of UNSIGNED(43 downto 0);
signal rom_data : romt;
signal tmp : UNSIGNED(43 downto 0);
begin
-- Wait with ROM data until all other is done
process(clk) begin
if clk’event and clk = ’1’ then
data1 <= tmp(43 downto 38);
data2 <= tmp(37 downto 28);
data3 <= tmp(27 downto 15);
data4 <= tmp(14 downto 0);
end if;
end process;
tmp <= rom_data(to_integer(addr));
-- Here comes the ROM data
rom_data <= (b"101001_0000000000_1100100100010_000000000000000",
b"101000_0001111100_1100010100110_001100011111001",
b"100100_0011110011_1011100111000_011000011111011",
b"100000_0101100000_1010011100110_100011100011100",
b"011010_0111000001_1000111000110_101101010000010",
b"010011_1000001111_0110111110111_110101001101101",
b"001100_1001001010_0100110011111_111011001000001",
b"000100_1001101110_0010011101000_111110110001001");
end POLY_ROM_architecture;
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement