AN ANALYSIS OF FREQUENCY RECOGNITION ALGORITHMS AND IMPLEMENTATION IN REALTIME By

AN ANALYSIS OF FREQUENCY RECOGNITION ALGORITHMS AND IMPLEMENTATION IN REALTIME By
AN ANALYSIS OF FREQUENCY RECOGNITION
ALGORITHMS AND IMPLEMENTATION IN REALTIME
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
Master of Technology
In
Telematics and Signal Processing
By
CHINTHA.VAMSHI
Roll no: 20607005
Department of Electronics and Communication Engineering
National Institute of Technology
Rourkela
2007-2008
i
AN ANALYSIS OF FREQUENCY RECOGNITION
ALGORITHMS AND IMPLEMENTATION IN REALTIME
A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF
Master of Technology
In
Telematics and Signal Processing
By
CHINTHA.VAMSHI
Roll no: 20607005
Under the Guidance of
Prof. G.S. RATH
Department of Electronics and Communication Engineering
National Institute of Technology
Rourkela
2007-2008
ii
National Institute Of Technology
Rourkela
CERTIFICATE
This is to certify that the thesis entitled, “An Analysis of Frequency Recognition
Algorithms and Implementation in Real-Time” submitted by Ch.Vamshi in partial
fulfillment of the requirements for the award of Master of Technology Degree in Electronics
& communication Engineering with specialization in “Telematics and Signal Processing”
at the National Institute of Technology, Rourkela (Deemed University) is an authentic work
carried out by him under my supervision and guidance.
To the best of my knowledge, the matter embodied in the thesis has not been submitted to any
other University / Institute for the award of any Degree or Diploma.
Date: -05-2008.
Prof. G.S. Rath
Dept. of Electronics & Communication Engg.
National Institute of Technology
Rourkela-769008
iii
ACKNOWLEDGEMENTS
This project is by far the most significant accomplishment in my life and it would be
impossible without people who supported me and believed in me.
I would like to extend my gratitude and my sincere thanks to my honorable, esteemed
supervisor Prof. K.K.Mahapatra, Department of Electronics and Communication
Engineering. He is not only a great lecturer with deep vision but also and most importantly a
kind person. I sincerely thank for his exemplary guidance and encouragement. His trust and
support inspired me in the most important moments of making right decisions and I am glad
to work with him.
I want to thank all my teachers Prof. G.S. Rath, Prof. G.Panda, Prof. S.Mehar,
Prof. S.K. Patra and for providing a solid background for my studies and research thereafter.
They have been great sources of inspiration to me and I thank them from the bottom of my
heart.
I would like to thank all my friends and especially my classmates for all the thoughtful
and mind stimulating discussions we had, which prompted us to think beyond the obvious.
I’ve enjoyed their companionship so much during my stay at NIT, Rourkela.
I would like to thank all those who made my stay in Rourkela an unforgettable and
rewarding experience.
Last but not least I would like to thank my parents, who taught me the value of hard
work by their own example. They rendered me enormous support during the whole tenure of
my stay in NIT Rourkela.
CHINTHA VAMSHI
iv
CONTENTS:
Abstract
vii
List of Figures
viii
List of Tables
x
1. INTRODUCTION
1
1.1 Background
2
1.1.1 Dual-Tone Multi-Frequency (DTMF) Systems
2
1.1.2 Musical Instrument Digital Interface (MIDI) in Musical Systems
4
1.2 The Generic Musical Instrument System (GMIS)
6
2. Fourier transform-based Frequency Recognition Algorithms
8
2.1 The Fourier Series
8
2.2 The Fourier Transform
9
2.3 The Discrete Fourier Transform (DFT)
9
2.4 The Fast Fourier Transform (FFT)
10
2.4.1 Decimation of the DFT in Time (DIT)
2.4.2 Bit Reversal
2.4.3 The Butterfly Network
10
11
12
2.5 The Non-Uniform Discrete Fourier Transform (NDFT)
13
2.6 The Goertzel Algorithm
14
3. Implementation
17
3.1 The Digital Signal Processor (TI TMS320C6713 DSP)
17
3.2 The TI TMS320C6713 DSP Board
18
3.2.1 Chip Support Library (CSL)
3.2.2 The Code Composer Studio (CCS)
3.3 Usage of the Timer
18
19
19
4. Implementation of the Frequency Recognition Algorithms
21
4.1 Methodology
21
4.2 Principle of the Frequency Recognition Algorithms
22
v
4.2.1 The Discrete Fourier Transform (DFT)
4.2.1.1 Simulation in Matlab
4.2.1.2 Implementation of the DFT in C
4.2.2 The Fast Fourier Transform (FFT)
4.2.2.1 Simulation in MatLab
4.2.2.2 Implementation of the FFT in C
22
22
23
24
24
24
4.2.3 The Non-Uniform Discrete Fourier Transform (NDFT)
25
4.2.4 The Goertzel Algorithm
27
4.3 Comparison of schedulable Buffer Sizes
5. Measurements
28
29
5.1 Analysis of Input Signals
29
5.1.1 Sinusoidal Inputs
5.1.2 Inputs of Musical Instruments
30
31
5.2 Prediction of the Algorithms’ Frequency Recognition Capability
32
5.3 Evaluation Criteria
33
5.3.1 Sampling Rate
5.3.2 Spectral Resolution
5.3.4 Time to settle
5.3.5 Computational Costs of Investigated Algorithms
5.4 Performance Metrics
33
34
34
35
36
5.4.1 Latency
36
5.4.2 Speedup
5.4.3 Accuracy
37
37
6. Results
38
6.1 Frequency Recognition Algorithms analyzing simple Sinusoids
6.1.1 Latency
6.1.2 Speedup
6.1.3 Accuracy
6.1.4 Comparison of the Investigated Metrics
6.2 Frequency Recognition Algorithms analyzing Musical Notes
vi
38
38
40
42
46
46
6.2.1 Note C6 (1046.5Hz) for Piano, Violin, Flute and Trumpet
6.2.2 Note C5 (523.25Hz) for Piano, Violin, Flute and Trumpet
6.2.3 Note C4 (261.63Hz) for Piano, Violin, Flute and Trumpet
6.2.4 Comparison of Frequency Recognition Capability for Musical Notes
46
49
52
54
7. Conclusions and Future Work
56
REFERENCES
57
vii
ABSTRACT
Frequency recognition is an important task in many engineering fields, such as audio signal
processing and telecommunications engineering. There are numerous applications where
frequency recognition is absolutely necessary like in Dual-Tone Multi-Frequency (DTMF)
detection or the recognition of the carrier frequency of a Global Positioning System (GPS)
signal. Furthermore, frequency recognition has entered many other engineering disciplines
such as sonar and radar technology, spectral analysis of astronomic data, seismography,
acoustics and consumer electronics.
Listening to electronic music and playing electronic musical instruments is becoming more
and more popular, not only among young musicians. This dissertation details background
information and a preliminary analysis of a musical system, the Generic Musical Instrument
System (GMIS), which allows composers to experiment with electronic instruments without
actually, learning how to play them.
This dissertation gives background information about frequency recognition algorithms
implemented in real time. It analyses state-of-the-art techniques, such as Dual- Tone MultipleFrequency (DTMF) implementations and MIDI-based musical systems, in order to work out
their similarities. The key idea is to adapt well-proven frequency recognition algorithms of
DTMF systems, which are successfully and widely used in telephony. The investigations will
show to what extent these principles and algorithms can be applied to a musical system like
the GMIS.
This dissertation presents results of investigations into frequency recognition algorithms
implemented on a Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP)
core, in order to estimate the frequency of an audio signal in real time. The algorithms are
evaluated using selected criteria in terms of speed and accuracy with accomplishing over 9600
single measurements. The evaluations are made with simple sinusoids and musical notes
played by instruments as input signals which allows a solid decision, which of these
frequency recognition algorithms is appropriate for audio signal processing and for the
constraints of the GMIS in real time.
viii
List of Figures:
Figure 1: DTMF frequencies according to ITU-T Q.24
(2)
Figure 2: The Generic Musical Instrument System (GMIS)
(6)
Figure 3: A basic butterfly for a 2-point DFT
(12)
Figure 4: Bit reversal of the input data
(13)
Figure 5: Filter structure of the Goertzel algorithm
(15)
Figure 6: Memory Map for the TMS320C6713 DSP Starter KIT
(17)
Figure 7: Pseudo code for applying time measurements
(20)
Figure 8: Frequency recognition in principle
(23)
Figure 9: Arbitrarily chosen frequencies for the NDFT
(26)
Figure 10: Latency vs. Buffer size for a 27.5Hz input
(39)
Figure 11: Latency vs. Buffer size for a 440Hz input
(39)
Figure 12: Latency vs. Buffer size for a 440Hz input
(40)
Figure 13: Speedup vs. Buffer size for a 27.5Hz input
(41)
Figure 14: Speedup vs. Buffer size for a 440Hz input
(41)
Figure 15: Speedup vs. Buffer size for a 4186.01Hz input
(42)
Figure 16: Estimated frequency vs. Buffer size for a 27.5Hz input
(43)
Figure 17: Absolute deviation vs. Buffer size for a 27.5Hz input
(43)
Figure 18: Estimated frequency vs. Buffer size for a 440Hz input
(44)
Figure 19: Absolute deviation vs. Buffer size for a 440Hz input
(44)
Figure 20: Estimated frequency vs. Buffer size for a 4186.01Hz input
(45)
Figure 21: Absolute deviation vs. Buffer size for a 4186.01Hz input
(45)
Figure 22: Estimated frequency vs. time for a 1046.5Hz input (piano)
(47)
Figure 23: Estimated frequency vs. time for a 1046.5Hz input (violin)
(47)
Figure 24: Estimated frequency vs. time for a 1046.5Hz input (trumpet)
(48)
ix
Figure 25: Estimated frequency vs. time for a 1046.5Hz input (flute)
(48)
Figure 26: Estimated frequency vs. time for a 523.25Hz input (piano)
(49)
Figure 27: Estimated frequency vs. time for a 523.25Hz input (violin)
(50)
Figure 28: Estimated frequency vs. time for a 523.25Hz input (flute)
(51)
Figure 29: Estimated frequency vs. time for a 523.25Hz input (trumpet)
(51)
Figure 30: Estimated frequency vs. time for a 261.63Hz input (violin)
(52)
Figure 31: Estimated frequency vs. time for a 261.63Hz input (piano)
(53)
Figure 32: Estimated frequency vs. time for a 261.63Hz input (flute)
(53)
Figure 33: Estimated frequency vs. time for a 261.63Hz input (trumpet)
(54)
x
List of Tables:
Table 1: The FFT algorithm (Decimation in Time)
(11)
Table 2: Filter coefficients for the Goertzel algorithm
(15)
Table 3: Frequencies and filter coefficients for 88 MIDI notes
(27)
Table 4: Frequencies and filter coefficients for 88 MIDI notes
(32)
Table 5: Probable frequency recognition capability of instruments
(33)
Table 6: Fourier transform-based algorithmic complexity
(35)
xi
1. INTRODUCTION
Frequency recognition is an important task, not only in many scientific disciplines such
as astronomy, physics or engineering but also in everyday life like in telephony, medical
applications or consumer electronics. Frequency recognition is used in many
applications,
for example spectrum analyzers and seismographs to analyze earth quakes
which make life both more convenient and secure.
In consumer electronics, musical systems have found a broad distribution and
have a remarkable market potential. Playing and listening to electronic music is
becoming more and more important as a leisure activity, for young and old alike.
Therefore, the design of a Generic Musical Instrument System (GMIS), allows
musicians the chance to experiment with other musical instrument sounds
without actually having to learn them.
One very widely spread application of frequency recognition is used in telephony
and is called Dual-Tone Multiple-Frequency (DTMF). Its advantageous, wellproven algorithms could be adapted to musical systems which show similarities
to DTMF systems. These comparable properties of DTMF systems have to be
analyzed and their usability has to be evaluated with respect to musical systems
such as the GMIS.
Therefore,
four
different
Fourier
transform-based
frequency
recognition
algorithms are subject to an analysis: the Discrete Fourier Transform (DFT) taken
as a baseline to evaluate all algorithms, the Fast Fourier Transform (FFT) which
is the workhorse in many engineering applications, the Goertzel algorithm and
the Non-Uniform Discrete Fourier Transform (NDFT) which are both successfully
used in DTMF systems.
These frequency recognition algorithms are implemented on a Texas Instruments
(TI) TMS320C6713 digital signal processor (DSP) core in order to estimate the
frequency of the audio signal in real time. These frequency recognition
algorithms are evaluated by selected criteria in terms of speed and accuracy by
using both simple sinusoids and musical notes played by instruments as input
1
signals. This analysis allows a solid conclusion to be drawn regarding the
application of the frequency recognition algorithms in musical systems such as
the GMIS.
1.1 Background
1.1.1 Dual-Tone Multi-Frequency (DTMF) Systems:
Dual-Tone Multi-Frequency (DTMF) is used for remote mono-directional user- to
machine communication in telephony, service selection in Intelligent Networks
and serinteractive phone services such as telephone banking to obtain a desired
service. The addressed machine is controlled by a unique mixture of two
standardized sinusoids.
F in Hz
High group
Low group
1209
1336
1477
1633
697
1
2
abc
3
def
A
770
4
ghi
5
jkl
6
mno
B
852
7
pqrs
8
tuv
9
wxyz
C
941
*
0
#
D
Figure 1: DTMF frequencies according to ITU-T Q.24
2
For each column and row, one sinusoid of a standardized frequency is allocated.
When a button is pressed, the mixture of two of these frequencies is sent to the
exchange. Therefore, in order for the exchange to determine the key being
pressed, accurate frequency recognition is required to separate the two tones.
The accuracy of this recognition is very high in order to comply with the ITU-T
specifications.
DTMF systems are well-proven and have several advantages which make them
convenient for signal processing and frequency recognition. For instance, DTMF
systems operate on a low sampling frequency of f s = 8000 Hz . Consequently,
compared to other systems with higher sampling rates, the number of sampling
points per second of a signal is smaller than sampled.
Another feature that DTMF systems have that can be used to simplify the
processing is the use of simple pure sinusoidal signals only, because these
signals are meant to control machines and so these signals have to be unique.
The frequency recognition algorithms used to determine these signals, simply
have to perform a simple peak detection after having estimated the spectrum and
do not have to take the shape of the spectrum into account.
One very important and requirement for the design of frequency recognition
algorithms in DTMF systems is the fact that the tolerance of the frequency
recognition is only 1.5%. This might not sound much, but actually regarding the
lowest frequency of 697Hz, the absolute tolerance is 10.46Hz. This is,
anticipatory, a relatively big tolerance, compared to other systems like musical
systems as described in the following system.
Summing up, DTMF systems benefits from the following characteristics:
*ÆLow sampling rate of f s = 8 KHz .
*Æ8 standardized simple sinusoidal signals, known in advance.
*ÆLimited bandwidth: 697Hz - 1633Hz
*ÆMinimum absolute tolerance: 10.46Hz (1.5%).
3
These advantages are taken into account for DTMF detectors and are used to
reduce computational requirements and increase accuracy when developing
frequency recognition algorithms for the purpose of DTMF detection.
1.1.2 Musical Instrument Digital Interface (MIDI) in Musical Systems:
The Musical Instrument Digital Interface (MIDI) version 1.0 was defined by a
consortium of musical instrument manufacturers, The International MIDI
Association in 1983. The main purpose was to set up a standard interface in
order to make electronic musical instruments of different manufacturers
compatible among each other. The communication of these musical devices
increased the sales quantities but what is more, this standard also caused a
boom in the composition, development and recording of electronic music among
musicians who cannot afford a professional recording studio.
MIDI notes and their corresponding frequencies are the basis for the
following investigations. An electronic keyboard has 88 keys, starting with MIDI
note #21 until MIDI note #108, and has its origin in the theory of the equal
tempered piano. Based on note A4 (440Hz), each neighbored note is one 12th
part of an octave distant. This distance is also called tempered semitone.
Between two octaves, the notes’ frequencies double. Taking these principles into
account, the corresponding frequency of each MIDI note is
nMIDI −69
12
f (nMIDI ) = 440.2
Hz
4
for 21 ≤ nMIDI ≤ 108
(1)
Where nMIDI is the MIDI note number.
In terms of digital signal processing, the advantages of the Musical Instrument
Digital Interface (MIDI) used in musical systems (in particular of a keyboard), are
worked out in this section.
One major important issue is the limited bandwidth with a minimum frequency of
27.5Hz and a maximum frequency of 4186.01Hz. Frequency recognition
algorithms have to be designed with respect to this bandwidth which limitation
reduces their complexity immensely. Also, since all 88 frequencies of interest are
standardized, the algorithms can be developed by referring to these expected
frequencies.
There are some considerable disadvantages of musical systems, however, which
have to be faced when developing frequency recognition algorithms. Mostly,
musical systems use a high sampling frequency of f s = 44100 Hz to meet the
Nyquist-Shannon theorem. The Nyquist- Shannon states that the sampling rate
f s have to be greater than the twice highest frequency in the signal, in order to
be able reconstruct this signal correctly. If this condition is not fulfilled, all the
frequencies above the half the sampling rate, i.e. the Nyquist frequency, will
appear as lower frequencies in the reconstructed signal which is
called “aliasing”. Since the audible range of human is within 0Hz and 20000Hz
and is therefore below the Nyquist frequency of f s / 2 = 22050 Hz , the NyquistShannon sampling theorem is fulfilled using a sampling rate f s = 44100 Hz .
Recapitulating, the properties of musical systems as analyzed above are listed
as follows:
*Æ Sampling rate of f s = 44.1kHz
* Æ88 standardized MIDI frequencies
* ÆLimited bandwidth: 27.5Hz - 4189.01Hz
* ÆMinimum absolute tolerance: 0.82Hz (2.81%).
5
1.2 The Generic Musical Instrument System (GMIS)
Among young musicians, playing electronic instruments has become more and
more popular. At the same time, the attraction towards learning classical
instruments (for example flute, saxophone, etc) has decreased despite their
importance in music composition. The design of a Generic Musical Instrument
System (GMIS), therefore, allows musicians the opportunity to experiment with
other musical instrument sounds without actually having to learn how to play
them.
The GMIS is a system which can make any instrument sound like any other
instrument. In order to attain maximum benefits, the system should operate in
real time. The advantage of real time behavior is the musician’s chance to listen
to the result immediately and actually compose by ear.
6
x(t )
Low Pass
Filter
Sampling
(Sample &
Hold)
ADC
x(n)
Digital Signal
Processing
DAC
y(n)
Low Pass
Filter
X(n)
Frequency to
MIDI
Converter
MIDI − f (X(n))
T h e G en eric M u sical In stru m en tal S ystem ( G M IS )
Figure 2: The Generic Musical Instrument System (GMIS)
The GMIS consists of a common Digital Signal Processor (DSP) system which is
used to recognize the frequencies of the incoming audio data’s input in real time,
as shown in the figure2. The time and amplitude continuous audio data passes
an anti-aliasing band-limiting low pass filter, before a sample and hold unit
samples the signals in time. There is still continuous amplitude whose infinite
values have to be quantized by the analogue to digital converter (ADC). The time
and value discretized data can now be processed by the digital signal processor
(DSP). After the processing the digital to analogue converter (DAC) converts the
digital data into an analogue form. Before the audio data is output, another low
pass filter smoothes the signal by removing the high frequency components
which are an undesired by-product of the converting process.
The GMIS consists of a common Digital Signal Processor (DSP) system which is
used to recognize the frequencies of the incoming audio data’s input in real time.
7
y (t )
The time and amplitude continuous audio data passes an anti-aliasing bandlimiting low pass filter, before a sample and hold unit samples the signals in time.
There is still continuous amplitude whose infinite values have to be quantized by
the analogue to digital converter (ADC). The time and value discretised data can
now be processed by the digital signal processor (DSP). After the processing the
digital to analogue converter (DAC) converts the digital data into an analogue
form. Before the audio data is output, another low pass filter smoothes the signal
by removing the high frequency components which are an undesired by-product
of the converting process.
This dissertation analyses the frequency recognition part only and does not carry
out
further investigations on the frequency to MIDI conversion.
2. Fourier transform-based Frequency Recognition Algorithms
In 1807, Jean Baptiste Joseph Fourier (1798 - 1830) developed the theory about
the Fourier series but was rejected by his supervisors Lagrange, Laplace and
Legrendre. Finally, in 1822, he published his work in his book “Théorie analytique
de la chaleur”. More or less as a side product of this work, he derived the socalled Fourier Series, where he proved that any periodic signal consists of an
infinite number of sinusoids and a constant. His theories have revolutionized
science and are indispensable in many technical applications.
In this section, after a brief definition of the Fourier Series and the Fourier
Transform, four Fourier transform-based algorithms are going to be introduced.
These algorithms are the Discrete Fourier Transform (DFT) and its faster
8
implementation Fast Fourier Transform (FFT), the Goertzel algorithm, and the
Non-Uniform Discrete Fourier Transform (NDFT).
2.1 The Fourier Series:
With the Fourier Series, it is possible to create any periodic signal, x(t ) , in the
time domain from the sum of an infinite number of sinusoids, i.e. sine and cosine
functions, which are integer multiples of the fundamental frequency, f 0 . The
Fourier Series is defined as:
a0 ∞
x (t ) =
+ ∑ ( an .cos( n.w0 .t ) + bn .sin( n.w0 .t ))
2 n =1
n = 0,1,.......∞
(2)
Where a0 is the amplitude of the direct current component, n is the current
number of the sinusoidal component, an and bn the amplitude of the nth sine and
nth cosine function respectively, t the representative of the time domain, w0 the
angular frequency, with , w0 = 2.π . f 0 where f 0 is the fundamental frequency.
+
a
n
=
2
T0
∫
2
T0
x ( t ) . c o s ( n . w 0 .t ) . d t
(3)
x ( t ) . s i n ( n . w 0 .t ) . d t
(4)
T
− 0
2
+
bn =
T0
2
T0
2
∫
T
− 0
2
Where T0 is the period of fundamental frequency, f 0 , of the signal, with
w0 = 2.π . f 0 .
2.2 The Fourier Transform:
The Fourier Series is a special case of the Fourier integral and is valid for
periodic signals only. For any non-periodic signal, the Fourier Series cannot be
applied anymore and the Fourier Transform has to be taken into account and is
defined as:
9
+∞
X ( f ) =
∫
x ( t ) .e −
j . w 0 .t
dt
(5)
−∞
where t and f stand for the time and frequency domain respectively; x(t)
represents the continuous time signal and X(f) its spectrum in the frequency
domain. The indicator for the imaginary part j = −1 and w0 is the angular
frequency and is defined for the Fourier Transform as:
w 0 = 2 .π . f
(6)
The Fourier Transform can finally be written as:
+∞
X(f ) =
∫ x(t ).e
− j .2.π . f .t
dt
(7)
−∞
Since the Fourier Transform is valid for infinite continuous signals only and a
numerical implementation is just possible with finite discrete signals, due to
limited memory and computation time, the Discrete Fourier Transform (DFT) can
be derived.
2.3 The Discrete Fourier Transform (DFT):
The easiest and most direct way to obtain the discrete spectrum of a signal is the
Discrete Fourier Transform (DFT). It is also the basis for the four Fourier
transform-based frequency recognition algorithms described in section 5.4.2.2.
Because the DFT is the slowest algorithm, it is also taken as a baseline to
evaluate all the other investigated algorithms.
n = 0,1,......, N − 1
t → n.T
(8)
f
k = 0,1,....N − 1
f → k. s
(9)
N
Where n represents the sample index for discrete time domain signal values and
k the discrete spectral index. The sampling frequency f s and the accordant
period Ts are linked via the relation
Ts =
1
fs
(10)
10
The DFT is finally formulated more convenient as
N −1
X [k ] = ∑ x[n].e
− j.
2π
.k .n
N
n =0
k = 0,1,....N − 1
(11)
Equation (11) is the definition of the Discrete Fourier Transform (DFT), with x[n]
as the
sampled time signal and X[k] as the representative of the discrete frequency
spectrum.
2.4 The Fast Fourier Transform (FFT):
The Fast Fourier Transform (FFT) is not an independent time to frequency
domain transform but an effective recursive algorithm to calculate the DFT and
was developed by Cooley and Tukey in 1965. It has revolutionized digital signal
processing and is the basis for many real time applications whenever a transform
of signals from time to frequency domain is required.
2.4.1 Decimation of the DFT in Time (DIT):
The FFT takes advantage of reducing redundant calculation of DFT coefficients,
also called twiddle factors. A prerequisite, therefore, is that the number of
samples or the buffer size, N, is a power of two. If this condition is not given,
zero-padding (in other words, adding zeros at the end of the input buffer) has to
be applied. Then, according to the principle of divide and conquer, the input
samples have to be re-ordered N times until N/2 2-point DFTs can be calculated.
This process is also known as bit reversal. Starting with the calculation of the 2point DFTs, this procedure follows log 2 ( N ) times with N / 4 4-point DFTs, then
with N / 8
8-point DFTs until the (N/N) N-point DFT is reached. This
computational part is also known as butterfly computation.
The whole algorithm as described above and is summed up in table 1.
Table 1: The FFT algorithm (Decimation in Time)
11
Steps#
Action
Comment
0
Zeros padding (add zeros to
the buffer), if the number of samples N is
not to the power of two.
The FFT requires N = 2
(with m as an integer number)
Input samples.
1
Bit-reverse input samples until
N/2 input sample pairs for N/2
2-point DFT are reached.
This stage requires N steps
2
Apply butterfly computations
recursively
This stage requires log 2 ( N )
steps.
m
Following the principle of divide and conquer, the DFT can be split up into two half size
N-point DFT, the even and the odd part as shown in equation (12) and (13) respectively.
X[k] = X[k]even + X[k]odd
X [k ] =
N / 2−1
∑
n =0
k = 0,1,........, N − 1
(12)
N / 2−1
x[2.n].WNn./k2 +WNk . ∑ x[2.n + 1].WNn./k2
k = 0,1,........, N − 1
(13)
k = 0,1,........, N − 1
(14)
n =0
Finally, it can be concluded that
X [ k ] = X 11[ k ] + WNk X 12 [ k ]
This divide and conquer process continues until a basis of N / 2 2-point DFTs is
reached. The principle remains always the same: Two DFTs of equal length have
to be calculated at the same time, whereas the DFT with the odd indices needs
to be multiplied with WNk = e
− j.
2π
.k
N
. The advantage of having twiddle factors is that
they have to be calculated times log 2 ( N ) only, instead of N 2
common DFT.
2.4.2 Bit Reversal:
12
as it is with a
As figured out in the previous section, the key to the success of the FFT is the use of the
butterfly network, which takes log 2 ( N ) stages. A preliminary for using the butterfly
network is the re-ordering of the input data in the time domain. This preparation of the
data can be achieved by a bit reversal of the samples’ physical addresses as shown in
figure 3. This step of the algorithm takes N stages.
2.4.3 The Butterfly Network:
The bit reversal is a preparation of the input data for the butterfly network which
is a recursive application of N DFTs. Starting with N / 2 2-point DFTs, which are
the basis for N / 4 4-point DFTs, the recursive computation will be applied until
the ( N / N ) N-point DFT is reached. The whole process of applying the butterfly
network takes log 2 ( N ) stages.
In figure 3, a basic butterfly for a 2-point DFT is shown. Taking the bit reversed
input data as a basis, the twiddle factors are multiplied with the input sample with
the odd index, whereas the odd indexed spectral component is additionally
multiplied by -1.
Bit-reversed
input
buffer
Spectral output
buffer
x[0]
X[0]
x[1]
X[1]
W2k
−1
Figure 3: A basic butterfly for a 2-point DFT
13
Original input
buffer
Original
address of
buffer elements
Bit-reversed
address
of buffer elements
Bit-reversed
input
buffer
x[0]
000
000
x[0]
x[1]
001
100
x[4]
x[2]
010
010
x[2]
x[3]
011
110
x[6]
x[4]
100
001
x[1]
x[5]
101
101
x[5]
x[6]
110
011
x[3]
x[7]
111
111
x[7]
Figure 4: Bit reversal of the input data
2.5 The Non-Uniform Discrete Fourier Transform (NDFT):
The disadvantage of the DFT and FFT is the use of an evenly spaced frequency
range which leads into a transform of the whole frequency spectrum for the
sampling rate f s
as a constraint. With the Non-Uniform Fourier Transform
(NDFT) it is possible, to analyze arbitrary frequency ranges with irregular
intervals. Therefore, an increase of accuracy is possible by the application of a
well-conditioned frequency vector.
14
Considering equation (11) of the DFT and taking the equidistant sampled
frequency domain in equation (9) into account with
fk k
=
fs N
Where
k = 0,1,........, N − 1
(14)
f s is the sampling frequency, N the number of samples and
f (k ) = f k = [ f 0 , f1 ,...., f N −1 ] the arbitrary chosen frequency at k, the DFT can be
rewritten as a NDFT:
N −1
X [ k ] = ∑ x[ n ].e
−j
2π
. f k .n
fs
k = 0,1,........, N − 1
n=0
(15)
The NDFT has still a complexity of N 2 but due to the vector of arbitrarily chosen
frequencies, it is more accurate within the desired range. Everything outside that
range will be of a lesser accuracy but as this range is not required and can be
ignored. The vector itself holds the relevant frequencies of interest with wellconditioned arbitrarily chosen frequencies in between these frequency points.
The interpretation of the NDFT’s results is raised by assigning the discrete non
equidistant spectral index value k directly to the arbitrarily chosen frequency
vector f k (16) after having estimated the spectrum.
f ( k ) = f k = [ f 0 , f1 ,...., f N −1 ]
k = 0,1,........, N − 1
(16)
2.6 The Goertzel Algorithm:
Another effective derivative of the DFT is the Goertzel algorithm which found its
earliest formulation in 1958. It is a widely-used algorithm used for DTMF
applications. Different from DFT and FFT, the Goertzel algorithm does not regard
the whole frequency spectrum. These advantages can be used in the adaptation
of the Goertzel algorithm for musical systems, since the frequency range is also
known from 27.5Hz to 4186.01Hz as well as the number of expected frequencies
15
of interest which is 88. Therefore, it is only necessary to perform an analysis over
this range hence fewer points are required for the computation.
The Goertzel algorithm is a filter bench consisting of recursive second order
Infinite Impulse Response (IIR) filters. An example of these filters is depicted in
figure 6. Its system function can be deduced from its structure and is stated in
equation (17).
b0 .z −0 + b1.z −1
H ( z) =
1 + a1.z −1 + a2 .z −2
(17)
qk [n]
x[n]
+
b0
+
z −1
+
a1
b1
z −1
a2
Figure 5: Filter structure of the Goertzel algorithm
Table 2: Filter coefficients for the Goertzel algorithm
16
yk [ n ]
Feed forward section
Recursive section
a1 = 2.cos(
b0 = 1
b1 = −e
−j
2.π
.k
N
2π
.k )
N
a2 = 1
Inserting these coefficients into the system function and putting the value
f
of k = k .N , the system function can be rewritten as:
fs
1 − e − j .wk .z −1
H (k , z ) =
1 − 2.cos( wk ).z −1 + z −2
k = 0,1,.....88
(18)
The recursively calculated output signal finally is:
y k [ k ] = a k .q k [ n − 1] − q k [ n − 2] + x[ n ] − e − j . wk .q k [ n − 1]
17
k = 0,1,.....88 (19)
3. Implementation
3.1 The Digital Signal Processor (TI TMS320C6713 DSP):
The Texas Instruments (TI) TMS320C6713 digital signal processor (DSP) is a 32
Bit floating point DSP of the C6000 series and runs at a frequency of 225MHz
and is optimized for audio applications. The DSP core contains two exclusively
fixed point Arithmetic-Logic Units (ALU), four both fixed and floating point ALU
and two both fixed and floating point multipliers. It can perform both in single and
double precision.
As in many commercially used processors, the TMS320C6713 has an 8KB
first level and a second level cache of 256KB as internal memory. The reason for
this unbalanced distribution of internal memory is that level 1 cache is on the
other hand fast and therefore contains both program and data cache, but is more
expensive compared to the level 2 cache on the other hand. Additionally, the
TMS320C6713 also has access to peripheral 16MB external Synchronous
Dynamic Random Access Memory (SDRAM) which can directly be accessed.
18
Address
C67X Family
Memory Type
6713 DSK
0 X 00000000
Internal Memory
Internal
Memory
0 X 00030000
Reserved space
Or
Peripheral Regs
Reserved
Or
Peripheral
0 X 80000000
EMIF CE0
SDRAM
0 X 90000000
EMIF CE1
Flash
0 XA0000000
EMIF CE2
CPLD
0 XB 0000000
EMIF CE3
Daughter
card
0 X 90080000
Figure 6: Memory Map for the TMS320C6713 DSP Starter KIT
The user of the processor can choose between the two orders little and big
endian. That means, for the little endian mode (the default mode), the least
significant byte is stored at the lowest address, literally, little end first. In contrast
to this, in the big endian mode, the most significant byte is stored at the lowest
address, meaning, big end first.
The TMS320C6713 also makes use of an Enhanced Direct Memory Access
(EDMA) Controller. This technique enables the acquisition of audio data directly
from the stereo audio codec (AIC 23). The EDMA can also be combined with the
two Multichannel Bidirectional Serial Ports (McBSP)
Finally, the TMS320C6713 is provided with an optimized C/ C++ compiler for the
processor’s architecture. Especially the Multiply-Accumulate (MAC) command is
supported and the uses of this multiply and add combination (also known as sum
of products) is recommended.
3.2 The TI TMS320C6713 DSP Board
19
The target hardware used for the analysis of frequency recognition algorithms is
the TMS320C6713 DSP Starter Kit, developed by Spectrum Digital and Texas
Instrument respectively, is referred as DSP board in the following. The reason
why not a Personal Computer (PC) (which nowadays has enough computational
power to compete with a DSP core), was not chosen for the analysis on
frequency recognition algorithms is because the aim was to target an embedded
solution. The Signal Processing Laboratory was provided with two of these DSP
boards by Texas Instruments. Due to financial restrictions, one further
requirement of the project was that the implementation of the frequency
recognition algorithms should be done on this board.
3.2.1 Chip Support Library (CSL):
To handle interrupts, scheduling tasks and their priorities and to manage
memory, Texas Instruments has developed a DSP/BIOS real time operating
system. It operates independently from the application. All the settings for the
issues can be set up by a graphical configuration manager. When compiling, the
settings are then applied by the Chip Support Library (CSL).The software
interrupt (processBufferSwi) is defined or the software interrupt service routine
(ISR) which contains the frequency recognition algorithm.
3.2.2 The Code Composer Studio (CCS):
The Code Composer Studio Code (CCS) is the front end of the TMS320C6713
DSP starter KIT and has a lot of advantageous properties.
The most obvious feature is the data visualization. The CCS offers the chance to
observe the data which are currently present in the internal buffers. In the
animation mode, even the change over time is recognizable,. i.e. a change in
frequency would change the diagrams, too.
Another, very important issue is the simple data import to and export capabilities
from these internal buffers. The latter scenario is very useful for debugging
purposes, i.e. for verifying results and frequency recognition algorithms either in
a simulation environment like MatLab or with an alternative ANSI C compiler.
20
This approach has been used in the course of the project and has been proved
to be the most effective way.
Furthermore, the CCS has the convenient feature of a simplified file I/O which
enables to trace results by using injections. Injections are soft breakpoint which
halt the CPU far a short moment to perform a file I/O. Like break and animation
points, they can just be used in a debug/release environment.
3.3 Usage of the Timer:
In order to use the 32-bit timer, two steps must be completed. At first, the timer
has to be configured and second a calibration has to be done which measures
the starting and the stopping of the timer itself.
There are three timer registers to be initialized: the control, the period and the
counter register. The timer control register is used to determine the timer’s mode.
The timer period register stores the maximum value the timer counts to. Once
this value is reached, a timer overflow occurs and the time measurements
become corrupt. The maximum value, a timer can maximal count is
0xFFFFFFFF. To circumvent a timer overflow, either the overflows have to be
counted or the number is downscaled through a division by constant, e.g. 1000.
This constant has to be bared in mind for the correction of the exported data. The
timer counter register stores the current value of the timer. To init the timer, the
constant 0x00000000 is to be taken.
The second step, the calibration of the timer in conjunction with the time
measurement is illustrated by
21
Open timer;
Configure timer;
Set timer to zero;
Calibrate timer;
Start timer;
Stop timer;
Calibration cycles = Stop – Start;
Start timer;
Perform algorithm;
Stop timer;
Elapsed cycles = Stop – Start – Calibration cycles;
Close timer;
Figure 7: Pseudo code for applying time measurements
After having opened and having configured the timer as described above, the
timer needs to be calibrated because the measuring the time itself takes some
time, too. This calibration is the difference between two successive timer events,
i.e. the starting and the stopping of the timer. Then, the difference of both is
estimated, measured in cycles.
The actual measurement of the time works as follows. The timer is started, and
then the algorithm is performed. After having finished the calculation, the timer
stops again
and the elapsed cycles are calculated out of difference of the start
time, the stop time and he calibration.
4. Implementation of the Frequency Recognition Algorithms
4.1 Methodology:
22
For the implementation of the algorithms, the real time characteristic is of less
importance in the first place. The main purpose is to get them working properly
and to verify their results in terms of correctness. In order to achieve this, several
tools are very useful.
As a first approach, MatLab and Simulink are consulted for a first implementation
of the algorithms. Their ease of use and their undisputable ability to monitor
results graphically very quickly are of assistance to get proof of the correctness
of the algorithms. The second step is a direct implementation of the algorithms in
ANSI C using Microsoft Visual Studio C++ 6.0. Since ANSI C is the programming
language which is used for developing applications for the Texas Instruments (TI)
TMS320C6713 Digital Signal Processor (DSP) core in the Code Composer
Studio (CCS), a fast implementation close to the final application is possible. This
step also ensures simultaneous debugging by taking a working algorithm as a
reference. This is a very effective way to verify intermediate steps. Since the
Code Compose Studio offers the opportunity to export complete buffer content’s
into text files, MatLab is used for verifying and displaying the results.
As input sources, several options can be considered. Cleary defined unique
sinusoids generated by WaveLab and played by the soundcard are valid input
signals. Signals from a signal generator are preferable though because they are
more reliable signal sources. At a later stage instrument samples playing musical
notes are taken as input source too
To summarize, for the implementation of the algorithms onto the target hardware,
MatLab and both developer studios (Microsoft Visual Studio C++ 6.0 and Code
Composer Studio go hand in hand) and complement each other. This methology
accelerates the developing process crucially and, what is also very important, it
helps to verify the results to make sure that they are correct.
4.2 Principle of the Frequency Recognition Algorithms
23
The process which all frequency recognition algorithms undergo can be
described as follows. At first, the sampled and quantized input signal x(n) is
transformed into the frequency range, in order to obtain real ℜ{ X (n)} and
imaginary part ℑ{X(n)} of its spectrum. The calculation of magnitude Mag ( X (n)) is
initial for a peak detection mechanism, which finds the spectral component of the
signal with the maximum power. The index kmax of this spectral component is
finally evaluated with respect to the number of samples N and the sampling
rate f s . Since the frequency is now given as a numerical value, it is ideal for a
post processing MIDI conversion.
4.2.1 The Discrete Fourier Transform (DFT):
4.2.1.1 Simulation in Matlab:
The Discrete Fourier Transform is defined as:
N −1
X [k ] = ∑ x[n].e
− j.
2π
.k .n
N
k = 0,1,....N − 1
n =0
(20)
and directly implemented in MatLab (appendix D). The result is shown in figure 8
depicting the original input signal, its spectrum together with the spectrum’s real
and imaginary part of a 1 kHz sinusoidal input at a sampling rate of f s = 44100 Hz .
The signal is deliberately maximal non-coherent sampled as with a leakage
factor of 0.5 as explained in section, in order to obtain a maximum leakage effect.
It is noticeable that even under this worse case condition the peak detection
mechanism will succeed in terms of finding the maximum energetic spectral
component which underlines the decision not to apply extra windowing.
24
2
|Mag| in dB →
x(t) in V →
0.5
0
-0.5
10
0
10
0
0.005
0.01
t in s →
0.015
0.5
1
1.5
f in Hz →
200
200
100
Im{X(f)} in dB →
Im{X(f)} in dB →
0
0
-100
-200
2
4
x 10
0
-200
-300
0
0.5
1
1.5
f in Hz →
-400
2
0
0.5
4
x 10
1
1.5
f in Hz →
2
4
x 10
Figure 8: Frequency recognition in principle
4.2.1.2 Implementation of the DFT in C:
A direct implementation of the DFT in C as in the case of the MatLab simulation
in the previous section is possible, however with slight differences. As MatLab
supports the scientific notation of complex numbers but since this convenient
implementation in C is not available, the twiddle factors have to be rewritten by
using the Euler equation:
W =e
k
N
− j.
2π
.k
N
= cos(
2π
2π
.k ) − j sin( .k )
N
N
(21)
This preliminary study is very important for the implementation of the DFT on the
TMS320C671 because the real and the imaginary part of the spectrum can be
regarded separately. Then the MAC (Multiply-ACumulate) command can be
applied as explained. Since the architecture and the compilers of the C6000
series are specialized for this product summing up command, an implementation
25
of the DFT in the way as described above is promising in terms of a fast
calculation speed.
What is more, as a consequence of the Nyquist-Shannon Theorem, only half the
calculated spectrum is relevant for an evaluation, more precisely, the frequencies
range of 0Hz until half the sampling rate f s / 2 . Therefore, only a scan of half the
spectrum is necessary, a fact that results in an economization of computation
time.
4.2.2 The Fast Fourier Transform (FFT):
4.2.2.1 Simulation in MatLab:
Since the Fast Fourier Transform (FFT) is a very efficient algorithm to apply the
Discrete Fourier Transform (DFT) by exploiting the redundancy of calculating the
twiddle factors, the approach to implement this algorithm implies previous
knowledge and understanding of the DFT.
For the simulation of the FFT, it reads wave files which contain signals to be
analyzed and outputs both its spectrum graphically and stores the input signal
numerically in a user defined output file. The intention is to take this numerical
data as a known reference for the implementation on the TI TMS320C6713.
Another derivative of this simulation has been used, but instead of referring to
wave files as input, actual buffer contents of the TI TMSC6713 were taken as the
input for the MatLab file. This method ensures a direct comparison of the
obtained spectra estimated both in MatLab and with the TI TMS320C6713 and is
very helpful for the development of the algorithm.
4.2.2.2 Implementation of the FFT in C:
Since the implementation of the FFTW was not successfully applied, the
assembly coded library has to be taken for implementing the FFT. For this,
previous knowledge of the FFT as such is pre-requisite.
26
For a buffer size of N ≥ 32, where N has to be of the power of two, a N-point FFT
can be achieved. The API function DSPF_sp_cfftr2_dit() expects a pointer to an
input buffer, a pointer to an array holding the pre compiled twiddle factors and the
number of elements N as an integer as input parameters.
Since the input data has to be complex, i.e. consisting of interleaved real and
imaginary parts, the buffer containing this data has to have the length of 2 · N.
After the execution of this function, this buffer holds the complex result. The
number of twiddle factors is N/2 as a result of having taken advantage from their
redundant calculation. Per definition of the API, the twiddle factors have to be
reordered by bit reversal. Finally, after the calculation of the FFT, the output data
has to be reordered by bit reversal. Both twiddle factor pre calculation and bit
reversal can be found in the TI FFT support files.
Since there was no need to implement the FFT explicitly in another development
environment rather than the Code Composer Studio (CCS), the DFT has been
taken as a reference for the development of the FFT. The approach is, to export
the input buffer of the algorithm in the CSS and to re-import them into the
alternative development environment. Then, a debugging close to real conditions
is possible.
4.2.3 The Non-Uniform Discrete Fourier Transform (NDFT):
The structure of the Non-Uniform Discrete Fourier is similar to the DFT and its
complexity is N 2 , as well. The only difference is the fact that the NDFT does not
refer to an equidistant sampled frequency range indexed with k but to an
arbitrarily frequency vector f k as described. It is expected that the assessing to
this vector stored in an array causes an additional delay that makes the NDFT a
bit slower than the DFT.
The main issue of the NDFT and the key for its advantage in comparison to the
DFT is the arbitrarily chosen frequency vector whose accuracy is increased, if it
is ellconditioned. For frequencies in the lower range (i.e. 27.5Hz, 29.14Hz, … ), a
finer fragmentation of the frequency vector is required, because the spacing
27
between two neighboured notes is smaller than for adjoining frequencies in the
higher range (i.e. … 3951.07Hz, 4186.01Hz).
The main problem is the fact that there are 88 MIDI notes, a number which differs
from the buffer size N in every case. If the buffer size N was equal to 88, each
MIDI note would correspond to a buffer index, but this is, as mentioned before,
not possible. For a buffer size N less than 88, the accuracy would not be
sufficient enough due to the fact that some of the MIDI notes are simply not
assigned to one buffer index. The most interesting case is, if the buffer size is
greater than the number of MIDI notes.
The spacings between each frequency point between two MIDI notes are
equidistant though, however dissimilar between each MIDI note pair due to the
fact that the MIDI notes themselves are not equidistant arranged to each other.
Taking equation (1) with as a f ( n M ID I ) = 440.2
n M ID I − 69
12
H z basis, the spacings
are calculated in dependence on two neighbored MIDI notes and the buffer size
N.
Δ nMIDI ( N ) =
f ( nMIDI + 1) − f (nMIDI )
N
88
for 21 ≤ nMIDI ≤ 108
Figure 9: Arbitrarily chosen frequencies for the NDFT
28
(22)
The final structure of the vector holding the arbitrarily chosen frequencies and the
corresponding MIDI notes is shown in figure 9 in principle. Since this vector is
dependent on the buffer size N, it has to be recalculated for each buffer size N.
4.2.4 The Goertzel Algorithm:
Before coding the Goertzel algorithm up, it is quite useful to have a closer look at
the Goertzel IIR filter coefficients as they are listed in table 3 . Since the filter
coefficients b0 = 1 and a2 = -1 remain constant throughout the whole calculation
and hence do not have to be determined at each stage, the filter coefficients b1
and a1 have to be calculated only for each Goertzel filter. It is useful to separate
the complex denotation of filter coefficient b1 into a real and an imaginary part
using the Euler equation as shown in equations (41) and (42).
b1 = −e
b1 = − cos(2π .
− j.
2π
.k
N
= −e
− j .2π .
fk
fs
k = 0,1,.....88
fk
f
) + j.sin(2π . k ) = ℜ(b1 ) + j.ℑ(b1 )
fs
fs
k = 0,1,.....88
(23)
(24)
For the sake of completeness, filter coefficient a1 of the recursive section is given
as
a1 = 2 cos(
2π
.k )
N
(25)
This consideration results in a set of 3 · 88 = 264 pre-calculated filter coefficients
to be set up in a look-up table as shown in principle in table 3.
Table 3: Frequencies and filter coefficients for 88 MIDI notes
29
For the buffer size N, the signal is filtered with respect to the pre-calculated filter
coefficients. Since only the real and imaginary part and their magnitude are of
interest only, a calculation of these intermediate values takes place after having
filtered the data. Different from DTMF systems using 8 frequencies of interest,
the maximum number of the frequencies to be regarded is 88. For reason of
saving computation time, the pre- calculated filter coefficients are read from a two
dimensional array goertzel_coeff for each loop pass. It has been shown that the
speedup of the method of pre calculation the IIR filter coefficients results in a
considerable factor of 102 .
The actual Goertzel value is obtained with the application of the Goertzel IIR filter
by taking the filter coefficients into account. Again, as considered in section 3.4,
the estimation of the maximum power happens promptly without and is
continuously updated if the maximum is excelled instead of a down streaming
scanning of a whole temporary buffer containing all successively calculated
Goertzel values.
4.3 Comparison of schedulable Buffer Sizes:
A noteworthy fact is the actual memory requirement for a buffer size N for each
of the algorithms. This also demonstrates the reason why a maximum buffer size
of just 8192 can be applied.
To obtain N data elements, the ping and the pong buffer have to be set up,
each of a size of 2.N . Due to the codec only applying a stereo input, twice the
number of samples as to be scheduled. In total, for the DFT and the Goertzel
algorithm, the actual buffer equirements are 4 · N.
30
In particular in the case of the FFT, for the generation of the input buffer for
the API function, another additional buffer is needed. This buffer has the size
of 2.N , because the input data has to be complex with interleaved real and
imaginary part. The twiddle factors need an extra buffer of the size N / 2 . In total,
5.5 times more data elements have to be provided than originally scheduled for a
problem size N.
The NDFT has, due to its complexity, similar basic memory requirements of
(4.N ) as the DFT. Additionally, there is a need of N data elements for the precalculated arbitrarily chosen frequency vector.
Table 11 lists the actual buffer size requirements for a problem size N and
states which factor has to be taken into consideration for an implementation of a
problem size N for each algorithm.
5. Measurements
In order to obtain distinct signals consisting of single sinusoids only, a common
signal generator is used. For monitoring reasons its signals are simultaneously
displayed with an oscilloscope. Once the signal has been led into the line input of
the Texas Instruments (TI) TMS320C6713 Digital Signal Processor (DSP) board
it is subject to sampling and quantization before it can be analyzed by one of the
to be investigated by one of the to be investigated frequency recognition
algorithms. The development software, naming the Texas Instruments’ Code
Composer Studio is installed on the PC and is used for both coding the
algorithms and for storing the measurement results in a file.
5.1 Analysis of Input Signals:
Measurements without any preceding estimation or even without a basic analysis
are useless. In other words, starting a measurement without any kind of
expectancy is neither engineer-like nor scientific. Therefore, this section analyses
the input signals’ spectra versus time, i.e. the spectral behavior during the
duration of the samples. It does not take into account the spectral resolution. It
31
makes assumptions of the algorithms’ frequency recognition capability
exclusively based on an analysis of the input signals’ spectra and is therefore a
hypothesis. By doing this, a necessary preliminary evaluation of the frequency
recognition algorithms is possible.
It is presumed that the first category of signals, which consists of a simple
sinusoid, are stable over their whole duration, meaning, a simple maximum
power search over the spectrum will yield the fundamental frequency. That is
why, for this type of signal, frequency recognition is expected to be without
complications.
For the second category, signals containing complex waveforms, samples of
notes played by musical instruments are used as an input for the frequency
recognition algorithms. The expectation is that not every sample holds the
maximum power on the fundamental frequency because the signal can no longer
be considered to be a simple sinusoid. Moreover, for some of the notes some of
the harmonics actually carry more power. Additionally, this behaviour changes
over time for some of the regarded notes, which might make frequency
recognition more difficult rather than with simple sinusoids. These assumptions
have to be confirmed both by the following spectral examination and verified by
the measurements itself.
5.1.1 Sinusoidal Inputs:
The bases for the evaluation of the frequency recognition algorithms are the
metrics latency, speedup and accuracy as discussed earlier. Latency and
speedup can be deduced from the number of cycles an algorithm needs to
estimate a frequency and the accuracy depends directly on the algorithmic
estimation capability from which the absolute deviation can be calculated.
To find out the best frequency recognition algorithm, at least 10 test series for
each buffer size for 32 ≤ N ≤ 8192 have been applied to each algorithm. The
reasons for these boundaries are as follows. The lower bound 32 ≤ N is a
limitation of the implementation of the FFT API routine (DSDF_sp_cfftr2_dit) from
Texas Instruments (TI), whereas the upper bound is due to internal memory
32
constraints. Furthermore, for a buffer size of N > 8192, there would be
unreasonably high computational costs for the calculation of direct frequency
recognition algorithms such as the DFT and the NDFT, whose costs are N2.
Concluding, the common boundaries for the buffer size, 32 ≤ N ≤ 8192, have
been chosen for all the measurements to be able to compare all the algorithms
and should be sufficient to fulfill the task.
For each algorithm test series, three cases were investigated: the note A4 with
440Hz (being the note in the centre of the MIDI scale), and two extreme cases:
the highest frequency with 4186.01Hz (note C8) and the lowest one of 27.5Hz
(note A0) of the target frequency range. The reason for these choices is to
determine the ability to analyze the algorithms’ capability to handle the highest
frequency, 4186.01Hz and the highest spectral resolution demand at 27.5Hz,
However, there are a number of harmonics along with the fundamental frequency
which should not occur. When we are considering the input from the function
generator. The reason for this is the fact that the signal generator does not
produce a pure sinusoid exclusively but has a small amount of harmonic
distortion and should be ignored in this case. Additionally, there are reflections
due to a mismatched connection from the signal generator to the audio jackinput
of the DSP board. But here we are not considering the function generator but
taking the signal directly from the system. Since the maximum power can be
found at 440Hz, 4186.01Hz and 27.5Hz respectively, these undesired, less
powerful frequencies do not carry weight in terms of frequency recognition. This
is a good prerequisite for a solid frequency recognition capability regardless of
the spectral resolution in terms of accuracy and a reasonable buffer size N.
5.1.2 Inputs of Musical Instruments:
For signals containing simple sinusoids generated by a signal generator, it is
anticipated that the frequency recognition algorithms should work perfectly if
maximum power detection is used. Whether they can be applied to signals
containing complex waveforms is subject to a discussion in this section.
33
To test the frequency detection capabilities of each algorithm, four different
instruments (piano, violin, trumpet and flute) playing three different notes have
been taken as an input. Because not every note of the frequency range of
interest can be subject to an analysis and a subsequent measurement, because
of the range of each instrument, a representative selection has to be made.
Thus, all the selected notes are played by each instrument to ascertain
comparability.
When applying instrumental inputs, the frequency recognition capability is initially
of interest only. The buffer size N is dependent on the demand for the accuracy
for the notes C6, C5 and C4 respectively. Equation (44) denotes the coherence
between the sampling rate f s and the demanded accuracy depending on the
MIDI note.
N=
fs
accuracy (nMIDI )
(26)
Table 13 lists summarizing the constraints of the notes which are subject to
further measurements. They are valid each for piano, violin, trumpet and flute.
Table 4: Frequencies and filter coefficients for 88 MIDI notes
34
5.2 Prediction of the Algorithms’ Frequency Recognition Capability:
Taking simple sinusoids (440Hz, 4186.01Hz and 27.5Hz) all frequency
recognition algorithms have an unambiguous input because the spectral
distribution of the signal’s power remains constant throughout the whole time
window and spectrum’s maximum peak can be found at the fundamental
frequency. That is the reason why the chances of failure are very low using only
a simple power detection technique. Measurements on the algorithms with these
sinusoidal inputs and the subsequent analysis with the evaluation criteria will give
certainty whether this prediction is right.
For instruments playing musical notes, it is important to find out whether the
maximum power can actually be found on the fundamental frequency or on one
of its harmonics. In the latter case, investigations have shown that the lower the
frequency is the more harmonics exist. For all analyzed musical notes
(1046.5Hz, 523.23Hz and 261.63Hz), string-based instruments (piano and violin),
the spectral behavior over time is constant and therefore ideal for frequency
recognition. It can also be seen, that the relative power distribution of the
spectrum of these string based instruments remains stable over time, i.e. the
maximum power can always be found on the fundamental frequency of the note
being played. However, the spectrum over time of woodwind and brass
instruments like the flute and the trumpet respectively is not well behaved for
each of the analyzed notes. The trumpet does not have the maximum power on
the fundamental frequency for MIDI note C5. Furthermore, for MIDI note C4, both
the trumpet and the flute will fail according to the preliminary spectral analysis
because the maximum power is not held by the fundamental frequency each.
Table 5 lists the probable frequency recognition capabilities of the investigated
algorithms with respect to the analyzed instrumental notes.
Table 5: Probable frequency recognition capability of instruments
35
5.3 Evaluation Criteria
In order to find the optimum frequency recognition algorithm, there is a need for a
precedent definition of fixed quantities and performance metrics. The purpose is
a reduction of the number of variables to a reasonable minimum which is then
subject to a further analysis. Finally, this preliminary investigation leads to the
conclusion that the evaluation criteria, in particular the performance metrics, are
dependent on the buffer size N only.
5.3.1 Sampling Rate:
The sampling rate is fixed to f s = 44100 Hz in order to take into account high
frequency harmonics, since the analysis of frequency recognition algorithms is
subject to a further extension to the whole audible range of 0Hz – 22000Hz.
Therefore the sampling rate has to be greater than twice the maximum frequency
that can appear in the expected signal in order to avoid aliasing according to the
Nyquist-Shannon theorem as explained in section 5.4.1.1. Consequently, this
major constraint of having a sampling rate of f s = 44100 Hz is mandatory for the
Generic Musical Instrument System (GMIS), despite of having a large buffer size
N and thus having an increase of complexity, because the higher the sampling
rate, the more sampling points are acquired.
5.3.2 Spectral Resolution:
36
A fine spectral resolution R is prerequisite for a solid accuracy and is given by the
ratio of the sampling rate f s and the actual buffer size N with
R( f s , N ) =
fs
N
(27)
In general, one can say, the higher the buffer size N the better is the spectral
resolution R at a given sampling rate f s . Due to the fact that an increase of the
buffer size N simultaneously causes an increase of computational costs and
memory requirements, a trade-off has to be found between the spectral
resolution, R, and the buffer size, N.
When analyzing musical notes, the maximum spectral resolution which has to be
provided, is Rmax = 0.82 Hz in order to be able to recognize all frequencies of the
whole target frequency range. The reason for this high resolution is the smallest
half of the MIDI channels, i.e. between MIDI note #21 and #22 with 27.5Hz and
29.14Hz respectively.
5.3.4 Time to settle:
The system’s time to settle tsettle is the time which is needed to make sure that the
buffer completely holds the signal’s samples. This quantity is relevant and
subject to further investigations for unbuffered systems only, but because internal
buffers are applied, tsettle is constant the time required for filling up a buffer of the
size N. Let the sampling frequency be f s = 44100 Hz and assuming there is a
buffer of the size N = 44100, it takes one second to acquire all 441000 samples.
Therefore, the time to settle tsettle is defined as
t settle =
N
fs
(28)
As the sampling rate f s is fixed to 44100Hz and the buffer size N does not
change during the operating mode, the time to settle tsettle is constant for each
buffer size N, too.
37
5.3.5 Computational Costs of Investigated Algorithms:
The complexities of the to be investigated algorithms are listed for the buffer size
N
Table 6: Fourier transform-based algorithmic complexity
Algorithm
Complexity
Discrete Fourier Transform (DFT)
N2
Fast Fourier Transform (DFT)
N log 2 ( N )
Goertzel algorithm
Non-Uniform Discrete Fourier Transform
(DFT)
88.N
N2
The comparison of the complexities of the Goertzel algorithm and the FFT
shows, that at a buffer size of N > 288, the FFT’s algorithmic complexity will be
greater than the one of the Goertzel algorithm, but the buffer size required for this
problem is unreasonable high and furthermore not feasible without massive
accessing external memory. This, however, would mean additional latencies and
would be a different kind of problem which cannot be taken into consideration at
this stage.
Theoretically, the number of cycles corresponds directly to the algorithmic
complexity, but there is an additional amount of cycles for pre and post
processing of audio data to be regarded, for instance the generation of
interleaved complex data, the calculation of magnitude, the spectrum’s peak
detection and the final evaluation of the most powerful spectral frequency
component given as an index. Thus, the overall number of cycles has to be
measured and taken into account when analyzing the metrics.
38
5.4 Performance Metrics
For the evaluation of a system’s real time characteristics, it is necessary to take
the algorithmic properties into account, i.e. their latency, their speedup and the
algorithms’ accuracy. These metrics are subject to be discussed in this section.
It is expected that according to the fixed computational costs of each
algorithm the number of cycles needed for the calculation of each algorithm, will
follow the same law. Consequently, the tendencies of speedup and latency will
remain equal with each measurement and independent from the input, but due to
the fact that the number of cycles are deduced from the computational cost and
are therefore theoretical, they have to be experimentally proven. Therefore and
with respect to the fact that pre and post processing of the data is not included to
the theoretical considerations, there is an extra need for measuring these
metrics.
5.4.1 Latency:
The latency is the time a signal needs to get from its source to its destination
after processing. In the case of frequency recognition as applied in this project,
latency is the difference in time between playing the note and detecting its
frequency or in other words how long an algorithm needs to output the numerical
value of the estimated frequency. Therefore, for the number of cycles an
algorithm takes, the latency is defined as
lt =
cycles
f CPU
(29)
with a given central processing unit (CPU) frequency of the TI TMS320C6713 of
fCPU = 225MHz . The average in latency ltaverage is the average temporal resolution
of a human ear, i.e. the time that can pass by before the listener realizes a delay
in playing a note and hearing the actual sound. It has been shown that on
average the latency of the human ear is ltaverage = 50ms . Consequently, in order to
be taken seriously into consideration for frequency recognition in real time, the
39
algorithms have to terminate their calculations within this time limit, preferably
less than that because frequency recognition will probably not be the only task
for the GMIS.
5.4.2 Speedup:
An important metric for the evaluation of is the speedup according to Amdahl’s
Law. The speedup is the ratio of the number of cycles of an algorithm before and
after its improvement and is described as
sp =
cyclesDFT
cycles
(30)
The speedup has to be greater than 1 if the algorithm is to be considered
superior to any of the other investigated algorithms. Due to the fact that the
Discrete Fourier Transform (DFT) is the slowest Fourier transform-based
algorithm, it is taken as a baseline for the speedup for all the other algorithms.
5.4.3 Accuracy:
A measure for the accuracy is the absolute deviation. The absolute deviation of a
value in a set of values is the absolute difference between this value and a
nominal. This nominal can either be a mean of the set of these values or a
threshold value, in this case the expected frequency f ep . The value from which
the absolute difference is taken is the estimated frequency f es . Being a measure
for accuracy, absolute deviation is defined as
ad = fep − fes
(31)
Due to the fact that the tolerance is the half the difference between two MIDI
channels, in the worse case between MIDI note #21 and #22, the minimum
acceptable error equals the maximum spectral resolution required for accurate
frequency recognition, i.e.ad min = Rmax = .82 Hz .
40
6. Results
The first category, input signals consisting of a simple sinusoid, is used to
measure latency and speedup in the first step in order to prove the theoretical
complexity discussed earlier and second to show the functionality of the
investigated frequency recognition algorithms, i.e. to prove their frequency
recognition capability and to investigate on their accuracy as derived earlier.
The second category, instrumental inputs playing musical notes, is taken to judge
on the algorithms’ frequency recognition capability for input signals containing
complex waveforms such as in the case of musical notes. By doing this, it will be
demonstrated that a simple frequency recognition algorithm with simple peak
power detection as performed in the first set of measurements, will not suffice for
a musical system as proposed with the GMIS.
6.1 Frequency Recognition Algorithms analyzing simple Sinusoids
6.1.1 Latency:
The behavior of the algorithmic latency depends on their individual complexity as
described earlier. For the regarded frequencies 27.5Hz, 440Hz and 4189Hz, the
latency is equal as it can be seen in figures 10 - 12 which underlines the
preliminary considerations earlier. Therefore, just an analysis of the results
concerning the latency’s tendencies for each algorithm is made, not for each
frequency in particular.
Figures 10 - 12 show the four algorithms’ latency versus the buffer size N for
different inputs containing simple sinusoids of 440Hz, 4186.01Hz and 27.5Hz
respectively. The algorithms with the maximum latency for the whole range of N,
with 32 ≤ N ≤ 8192, are the DFT and the NDFT with a narrow difference to the
advantage of the DFT as expected. This slight difference is due to the fact that
41
the NDFT has to read the vector with the arbitrarily chosen frequencies whereas
the DFT simply uses the equidistant spectral index k for the computation of the
spectrum. According to their equal structure, the tendency of their latencies is
equal, too. The Goertzel algorithm has a smaller latency than the DFT and NDFT
but is, as expected, still slower than the FFT.
latency vs buffersize for inoput of 27.5HZ
2
10
1
10
0
10
latency
-1
10
-2
10
-3
10
-4
10
-5
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 10: Latency vs. Buffer size for a 27.5Hz input
42
latency vs buffersize for inoput of 440
2
10
1
10
0
10
-2
10
-3
10
-4
10
*
*
*
*
-5
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 11: Latency vs. Buffer size for a 440Hz input
latency vs buffersize for inoput of 4186.01hZ
2
10
1
10
0
10
-1
latency
latency
-1
10
10
-2
10
-3
10
-4
10
-5
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 12: Latency vs. Buffer size for a 440Hz input
43
NDFT
FFT
DFT
Goertzel
6.1.2 Speedup:
Figures 13- 15 show the speedup with respect to the buffer size N for 440Hz,
4186.01Hz and 27.5Hz. Again as expected, since the speedup is also directly
linked with the number of cycles, the three graphs are fairly equal for each input
frequency.
Taking the DFT as a baseline, just the NDFT is slightly slower which is due to
implemental reasons, i.e. the reading of the vector of the arbitrary chosen
frequencies takes more time than just simply referring to a loop index.
The Goertzel algorithm with a complexity of 88·N is dramatically faster than the
DFT and the NDFT which computational costs are N 2 , but is still more slowly
than the FFT with a complexity of N log 2 ( N ) , again as expected.
speed vs buffersize for 27.5Hz
4
10
3
10
2
speed
10
1
10
0
10
-1
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 13: Speedup vs. Buffer size for a 27.5Hz input
44
speed vs buffersize for 440Hz
4
10
3
10
2
speed
10
1
10
0
10
-1
0
1000
2000
3000
4000
5000
buffersize(N)
6000
7000
8000
9000
Figure 14: Speedup vs. Buffer size for a 440Hz input
speed vs buffersize for 4186.01Hz
4
10
3
10
2
10
speed
10
*
*
*
*
1
10
0
10
-1
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 15: Speedup vs. Buffer size for a 4186.01Hz input
45
NDFT
FFT
DFT
Goertzel
6.1.3 Accuracy:
Why can the fastest algorithm, the FFT, not simply be taken for frequency
recognition? The answer is that, the less time an algorithm needs to fulfill its
purpose the better it is, but it still has to be accurate enough to meet the
requirements. Thus, the accuracy of each algorithm’s frequency recognition
capability has to be analyzed, too, because a fast but inaccurate algorithm is of
no use. As defined, a measure for accuracy is the absolute deviation of the
estimated frequency from the expected frequency, i.e. the absolute difference
between these both frequencies.
estimated frequency vs buffersize for input of 27.5Hz
3
estimated frequency(Hz)
10
2
10
1
10
0
1000
2000
3000
4000 5000
buffersize(N)
46
6000
7000
8000
9000
Figure 16: Estimated frequency vs. Buffer size for a 27.5Hz input
absolute deviation vs buffer size for input of 27.5Hz
3
10
2
absolute deviation(Hz)
10
1
10
0
10
-1
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 17: Absolute deviation vs. Buffer size for a 27.5Hz input
estimated frquency vs buffersize for input of 440Hz
2.9
10
2.8
estimated frequency(Hz)
10
2.7
10
2.6
10
2.5
10
2.4
10
2.3
10
2.2
10
0
1000
2000
3000
4000 5000
buffersize(N)
47
6000
7000
8000
9000
*
*
*
*
NDFT
FFT
DFT
Goertzel
Figure 18: Estimated frequency vs. Buffer size for a 440Hz input
absolute deviation vs buffersize for input 440Hz
3
10
2
absolute deviation(Hz)
10
1
10
0
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
Figure 19: Absolute deviation vs. Buffer size for a 440Hz input
estimated frequency vs buffer size for input of 4186.01Hz
3.62
10
estimated frequency(Hz)
3.6
10
3.58
10
3.56
10
3.54
10
0
1000
2000
3000
4000
5000
buffersize(N)
6000
7000
8000
9000
Figure 20: Estimated frequency vs. Buffer size for a 4186.01Hz input
48
*
*
*
*
NDFT
FFT
DFT
Goertzel
absolute deviation vs buffersize for 4186.01Hz
3
absloute deviation(Hz)
10
2
10
1
10
0
10
0
1000
2000
3000
4000 5000
buffersize(N)
6000
7000
8000
9000
*
*
*
*
NDFT
FFT
DFT
Goertzel
Figure 21: Absolute deviation vs. Buffer size for a 4186.01Hz input
6.1.4 Comparison of the Investigated Metrics:
From the results presented, it can be seen that the DFT and the NDFT yield the
highest latency due to their equal structure. The NDFT is slightly slower, the
speedup with respect to the DFT is 0.933 on average which is due to the
additional reading of the vector with the arbitrarily chosen frequencies. The
Goertzel algorithm is considerably faster than the DFT and NDFT but definitively
slower than the FFT. Its tendency of latency and speedup is due to its algorithmic
complexity of 88·N. The fastest frequency recognition algorithm is clearly the FFT
with a complexity of N log 2 ( N ) .
The DFT is far too slow and inaccurate and is therefore as expected not to be
taken into account for frequency recognition at all. The NDFT has an adequate
accuracy at a buffer size of N = 4096, but is too slow for the average latency of
the human ear. The most interesting case is the frequency input of 27.5Hz,
where an accuracy of 0.82Hz is demanded. It has been clearly shown that the
49
FFT, the DFT and the Goertzel algorithm fail in terms of accuracy. For these two
algorithms, a buffer size of N > 8192 is necessary to meet the constraints. The
NDFT, however, is accurate enough for the demanded spectral resolution of
0.82Hz but is far too slow and beyond the maximum allowed latency of 50ms.
6.2 Frequency Recognition Algorithms analyzing Musical Notes
The question is whether the frequency recognition algorithms are able to cope
with instrumental inputs. As analyzed, it has been analytically shown that the
spectrum for a single note over time can join the maximum power on one of their
harmonics rather than on the fundamental frequency, depending on the
instrument’s character.
6.2.1 Note C6 (1046.5Hz) for Piano, Violin, Flute and Trumpet:
Figure 22 - 23 depict the estimated frequencies over time for an 1046.5Hz input
played by the two string based instruments piano and violin respectively. Since
the estimated frequency is within the boundaries throughout the whole sample’s
duration, all investigated frequency recognition algorithms meet the restrictions
for all applied instrumental inputs.
Estimated frequency(Hz) vs time for input of 1046.25Hz(panio)
3.027
10
Estimated frequency(Hz)
3.025
10
3.023
10
3.021
10
3.019
10
3.017
10
3.015
10
0
0.1
0.2
0.3
0.4
0.5
t(s)
0.6
50
0.7
0.8
0.9
1
Figure 22: Estimated frequency vs. time for a 1046.5Hz input (piano)
Estimated frequency(Hz)
Estimated frequency(Hz) vs time for input of1046.5Hz(violin)
3.02
10
3.01
10
0
0.5
1
1.5
2
2.5
3
3.5
t(s)
*
*
*
NDFT
DFT
FFT
Figure 23: Estimated frequency vs. time for a 1046.5Hz input (violin)
The Goertzel algorithm is the most accurate algorithm and estimates 1046.5Hz
constantly during the whole sample’s duration. The FFT is the frequency
recognition algorithm with the nearest close accuracy with an estimated
frequency of 1033.99Hz for every moment. The DFT and the NDFT meet the
constraints in terms of accuracy, too.
51
Estimated frequency(Hz) vs time for input of 1046.5Hz(trumpet)
3.04
Estimated frequency(Hz)
10
3.03
10
3.02
10
0
1
2
3
4
5
6
7
t(s)
Figure 24: Estimated frequency vs. time for a 1046.5Hz input (trumpet)
estimated frequency(Hz)
estimated frequency vs time for input of 1046.5Hz(flute)
3.02
10
3.01
10
0
0.5
1
1.5
2
2.5
3
3.5
t(s)
*
*
*
NDFT
DFT
FFT
Figure 25: Estimated frequency vs. time for a 1046.5Hz input (flute)
52
Both the DFT and the NDFT show a similar behaviour like for the string. Both
algorithms are accurate enough for meeting the restrictions but they are
fluctuating throughout the whole sample’s duration, whereas the DFT appears
more stable than the NDFT. Again, the FFT supplies accurate frequency
estimation within the limits, but is subject to variation for a violin. The only
algorithm, which is unreliable in terms of frequency recognition for woodwind and
brass instruments, is the Goertzel algorithm.
6.2.2 Note C5 (523.25Hz) for Piano, Violin, Flute and Trumpet:
For a string based instrumental input of 523.25Hz all algorithms succeed in
estimating the played note, as shown is figures 26 and 27. Again, the Goertzel
algorithm is the most accurate and hits exactly the expected frequency of
523.25Hz. The FFT is similar to the previous section the next close frequency
recognition algorithm and estimates constantly 517.0Hz over the whole samples’
duration and hence meets the given constraints. As expected, the DFT and the
NDFT fluctuate but are is still in between the upper and lower bounds of
508.57Hz and 537.93Hz respectively.
Estimated frequency(Hz) vs time for input of 523.25Hz(piano)
2.726
10
2.724
Estimated frequency(Hz)
10
2.722
10
2.72
10
2.718
10
2.716
10
2.714
10
0
0.2
0.4
0.6
0.8
53
1
1.2
1.4
1.6
1.8
Figure 26: Estimated frequency vs. time for a 523.25Hz input (piano)
Estimated frequency(Hz) vs time for input of 523.25Hz(violin)
2.726
10
2.723
Estimated frequency(Hz)
10
2.72
10
2.717
10
2.714
10
2.711
10
0
*
*
*
NDFT
DFT
FFT
0.5
1
1.5
2
2.5
3
Figure 27: Estimated frequency vs. time for a 523.25Hz input (violin)
Regarding the measurement results for the flute and the trumpet, a more specific
evaluation is required. Figure 28 shows the estimation capabilities of the four
algorithms for a flute playing MIDI note C5. The DFT, the FFT and the NDFT are
vacillating but are within the desired absolute tolerance of 14.68Hz, meaning that
the estimated frequency does not cross the upper and lower limits of 508.57Hz
and 537.93Hz respectively at any time. In specific, the FFT is the least fluctuating
algorithm. In contrast to these three algorithms, the Goertzel algorithm fails in
terms of accuracy.
For all algorithms, the estimated frequency wobbles between these two
harmonics. Consequently, for a trumpet playing MIDI note C5 it is impossible to
estimate the expected frequency of 523.25Hz regardless what algorithm is being
applied.
54
Estimated frequency(Hz) vs time for input of 523.25Hz(flute)
2.74
10
Estimated frequency(Hz)
2.73
10
2.72
10
2.71
10
2.7
10
0
0.5
1
1.5
2
2.5
3
3.5
t(s)
Figure 28: Estimated frequency vs. time for a 523.25Hz input (flute)
55
Estimated frequency(Hz) vs time for input of 523.25Hz(trumpet)
3.17
Estimated frequency(Hz)
10
3.14
10
3.11
10
3.08
10
3.05
10
3.02
10
2
3
4
5
6
7
8
9
*
*
*
NDFT
DFT
FFT
Figure 29: Estimated frequency vs. time for a 523.25Hz input (trumpet)
6.2.3 Note C4 (261.63Hz) for Piano, Violin, Flute and Trumpet:
For a string based input of a piano and a violin playing note C4 (261.63Hz) the
DFT, the FFT and the NDFT meet the limits given by the absolute tolerance of
7.34Hz. The DFT varies within the compulsory boundaries of 254.29Hz and
268.97Hz respectively, and so does the NDFT but more modest than the DFT.
The FFT is not fluctuating at all and stays constant at an estimated frequency of
258.5Hz. However, the Goertzel algorithm does not fulfill the required constraints
at each moment and is therefore not reliable for recognizing the input’s frequency
for these two instruments.
56
Estimated frequency(Hz) vs time for input of 261.63Hz(violin)
2.44
Estimated frequency(Hz)
10
2.43
10
2.42
10
2.41
10
0
1
1.5
2
2.5
3
3.5
NDFT
DFT
FFT
Figure 30: Estimated frequency vs. time for a 261.63Hz input (violin)
Estimated frequency(Hz) vs time for input of 261.63Hz(piano)
2.44
10
Estimated frequency(Hz)
*
*
*
0.5
2.43
10
2.42
10
2.41
10
0
0.5
1
1.5
2
2.5
3
Figure 31: Estimated frequency vs. time for a 261.63Hz input (piano)
57
All the algorithms estimate these both overtones at 1046.5Hz and 1569.98Hz
respectively because of the reasons stated above. Hence and to anticipate at this
stage, a simple application of the frequency recognition algorithms is not
sufficient when playing note C6 with a trumpet.
Estimated frequency(Hz) vs time for input of 261.63Hz(flute)
2.722
10
2.72
Estimated frequency(Hz)
10
2.718
10
2.716
10
2.714
10
0
0.5
1
1.5
2
2.5
3
3.5
Figure 32: Estimated frequency vs. time for a 261.63Hz input (flute)
58
Estimated frequency(Hz) vs time for input of 261.63Hz(trumpet)
3.17
Estimated frequency(Hz)
10
3.14
10
3.11
10
3.08
10
3.05
10
3.02
10
0
*
*
*
1
2
3
4
5
6
7
8
NDFT
DFT
FFT
Figure 33: Estimated frequency vs. time for a 261.63Hz input (trumpet)
6.2.4 Comparison of Frequency Recognition Capability for Musical Notes:
The DFT, the FFT and the NDFT are able to estimate the expected frequency
within the given constraints for note C6 for each instrument; for note C5 for all
instruments except the trumpet and for C4 for string based instruments only, i.e.
the piano and the violin.
Also, as expected, all algorithms fail in terms of estimating the fundamental
frequency for the trumpet playing note C5 and even for both flute and trumpet
producing note C4 due to the fact that the maximum power for the notes played
by these instruments is joined at their harmonics as shown. In particular, note C5
(523.25Hz) played by a trumpet, holds its maximum power on its second
harmonic 1046.5Hz as analytical shown and experimental verified. In specific, for
note C4 (261.63Hz) played by a flute, the maximum power can be found at the
second harmonic at 523.25Hz, the reason why the peak detection mechanism of
all algorithms estimates this frequency. For a trumpet playing note C4, it has
been theoretically and experimentally shown that it is even more difficult to find
59
the correct expected frequency because the maximum power is present at both
the fourth and sixth harmonic of 1046.5Hz and 1569.98Hz respectively.
The exception is the Goertzel algorithm. It identifies the correct frequency of the
piano and violin for the notes C6 and C5 on the one hand, but does not confirm
the expected results of estimating flute and trumpet playing note C6, the flute
producing C5 and the piano and the violin playing note C4 on the other hand.
Regarding the algorithms’ frequency recognition capability only, it has been
uniquely shown in section 6.1, that frequency recognition algorithms estimate
frequencies of signals containing a simple sinusoid in most of the measured
cases without complications, whereas the frequency recognition capability of
each algorithm varies due to each individual nature. However, when analyzing
instruments playing musical notes, it has to be taken into account that the lower
the frequency is, the higher is the number of harmonics. As long as the maximum
power is at the fundamental frequency, the peak detection algorithm estimates
the correct frequency as it is the case with the string based instruments piano
and violin. If the maximum spectral power is on one of the harmonics as been
shown for note C5 played by the trumpet and for note C4 produced by trumpet
and flute, a valid frequency estimation as implemented with the four algorithms,
is not
possible. Consequently, it can be clearly stated that an exclusive
application of frequency recognition algorithms is insufficient for definite
frequency recognition of musical notes.
60
7. Conclusions and Future Work
This dissertation details the theory behind a novel Generic Musical Instrument
System (GMIS) and provides an analysis of the possibilities to adapt the
advantageous techniques of Dual-Tone Multi-Frequency (DTMF) systems to
musical systems such as the GMIS. Four different Fourier transform-based
frequency recognition algorithms have been analyzed: the Discrete Fourier
Transform (DFT) as a baseline for the evaluation of all algorithms, the Fast
Fourier Transform (FFT) which is used in most of the engineering applications,
the Goertzel algorithm and the Non-Uniform Discrete Fourier Transform (NDFT)
which are both successfully applied to DTMF systems.
For input signals containing simple sinusoids, the DFT and the NDFT are far too
slow due to their complexity of N 2 . In contrast to them, the FFT is, as expected,
clearly the fastest Fourier transform-based algorithm, followed by the Goertzel
algorithm. Both algorithms are fast enough to undercut the average latency of a
human ear of 50ms.
In specific, the DFT, the FFT and the NDFT estimate the correct frequencies for
all instruments playing note C6 and C5 except for a trumpet playing note C5. For
note C4, only the sting-based instruments succeed. This behaviour is due to the
fact that the maximum spectral power is held by the fundamental frequency. It
has been worked out, that the Goertzel algorithm estimates the correct frequency
of note C6 played by the piano and the violin, but it fails for all other analyzed
musical notes and does not estimate the frequencies as predicted. That is why,
this algorithm cannot be considered as a candidate for frequency recognition of
complex waveforms as it is required for musical systems such as the GMIS.
It has been shown that the frequency recognition capability of the investigated
Fourier transform-based algorithms is not satisfactorily enough according to the
constraints of the GMIS. Thus, frequency recognition algorithms using spectral
power estimation could be subject to continuative research. Two representatives
of this algorithm category are the Normalized Direct Frequency Estimation
61
Technique (NDFET) based on the Least Mean square (LMS) algorithm and the
Multiple Signal Classification (MUSIC) algorithm is a power spectral estimation
method.
REFERENCES:
[1].MIDI
detailed
specification,
The
International
MIDI
Association,
http://www.midi.org
[2]. Sonali Bachi, Member, IEEE, and sanjit K.Mitra, Fellow, IEEE, “The Non-uniform
Discrete
Fourier
Transform
and
Its
Applications
in
Filter
Design”,
IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS-11: ANALOG AND DIGITAL
SIGNAL PROCESSING, VOL. 43, NO. 6, JUNE 1996.
[3]. M.Bartels and S.Linfoot,”An Analysis of Real time Implementation Of Fourier
Transform-based Frequency Recognition Algorithms”, Academic Paper, Published at the
8th IEEE ISCE, September 2004.
[4]. B. Widrow, S. D. Stearns, "Adaptive Signal Processing", Prentice-Hall, 1985.
[5]. International Standards Organization (ISO), “Acoustics - Standard tuning frequency
(Standard musical pitch)”, ISO 16:1975, 1975.
[6]. Philip McLeod, Geoff Wyvill Department of Computer Science University of Otago,
New Zealand, “Visualization of Musical Pitch”, Proceedings of the Computer Graphics
International (CGI’03) , 2003 IEEE.
[7]. H. Nyquist, "Certain topics in telegraph transmission theory", Trans. AIEE,
vol. 47, pp. 617-644, Apr. 1928.
[8]. E. O. Brigham, “The Fast Fourier Transform and its applications”, Prentice
Hall, 1988
[9]. Texas Instruments, "Datasheet, TMS320C6713, TMS320C6713B FloatingPoint
DSPs"
(tms320c6713),
http://focus.ti.com/docs/prod/folders/print/tms320c6713.html
[10]. C. Roads, “The Computer Music Tutorial”, 2nd ed., Massachusetts Institute
of Technology, 1996
[11]. http://www.ti.com/sc/docs/psheets/man_dsp.htm
62
[12]. http://www.cs.hmc.edu/~kperdue/MusicalDSP.html
[13]. http://www.ee.columbia.edu/~dpwe/sounds/instruments/
63
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement