Implementation of CELP and MELP Speech Coding

IJARCCE
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
Implementation of CELP and MELP Speech
Coding Techniques
Rhutuja Jage1, Savitha Upadhya2
M.E. Student, Dept of E &TC, Fr. C. Rodrigues Institute of Technology Vashi, Navi Mumbai, Maharashtra, India1
Asst Professor, Dept of E &TC, Fr. C. Rodrigues Institute of Technology Vashi, Navi Mumbai, Maharashtra, India 2
Abstract: Speech is one of the natural ways of communication amongst humans. Nowadays there is insatiable demand
for speech communication as it carries more information like speaker identity, emotional state, prosodic nuance which
adds naturalness in communication. With rapid growth and increased number of applications there exists a need for
devising an approach for data compression techniques which reduces communication cost by using available bandwidth
and storage space effectively. The speech coding techniques helps to achieve lower bit rate by simultaneously
maintaining the original speech quality.This paper aims at implementing the CELP (Code Excited Linear Prediction)
and MELP (Mixed Excitation Linear Prediction) coding techniques of speech using MATLAB R2009a. These coding
techniques are analyzed on the basis of subjective and objective tests like Mean Square Error (MSE), Mean Opinion
Score (MOS) and Signal-to-Noise Ratio (SNR). The analysis of CELP speech coding technique shows that this
technique is an improvement to a coder called Linear Predictive Coder (LPC). It is an efficient coding technique having
bit rate in the range 9.6 kbps to 16 kbps. The analysis of MELP coding technique shows that this coder removes the
voicing error in two state excitation model of LPC. It is a low bit rate coder having a bit rate of 2.4 kbps and mainly
used by military and Federal Standards. The comparative results of different values which are obtained for CELP and
MELP coding techniques give a clear idea about more efficient coding technique.
Keywords: Speech coding, Linear Prediction (LP), CELP (Code Excited Linear Prediction), MELP (Mixed Excitation
Linear Prediction)
I. INTRODUCTION
Speech is a special type of non-stationary signal which is
hard to analyze and model. The factors like intelligibility,
coherence and other characteristics play a vital role in the
analysis of the speech signals. In communication the
number of discrete values required to describe one second
of speech signal corresponds to 8000 samples. Therefore,
speech signals are compressed before being transmitted, as
bandwidth is the parameter which affects the cost of
processing. Speech coding is concerned with obtaining a
compact digital representation of speech signals for the
purpose of efficient transmission and storage over band
limited wired or wireless channels. Using speech coding a
telephone company can carry out more voice calls on a
single fiber link or cable. In Mobile and Cellular
communications where the data rates for a particular user
are limited, speech coders can give accommodation to
more services. Speech coding is a useful technique for
Voice over IP, Video conferencing and Multimedia
applications which reduces the bandwidth requirement
over internet and for the tetherless transmission of
information. Also few applications of speech require
minimum coding delay since long coding delays hinder
the flow of the speech conversation [1]. These coding
techniques can be classified based on bit rate, bandwidth
and speech coders used. In this paper the parametric
speech coding technique i.e. MELP and hybrid speech
coding techniques i.e. CELP are simulated using
MATLAB software. It also aims at finding the subjective
and objective parameters of both the coding techniques
Copyright to IJARCCE
and compares them to get the more efficient coding
technique.
II. RELATED WORK
Literature available for the speech coding techniques used
in communications is vast, emerging and continuously
growing. In this paper, many technical papers of authentic
publication, such as, IEEE, Springer, Elsevier and journals
are referred which will be used as a reference while
implementing Code Excited Linear Prediction (CELP) and
Mixed Excited Linear Prediction (MELP) speech coding
techniques. This section describes the several standard
technical papers available and also explains different
methods involved in coding the speech signals. The
various approaches of the research work on speech coding
techniques are illustrated briefly in this section. The paper
[2] gives an overview of methodologies for speech coding
with emphasis on popular methods that have became part
of many communication standards. It mainly presents
historical perspective, brief discussion on human speech
production mechanism, speech coding methods, and
performance measures. A novel approach to speech coding
using hybrid architecture is presented in the paper [3].
Advantages of parametric and perceptual coding methods
are utilized to create a speech coding algorithm assuring
better signal quality than CELP parametric codec. It
mainly discusses two approaches; one is based on
selection of voiced signal components which are encoded
using parametric algorithm and unvoiced components. On
DOI 10.17148/IJARCCE.2016.56155
702
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
IJARCCE
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
the other hand the second approach uses perceptual
encoding of the residual signal in CELP codec. The Code
Excited Linear Predictive (CELP) coding technique falls
under the category of hybrid coding. It is a low bit rate
coding technique and the prime use of this technique is for
communication purpose. The literature of CELP is
described further. The paper [4] discusses a postprocessing technique which improves the coding quality of
CELP under background noise without any modification
in the codec structure and performs the smoothing for the
decoded spectral parameters and the excitation signal
energy. It adaptively smoothes both, the spectral envelope
and the energy of the estimated excitation signal. Thus, the
proposed post-processing is performed separately from the
decoder. The paper [5] describes extensions of the 4 kbps
hybrid MELP/CELP coder, up to 6.4 kbps and down to 2.4
kbps. These coders form a close family and they share
most of the encoder analysis, quantization tables and
decoder synthesis. Their coding structure leads to coders
that perform better at a given bit rate than MELP or CELP
and better than equivalent higher bit-rate ITU standards.
The paper [6] presents an idea about a CELP coder with a
stochastic multi-pulse (STMP) codebook and training
procedures for codebook excitation signal. The Linear
Predictive Coding (LPC) residual exhibits a certain
structure due to non linearities in the glottal excitation.
This structure can be exploited by relinement of the STMP
excitation signal, as a training procedure for the codebook.
In paper [7], 16 kbps CELP coder with a complexity as
low as 3 MIPS is presented. The main thrust is to reduce
the complexity as much as possible while maintaining tollquality. This Low Complexity CELP (LC−CELP) coder
has certain features like fast LPC quantization, 3-tap pitch
prediction with efficient open loop pitch search and
predictor tap quantization, backward adaptive excitation
gain, a trained excitation codebook with a small vector
dimension and a small codebook size. Further part of this
section describes the current knowledge as well as
theoretical and methodological contributions to MELP
coder since former days and usually proceeds with the
results found in their work. The papers which were
referred for studying the MELP speech coder are as
follows. In paper [8] the author describes MELP 2400 bps
vocoder implementation and evaluation on the basis of
DAM (Diagnostic Acceptability Measure). The
autocorrelation technique is used for determination of LPC
coefficients and adaptive spectral enhancement allows the
vocoder to better match the voiced speech waveforms. It
shows that the additional information of Fourier series
improves the quality of coded speech. The author of the
paper [9] explains the 600 bps vocoder, which provides
significant increase in secure voice availability compared
to 2400 bps vocoder. The 600 bps vocoder takes the
advantage of inherent interframe redundancy of MELP
parameters. This paper also evaluates the coder on the
basis of subjective tests like DRT (Diagnostic Rhyme
Test) and DAM. The paper entitled in [10] represents a
system to encode speech with high quality, using MELP
coding technique which is effective at bit rates of 1.6−2.4
kbps. MELP model produces significantly higher speech
Copyright to IJARCCE
quality at bit rates above 2.4 kbps. From an extensive
speech quality study in this paper and for bit rate above
2.4 kbps it is clear that high transmission rate for the
voicing strengths and an accurate encoding of the LP
parameters are perceptually important.
III. CELP AND MELP SPEECH CODING
TECHNIQUES
A. CELP speech coding technique
CELP is an efficient closed loop analysis by synthesis
hybrid coding technique for narrow band and medium
band speech coding. Here excitation waveform is obtained
by optimizing the position and amplitude of a fixed
number of pulses to minimize an objective measure of
performance. It is employed to accomplish best quality
speech with low computational complexity [11]. Most
popular coding systems in the range of 4-8 Kbps bit rate
use CELP. This technique is widely used for toll quality
speech at 16Kbps. The basic principle exploited by speech
coders is that speech signals are highly correlated
waveforms.
1. CELP system block diagram
The CELP Analysis-by-Synthesis system is as shown in
Fig.1 where encoding and decoding of speech takes place
at the encoder and the parameters which minimize the
energy of error signal are found at the encoder. LP
analysis is used to find the vocal system impulse response
in each frame. The error signal is perceptually weighted to
emphasize important frequencies and it is minimized by
optimizing the excitation signal. The excitation signal is
updated over four blocks within the frame. The proposed
CELP coder has a frame duration of 20ms and block
duration of 5 ms for finding the excitation. The encoder
needs information about linear prediction coefficients (a),
gain (G), pitch filter (b), pitch delay (P) and codebook
index (k). After calculating these parameters they are sent
to decoder. The linear prediction analysis estimates all
pole filter in each frame which is used to generate spectral
envelope of the speech signal. The filter has 10 LPcoefficients and makes use of Levinson‟s Durbin
algorithm which reduces the complexity of the filter. The
output of LP Analysis is error signal which is passed
through the perceptual error weighing filter to control the
noise level. The Gaussian codebook in the implemented
system has number of Gaussian signals which are used as
excitation signals for the filter. The Gaussian codebook
with 512 sequences yields good quality speech with code
index value as 9 bits. The pitch filter is used as a long
delay correlation filter to generate pitch periodicity in the
voiced speech. For energy minimization between original
speech signal and synthetic speech the parameters G, k, b
and P are determined over a particular frame [12].
Fig.1 Block diagram of CELP coder [12]
DOI 10.17148/IJARCCE.2016.56155
703
IJARCCE
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
2. Post filtering in CELP
The perceptual weighting filter is used inside search loop
for best excitation in the codebook. When there is some
distortion remaining in the reconstructed speech, it is
termed as roughness or coding error which is a function of
frequency and too high at regions between formants and
between pitch harmonics. Thus several coders employ a
post filter that operates on reconstructed speech to deemphasize coding error between formants and pitch
harmonics. This process is known as “Post-processing”
[13].
2. Shaping filters in MELP coding technique
The MELP speech production model makes use of two
different shaping filters which combines pulse excitation
with noise excitation and form mixed excitation signal. By
time varying the voicing strengths, pair of time varying
filters is formed and these filters decide the amount of
pulse and the amount of noise in the excitation, at different
frequency bands. The shaping filters mechanism controls
the frequency response so as to achieve the voicing
strengths. The two filters which are used here works in
complementary fashion. When the gain of one filter is
high, at the same time gain of the other filter is
B. MELP speech coding technique
proportionately lower, and hence the total gain of the two
The Mixed Excitation Linear Prediction (MELP) filters remains constant at all times [15].
algorithm is the linear prediction based parametric speech
coder that was chosen as the new 2.4 kbps US Federal
IV. IMPLEMENTATION OF CELP AND MELP
Standard (FS). Even though the MELP technique is quite
SPEECH CODING TECHNIQUE
good, there are still some perceivable distortions,
particularly around non-stationary speech segments and The coding techniques implemented here are CELP and
for some low pitch male speakers. In MELP speech coding MELP. The Fig.5 and Fig.6 are the flow charts of CELP
the input speech signal parameters are estimated first and MELP techniques which explains the workflow and
which are then used to synthesize speech signal at the processing of these techniques and helps to analyse and
output. In these coders the samples of input speech signal design these coding techniques. The recording of speech
are buffered into frames and are given to linear prediction signal is done with the help of Praat software. The
filter. The frame can be represented by filter coefficients simulation is done for various speech samples and their
and a scale factor. It utilizes parameters like mixed subjective and objective parameters are calculated while
excitation,
aperiodic
pulses,
adaptive
spectral analysing these techniques.
enhancement, pulse dispersion filtering and Fourier
magnitude modelling to capture the signal dynamics [14]. A. Introduction to PRAAT software
Praat can read sounds recorded with the program or audio
1. Block diagram of MELP speech coder
files recorded. Here the different speech signals are
A block diagram of speech production model for MELP recorded using 5.4.21 version of Praat software. The
coder is shown in Fig.2, which is an endeavour to improve sampling frequency Fs of the speech signal is set to 8000
the LPC model. Implementing a MELP coder mainly Hz. The Fig.3 shows the PRAAT window and Fig.4 shows
involves four steps: analysis, encoding, decoding and the speech recorded using PRAAT software at sampling
synthesis. The MELP coder divides the speech signal into frequency of 8000 Hz.
three classes: voiced, unvoiced, and jittery voiced. From
the input speech signal the shape of the excitation pulse is
extracted for the periodic excitation and it is transmitted as
information on the frame. The pulse shape contains
significant information which is captured by the MELP
coder through Fourier magnitudes of the prediction error.
These quantities are needed for the generation of the
impulse response of the pulse generation filter. It is also
responsible for the synthesis of periodic excitation. The
periodic excitation and noise excitation are filtered using
Fig.3 Praat window
the pulse shaping filter and noise shaping filter,
respectively and later these outputs are added together to
form the total excitation called as mixed excitation [15].
Fig.4 Speech signal recorded using Praat software
Fig.2 MELP model of speech production [15]
Copyright to IJARCCE
B. Flowchart of CELP speech coding technique
The Fig.5 below is the flowchart of Code Excited Linear
Prediction speech coding technique. The algorithm begins
with the recording of speech file in PRAAT software with
DOI 10.17148/IJARCCE.2016.56155
704
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
IJARCCE
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
sampling frequency of 8000 Hz. The recorded speech sequence and synthesized speech is generated which is
signal is loaded into MATLAB using “wavread” plotted in MATLAB.
command. After loading the speech signal, different
parameters like frame length (L), order of LP analysis (M),
constant parameter for perceptual weighted filter (c), Pidx
range are given certain values. After the analysis by
synthesis principle begins and the Gaussian codebook is
created. This codebook is searched for obtaining the code
vectors and later the 9.6 kbps and 16 kbps speech coders
are invoked using a function in MATLAB. The Levinson
Durbin algorithm is used for synthesis part. At the end of
the algorithm the graphs are plotted which includes
original speech signal, synthesized speech signal at 9.6
kbps and 16 kbps as well as a comparison plot of different
bit rate signal with original speech signal.
Fig.6 Flowchart of MELP speech coding technique
V. SIMULATION RESULTS
Fig.5 Flowchart of CELP speech coding technique
C. Flowchart of MELP speech coding technique
The Fig.6 below shows the flowchart of Mixed Excitation
Linear Prediction technique. The MELP algorithm begins
in same way as in CELP. The speech signal recorded in
PRAAT is used for further analysis and synthesis. At the
beginning of the program the prediction order and frame
size of hamming window are defined. The zero padding is
done in the signal if needed and after this step the original
signal is multiplied with the window. Thereafter Levinson
and Durbin algorithm is applied after this step and voiced
and unvoiced decision is taken on the basis of filters.
Subsequently the gain for the speech frame is calculated
and the value of pitch is determined. At this point of
program the analysis stage is completed and synthesis
stage begins. Again the voiced and unvoiced frames are
checked and the speech signal is converted into a single
Copyright to IJARCCE
A. Results of CELP speech coding technique
The CELP coder is implemented using MATLAB R2009a.
It was simulated on different speech inputs like /a/, /i/, /o/,
/uu/ vowels, word „hello‟, „h‟ file and on a sentence. The
original speech sound is mono-sound recorded using
PRAAT software sampled at 8000 Hz. The CELP coder
analysis by synthesis began by loading the original speech
into MATLAB using „wavread‟ command and creating a
Gaussian codebook. Different functions are written in
MATLAB for invoking the CELP coders.
This technique is implemented for low as well as high bit
rate i.e. 9.6 kbps and 16 kbps respectively. For different
values of perceptual weighted constant () the different
figures are obtained. This constant mainly helps to reduce
the perceptual weighted error in the synthesised speech
signal. Fig.7 is a plot of „h‟ file along with a plot of 16
kbps and 9.6 kbps CELP synthesized speech signal. The
Fig.8 and Fig.9 are the plots of „h‟ file for c=0.9 at 16 kbps
and 9.6 kbps and Fig.10 and Fig.11 are the plots for „h‟
file for c=0.5 at 16 kbps and 9.6 kbps. All figures have
“time” on x-axis and “amplitude” on y-axis.
DOI 10.17148/IJARCCE.2016.56155
705
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
IJARCCE
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
The original speech samples
The comparison of original speech & 16 kbps CELP synthetic samples
Amplitude
1
1
0.6
-1
0
1
2
3
4
5
6
7
time
0.4
4
x 10
The CELP 16 kbps synthetic samples
0.2
Amplitude
1
Amplitude
original speech
16 kbps CELP speech
0.8
0
0
0
-0.2
-1
0
1
2
3
4
5
6
-0.4
7
time
4
x 10
-0.6
The CELP 9.6 kbps synthetic samples
1
Amplitude
-0.8
0
-1
0
1
2
3
4
5
6
7
time
-1
0
1
2
3
4
5
6
7
time
4
x 10
Fig.7 „h‟ file plot along with 16 kbps and 9.6 kbps CELP
synthesis
4
x 10
Fig.10 „h‟ file compared with 16Kbps synthesized
signal [c=0.5]
The comparison of original speech & 9.6 kbps CELP synthetic samples
The comparison of original speech & 16 kbps CELP synthetic samples
1
1
original speech
16 kbps CELP speech
0.8
original speech
9.6 kbps CELP speech
0.8
0.6
0.6
0.4
0.4
Amplitude
0.2
Amplitude
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-0.6
-0.8
-0.6
-1
-0.8
-1
1
2
3
4
5
6
7
time
4
Fig.8 „h‟ file compared with 16Kbps synthesized
signal [c=0.9]
x 10
The comparison of original speech & 9.6 kbps CELP synthetic samples
1
original speech
9.6 kbps CELP speech
0.8
0.6
0.4
Amplitude
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
0
1
2
3
4
5
6
time
0
0
1
2
3
4
time
5
6
7
4
x 10
Fig.9 „h‟ file compared with 9.6Kbps synthesized signal
[c=0.9]
7
4
x 10
Fig.11 „h‟ file compared with 9.6Kbps synthesized signal
[c=0.5]
B. Analysis of CELP speech coding technique
Here „h‟ file and a „sentence‟ are synthesized using CELP
technique. These speech signals are analysed at 16 kbps
and 9.6 kbps for different values of constant parameter for
perceptual weighted filter (c). It is observed that for lower
value of c i.e. c=0.5 reconstruction of speech signal is
better and audio quality and intelligibility is also
maintained. Hence it can be concluded that CELP
Table 1.Comparison of MSE for 16 kbps and 9.6 kbps synthesized speech for different values of „c‟ (perceptual
weighted constant)
Speech
MSE values
signals
16 kbps
9.6 kbps
c=0.9
c=0.5
c=0.9
c=0.5
„h‟ file
0.0318
0.0019
0.1746
0.0318
„sentence‟
0.1131
0.0916
0.0064
0.0019
Table 2.Mean Opinion Score (MOS) calculation for different speech signals
Speech signals
“MOS” calculation ( Subjective evaluation)
“ h file”
3.9
Sentence
3.7
Table 3.Signal-to-Noise ratio calculation for different values of „c‟ at different bit rates
Speech
SNR(dB)
signals
16 kbps
9.6 kbps
c=0.9
c=0.5
c=0.9
c=0.5
„h‟ file
104.3526
102.8461
108.1873
101.0187
„sentence‟
103.8115
104.9030
110.4880
103.6551
technique depends on value of c. The value of c ranges
from 0.5 to 0.9 giving appropriate result for c=0.5. The
mean square error values obtained for different bit rates
and for c=0.9 and c=0.5 are as given in Table 1 Mean
opinion score (MOS) values are also calculated for
different signals are given in Table 2. The SNR values
calculation for different bit rate and different c values are
given in Table 3.
Copyright to IJARCCE
C. Results of MELP speech coding technique
The MELP coder is implemented using MATLAB
R2009a. It was studied on different input speech signals
like /a/, /i/, /o/ vowels, /fa/, /mi/, word „hello‟ and on
different sentences like „mysp1‟, „mysp2‟, „mysp3‟,
„savitha1‟. The original speech sound is a mono-sound
recorded using PRAAT software having sampling rate of
about 8000 Hz. The MELP coder analysis began by
DOI 10.17148/IJARCCE.2016.56155
706
IJARCCE
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
loading the original speech using „wavread‟ command.
Different functions like mygain, mylevinson, mypitch,
myzcr are written in MATLAB for invoking the MELP
coders. The Fig.12 and Fig.14 are the plots of original
speech signal and synthesized speech signal. For all these
plots the „x‟ axes have amplitude and „y‟ axes have time in
milli-seconds (ms). In the Fig.13 and Fig.15 the „x‟ axes
have frame length of speech signal in milli-seconds and
„y‟ axes have pitch values in hertz (Hz). These figures are
plotted for different values of frame length i.e. 30 ms and
25 ms.
Original speech signal fa
1
time(ms) ---->
0.5
0
-0.5
-1
0
500
1000
1500
2000
2500
3000
Amplitude ---->
MELP synthesized speech signal
3500
4000
time(ms) ---->
1
D. Analysis of CELP speech coding technique
The different speech signals synthesized by MELP
technique are vowel /o/, /fa/ /mi/, /mysp2/. These signals
are analysed for two different values of frame length i.e.
30 ms and 25 ms. The window used in MELP synthesis is
hamming window. The synthesized speech signal quality
is better for higher value frame length (frame length= 30
ms) and it goes on degrading as the value of frame length
decreases (frame length= 25 ms) further. The subjective
evaluation is done on the basis of MOS values as shown in
Table 4. The synthesized speech signal is evaluated by ten
different users and it is rated as per ITU recommendation.
The objective evaluation is done by calculating the MSE
(Mean Square Error) values and SNR (Signal to Noise
Ratio) values for different values of frame length as shown
in Table 5 and Table 6 respectively. It can be concluded
that MELP technique depends on value of frame length
and also on type of window used.
0.5
0
-0.5
0
500
1000
1500
2000
2500
Amplitude ---->
3000
3500
4000
Fig.12 /fa/ Original and synthesized speech signal
Table 5. MSE values calculation for different values of
frame length
Speech
MSE values for different
signals
values of frame length
Frame length Frame length
= 30 ms
= 25 ms
/fa/
0.1682
0.2291
mysp1
0.0528
0.0597
Plots for pitch fa
Pitch(Hz) ---->
200
150
100
50
0
0
5
10
15
20
frame length of speech signal(ms) ---->
25
0
5
10
15
20
frame length of speech signal(ms) ---->
25
1
VO/UV ---->
Table 4. MOS values calculation
Speech signals
MOS values calculation
/fa/
3
mysp1
3.5
0.5
0
Fig.13 /fa/ Plot for pitch for /fa/(Frame length= 30 ms
Fig.20&21)
Original speech signal fa
time(ms) ---->
1
0.5
0
-0.5
-1
0
500
1000
1500
2000
2500
3000
Amplitude ---->
MELP synthesized speech signal
3500
4000
time(ms) ---->
2
1
0
-1
0
500
1000
1500
2000
2500
Amplitude ---->
3000
3500
4000
Fig.14 /fa/ original and synthesized speech signal
Plots for pitch fa
Pitch(Hz) ---->
200
150
100
50
0
0
5
10
15
20
25
frame length of speech signal(ms) ---->
30
35
0
5
10
15
20
25
frame length of speech signal(ms) ---->
30
35
VO/UV ---->
1
0.5
0
Fig.15 /fa/ Plot for pitch for /fa/(Frame length=25 ms
Fig.22&23)
Copyright to IJARCCE
Table 6. SNR values calculation for different values of
frame length
Speech
SNR values for different
signals
values of frame length
Frame length Frame length
= 30 ms
= 25 ms
/fa/
65.8072
72.2378
mysp1
85.8439
87.787
E. Comparison of results obtained for CELP and MELP
speech coding technique
This section includes the comparative study of CELP and
MELP speech coding techniques on the basis of MOS,
MSE and SNR values obtained during simulation. The
MOS values obtained for different speech signals in CELP
and MELP technique are as shown in Table.2 and Table 4
respectively. By observing these values it can be
concluded that CELP technique gives toll quality speech
when compared to MELP technique. This can be said
because higher MOS values are obtained in CELP
technique. The MSE values are obtained for different bit
rate for CELP technique are as shown in Table 1, similarly
for MELP technique MSE values for different values of
frame length are as shown in Table 5. By noticing these
values, CELP coded speech is observed to be less accurate
when compared to MELP coder. The reason may be due to
DOI 10.17148/IJARCCE.2016.56155
707
IJARCCE
ISSN (Online) 2278-1021
ISSN (Print) 2319 5940
International Journal of Advanced Research in Computer and Communication Engineering
Vol. 5, Issue 6, June 2016
the poor way of representation of excitation signal in
CELP coding technique. Among these methods SNR
5.
performance of CELP coder is better than MELP. The
improved performance of CELP is due to the high SNR
characteristics of speech signal present in the glottal
closure phase. The higher values of SNR as obtained in
6.
Table 3 shows that the signal strength is stronger in
relation to the noise levels and it offers higher data rate
and better throughput. The CELP coder uses the code 7.
book to represent the excitation signal which introduces
more approximation in the synthesized speech when
compared to MELP coding technique. Hence it can be
concluded that the performance of CELP coding technique
is better when compared to MELP coding technique.
8.
VI. CONCLUSION AND FUTURE WORK
It can be concluded from simulation that CELP technique
provides toll quality speech than existing low bit-rate
algorithms, such as RELP and LPC vocoders for lower
values of c and works well in a range of 9.6 Kbps to 16
kbps bit rate. Along with its variants, such as algebraic
CELP, relaxed CELP, low-delay CELP, it is currently the
most widely used coding algorithm. The MSE values
obtained are high for 9.6Kbps and low for 16Kbps which
makes it clear that CELP can work efficiently at low bit
rate. On the other hand MELP coder provides much better
quality than all older military standards, especially in noisy
environments such as battlefield and vehicles and aircraft.
The MELP coder has additional features like mixed
excitation, a periodic pulses, pulse dispersion and adaptive
spectral enhancement as compared to other parametric
coder. It helps to remove the annoying artifacts, buzzes and
tonal noises. The subjective evaluation of MELP technique
suggests that it is also a low bit rate coder and intelligible
coder among parametric coder. The literature says that
there are other speech coding techniques which can
perform well when compared to CELP and MELP. One
such coding technique is RELP (Residual Excited Linear
Prediction), which directly transmits the residual signal. To
achieve the lower bit rates, the residual signal is usually
down-sampled. It is used in some text to speech voices,
such as diphone databases which are mainly found in the
speech synthesizers. The variants of CELP technique such
as Algebraic CELP (ACELP), Relaxed CELP (RCELP),
Low Delay CELP (LD−CELP) and vector sum excited
linear prediction can also be implemented because these are
the most widely used speech coding algorithms in MPEG-4
Audio and speech coding.
9.
10.
11.
12.
13.
14.
background noise,” IEEE Workshop Speech coding Proceedings,
2000, pp.102-104.
Jacek Stachurski, Alan MeCree, Vishu Viswanathan, Anssi Ram,
Sakari Himanen, Peter Blocher, “Hybrid MELP and CELP Coding
at bit rates from 6.4 to 2.4 Kbps,” IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP)
proceedings, Vol. 1, 1999, pp.153-156.
Bernt Ribbum, Andrew Perkis, K.K.Paliwal, “Enhancing the
codebook for improving the speech quality of CELP coder,” ELAB,
Norwegian Institute of Technology, pp. 408-413.
Rainer Martid, Richard V. Cox, “New speech enhancement
techniques for low bit rate speech coding,” Proceedings of IEEE
Workshop on Speech Coding (SCW), June 1999, pp.165-167.
Tahereh Fazel Hughes, Thomas Fuja, “Robust transmission of
MELP compressed speech: An illustrative example of joint source
channel decoding,” IEEE Transactions on Communications, Vol.
51, No. 6, June 2003, pp. 973-982.
Hamza Kheddar, Bachir Boudraa, “Implementation of interleaving
methods on MELP 2.4 coder to reduce packet loss in the Voice over
IP (VoIP) transmission,” International Journal of Engineering
Research and Applications, ISSN: 2248-9622, Vol. 5, Issue 3, (Part
-4) March 2015, pp. 85-89, IF. 1.69.
Alan V McCree, Thomas R Bamwe, “Implementation and
evaluation of a 2400 bps Mixed Excitation LPC Vocoder,” IEEE
Transactions on Acoustics, Speech and Signal Processing, 1993, pp.
159-162.
Wai C. Chu, by John Wiley & Sons,” Speech Coding AlgorithmsFoundation and Evolution of Standardized Coders,” New Jersey.
Published simultaneously in Canada, Copyright # 2003
http://www.ece.mcmaster.ca/~shirani/multi12/subband.pdf assessed
on July 2015.
http://read.pudn.com/downloads167/ebook/767824/chap_7_speech.
pdf assessed on July 2015.
E. Harborg, J. E. Knudsen, A. Fuldreth, F. T. Johansen,” A realtime wideband CELP coder for a videophone application,” IEEE
International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), Vol.2, April 1994, pp.121-124.
A. R. Sahab, M. Khoshroo, “Speech Coding Algorithms: LPC10,
ADPCM, CELP and VSELP,” Journal of Applied Mathematics,
Islamic Azad University of Lahijan, Vol.5, No.16, 2008, pp. 37-49.
REFERENCES
1.
2.
3.
4.
http://shodhganga.inflibnet.ac.in/bitstream/10603/2268/14/14_chapt
er%201.pdf assessed on June 2015.
Sanjeev Gupta,” Science of Speech Coding,” DRDO Science
Spectrum, March 2009, pp. 120-127.
M. Kulesza, G. Szwoch, A. Czyżewski, “High quality speech
coding using combined parametric and perceptual modules,” World
Academy of Science, Engineering and Technology, 2006, pp. 106111.
Atsushi Murashima, Masahiro Serizawa, Kazunori Ozawa,“A postprocessing technique to improve coding quality of CELP under
Copyright to IJARCCE
DOI 10.17148/IJARCCE.2016.56155
708