Document
.
US005434948A
Ulllted States Patent [19]
[11] Patent Number:
Holt et a1.
[45]
[54]
POLYPHONIC CODING
5,434,948
Date of Patent:
Jul. 18, 1995
5,091,944 2/1992 Takahashi ........................... .. 381/38
.
5,113,437
[75] Inventors: Christopher E-.H0lt, Melton; Edward
5/1992
Best et a1. ...... ..
380/3
5,142,656 8/1992 Fielder et a1.
Munday, IpSWlCh; Barry M. G.
381/37
5,285,498 2/1994 Johnston .......................... .. 395/2.12
Cheetham, Liverpool, all of England
73
_
_ _
T l
[ ] Asslgnee'
_
_
_
OTHER PUBLICATIONS
czmesggmilgg?ggnslaguli?d
’
’
Nelson et al, “Adaptive inverse ?lters for stereophonic
g
sound reproduction”; IEEE Transactions on Signal
[21] Appl. No.: 109,479
.
Processing, vol.: 40 Iss: 7 pp.‘ 1621-1632, Jul. 1992.
_
Minami et a1, “Stereophonic ADPCM voice coding
[22] Flled'
Mg‘ 20’ 1993
method”; ICASSP 90, pp. 1113-1116, 3-6 Apr. 1990.
Related US. Application Data
.
[63]
[30]
Primary Examiner—Allen R. MacDonald
.
Assistant Examiner-—Tariq Ha?z
ggrrliglnuation of Ser. No. 834,548, Feb. 12, 1992, aban-
Attorney, Agent, or Firm_Nixon & Vanderhye
Foreign Application Priority Data
[57]
- d
1.5,18
............... ..8l758
“n 1 9 9 [GB] Umted Kmg om
9 3
[51] Int. Cl.6 .............................................. .. G10L 9/00
ABSIR‘ ‘Cr
A polyphonic
(e. g. stereo)
audioconferencing
system,
in
_
_
_
_
_
which input left and right channels are time-aligned by
variable delay stages (10a, 10b), controlled by a delay
[52] U.S. c1. ..................................... .. 395/229; 395/2;
calculator (9) (e.g. by deriving the maximum crossmr.
395/2~1; 395/2-67; 381/51
relation value), and then summed in an adder (2) and
[53] Field of Search -------------- -- 395/2, 2-24, 2-33, 2-12,
subtracted in subtracter (3) to form sum and difference
395/2.25-2.28; 381/1, 10, 17, 38, 47, 51, 31
signals. The sum signal is transmitted in relatively high
[56]
References Cited
quality; the difference signal is reconstructed at the
decoder by prediction from the sum signal using an
US PATENT DOCUMENTS
4,236,039 11/1930
adaptive ?lter (5). The decoder adaptive ?lter (5) is
cooper _ _ , _ _ _
_ _ , _ _ __ 179/1
4,538,234 8/1985 Honda et 31
381/31
using backwards adaptation, from a received residual
. . . . .. 395/2
signal produced by a corresponding adaptive ?lter (4) in
395/2-24
--- -- 395/2
the coder, or both. Preferably, the adaptive ?lter (4) is
a lattice ?lter, employing a gradient algorithm for coef
4’956’871 9/1990 S‘Yammathan """ "
321/31
?cient update. The complexity of the adaptive ?lter (4)
"""""" “
5’040’217 8/1991 Branedengl'l'r'g'e‘é'gl' ‘
3
‘ ' ' ' "381/47
is reduced by pre-whitening, in the encoder, both the
sum and difference signals using corresponding whiten
5:042:069 8/1991 Chhatwo] et aL '
355/238
4,559,602 12/ 1985
Bates, Jr
. . . .. . . . . .
4,704,730 11/1987 Turner et a1 ------- -4,852,169 7/1989 Veenefnan et a1-
con?gured
(E.Q.XL+XR)
120)
received
?lter
coef?cients
ing ?lters (14a, 14b) derived from the sum channel.
5,060,268 10/1991 Asakaw et a1.
381/38
5,060,269 10/ 1991 Zinser ................................. .. 381/38
1S1’ CHANNEL
either by
17 Claims, 5 Drawing Sheets
ENCOUER
13c)
ENCODER
DECODER
[10
¢r+ DECODER
DELAY ‘——>
LOCAL
13“
oecooaa /
140
211-0
\
i E
ADAPTIV
’
F:
ADAPTIVE)
'\
F: PREWHITENING
WHITENING III‘,
/ FILTER
FILTER
i i” A
m: :
l
'-' '
E: 'I
g; ;
10)
ADAPTIVE
PREmTcgwE
12b
\ /
carygu
leg. xL-xR)
~
PREDIETIVE
11.151 ,1 L,
FILTER
2nd CHANNEL
/
ADAPTIVE’
E:
: /F%(TZE)R COEFFICIENTS
Oi i
"a 2’
9w 1,,
‘DELAY -> WHI
i:
\ii
::m
I
1
DECODER T
“125615?
ENCODER
13d)
4, -—'_ DECOIIR
5
: ' ‘i
. l _
to
:gg
‘F';(Z,R\ \:E ii QE
/
as“
q 7 [24b
\.
+
+
[Ne/$55
-_
or,
US. Patent
July 18, 1995
Sheet 1 of 5
5,434,948
2 Fig. 7.
‘*\~
FILTER
CALCULATION
I
MUX
hi
RESIDUAL u’
\
DIFFERENCE
SIG N AL >
.
RECONSTRUCTION
DATA
H'gZ
6
8n
XS
x
I, 2
%
3
I
CONTROLLED
FILTER
XD
DIFFERENCE
SIGNAL
RECONSTRUCTION
DATA
'
I
a
US. Patent
July 18, 1995
Sheet 2 of 5
\ ADAPTIVE
DELAY
CALCULATOR
\
FKLTER
5,434,948
FILTER
COEFFS
DIFFERENCE
SIGNAL
RESIDUAL C RECONSTRUCTION
DATA
Y.
RECONSTRUCTION
DATA
DELAY D
US. Patent
July 18, 1995
Sheet 5 of 5
5,434,948
1
5,434,948
2
correlation) prior to transmission and re-introduce it
POLYPHONIC CODING
after reception.
In general, H(s) is not unique and can be signal- and
This is a continuation of application Ser. No.
time- dependent. However when the source signals are
07/834,548, ?led Feb. 12, 1992, now abandoned.
5 white and uncorrelated, i.e. when their autocorrelation
This invention relates to polyphonic coding tech
functions are zero except at t =0 and their cross-correla
niques, particularly, but not exclusively, for coding
tion functions are zero for all t, H(s) will depend on
speech signals.
factors not subject to rapid change, such as room acous
It is well-known that polyphonic, speci?cally stereo
phonic, sound is more perceptually appealing than
monophonic sound. Where several sound sources, say
tics and the positions of the microphones and sound
sources, rather than the nature of the source signals
which may be rapidly changing.
within a conference room, are to be transmitted to a
To realise such a system in physical form, the funda
second room, polyphonic sound allows a spatial recon
mental problems of causality and stability must be over
struction of the original sound ?eld with an image of
come. Consider for a moment a single source signal
each sound source being perceived at an identi?able 15
which is delayed by d;, seconds before reaching the left
point corresponding to its position in the original con
microphone and by d}; seconds before reaching the
right microphone (although the point to be made has
ference room. This can eliminate confusion and misun
derstandings during audio-conference discussions since
more general implications). If the source is near to, say,
each participant may be identi?ed both by the sound of
his voice and by his perceived position within the con
the left microphone, then dL will be smaller than (1,. The
interchannel transfer function H(s) must delay x1_(t) by
ference room.
the difference between the two delays, dR—dL to pro
Inevitably, polyphonic transmissions require an in
crease in transmission capacity as compared with mono
phonic transmissions. The conventional approach of
transmitting two independent channels, thus doubling
duce the right channel xR(t). Since dg-dL is positive,
H(s) will be causal. If the signal source is now moved
25 closer to the right microphone than to the left, dR--d1_
the required transmission capacity, imposes an un
naceptably high cost penalty in many applications and is
not possible in some cases because of the need to use
existing channels with ?xed transmission capacities.
In stereophonic (i.e. two-channel polyphonic) sys
tems, two microphones (hereinafter referred to as left
and right microphones), at different positions, are used
to pick up sound generated within a room (for example
by a person or persons speaking). The signals picked up
by the microphones are in general different. Each mi 35
becomes negative and H(s) becomes non-causal; in
other words, there is no causal relationship between the
right channel and the left channel, but rather the reverse
so the right channel can no longer be predicted from the
left channel, since a given event occurs first in the right
channel. It will therefore be realised that a simple sys
tem in which one fixed channel is always transmitted
and the other is reconstructed from it is impossible to
realise in a direct sense.
‘
According to a ?rst aspect of the invention, there is
crophone signal (referred to hereinafter as xL(t) with
Laplace transform XL(s) and xR(t) with Laplace trans
provided a polyphonic signal coding apparatus com
form XR(s) respectively) may be considered to be the
means for receiving at least two input channels from
different sources;
means for producing a sum channel representing the
sum of such signals, and for producing at least one dif
superposition of source signals processed by respective
acoustic transfer functions. These transfer functions are
strongly affected by the distances between the sound
sources and each microphone and also by the acoustic
properties of the room. Taking the case of a single
prising:
ference channel representing a difference therebetween;
means for periodically generating a plurality of para
metric coef?cients which, if applied to a plural order
source, e. g. a single person speaking at some ?xed point
within the room, the distances between the source and 45 predictor ?lter, would enable the prediction of the dif
ference channel from the sum channel thus filtered; and
the left and right microphones give rise to different
means for outputting data representing the said sum
delays, and there will also be different degrees of atten
channel and data enabling the reconstruction of the said
uation. In most practical environments such as confer
difference channel therefrom.
ence rooms, the signal reaching each microphone may
have travelled via many reflected paths (e. g. from walls
In a ?rst embodiment, the difference signal recon
or ceilings) as well as directly, producing time spread
struction data are ?lter coef?cients. In a second embodi
ing, frequency dependent colouration due to resonances
ment, the residual signal representing the difference
between the difference signal and the sum signal when
and antiresonances, and perhaps discrete echos.
From the foregoing, in theory, the signal from one
thus ?ltered is formed at the transmitter, and this is
microphone may be formally related to that from the 55 transmitted as the difference signal reconstruction data.
other by designating an interchannel transfer function H
In this embodiment, the prediction residual signal may
be ef?ciently encoded to allow an backward adaptation
say; i.e. XL(s)=H(s) XR(s) where s is complex fre
quency parameter. This statement is based on an as
technique to be used at the decoder for deriving the
sumption of linearity and time-invariance for the effect
prediction ?lter coef?cients. The residual is also used as
of room acoustics on a sound signal as it travels from its
an error signal which is added to the prediction ?lter’s
source to a microphone. However, in the absence of
output at the decoder to correct for innaccuracies in the
knowledge as to the nature of H, this statement does no
prediction of the difference channel from the sum chan
more than postulate a correlation between the two sig
nel. This “residual only” embodiment is also useful
nals. Such a postulation seems inherently sensible, how
where the left channel, say, is predicted from the right
ever, at least in the special case of a single sound source, 65 channel (without forming sum and difference signals
and therefore one way of reducing the bit-rate needed
)-—provided suitable measures are taken to ensure
to represent stereo signals should be to reduce the re
causality--to give high quality polyphonic reproduc
dundancy of one relative to the other (to reduce this
tion. In a third embodiment, both are transmitted.
3
5,434,948
Preferably, the means for generating the ?lter coef?
cients is an adaptive ?lter, advantageously a lattice
?lter. This type of ?lter also gives advantages in non~
sum and difference polyphonic systems.
In preferred embodiments, variable delay means are
disposed in at least one of the input signal paths, and
controlled to time align the two signals prior to forming
4
The invention will now be illustrated, by way of
example only, with reference to the accompanying
drawings in which:
FIG. 1 illustrates generally an encoder according to a
?rst aspect of the invention;
FIG. 2 illustrates generally a corresponding decoder;
FIG. 3a illustrates an encoder according to a pre
ferred embodiment of the invention;
FIG. 3b illustrates a corresponding decoder;
the sum and difference signals so that causal prediction
?lters of reasonable order can be used.
FIGS. 4a and 4b show respectively a corresponding
encoder and decoder according to a second aspect of
the invention.
This aspect of the invention has several important
advantages:
(i) The ‘sum signal’ is fully compatible with mono
phonic encoding and is unaffected by the poly
FIGS. 5a and 5b illustrate an encoder and a decoder
phonic coding except for the introduction of an
imperceptible delay. In the event of loss of stereo,
according to a second aspect of the invention;
monophonic back-up is thus available.
yet further embodiment of the invention.
FIG. 6 illustrates part of an encoder according to a
(ii) The sum signal may be transmitted by conven
The embodiments illustrated are restricted to 2 chan
tional low bit-rate coding techniques (eg. LPC)
nels (stereo) for ease of presentation, but the invention
may be generalised to any number of channels. One
without modi?cation.
(iii) The encoding technique for the difference signals
20
possible way of removing the redundancy between two
can be varied to suit the application and the avail
input signals (or predicting one from the other) would
able transmission capacity between the above three
embodiments. The type of residual signal and pre
be to connect between the two channels an adaptive
predictor ?lter whose slowly changing parameters are
calculated by standard techniques (such as, for example,
diction coef?cients can also be selected in various
different ways, while still conforming to the basic 25 block cross-correlation analysis or sequential lattice
adaptation). In an audioconferencing environment, the
two signals will originate from sound sources within a
(iv) Overall, the apparatus encodes polyphonic sig
nals with only a modest increase in bit-rate require
room, and the acoustic transfer function between each
source and each microphone will be characterised typi
ment as compared with monophonic transmission.
encoding principle.
(v) The encoding is digital and hence the perfor
30
mance of the apparatus will be predictable, not
subject to ageing effects or component drift and
cally by weak poles (from room resonances) and strong
zeros (due to absorption and destructive interference).
An all-zero ?lter could therefore produce a reasonable
approximation to the acoustic transfer function between
easily mass-produced.
A method of calculating approximations to H(s)
a source and a microphone and such a ?lter could also
when the source signals are not white (which, of course, 35 be used to predict say the left microphone signal xL(t)
includes all speech or music signals) is proposed in a
from xR(t) when the source is close to the right micro
phone. However, if the source were now moved away
second aspect of the invention, using the idea of a ‘pre
whitening ?lter’.
from the right microphone and placed close to the left,
the nature of the required ?lter would be effectively
According to a second aspect of the invention, there
is provided a polyphonic signal coding apparatus com 40 inverted even when delays are introduced to guarantee
prising:
causality. The ?lter must now model a transfer function
with weak zeros and strong poles-a dif?cult task for an
means for receiving at least two input channels;
means for ?ltering each input channel in accordance
all-zero ?lter. Other types of ?lter are not, in general,
inherently stable. The net effect of this is to cause un
with a ?lter approximating the spectral inverse of a ?rst
of said channels to produce respective ?ltered chanels, 45 equal degradation in the reconstructed channel when
the ?rst said ?ltered channel thereby being substantially
the source shifts from one microphone to the other.
spectrally whitened;
This further makes the simplistic prediction of one
means for receiving said ?ltered chanels and for peri
channel (say, the left) from the other (say, the right)
odically generating parametric data for each ?ltered
hard to realise.
In a system according to the ?rst aspect of the inven
channel (other than said ?rst), which would enable the
prediction of each input channel from said ?rst; and
means for outputting data representing the ?rst chan
nel, and data representing said parametric data.
This aspect of the invention provides, as above, the
advantages of a digital system compatible with existing 55
techniques and simpli?es the process of modelling (at
the encoder) the required interchannel transfer func
tion.
rather than H(s).
Referring to FIG. 1, in its essential form the invention
comprises a pair of inputs 1a, 1b for receiving a pair of
speech signals, e.g. from left and right microphones.
The signals at the inputs, xR(t) and xL(t), may be in
provided according to the invention, as are systems
including such encoding and decoding apparatus, par
ticularly in an audioconferencing application, but also
in a polyphonic recording application. Other aspects of
the invention are as claimed and disclosed herein.
past data, but also estimation of present data of a chan~
nel from past and present data of another channel.
xR(t) using an all-zero adaptive digital ?lter.
In practice, xR(t) and xL(t) (or x5(t) and xD(t)) will be
processed in sampled data form as the digital signals
xR[n] and x1;[n] (or xs[n] and xD[n]) and it will be more
convenient to use the ‘z-transform’ transfer fuction H(z)
Broadly corresponding decoding apparatus is also
The words “prediction” and “predictor” in this speci
?cation include not only prediction of future data from
tion, better results have been obtained by forming a
“sum signal” xS(t)=xL(t)+xR(t) and predicting either a
difference signal xD(t)=xL(t)—-xR(t) or simply xL(t) or
65
digital form. It may be convenient at this point to pre
process the signals, e. g. by band limiting. Each signal is
then supplied to an adder 2 and a subtractor 3, the out
put of the adder being the sum signal
xS(t)=xR(t)+xL(t), and the output of the subtracter 3
5
5,434,948
6
being the difference signal x1,(t)——xR(t)+xL(t) i.e.
is obtained some time before the ?lter has converged.
XD(t)=H(s) X_<,(s). The sum and difference signals are
then supplied to ?lter derivation stage 4, which derives
the coef?cients of a multi-stage prediction ?lter which,
when driven with the sum signal, will approximate the
difference signal. The difference between the approxi
mated difference signal and the actual difference signal,
the prediction residual signal, will usually also be pro
This method may be improved further because spatial
information is also available from the relative ampli
tudes of the input channels; this could be used to apply
a weighting function to the ?lter coef?cients to speed
convergence.
Referring to FIG. 3a, in a preferred embodiment of
the invention, the complexity and length of the ?lter to
be calculated is therefore reduced by calculating the
required value of d in a delay calculator stage 9 (prefer
ably employing one of the above methods), and then
bringing the channels into time alignment by delaying
duced (although this is not invariably necessary). The
sum signal is then encoded (preferably using LPC or
sub~band coding), for transmission or storage, along
with further data enabling reconstruction of the differ
ence signal. The ?lter coef?cients may be sent, or alter
one or other by d using, for example, a pair of variable
delays 10a, 10b (although one ?xed and one variable
natively (as discussed further below), the residual signal
may be transmitted, the difference channel being recon 15 delay could be used) controlled by the delay calculator
stituted by deriving the ?lter parameters at the receiver
9. With the major part of the speech information in the
using a backwards adaptive process known in the art; or
channels
time aligned, the sum and difference signals
both may be transmitted.
are then formed.
Although it would be possible to calculate ?lter pa
rameters directly (using LPC analysis techniques), one 20 Referring to FIG. 3b, the delay length d is preferably
transmitted to the decoder, so that after reconstructing
simple and effective way of providing the derivation
the
difference channel and subsequently the left and
stage 4 is to use an adaptive ?lter (for example, an adapt
right channels, corresponding variable length delay
ive transversal ?lter) receiving as input the sum channel
stages 11a, 11b in one or other of the channels can re
and modelling the difference channel so as to reduce the
store the interchannel delay.
In the illustrated structure, the “sum” signal is thus no
adaptation are well-known in the art.
longer quite the true sum of xL(t)+xR(t); because of the
Our initial experiments with this structure have used
delay (1 it is xL(t)+xR(t-d). It may therefore be pre
a transversal FIR ?lter with coef?cient update by an
ferred to locate the delays 10a, 10b (and, possibly, the
algorithm for minimising the mean square value of the
residual, which is simple to implement. The ?lter coef? 30 delay calculator) downstream of the adder and sub
tractor 2 and 3; this gives, for practical purposes, the
cients change only slowly because the room acoustic
same bene?ts of reducing the necessary ?lter length.
(and hence the interchannel transfer function) is rela
In practice, the delay is generally imperceptible; typi
tively stable.
cally, up to 1.6 ms. Alternatively, a ?xed delay, suffi
Referring to FIG. 2, in a corresponding receiver, the
sum signal xs(t) is‘ received together with either the 35 ciently long to guarantee causality, may be used, thus
removing the need to encode the delay parameter.
filter parameters or the residual signal, or both, for the
In the ?rst embodiment of the invention, as stated
difference channel, and an adaptive ?lter 5 correspond
prediction residual. Such general techniques of ?lter
ing to that for which the parameters were derived at the
coder receives as input the sum signal and produces as
output the reconstructed difference signal when con?g
ured either with the received parameters or with param
eters derived by backwards adaptation from the re
ceived residual signal. Sum and difference signals are
25
above, only the ?lter parameters are transmitted as
difference signal data. With 16 bits per coef?cient, this
meant that a transmission ‘capacity of 5120 bits/sec is
needed for the difference channel (plus 8 bits for the
delay parameter). This is well within the capacity of a
standard 64 kbit/sec transmission system used which
allocates 48 kbits/sec to the sum channel (ef?ciently
then both fed to an adder 6 and a subtracter 7, which
produce as outputs respectively the reconstructed left 45 transmitted by an existing monophonic encoding tech
nique) and offers 16 kbits/sec for other “overhead”
and right channels at output nodes 8a and 8b.
data. This mode of the embodiment gives a good signal
Since a high-quality sum signal is sent, the encoder is
to noise ratio and the stereo image is present, although
fully mono-compatible. In the event of loss of stereo
it is highly dependent on the accuracy of the algorithm
information, monophonic back-up is thus available.
used to adapt the predictive ?lter. Inaccuracies tend to
As discussed above, one component of the transfer
cause the stereo image to wander during the course of a
functions HL and HR is a delay component relating to
conference particularly when the conversation is passed
the direct distance between the signal source and each
from one speaking person to another at some distance
of the microphones, and there is a corresponding delay
from the ?rst.
difference d. There is thus a strong cross-correlation
between one channel and the other when delayed by d. 55 Referring to FIG. 4a, in a second embodiment of the
invention, only the residual signal is transmitted as dif
This method, however, requires considerable pro
cessing power.
ference signal data. The sum signal is encoded (12a)
An alternative method of delay estimation found in
using, for example, sub-band coding. It is also locally
papers on sonar research is to use an adaptive ?lter. The
decoded (13a) to provide a signal equivalent to that at
left channel input is delayed by half the ?lter length and
the decoder, for input to adaptive ?lter 4. The residual
the coefficients are updated using the LMS algorithm to
difference channel is also encoded (possibly including
minimise the mean-square error or the output. The
bandlimiting) by residual coder 12b, and a correspond
transversal ?lter coef?cients will, in theory, become the
ing local decoder 13b provides the signal minimised to
required cross-correlation coef?cients. This may seem
adaptive ?lter 4. The advantage this creates is that inac
like unnecessary repetition of ?lter coef?cient deriva 65 curacies in generating the parameters cause an increase
tion were it not for the property of this delay estimator
in the dynamic range of the residual channel and a cor
that the maximum value of the cross-correlation coef?
responding decrease in SNR, but with no loss in stereo
cient (at the position of the maximum ?lter coef?cient)
image.
‘
7
5,434,948
Referring to FIG. 4b, at the decoder, the analysis
?lter parameters are recovered from the ‘transmitted
residual by using a backwards-adapting replica ?lter 5
of the adaptive ?lter 4 at the coder. Decoders 13c, 13d
8
In one embodiment, the coef?cients only are trans
mitted and the decoder is simply that of FIG. 2 (needing
no further ?lters). In this embodiment, of course, resid
ual encoder 12b and decoder 13b are omitted.
are identical to local decoders 13a, 13b and so the ?lter
An adaptive ?lter will generally not be long enough
5 receives the same inputs, and thus produces the same
parameters, as that of encoder ?lter 4.
In a further embodiment (not shown), both ?lter
to filter out long-term information, such as pitch infor
parameters and residual signal are transmitted as side
information, overcoming many of the problems with
the residual-only embodiment because the important
stereo information in the ?rst 2 kHz is preserved intact
and the relative amplitude information at higher fre
quencies is largely retained by the ?lter parameters.
Both the above residual-only and hybrid (i.e. residual
plus parameters) embodiments are preferably em
ployed, as described, to predict the difference channel
from the sum channel. However, it is found that the
same advantages of retaining the stereo image (albeit
with a decrease in SNR) are found when the input chan
nels are left and right, rather than sum and difference,
provided the problem of causality is overcome in some
manner (e. g. by inserting a relatively long ?xed delay in
mation in speech, so the sum channel will not be com
pletely “white”. However, if a long-term predictor
(known in LPC coding) is additionally employed in
?lters 14a and 14b, then ?lter 4 could, in principle, be
connected to ?lter the difference channel alone, and
thus to model the inverse of the room acoustic.
Since this second aspect of the invention reduces the
dynamic range of the residual, it is particularly advanta
geous to employ this whitening scheme with the residu
al-only transmission described above. In this case, prior
to backwards adaptation at the decoder, it is necessary
to ?lter the residual using the inverse of the whitening
?lter, or to ?lter the sum channel using the whitening
?lter. Either ?lter can be derived from the sum channel
information which is transmitted.
Referring to FIG. 5b, in residual-only transmission,
an adaptive whitening ?lter 24a (identical to 14a at the
encoder) receives the (decoded) sum channel and
one or other path). The scope of the invention therefore
25 adapts to whiten its output. A slave ?lter 24b (identical
encompasses this also.
to 14b at the encoder) receives the coef?cients of 24a.
The parameter-only embodiment described above
Using the whitened sum channel as its input, and adapt
preferably uses a single adaptive ?lter 4 to remove re
dundancy between the sum and difference channels. An
effect discovered during testing was a curious ‘whisper
ing’ effect if the coef?cients were not sent at a certain
rate, which was far above what should have been neces
sary to describe changes in the acoustic environment.
This was because the adaptive ?lter, in addition to mod
elling the room acoustic transfer function, was also
trying to perform an LPC analysis of the speech.
This is solved in the second aspect of the invention by
whitening the spectra of the input signals to the adapt
ing from the (decoded) residual by backwards adapta
tion, adaptive ?lter 5 regenerates a ?ltered signal which
is added to she (decoded) residual and the sum is ?ltered
by slave ?lter 24b to yield the difference channel. The
sum and difference channels are then processed (6, 7 not
shown) to yield the original left and right channels. ‘
In a further embodiment (not shown), both residual
and coef?cients are transmitted.
Although this pre-whitening aspect of the invention
has been described in relation to the preferred embodi
ment of the invention using sum and difference chan
ive ?lter as shown in FIG. 5, so as to reduce the rapidly
nels, it is also applicable where the two channels are
changing speech component leaving principally the 40 ‘left’ and ‘right’ channels.
room acoustic component.
For a typical audioconferencing application, the re
In the second aspect of the invention, the adaptive
sidual will have a bandwidth of 8 kHz and must be
?lter 4 which models the acoustic transfer functions
quantised and transmitted using spare channel capacity
may be the same as before (for example, a lattice ?lter of
of about 16 kbit/s. The whitened residual will be, in
order 10). The sum channel is passed through a whiten 45 principle, small in mean square value, but will not be
ing ?lter 140 (which may be lattice or a simple transver
optimally whitened since the copy pre-whitening ?lter
sal structure).
14b through which the residual passes has coef?cients
The master whitening ?lter 140 receives the sum
derived to whiten the sum channel and not necessarily
channel and adapts to derive an approximate spectral
inverse ?lter to the sum signal (or, at least, the speech
components thereof) by minimising its own output. The
output of the ?lter 14a is therefore substantially white.
the difference channel. Typically, the dynamic range of
the ?ltered signal is reduced by 12 dB over the un?l'
tered difference channel. One approach to this residual
quantisation problem is to reduce the bandwidth of the
The parameters derived by the master ?lter 14a are
residual signal. This allows downsampling to a lower
supplied to the slave whitening ?lter 14b, which is con
rate, with a consequential increase in bits per sample. It
nected to receive and ?lter the difference signal. The 55 is well known that most of the spatial information in a
output Of the slave whitening ?lter 14b is therefore the
stereo signal is contained within the 0-9 kHz band, and
difference signal ?ltered by the inverse of the sum sig
therefore reducing the residual bandwidth from 8 kHz
nal, which substantially removes common signal com
to a value in excess of 2 kHz does not affect the per~
ponents, reducing the correlation between the two and
ceived stereo image appreciably. Results have shown
leaving the output of 14b as consisting primarily of the
that reducing the residual bandwidth to 4 kHz (and
acoustic response of the room. It thus reduces the dy
taking the upper 4 kHz band to be identical to that of
namic range of the residual considerably.
the sum channel) produces good quality stereophonic
The effect is to whiten the sum channel and to par
speech when the reduced bandwidth residual is sub
tially whiten the difference channel without affecting
band coded using a standard technique.
the spectral differences between them as a result of 65
Experiments with various adaptive ?lters for the
room acoustics, so that the derived coef?cients of
?lter 4 (and, where applicable, 12) showed that a stan
adaptive ?lter 4 are model parameters of the room
dard transversal FIR ?lter was slow to converge. A
acoustics.
faster performance can be obtained by using a lattice
5,434,948
9
10
structure, with coef?cient update using a gradient algo
?ltered channels, the ?rst said ?ltered channel
rithm based on Burg’s method, as shown in FIG. 6.
thereby being substantially spectrally whitened;
The structure uses a lattice ?lter 14a to pre-whiten
the generating means being connected to receive the
the spectrum of the primary input. The decorrelated
?ltered channels.
backwards residual outputs are then used as inputs to a 5
4. Apparatus according to claim 3, wherein said ?lter
simple linear combiner which attempts to model the
ing means comprises an adaptive, master, ?lter arranged
input spectrum of the secondary input. Although the
to ?lter the ?rst channel so as to produce a whitened
modelling process is the same as with the simple trans
output, and a slave ?lter arranged to ?lter said second
channel, the salve ?lter being con?gured so as to have
an equivalent response to the adaptive master ?lter of
the ?ltering means.
versal FIR ?lter, the effect of the lattice ?lter is to point
the error vector in the direction of the optimum LMS
residual solution. This speeds convergence considera
bly. A lattice ?lter of order 20 is found effective in
5. Apparatus according to claim 1 further comprising:
input means for receiving input signals; and
means for producing the said channels therefrom, the
practice.
The lattice ?lter structure is particularly useful as
described above, but could also be used in a system in
which, instead of forming sum and difference signals, a
?rst channel being a sum channel representing the
sum of such input signals and the second or further
channels representing the differences therebe
(suitably delayed) left channel is predicted from the
right channel.
tween.
6. Apparatus according to claim 5 including variable
delay means for delaying at least one of the input sig
nals, and means for controlling a differential delay ap
plied to the input signals so as to increase the correlation
upstream of the generating means, the output means
being arranged to output also data representing the said
Although the embodiments described show a stereo
phonic system, it will be appreciated that with, for ex
ample, quadrophonic systems, the invention is imple
mented by forming a sum signal and 3 difference signals,
and predicting each from the sum signal as above.
Whilst the invention has been described as applied to
a low bit-rate transmission system, e.g. for telecon 25 differential delay.
7. Polyphonic signal coding apparatus comprising:
ferencing, it is also useful for example for digital storage
means for receiving data representing plural channels
of information signals;
of music on well known digital record carriers such as
Compact Discs, by providing a formatting means for
arranging the data in a format suitable for such record
30
carriers.
Conveniently, much or all of the signal processing
involved is realised in a single suitably programmed
correlations are well known.
We claim:
?lter being controlled in dependence on said sec
ond channel so that said adaptive ?lter produces a
1. Polyphonic signal coding apparatus for transmit
ting data representing plural correlated channels of
predicted second channel therefrom, and means for
producing a residual signal representing the differ
,
means for receiving data representing plural channels
of information signals;
generating means connected to the receiving means
and responsive to said plural channels for periodi
cally generating channel reconstruction data
which, when applied to a plural order predictor
and responsive to said plural channels for periodi
cally generating channel reconstruction data
which, when applied to a plural order predictor
?lter, enables the prediction of a second of said
plural channels from a ?rst of said plural channels
thus ?ltered; in which the generating means in
cludes a plural order adaptive ?lter connected to
receive the ?rst channel, said plural order adaptive
digital signal processing (dsp) chip package; two chan
nel packages are also commercially available. Software
to implement adaptive ?lters, LPC analysis and cross
audio signals, said apparatus comprising:
generating means connected to the receiving means
45
ence between the said predicted second channel
and the second channel,
means for outputting data representing the said ?rst
channel and channel reconstruction data including
data representing said residual signal.
8. Apparatus according to claim 7, in which the
?lter, enables the prediction of a second of said
adaptive ?lter is controlled only by the said residual
plural channels from a ?rst of said plural channels
signal and the said channel reconstruction data consists
thus ?ltered; and
50 of the said residual signal.
means connected to said generating means for output
ting data representing the said ?rst channel data
and said channel reconstruction data thereby en
abling the reconstruction of said second channel
data therefrom.
55
2. Apparatus according to claim 1, wherein the gener
ating means includes means for generating a plurality of
?lter coef?cients which, when applied to a plural order
predictor ?lter, enables the prediction of a second of
said plural channels from a ?rst of said plural channels 60
thus ?ltered;
and in which the said channel reconstruction data
comprises data representing the said ?lter coef?ci
ents.
3. Apparatus according to claim 1 further comprising: 65
9. Polyphonic signal decoding apparatus comprising:
means for receiving data representing a sum signal
and difference signal reconstruction data, said sum
signal representing the sum of at least ?rst and
second channel signals and said difference signal
represents the difference between said at least ?rst
and second channel signals;
a con?gurable plural order predictor ?lter connected
to said receiving means for receiving said differ
ence signal reconstruction data and modifying its
coef?cients in accordance therewith, the ?lter
being connected to receive the said sum signal and
reconstruct therefrom an output difference signal;
and
means connected to said con?gurable plural order
means for ?ltering the ?rst and second channel in
accordance with a ?lter approximating the spectral
predictor ?lter for adding the reconstructed differ
inverse of the ?rst channel to produce respective
subtracting the reconstructed difference signal
ence signal to the received sum signal, and for
11
5,434,948
12
therefrom; and including means for producing a resid
ual signal which represents the difference between the
said predicted second channel of audio signals and the
said second channel of audio signals, and in which the
output means is arranged also to output data represent
from the received sum signal, so as to produce at
least two output signals representing said at least
?rst and second channel signals respectively.
10. Apparatus as claimed in claim 9, in which the
difference signal reconstruction data comprises residual
signal data and the apparatus includes means for adding
the residual signal data to the output of the ?lter to form
the reconstructed difference signal.
11. Apparatus as claimed in claim 10 in which the
ing the residual signal.
15. Polyphonic audio signal coding method for trans
mitting digital data representing plural correlated chan
nels of audio signals, said method comprising:
responsive to said plural channels of audio signals,
periodically generating a plurality of ?lter coef?ci
con?gurable plural order predictor ?lter is connected
to receive the residual signal data and to modify its
coef?cients in accordance therewith.
ents which, when applied to a plural order predic
tor ?lter, enables the prediction of a second of said
channels from a ?rst of said channels thus ?ltered;
and
12. A method of coding polyphonic input signals
comprising:
producing a sum signal representing the sum of said
input signals;
producing at least one difference signal representing a
difference between said input signals;
analyzing said sum and difference signals and gener
ating therefrom a plurality of coef?cients to a mul 20
ti-stage predictor ?lter, thereby enabling the pre
outputting data representing said ?rst channel of
audio signals and data representing said ?lter coef
?cients thus enabling the reconstruction of the said
second channel of audio signals therefrom.
l6. Polyphonic audio signal according to claim 15, in
which the generating step includes adaptively ?ltering
diction of the difference signa1(s) from the sum
the ?rst channel of audio signals and producing a pre
dicted second channel of audio signals therefrom; and
signal thus ?ltered;
outputting data representing the said sum signal and
including the step of producing a residual signal
data enabling the reconstruction of the said differ 25
which represents the difference between the said
ence signal(s) therefrom.
predicted second channel of audio signals and the
13. Polyphonic audio signal coding apparatus for
said second channel of audio signals, and in which
transmitting digital data representing plural correlated
the data representing the said residual signal is also
channels of audio signals, said apparatus comprising:
output.
data generating means responsive to said plural chan 30 17. Polyphonic signal coding method for transmitting
nels of audio signals for periodically generating a
data representing plural correlated channels of audio
plurality of ?lter coef?cients which, when applied
signals, said method comprising:
to a plural order predictor ?lter, enables the predic
responsive to said plural channels of audio signals,
tion of a second of said channels from a ?rst of said
adaptively ?ltering a ?rst channel of said plural
35
channels thus ?ltered; and
channels, said adaptive ?ltering being controlled in
output means connected to the data generating means
dependence on a second of said plural channels, to
for outputting data representing the said ?rst chan~
nel of audio signals and data representing said filter
coef?cients thus enabling the reconstruction of the
said second channel of audio signals therefrom.
14. Apparatus according to claim 13 in which the
generating means includes an adaptive plural order
produce a predicted second channel;
producing a residual signal representing the differ
?lter connected to receive the ?rst channel of audio
ence between the said predicted second channel
and the said second channel which, when applied
to a plural order predictor ?lter, enables the predic
tion of the second of said plural channels from the
?rst of said plural channels thus ?ltered; and
signals, said adaptive ?lter being controlled in depen
outputting data representing the said ?rst channel and >
dence on said second channel so that said adaptive ?lter 45
produces a predicted second channel of audio signals
data representing the said residual signal.
*
55
60
65
*
*
it
*
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement