10a \ 14 `data (eg_ MP3
US008553895B2
(12) Ulllted States Patent
(10) Patent N0.:
Plogsties et al.
(54)
(45) Date of Patent:
DEVICE AND METHOD FOR GENERATING
(56)
.
-
U-S- PATENT DOCUMENTS
.
5,488,665 A
(75) Inventors. Jan PlogstIes, Erlangen (DE), Harald
1/1996
5,491,754 A
Johnston et a1.
21996 Jot et a1‘
Mundt> Erlangen (DE), Harald POPP’
5,632,005 A
5/1997 Davis et a1.
Tuchenbach (DE)
5,659,619 A *
8/1997 Abel ............................. .. 381/17
(73) Assignee: Fraunhofer-Gesellschaft zur
(Commued)
Foerderung der AngeWandten
Forschung e-Vl, Munich (DE)
_
_
Not1ce:
Oct. 8, 2013
References Cited
AN ENCODED STEREO SIGNAL OF AN
AUDIO PIECE OR AUDIO DATASTREAM
(*)
US 8,553,895 B2
_
_
FOREIGN PATENT DOCUMENTS
_
Subject to any d1scla1mer, the term of this
patent is extended or adjusted under 35
CN
1212580
CN
1469684 A
1/2004
C t~
d
U.S.C. 154(b) by 1777 days.
A
3/1999
( on “we )
OTHER PUBLICATIONS
(21) Appl. N0.: 11/840,273
Takamizawa et a1 , High-Quality and Processor-Ef?cient Implemen
(22)
Filed:
Aug. 17, 2007
(65)
tation ofan MPEG-2 AAC Encoder, 2001,IEEE,pp. 985-988.*
Prior Publication Data
Us 2007/0297616 A1
(Continued)
DeC- 27, 2007
Primary Examiner * Mohammad Islam
Related U_s_ Application Data
(63)
Continuation
of
Assistant Examiner * Kuassi Ganmavo
application
NO‘
PCT/EP2006/001622, ?led on Feb. 22, 2006.
(30)
(74) Attorney, Agent, or Firm * Keatmg & Bennett, LLP
(57)
Foreign Application Priority Data
ABSTRACT
A device for generating an encoded stereo signal from a
multi-channel representation includes a multi-channel
Mar- 4, 2005
(51)
Int‘ Cl‘
(52)
H04R 5/00
U-s- Cl-
(58)
(DE) ~~~~~~~~~~~~~~~~~~~~~~~ -~ 10 2005 010 057
(200601)
decoder generating three of more multi-channels from at least
one basic channel and parametric information. The three or
more multi-channels are sub'ected
to headP hone si 811a1 P ro
J
cessing to generate an uncoded ?rst stereo channel and an
uncoded second stereo channel Which are then supplied to a
USPC ~~~~~~~~~~~~~ -- 381/23; 381/17; 381/310; 381/300;
stereo encoder to generate an encoded stereo ?le on the output
381/306; 381/309; 381/312; 700/94; 704/500;
side. The encoded stereo ?le may be supplied to any suitable
704/501
Field of Classi?cation Search
USPC ........... .. 381/310, 300, 306, 23, 17, 309, 312,
player in the form of a CD player or a hardWare player such
that a user of the player does not only get a normal stereo
impression but a multi-channel impression.
381/313, 317, 323; 700/94; 704/500, 501
See application ?le for complete search history.
3 0' mole
uncoded
multi-channels 12
_ multi-channel
mum-Channel
decoder
2
representation
(
'
9 Claims, 7 Drawing Sheets
1st St Ch. 13
encoded stereo
\
10a
\
14 ‘data (e.g._ MP3
headphone
sional nroc.
’
stereo
encoder
_/_lller AAC me. ---l
1 OD
(basic channel
uncoded
+ parameter)
2nd st. ch.
US 8,553,895 B2
Page 2
(56)
References Cited
OTHER PUBLICATIONS
U.S. PATENT DOCUMENTS
Durand R. Begault, Perceptual Effects of Synthetic Reverberation on
Three-Dimensional Audio Systems, Nov. 1992,J. Audio Eng.Soc.,
vol. 40,No. 11,pp. 895-903.*
U.S. Appl. No. 60/578,717, ?led Jun. 2004, Yi Kyueun.*
English language translation of Of?cial Communication issued in
5,703,999 A
5,706,309 A
12/1997 Herre et al.
1/1998 Eberlein et al.
5,742,689 A *
5,812,971 A *
5,982,903 A *
4/1998
9/1998
11/1999
6,023,490
6,741,706
6,766,028
7,394,903
7,447,629
7,949,141
2002/0038158
2003/0026441
2003/0035553
2003/0219130
2004/0008847
2005/0273324
2005/0276430
2008/0052089
Tucker et al. ................. .. 381/17
Herre
.......................... ..
A
2/2000 Ten Kate
B1* 5/2004 McGrath et al. .............. .. 381/22
. 381/310
B1* 7/2004 Dickens
381/23
B2* 7/2008 Herre et al. .
. 704/219
B2* 11/2008 Breebaart
B2* 5/2011 Reilly et al. .................. .. 381/63
A1
3/2002 Hashimoto et a1.
A1
2/2003 Faller
A1
2/2003 Baumgarte et a1.
A1
11/2003 Baumgarte et a1.
A1
1/2004 Kim
'
.
A1* 12/2005
A1* 12/2005
.
A1* 2/2008 Takagi ........................ ..
FOREIGN PATENT DOCUMENTS
EP
JP
JP
JP
JP
JP
JP
JP
JP
JP
JP
JP
JP
JP
KR
KR
WO
WO
WO
WO
WO
WO
WO
704/230
Kinoshita et al. ............. .. 381/18
1768 451
06-043890
06-269097
09-500252
2001-100792
2001-255892
2001-331198
2002-191099
04-240896
2002-262385
2003-009296
2003-522441
2004170610
2004-246224
1020040027015
10-2004-0027015
94/01933
95/16333
99/14983
99/49574
01/05074
03/086017
03/090207
3/2007
2/1994
9/1994
1/1997
4/2001
9/2001
11/2001
7/2002
8/2002
9/2002
1/2003
7/2003
A1
A
A
A
A
A
A
A
A
A
A
A
A
A
A1
A1
A1
A1
A2
A2
A1
*
6/2004
9/2004
*
*
1/2004
4/2004
1/1994
6/1995
3/1999
9/1999
1/2001
10/2003
10/2003
corresponding Japanese Patent Application No. 2007-557373,
mailed on Jul. 13, 2010.
English translation of the of?cial communication issued in counter
part International Application No. PCT/EP2006/ 001622, mailed on
Jan. 31, 2008.
Of?cial communication issued in counterpart European Application
No.06 707 184.5, mailed on Nov. 3, 2008.
704/226
381/309
704/503
Of?cial Communication issued in corresponding Taiwanese Patent
Application No. 95106978, mailed on Sep. 23, 2009.
English translation of the of?cial communication issued in counter
part Taiwanese Application No. 95106978, mailed on Apr. 27, 2009.
Of?cial communication issued in the counterpart International
Application No. PCT/EP2006/001622, mailed on Aug. 18, 2006.
Herre et al., “MP3 Surround: Ef?cient and Compatible Coding of
Multi-Channel Audio”; Audio Engineering Society Convention
Paper 6049, 116th Convention; Berlin, Germany; pp. 1-14; May
8-11, 2004.
Faller et al., “Binaural Cue Coding Applied to Stereo and Multi
Channel Audio Compression”; Audio Engineering Society Conven
tion Paper 5574, 1 12th Convention; Munich, Germany; pp. 1-9; May
10-13, 2002.
Herre et al., “Intensity Stereo Coding”; Preprints of Papers Presented
at the AES Convention, Amsterdam, pp. 1-10; Feb. 26-Mar. 1, 1994.
Herre et al., “Spatial Audio Coding: Next-generation Ef?cient and
Compatible Coding of Multi-Channel Audio”; Audio Engineering
Society Convention Paper 6186, 117th Convention; San Francisco,
CA; pp. 1-13; Oct. 28-31, 2004.
Faller, “Coding of Spatial Audio Compatible With Different Playback
Formats”; Audio Engineering Society Convention Paper, 1 17th Con
vention; San Francisco, CA; pp. 1-12; Oct. 28-31, 2004.
Faller et al., “Binaural Cue Coding: Part II: Schemes and Applica
tions”, 2003 IEEE Transactions on Speech and Audio Processing;
vol. 11, No. 6; pp. 520-531; Nov. 2003.
Of?cial Communication issued in corresponding Japanese Patent
Application No. 2007-557373, mailed on May 10, 2011.
* cited by examiner
US. Patent
0a. 8, 2013
multi-channei
inputs
20
___--.
Sheet 2 of7
US 8,553,895 B2
US. Patent
0a. 8, 2013
Sheet 3 on
J60
CH1
CH1
—_“"
IS or 800
carrier channel
device
CHN
parametric
muiti-channei
information
FIG. 3
US 8,553,895 B2
US. Patent
0a. 8, 2013
Sheet 4 017
US 8,553,895 B2
124
122
_BCC e_n_c99e_r____\f112 115_
.
500
111
i
[
synthesis
5 1 _ 11
11
1
C2!
=
01
? downn'ux
1Signa| 1
E 1
=
I
01*
=
|
X
E
i
110:
=
1 Sum
11111
1
1
1 Side 1
5??
J1 11110 L
1
am W8
1
L ______ __1)_____J
/
BCC decoder
J’
*11.1 = m
101.13, 1610, 100M
side 11110
1121
S 1 ' processing
1
117 L _______ __JY_MJ
116
123
FIG. 5
126
125
/
,,
127
129
f 128
/
é111k)
\ ,_
Wk)
— C1101) ?g?‘
A
1* IFB +1110?)
'
801)»
PB
.
‘
130
82
= 0'2“)
é
~
A
112(k)
5 =
A
IFB +1901)
US. Patent
0a. 8, 2013
Sheet 5 of7
ll
US 8,553,895 B2
12
j
multi-channel
mumuchannel
repres. in
2
uncoded stereo
multiplication of PB signal in
decoder with frequency domain
repres. by spectral
interbank/PH
repres. of filter
freguency
domain
irnpulse resp.
FIG. 7
12
13
J
head horile
——*
l
A
uncoded st.
sigpnal
time or
Stereo
Elma‘ m?
Processing
enfmdm
frequency
frequency
domain
domain
. (Without
me with
mterbank/FFT] encoded
stereo signal
FIG. 8
masking
125
.
—r
threshold
1;
.
10"“ atelreo
entropy
-—- -v
QUHll'tlZBl
m Ue
‘
encoder
(co. Huffman)
l
16
FIG. 9
—>
US. Patent
0a. 8, 2013
Sheet 6 of7
FIG. 10
US 8,553,895 B2
US. Patent
0a. 8, 2013
Sheet 7 of7
US 8,553,895 B2
direct sound
early
reflections
lI‘
dif use reverberation
—>
time
FIG. 11
US 8,553,895 B2
1
2
DEVICE AND METHOD FOR GENERATING
AN ENCODED STEREO SIGNAL OF AN
AUDIO PIECE OR AUDIO DATASTREAM
response of a ?lter, such as, for example, of the ?lter Hl-L of
FIG. 2. The direct or primary sound illustrated in FIG. 11 by
the line 212 is represented by a peak at the beginning of the
?lter, Whereas early re?ections, as are illustrated exemplarily
in FIG. 10 by 214, are reproduced by a center region having
several (discrete) small peaks in FIG. 11. The diffuse rever
beration is typically no longer resolved for individual peaks,
since the sound of the loudspeaker 202 in principle is re?ected
arbitrarily frequently, Wherein the energy of course decreases
With each re?ection and additional propagation distance, as is
CROSS-REFERENCE TO RELATED
APPLICATIONS
This application is a continuation of copending Interna
tional Application No. PCT/EP2006/001622, ?led Feb. 22,
2006, Which designated the United States and Was not pub
lished in English.
illustrated by the decreasing energy in the back portion Which
in FIG. 11 is referred to as “diffuse reverberation”.
BACKGROUND OF THE INVENTION
Each ?lter shoWn in FIG. 2 thus includes a ?lter impulse
response roughly having a pro?le as is shoWn by the sche
matic impulse response illustration of FIG. 11. It is obvious
that the individual ?lter impulse response Will depend on the
1. Field of the Invention
The present invention relates to multi-channel audio tech
nology and, in particular, to multi-channel audio applications
in connection With headphone technologies.
2. Description of the Related Art
The international patent applications W0 99/ 49574 and
reproduction space, the positioning of the loudspeakers, pos
sible attenuation features in the reproduction space, for
W0 99/ 14983 disclose audio signal processing technologies
for driving a pair of oppositely arranged headphone loud
example due to several persons present or due to furniture in
the reproduction space, and ideally also on the characteristics
of the individual loudspeakers 201 to 206.
speakers in order for a user to get a spatial perception of the
audio scene via the tWo headphones, Which is not only a stereo
at the ear of the listener 207 is illustrated by the adders 22 and
representation but a multi-channel representation. Thus, the
20
The fact that the signals of all loudspeakers are superposed
25
listener Will get, via his or her headphones, a spatial percep
tion of an audio piece Which in the best case equals his or her
by the ?lters Which are destined for the left ear to obtain the
spatial perception, should the user be sitting in a reproduction
room Which is exemplarily equipped With a 5 .1 audio system.
For this purpose, for each headphone loudspeaker, each chan
23 in FIG. 2. Thus, each channel is ?ltered by a corresponding
?lter for the left ear to then simply add up the signals output
30
headphone output signal for the left ear L. In analogy, an
addition by the adder 23 for the right ear or the right head
phone loudspeaker 210 in FIG. 10 is performed to obtain the
nel of the multi-channel audio piece or the multi-channel
audio datastream, as is illustrated in FIG. 2, is supplied to a
headphone output signal for the right ear by superposing all
the loudspeaker signals ?ltered by a corresponding ?lter for
separate ?lter, Whereupon the respective ?ltered channels
the right ear.
Due to the fact that, apart from the direct sound, there are
belonging together are added, as Will be illustrated subse
quently.
35
also early re?ections and, in particular, a diffuse reverbera
On a left side in FIG. 2, there are the multi-channel inputs
tion, Which is of particularly high importance for the space
20 Which together represent a multi-channel representation of
perception, in order for the tone not to sound synthetic or
“aWkWar ” but to give the listener the impression that he or
she is actually sitting in a concert room With its acoustic
the audio piece or the audio datastream. Such a scenario is
exemplarily schematically shoWn in FIG. 10. FIG. 10 shoWs
a reproduction space 200 in Which a so-called 5.1 audio
40
system is arranged. The 5.1 audio system includes a center
loudspeaker 201, a front-left loudspeaker 202, a front-right
loudspeaker 203, a back-left loudspeaker 204 and a back
right loudspeaker 205. A 5.1 audio system comprises an addi
tional subWoofer 206 Which is also referred to as loW-fre
45
quency enhancement channel. In the so-called “sWeet spot”
of the reproduction space 200, there is a listener 207 Wearing
a headphone 208 comprising a left headphone loudspeaker
209 and a right headphone loudspeaker 210.
The processing means shoWn in FIG. 2 is formed to ?lter
50
each channel 1, 2, 3 of the multi-channel inputs 20 by a ?lter
Hl-L describing the sound channel from the loudspeaker to the
left loudspeaker 209 in FIG. 10 and to additionally ?lter the
same channel by a ?lter HiR representing the sound from one
of the ?ve loudspeakers to the right ear or the right loud
55
If, for example, channel 1 in FIG. 2 Were the front-left
60
cated by a broken line 213 . As is exemplarily indicated in FIG.
10 by a broken line 214, the left headphone loudspeaker 209
does not only receive the direct sound, but also early re?ec
tions at an edge of the reproduction space and, of course, also
late re?ections expressed in a diffuse reverberation.
Such a ?lter representation is illustrated in FIG. 11. In
particular, FIG. 11 shoWs a schematic example of an impulse
task. Since tWo ?lters are necessary for each individual multi
channel, namely one for the left ear and another one for the
right ear, When the subWoofer channel is also treated sepa
rately, a total amount of 12 completely different ?lters is
necessary for a headphone reproduction of a 5.1 multi-chan
nel representation. All ?lters have, as becomes obvious from
FIG. 11, a very long impulse response to be able to not only
consider the direct sound but also early re?ections and the
diffuse reverberation, Which really only gives an audio piece
the proper sound reproduction and a good spatial impression.
In order to put the Well-knoWn concept into practice, apart
speaker 210 of the headphone 208.
channel emitted by the loudspeaker 202 in FIG. 10, the ?lter
Hl-L Would represent the channel indicated by a broken line
212, Whereas the ?lter Hl-R Would represent the channel indi
characteristics, impulse responses of the individual ?lters 21
Will all be of considerable lengths. The convolution of each
individual multi-channel of the multi-channel representation
having tWo ?lters already results in a considerable computing
65
from a multi-channel player 220, as is shoWn in FIG. 10, very
complicated virtual sound processing 222 is necessary, Which
provides the signals for the tWo loudspeakers 209 and 210
represented by lines 224 and 226 in FIG. 10.
Headphone systems for generating a multi-channel head
phone sound are complicated, bulky and expensive, Which is
due to the high computing poWer, the high current require
ment for the high computing poWer necessary and the high
Working memory requirements for the evaluations to be per
formed of the impulse response and the high volume or
expensive elements for the player connected thereto. Appli
cations of this kind are thus tied to home PC sound cards or
laptop sound cards or home stereo systems.
US 8,553,895 B2
3
4
In particular, the multi-channel headphone sound remains
inaccessible for the continually increasing market of mobile
players, such as, for example, mobile CD players, or, in
particular, hardware players, since the calculating require
for transmitting the encoded stereo signal is smaller than a
data rate necessary for transmitting the uncoded stereo signal.
An embodiment may have a computer program having a
program code for performing the method for generating an
ments for ?ltering the multi-channels With exemplarily 12
different ?lters cannot be realized in this price segment nei
program runs on a computer.
encoded stereo signal mentioned above, When the computer
Embodiments of the present invention are based on the
ther With regard to the processor resources nor With regard to
the current requirements of typically battery-driven appara
?nding that the high-quality and attractive multi-channel
tuses. This refers to a price segment at the bottom (loWer) end
headphone sound can be made available to all players avail
of the scale. However, this very price segment is economi
cally very interesting due to the high numbers of pieces.
able, such as, for example, CD players or hardWare players,
by subjecting a multi-channel representation of an audio
piece or audio datastream, i.e. exemplarily a 5.1 representa
tion of an audio piece, to headphone signal processing outside
a hardWare player, i.e. exemplarily in a computer of a provider
having a high calculating poWer. According to an embodi
ment of the invention, the result of a headphone signal pro
SUMMARY OF THE INVENTION
According to an embodiment, a device for generating an
encoded stereo signal of an audio piece or an audio datas
tream having a ?rst stereo channel and a second stereo chan
nel from a multi-channel representation of the audio piece or
the audio datastream having information on more than tWo
multi-channels, may have: means for providing the more than
cessing is, hoWever, not simply played but supplied to a
typical audio stereo encoder Which then generates an encoded
stereo signal from the left headphone channel and the right
20
tWo multi-channels from the multi-channel representation;
headphone channel.
means for performing headphone signal processing to gener
This encoded stereo signal may then, like any other
encoded stereo signal not comprising a multi-channel repre
ate an uncoded stereo signal With an uncoded ?rst stereo
channel and an uncoded second stereo channel, the means for
a mobile CD player in the form of a CD. The reproduction or
performing being formed to evaluate each multi-channel by a
sentation, be supplied to the hardWare player or, for example,
25
replay apparatus Will then provide the user With a headphone
?rst ?lter function derived from a virtual position of a loud
multi-channel sound Without any additional resources or
speaker for reproducing the multi-channel and a virtual ?rst
means having to be added to devices already existing. Inven
ear position of a listener, for the ?rst stereo channel, and a
second ?lter function derived from a virtual position of the
tively, the result of the headphone signal processing, ie the
loudspeaker and a virtual second ear position of the listener,
for the second stereo channel, to generate a ?rst evaluated
channel and a second evaluated channel for each multi-chan
nel, the tWo virtual ear positions of the listener being differ
ent, to add the evaluated ?rst channels to obtain the uncoded
?rst stereo channel, and to add the evaluated second channels
left and the right headphone signal, is not reproduced in a
30
headphone, as has been the case so far, but encoded and
output as encoded stereo data.
Such an output may be storage, transmission or the like.
Such a ?le having encoded stereo data may then easily be
35
to obtain the uncoded second stereo channel; and a stereo
supplied to any reproduction device designed for stereo
reproduction, Without the user having to perform any changes
on his device.
encoder for encoding the uncoded ?rst stereo channel and the
The inventive concept of generating an encoded stereo
uncoded second stereo channel to obtain the encoded stereo
signal from the result of the headphone signal processing thus
alloWs multi-channel representation providing a considerably
signal, the stereo encoder being formed such that a data rate
necessary for transmitting the encoded stereo signal is
40
uncoded stereo signal.
According to another embodiment, a method for generat
ing an encoded stereo signal of an audio piece or an audio
datastream having a ?rst stereo channel and a second stereo
45
point is an encoded multi-channel representation, ie a para
metric representation comprising one or typically tWo basic
channels and additionally comprising parametric data to gen
channel from a multi-channel representation of the audio
piece or the audio datastream having information on more
erate the multi-channels of the multi-channel representation
on the basis of the basic channels and the parametric data.
Since a frequency domain-based method for multi-channel
than tWo multi-channels, may have the steps of: providing the
more than tWo multi-channels from the multi-channel repre
sentation; performing headphone signal processing to gener
improved and more real quality for the user, to be also
employed on all simple and Widespread and, in future, even
more Widespread hardWare players.
In an embodiment of the present invention, the starting
smaller than a data rate necessary for transmitting the
50
decoding is of advantage, the headphone signal processing is,
ate an uncoded stereo signal With an uncoded ?rst stereo
according to an embodiment of the invention, not performed
channel and an uncoded second stereo channel, the step of
in the time domain by convoluting the time signal by an
impulse response, but in the frequency domain by multipli
cation by the ?lter transmission function.
performing having: evaluating each multi-channel by a ?rst
?lter function derived from a virtual position of a loudspeaker
for reproducing the multi-channel and a virtual ?rst ear posi
tion of a listener, for the ?rst stereo channel, and a second
?lter function derived from a virtual position of the loud
speaker and a virtual second ear position of the listener, for
the second stereo channel, to generate a ?rst evaluated chan
nel and a second evaluated channel for each multi-channel,
the tWo virtual ear positions of the listener being different,
adding the evaluated ?rst channels to obtain the uncoded ?rst
stereo channel, and adding the evaluated second channels to
55
headphone stereo signal, Without ever having to go to the time
60
domain, may also take place Without going to the time
domain. The processing from the multi-channel representa
tion to the encoded stereo signal, Without the time domain
taking part or by an at least reduced number of transforma
tions, is interesting not only With regard to the calculating
obtain the uncoded second stereo channel; and stereo-coding
the uncoded ?rst stereo channel and the uncoded second
stereo channel to obtain the encoded stereo signal, the step of
stereo-coding being executed such that a data rate necessary
This alloWs at least one retransforrnation before the head
phone signal processing to be saved and is of particular
advantage When the subsequent stereo encoder also operates
in the frequency domain, such that the stereo encoding of the
65
time e?iciency, but puts a limit to quality losses since feWer
processing stages Will introduce feWer artefacts into the audio
signal.
US 8,553,895 B2
6
5
In particular in block-based methods performing quantiZa
data stream, Wherein the multi-channel representation com
tion considering a psycho-acoustic masking threshold, as is of
advantage for the stereo encoder, it is important to prevent as
may tandem encoding artefacts as possible.
be explained later, the multi-channel representation may be in
prises information on more than tWo multi-channels. As Will
an uncoded or an encoded form. If the multi-channel repre
sentation is in an uncoded form, it Will include three or more
In an embodiment of the present invention, a BCC repre
sentation having one or advantageously tWo basic channels is
used as a multi-channel representation. Since the BCC
multi-channels. With an application scenario, the multi-chan
nel representation includes ?ve channels and one subWoofer
channel.
If the multi-channel representation is, hoWever, in an
encoded form, this encoded form Will typically include one or
method operates in the frequency domain, the multi-channels
are not transformed to the time domain after synthesis, as is
usually done in a BCC decoder. Instead, the spectral repre
sentation of the multi-channels in the form of blocks is used
several basic channels as Well as parameters for synthesizing
the three or more multi-channels from the one or tWo basic
and subjected to the headphone signal processing. For this,
the transformation functions of the ?lters, i.e. the Fourier
channels. A multi-channel decoder 11 thus is an example of
transforms of the impulse responses, are used to perform a
means for providing the more than tWo multi-channels from
multiplication of the spectral representation of the multi
the multi-channel representation. If the multi-channel repre
sentation is, hoWever, already in an uncoded form, i.e., for
example, in the form of 5+1 PCM channels, the means for
providing corresponds to an input terminal for means 12 for
channels by the ?lter transformation functions. When the
impulse responses of the ?lters are, in time, longer than a
block of spectral components at the output of the BCC
decoder, a block-Wise ?lter processing is of advantage Where
the impulse responses of the ?lters are separated in the time
performing headphone signal processing to generate the
20
domain and are transformed block by block in order to then
Advantageously, the means 12 for performing headphone
signal processing is formed to evaluate the multi-channels of
the multi-channel representation each by a ?rst ?lter function
perform corresponding spectrum Weightings necessary for
measures of this kind, as is, for example, disclosed in WO
94/01933.
Other features, elements, processes, steps, characteristics
25
and advantages of the present invention Will become more
apparent from the folloWing detailed description of preferred
embodiments of the present invention With reference to the
attached draWings.
30
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the present invention Will be detailed
subsequently referring to the appended draWings, in Which:
FIG. 1 shoWs a block circuit diagram of the inventive
device for generating an encoded stereo signal.
FIG. 2 is a detailed illustration of an implementation of the
rate necessary for transmitting the encoded stereo signal is
smaller than a data rate necessary for transmitting the
FIG. 3 shoWs a Well-knoWn joint stereo encoder for gener
40
FIG. 4 is an illustration of a scheme for determining ICLD,
uncoded stereo signal.
According to the invention, a concept is achieved Which
alloWs supplying a multi-channel tone, Which is also referred
to as “surroun ”, to stereo headphones via simple players,
such as, for example, hardWare players.
The sum of certain channels may exemplarily be formed as
ICTD and ICC parameters for BCC encoding/decoding.
FIG. 5 is a block diagram illustration of a BCC encoder/
decoder chain.
FIG. 6 shoWs a block diagram of an implementation of the
for the ?rst stereo channel and by a second ?lter function for
the second stereo channel and to add the respective evaluated
multi-channels to obtain the uncoded ?rst stereo channel and
the uncoded second stereo channel, as is illustrated referring
to FIG. 2. DoWnstream of the means 12 for performing the
headphone signal processing is a stereo encoder 13 Which is
formed to encode the ?rst uncoded stereo channel 10a and the
second uncoded stereo channel 10b to obtain the encoded
stereo signal at an output 14 of the stereo encoder 13. The
stereo encoder performs a data rate reduction such that a data
35
headphone signal processing of FIG. 1.
ating channel data and parametric multi-channel information.
uncoded stereo signal With the uncoded ?rst stereo channel
10a and the uncoded second stereo channel 10b.
simple headphone signal processing to obtain the output
45
channels for the stereo data. Improved methods operate With
more complex algorithms Which in turn obtain an improved
BCC synthesis block of FIG. 5.
FIG. 7 shoWs cascading betWeen a multi-channel decoder
reproduction quality.
and the headphone signal processing Without any transforma
calculating-intense steps for multi-channel decoding and for
performing the headphone signal processing not to be per
formed in the player itself but to be performed externally. The
It is to be mentioned that the inventive concept alloWs the
tion to the time domain.
FIG. 8 shoWs cascading betWeen the headphone signal
50
result of the inventive concept is an encoded stereo ?le Which
is, for example, an MP3 ?le, an AAC ?le, an HE-AAC ?le or
processing and a stereo encoder Without any transformation
to the time domain.
FIG. 9 shoWs a principle block diagram of a stereo encoder.
FIG. 10 is a principle illustration of a reproduction scenario
for determining the ?lter functions of FIG. 2.
FIG. 11 is a principle illustration of an expected impulse
some other stereo ?le.
55
formed on different devices since the output data and input
data, respectively, of the individual blocks may be ported
response of a ?lter determined according to FIG. 10.
DETAILED DESCRIPTION OF PREFERRED
EMBODIMENTS
In other embodiments, the multi-channel decoding, head
phone signal processing and stereo encoding may be per
60
easily and be generated and stored in a standardized Way.
Subsequently, reference Will be made to FIG. 7 shoWing an
embodiment of the present invention Where the multi-channel
decoder 11 comprises a ?lter bank or FFT function such that
the multi-channel representation is provided in the frequency
FIG. 1 shoWs a principle block circuit diagram of an inven
tive device for generating an encoded stereo signal of an audio
piece or an audio datastream. The stereo signal includes, in an
uncoded form, an uncoded ?rst stereo channel 10a and an
uncoded second stereo channel 10b and is generated from a
multi-channel representation of the audio piece or the audio
domain. In particular, the individual multi-channels are gen
erated as blocks of spectral values for each channel. Inven
65
tively, the headphone signal processing is not performed in
the time domain by convoluting the temporal channels With
the ?lter impulse responses, but a multiplication of the fre
US 8,553,895 B2
8
7
Subsequently, reference Will be made to implementations
quency domain representation of the multi-channels by a
of the multi-channel decoder and to multi-channel illustra
tions using FIGS. 3 to 6.
There are several techniques for reducing the amount of
data necessary for transmitting a multi-channel audio signal.
Such techniques are also called joint stereo techniques. For
this purpose, reference is made to FIG. 3 shoWing a joint
stereo device 60. This device may be a device implementing,
spectral representation of the ?lter impulse response is per
formed. An uncoded stereo signal is achieved at the output of
the headphone signal processing, Which is, hoWever, not in
the time domain but includes a left and a right stereo channel,
Wherein such a stereo channel is given as a sequence of blocks
of spectral values, each block of spectral values representing
a short-term spectrum of the stereo channel.
for example, the intensity stereo (IS) technique or the binaural
cue encoding technique (BCC). Such a device generally
In the embodiment shoWn in FIG. 8, the headphone signal
processing block 12 is, on the input side, supplied With either
time-domain or frequency-domain data. On the output side,
receives at least tWo channels CH1, CH2, . . . , CHn as input
signal and outputs a single carrier channel and parametric
the uncoded stereo channels are generated in the frequency
domain, i.e. again as a sequence of blocks of spectral values.
multi-channel information. The parametric data are de?ned
so that an approximation of an original channel (CH1,
A stereo encoder Which is based on a transformation, i.e.
CH2, . . . , CHn) may be calculated in a decoder.
Which processes spectral values Without a frequency/time
conversion and a subsequent time/frequency conversion
Normally, the carrier channel Will include subband
samples, spectral coe?icients, time domain samples, etc.,
being necessary betWeen the headphone signal processing 12
and the stereo encoder 13, is of advantage as the stereo
encoder 13 in this case. On the output side, the stereo encoder
13 then outputs a ?le With the encoded stereo signal Which,
apart from side information, includes an encoded form of
20
controlling a certain reconstruction algorithm, such as, for
example, Weighting by multiplication, time shifting, fre
spectral values.
In an embodiment of the present invention, a continuous
frequency domain processing is performed on the Way from
the multi-channel representation at the input of block 11 of
25
possibly, a re-transformation to the frequency domain having
30
the Fourier spectrum at the output of the headphone signal
factors, intensity stereo information or BCC parameters, as
Will be described beloW.
35
real stereophonic reproduction techniques. Thus, this tech
45
apart from the entropy-coded spectral values, includes side
information necessary for decoding.
nique is modi?ed in that the second orthogonal component is
excluded from being transmitted in the bitstream. Thus, the
reconstructed signals for the left and right channels consist of
differently Weighted or scaled versions of the same transmit
50
ing, in particular With higher frequencies, provides a consid
erable encoding gain Without audible artefacts arising. The
output of the joint stereo module 15 is then processed further
using different other redundancy-reducing measures, such as,
for example, TNS ?ltering, noise substitution, etc., to then
supply the results to a quantiZer 16 Which achieves a quanti
Zation of the spectral values using a psycho-acoustic masking
threshold. The quantiZer step siZe here is selected such that
the noise introduced by quantiZing remains beloW the psycho
acoustic masking threshold, such that a data rate reduction is
achieved Without the distortions introduced by the lossy
quantization to be audible. DoWnstream of the quantiZer 16,
there is an entropy encoder 17 performing lossless entropy
encoding of the quantiZed spectral values. At the output of the
entropy encoder, there is the encoded stereo signal Which,
of the tWo stereophonic audio channels. If most data points
are concentrated around the ?rst main axis, an encoding gain
may be achieved by rotating both signals by a certain angle
before encoding takes place. HoWever, this does not apply to
FIG. 9 shoWs a general block circuit diagram for a stereo
of a center/ side encoding, provides a higher encoding gain
than a separate processing of the left and right channels. The
joint stereo module 15 may further be formed to perform an
intensity stereo encoding, Wherein an intensity stereo encod
The intensity stereo encoding technique is described in the
AES Preprint 3799 entitled “Intensity Stereo Coding” by J.
Herre, K. H. Brandenburg, D. Lederer, February 1994,
Amsterdam. In general, the concept of intensity stereo is
based on a main axis transform Which is to be applied to data
40
to a normal MP3 encoder or a normal AAC encoder.
encoder. The stereo encoder includes, on the input side, a joint
stereo module 15 Which is determining in an adaptive Way
Whether a common stereo encoding, for example in the form
numbers apply to compressed data. A non-compressed CD
channel of course necessitates approximately tenfold data
rates. An example of parametric data are the known scale
used as the stereo encoder, it Will be of advantage to transform
processing block to an MDCT spectrum. Thus, it is ensured
according to the invention that the phase information neces
sary in a precise form for the convolution/evaluation of the
channels in the headphone signal-processing block is con
verted to the MDCT representation not operating in such a
phase-correct Way, such that means for transforming from the
time domain to the frequency domain, i.e. to the MDCT
spectrum, is not necessary for the stereo encoder, in contrast
quency shifting, etc. The parametric multi-channel informa
tion thus includes a relatively rough representation of the
signal or the associated channel. Expressed in numbers, the
amount of data necessary for a carrier channel is in the range
of 60 to 70 kbits/ s, Whereas the amount of data necessary for
parametric side information for a channel is in the range from
1.5 to 2.5 kbits/sec. It is to be mentioned that the above
FIG. 1 to the encoded stereo ?le at the output 14 of the means
of FIG. 1, Without a transformation to the time domain and,
to take place. When an MP3 encoder or an AAC encoder is
Which provide a relatively ?ne representation of the underly
ing signal, Whereas the parametric data do not include such
samples or spectral coe?icients, but control parameters for
ted signal. Nevertheless, the reconstructed signals differ in
amplitude, but they are identical With respect to their phase
information. The energy time envelopes of both original
audio channels, hoWever, are maintained by means of the
selective scaling operation typically operating in a frequency
selective manner. This corresponds to human sound percep
55
tion at high frequencies Where the dominant spatial informa
tion is determined by the energy envelopes.
In addition, in practical implementations, the transmitted
signal, i.e. the carrier channel, is produced from the sum
signal of the left channel and the right channel instead of
60
rotating both components. Additionally, this processing, i.e.
generating intensity stereo parameters for performing the
scaling operations, is performed in a frequency-selective
manner, i.e. independently for each scale factor band, i.e. for
each encoder frequency partition. Advantageously, both
65
channels are combined to form a combined or “carrier” chan
nel and, in addition to the combined channel, the intensity
stereo information. The intensity stereo information depends
US 8,553,895 B2
10
Inter-channel level differences (ICLD) and inter-channel
time differences (ICTD) are calculated in the BCC analysis
block, as has been illustrated above. NoW, the BCC analysis
on the energy of the ?rst channel, the energy of the second
channel or the energy of the combined channel.
The BCC technique is described in the AES Convention
Paper 5574 entitled “Binaural Cue Coding applied to stereo
and multichannel audio compression” by T. Faller, F. Baum
garte, May 2002, Munich. In BCC encoding, a number of
block 116 is also able to calculate inter-channel correlation
5
values (ICC values). The sum signal and the side information
are transmitted to a BCC decoder 120 in a quantiZed and
encoded format. The BCC decoder splits the transmitted sum
audio input channels are converted to a spectral representa
tion using a DPT-based transform With overlapping WindoWs.
The resulting spectrum is divided into non-overlapping por
signal into a number of subbands and performs scalings,
delays and further processing steps to provide the subbands of
tions, of Which each has an index. Each partition has a band
the multi-channel audio channels to be output. This process
Width Which is proportional to the equivalent right-angled
bandWidth (ERB). The inter-channel level differences
ing is performed such that the ICLD, ICTD and ICC param
eters (cues) of a reconstructed multi-channel signal at the
output 121 match the corresponding cues for the original
multi-channel signal at the input 110 in the BCC encoder 112.
(ICLD) and the inter-channel time differences (ICTD) are
determined for each partition and for each frame k. The ICLD
and ICTD are quantized and encoded to ?nally reach a BCC
bitstream as side information. The inter-channel level differ
ences and the inter-channel time differences are given for
each channel With regard to a reference channel. Then, the
parameters are calculated according to predetermined formu
lae depending on the particular partitions of the signal to be
For this purpose, the BCC decoder 120 includes a BCC syn
thesis block 122 and a side information-processing block 123.
Subsequently, the internal setup of the BCC synthesis
20
processed.
On the decoder side, the decoder typically receives a mono
signal and the BCC bitstream. The mono-signal is trans
formed to the frequency domain and input into a spatial
synthesis block Which also receives decoded ICLD and ICTD
ing N spectral coef?cients from N time domain samples.
25
values. In the spatial synthesis block, the BCC parameters
(ICLD and ICTD) are used to perform a Weighting operation
of the mono-signal, to synthesiZe the multi-channel signals
30
In the case of BCC, the joint stereo module 60 is operative
to output the channel-side information such that the paramet
The input signal sn is converted to the frequency domain or
signal output by the element 125 is copied such that several
35
participating original channels.
The above techniques of course only provide a mono
representation for a decoder Which can only process the car
rier channel, but Which is not able to process parametric data
for generating one or several approximations of more than
one input channel.
5-channel surround system, may be output to a set of loud
speakers 124, as are illustrated in FIG. 5 or FIG. 4.
the ?lter bank domain by means of the element 125. The
ric channel data are quantiZed and encoded ICLD or ICTD
parameters, Wherein one of the original channels is used as a
reference channel for encoding the channel-side information.
Normally, the carrier signal is formed of the sum of the
The BCC synthesis block 122 further includes a delay
stage 126, a level modi?cation stage 127, a correlation pro
cessing stage 128 and an inverse ?lter bank stage IFB 129. At
the output of stage 129, the reconstructed multi-channel audio
signal having, for example, ?ve channels in the case of a
Which, after a frequency/time conversion, represent a recon
struction of the original multi-channel audio signal.
block 122 Will be illustrated referring to FIG. 6. The sum
signal on the line 115 is supplied to a time/ frequency conver
sion unit or ?lter bank PE 125. At the output of block 125,
there is a number N of subband signals or, in an extreme case,
a block of spectral coef?cients When the audio ?lter bank 125
performs a 1:1 transformation, i.e. a transformation generat
versions of the same signal are obtained, as is illustrated by
the copy node 130. The number of versions of the original
signal equals the number of output channels in the output
signal. Then, each version of the original signal at the node
130 is subjected to a certain delay d1, d2, . . . , di, . . . , dN. The
40
delay parameters are calculated by the side information-pro
The BCC technique is also described in the US patent
publication US 2003/0219130 A1, US 2003/0026441 A1 and
cessing block 123 in FIG. 5 and derived from the inter
channel time differences as they Were calculated by the BCC
analysis block 116 of FIG. 5.
The same applies to the multiplication parameters a1,
US 2003/ 0035553 Al . Additionally, reference is made to the
45 a2, . . . , ai, . . . , aN, Which are also calculated by the side
expert publication “Binaural Cue Coding. Part II: Schemes
andApplications” by T. Faller and F. Baumgarte, IEEE Trans.
On Audio and Speech Proc., Vol. 11, No. 6, November 2003.
Subsequently, a typical BCC scheme for multi-channel
audio encoding Will be illustrated in greater detail referring to
information-processing block 123 based on the inter-channel
level differences as they Were calculated by the BCC analysis
block 116.
The ICC parameters calculated by the BCC analysis block
50
116 are used for controlling the functionality of block 128 so
55
that certain correlations betWeen the delayed and level-ma
nipulated signals are obtained at the outputs of block 128. It is
to be noted here that the order of the stages 126, 127, 128 may
differ from the order shoWn in FIG. 6.
It is also to be noted that in a frame-Wise processing of the
FIGS. 4 to 6.
FIG. 5 shoWs such a BCC scheme for encoding/transmit
ting multi-channel audio signals. The multi-channel audio
input signal at an input 110 of a BCC encoder 112 is mixed
doWn in a so-called doWnmix block 114. With this example,
the original multi-channel signal at the input 110 is a 5-chan
audio signal, the BCC analysis is also performed frame-Wise,
i.e. temporally variable, and that further a frequency-Wise
nel surround signal having a front-left channel, a front-right
channel, a left surround channel, a right surround channel and
a center channel. In the embodiment of the present invention,
the doWnmix block 114 generates a sum signal by means of a
simple addition of these ?ve channels into one mono-signal.
Other doWnmix schemes are known in the art, so that using
a multi-channel input signal, a doWnmix channel having a
60
signal into, for example, 32 band-pass signals, the BCC
single channel is obtained.
This single channel is output on a sum signal line 115. Side
information obtained from the BCC analysis block 116 is
output on a side-information line 117.
BCC analysis is obtained, as can be seen by the ?lter bank
division of FIG. 6. This means that the BCC parameters are
obtained for each spectral band. This also means that in the
case that the audio ?lter bank 125 breaks doWn the input
65
analysis block obtains a set of BCC parameters for each of the
32 bands. Of course, the BCC synthesis block 122 of FIG. 5,
Which is illustrated in greater detail in FIG. 6, also performs a
reconstruction Which is also based on the exemplarily men
tioned 32 bands.
US 8,553,895 B2
11
12
Subsequently, a scenario used for determining individual
BCC parameters Will be illustrated referring to FIG. 4. Nor
mally, the ICLD, ICTD and ICC parameters may be de?ned
Depending on the circumstances, the inventive method for
generating may be implemented in either hardWare or soft
Ware. The implementation may be on a digital storage
medium, in particular on a disc or CD having control signals
Which can be read out electronically, Which can cooperate
betWeen channel pairs. It is, however, of advantage for the
ICLD and ICTD parameters to be determined betWeen a
reference channel and each other channel. This is illustrated
in FIG. 4A.
ICC parameters may be de?ned in different manners. In
general, ICC parameters may be determined in the encoder
betWeen all possible channel pairs, as is illustrated in FIG. 4B.
There has been the suggestion to calculate only ICC param
eters betWeen the tWo strongest channels at any time, as is
illustrated in FIG. 4C, Which shoWs an example in Which, at
any time, an ICC parameter betWeen the channels 1 and 2 is
calculated and, at another time, an ICC parameter betWeen
the channels 1 and 5 is calculated. The decoder then synthe
siZes the inter-channel correlation betWeen the strongest
channels in the decoder and uses certain heuristic rules for
With a programmable computer system such that the method
Will be executed. In general, the invention also is in a com
puter program product having a program encode stored on a
machine-readable carrier for performing an inventive method
When the computer program product runs on a computer. Put
differently, the invention may also be realiZed as a computer
program having a program encode for performing the method
When the computer program runs on a computer.
While this invention has been described in terms of several
embodiments, there are alterations, permutations, and
equivalents Which fall Within the scope of this invention. It
should also be noted that there are many alternative Ways of
calculating and synthesiZing the inter-channel coherence for
the remaining channel pairs.
20
With respect to the calculation of, for example, the multi
plication parameters a1, aN based on the transmitted ICLD
parameters, reference is made to the AES Convention Paper
No. 5574. The ICLD parameters represent an energy distri
bution of an original multi-channel signal. Without loss of
generality, it is of advantage, as is shoWn in FIG. 4A, to take
4 ICLD parameters representing the energy difference
betWeen the respective channels and the front-left channel. In
implementing the methods and compositions of the present
invention. It is therefore intended that the folloWing appended
claims be interpreted as including all such alterations, permu
tations, and equivalents as fall Within the true spirit and scope
of the present invention.
25
The invention claimed is:
1. A device for generating an encoded stereo signal of an
audio piece or an audio datastream comprising a ?rst stereo
channel and a second stereo channel from a multi-channel
representation of the audio piece or the audio datastream
the side information-processing block 122, the multiplication
comprising information on more than tWo multi-channels,
parameters a1, . . . , aN are derived from the ICLD parameters 30
comprising:
so that the total energy of all reconstructed output channels is
a provider con?gured to provide the more than tWo multi
the same (or proportional to the energy of the sum signal
channels from the multi-channel representation;
a performer con?gured to perform headphone signal pro
transmitted) .
In the embodiment shoWn in FIG. 7, the frequency/time
conversion obtained by the inverse ?lter banks IFB 129 of
35
cessing to generate an uncoded stereo signal With an
uncoded ?rst stereo channel and an uncoded second
FIG. 6 is dispensed With. Instead, the spectral representations
stereo channel, the performer being con?gured to:
of the individual channels at the input of these inverse ?lter
banks are used and supplied to the headphone signal-process
ing device of FIG. 7 to perform the evaluation of the indi
vidual multi-channels With the respective tWo ?lters per
multi-channel Without an additional frequency/time transfor
mation.
evaluate each multi-channel by a ?rst ?lter function
derived from a virtual position of a loudspeaker for
reproducing the multi-channel and a virtual ?rst ear
position of a listener, for the ?rst stereo channel, and
a second ?lter function derived from a virtual position
of the loudspeaker and a virtual second ear position of
the listener, for the second stereo channel, to generate
40
With regard to a complete processing taking place in the
frequency domain, it is to be noted that in this case the
multi-channel decoder, i.e., for example, the ?lterbank 125 of
a ?rst evaluated channel and a second evaluated chan
45
tions of the listener being different,
FIG. 6, and the stereo encoder should have the same time/
frequency resolution. Additionally, it is of advantage to use
one and the same ?lter bank, Which is particularly of advan
tage in that only a single ?lter bank is necessary for the entire
processing, as is illustrated in FIG. 1. In this case, the result is
a particularly e?icient processing since the transformations in
50
the multi-channel decoder and the stereo encoder need not be
calculated.
The input data and output data, respectively, in the inven
tive concept are thus encoded in the frequency domain by
nel for each multi-channel, the tWo virtual ear posi
55
add the evaluated ?rst channels to obtain the uncoded
?rst stereo channel, and
add the evaluated second channels to obtain the uncoded
second stereo channel; and
a stereo encoder con?gured to encode the uncoded ?rst
stereo channel and the uncoded second stereo channel to
obtain the encoded stereo signal, the stereo encoder
being formed such that a data rate necessary for trans
mitting the encoded stereo signal is smaller than a data
means of transformation/?lter bank and are encoded under
rate necessary for transmitting the uncoded stereo sig
psycho-acoustic guidelines using masking effects, Wherein in
nal; Wherein
particular in the decoder there should be a spectral represen
tation of the signals. Examples of this are MP3 ?les, AAC ?les
or AC3 ?les. HoWever, the input data and output data, respec
tively, may also be encoded by forming the sum and differ
60
the multi-channel representation comprises one or several
basic channels as Well as parametric information for
calculating each multi-channel from the one or several
basic channels;
the provider is con?gured to calculate each multi-channel
ence, as is the case in so-called matrixed processes. Examples
of this are Dolby ProLogic, Logic7 or Circle Surround. The
from the one or the several basic channels and the para
data of, in particular, the multi-channel representation may
additionally be encoded by means of parametric methods, as
metric information;
is the case in MP3 surround, Wherein this method is based on
the BCC technique.
65
the provider is con?gured to provide, on an output side of
the provider, a block-Wise frequency domain represen
tation for each multi-channel;
US 8,553,895 B2
14
13
the performer is con?gured to evaluate the block-Wise fre
quency domain representation for each multi-channel by
of the loudspeaker and a virtual second ear position of
the listener, for the second stereo channel, to generate
a frequency domain representation of the ?rst and sec
ond ?lter functions Without a frequency domain to time
a ?rst evaluated channel and a second evaluated chan
nel for each multi-channel, the tWo virtual ear posi
domain conversion;
tions of the listener being different,
the performer is con?gured to generate a block-Wise fre
quency domain representation of the uncoded ?rst stereo
channel and the uncoded second stereo channel; and
adding the evaluated ?rst channels to obtain the uncoded
?rst stereo channel, and
adding the evaluated second channels to obtain the
uncoded second stereo channel; and
stereo-coding the uncoded ?rst stereo channel and the
the stereo encoder is a transformation-based encoder and is
con?gured to process the block-Wise frequency domain
representation of the uncoded ?rst stereo channel and
the uncoded second stereo channel Without a frequency
domain to time domain conversion.
2. The device according to claim 1, Wherein the performer
is con?gured to use the ?rst ?lter function considering direct
sound, re?ections and diffuse reverberation the second ?lter
uncoded second stereo channel to obtain the encoded
stereo signal, the step of stereo-coding being executed
such that a data rate necessary for transmitting the
encoded stereo signal is smaller than a data rate neces
sary for transmitting the uncoded stereo signal; Wherein
the multi-channel representation comprises one or several
basic channels as Well as parametric information for
calculating each multi-channel from the one or several
function considering direct sound, re?ections and diffuse
reverberation.
3. The device according to claim 2, Wherein the ?rst and the
second ?lter functions correspond to a ?lter impulse response
comprising a peak at a ?rst time value representing the direct
20
sound, several smaller peaks at second time values represent
ing the re?ections, each of the second time values being
basic channels and the parametric information;
as a result of the step of providing, a block-Wise frequency
domain representation for each multi-channel is
greater than the ?rst time value, and a continuous region no
longer resolved for individual peaks and representing the
25
nels.
5. The device according to claim 1,
Wherein the stereo encoder is con?gured to quantiZe a
block of spectral values using a psycho-acoustic mask
ing threshold and subject it to entropy encoding to obtain
the encoded stereo signal.
6. The device according to claim 1,
second ?lter functions Without a frequency domain to
30
35
the step of stereo-coding includes using a transformation
based encoder and processing the block-Wise frequency
domain representation of the uncoded ?rst stereo chan
nel and the uncoded second stereo channel Without a
frequency domain to time domain conversion.
9. A non-transitory storage medium having stored thereon
7. The device according to claim 1,
Wherein the provider is a multi-channel decoder compris
ing a ?lter bank comprising several outputs,
Wherein the performer is con?gured to evaluate signals at
the ?lter bank outputs by the ?rst and second ?lter func
a computer program comprising a program code for perform
ing a method When the computer program runs on a computer
for generating an encoded stereo signal of an audio piece or an
audio datastream comprising a ?rst stereo channel and a
45
Wherein the stereo encoder is con?gured to quantiZe the
uncoded ?rst stereo channel in the frequency domain
and the uncoded second stereo channel in the frequency
domain and subject it to entropy encoding to obtain the
encoded stereo signal.
second stereo channel from a multi-channel representation of
the audio piece or the audio datastream comprising informa
tion on more than tWo multi-channels, comprising:
providing the more than tWo multi-channels from the
multi-channel representation;
performing headphone signal processing to generate an
8. A method for generating an encoded stereo signal of an
audio piece or an audio datastream comprising a ?rst stereo
channel and a second stereo channel from a multi-channel
uncoded stereo signal With an uncoded ?rst stereo chan
nel and an uncoded second stereo channel, the step of
performing comprising:
representation of the audio piece or the audio datastream
comprising information on more than tWo multi-channels,
time domain conversion;
the step of performing includes generating a block-Wise
frequency domain representation of the uncoded ?rst
stereo channel and the uncoded second stereo channel;
and
Wherein the provider is formed as a BCC decoder.
tions, and
obtained;
the step of performing includes evaluating the block-Wise
frequency domain representation for each multi-channel
by a frequency domain representation of the ?rst and
diffuse reverberation for third time values, each of the third
time values being greater than a greatest time value of the
second time values.
4. The device according to claim 1,
Wherein the stereo encoder is con?gured to perform a com
mon stereo encoding of the ?rst and second stereo chan
basic channels;
each multi-channel is calculated from the one or the several
55
comprising:
evaluating each multi-channel by a ?rst ?lter function
derived from a virtual position of a loudspeaker for
reproducing the multi-channel and a virtual ?rst ear
providing the more than tWo multi-channels from the
position of a listener, for the ?rst stereo channel, and
multi-channel representation;
performing headphone signal processing to generate an
a second ?lter function derived from a virtual position
of the loudspeaker and a virtual second ear position of
the listener, for the second stereo channel, to generate
uncoded stereo signal With an uncoded ?rst stereo chan
nel and an uncoded second stereo channel, the step of
60
a ?rst evaluated channel and a second evaluated chan
performing comprising:
nel for each multi-channel, the tWo virtual ear posi
evaluating each multi-channel by a ?rst ?lter function
derived from a virtual position of a loudspeaker for
reproducing the multi-channel and a virtual ?rst ear
position of a listener, for the ?rst stereo channel, and
a second ?lter function derived from a virtual position
tions of the listener being different,
65
adding the evaluated ?rst channels to obtain the uncoded
?rst stereo channel, and
adding the evaluated second channels to obtain the
uncoded second stereo channel; and
US 8,553,895 B2
15
16
stereo-coding the uncoded ?rst stereo channel and the
uncoded second stereo channel to obtain the encoded
stereo signal, the step of stereo-coding being executed
such that a data rate necessary for transmitting the
encoded stereo signal is smaller than a data rate neces
sary for transmitting the uncoded stereo signal; Wherein
the multi-channel representation comprises one or several
basic channels as Well as parametric information for
calculating each multi-channel from the one or several
basic channels;
each multi-channel is calculated from the one or the several
basic channels and the parametric information;
as a result of the step of providing, a block-Wise frequency
domain representation for each multi-channel is
obtained;
the step of performing includes evaluating the block-Wise
frequency domain representation for each multi-channel
by a frequency domain representation of the ?rst and
second ?lter functions Without a frequency domain to
time domain conversion;
the step of performing includes generating a block-Wise
frequency domain representation of the uncoded ?rst
20
stereo channel and the uncoded second stereo channel;
and
the step of stereo-coding includes using a transformation
based encoder and processing the block-Wise frequency
domain representation of the uncoded ?rst stereo chan
25
nel and the uncoded second stereo channel Without a
frequency domain to time domain conversion.
*
*
*
*
*
30
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement