EP 1 266 469 B1
&
(19)
(11)
EP 1 266 469 B1
EUROPEAN PATENT SPECIFICATION
(12)
(45) Date of publication and mention
(51) Int Cl.:
H04H 1/02 (2008.01)
of the grant of the patent:
24.09.2008 Bulletin 2008/39
(86) International application number:
PCT/US2001/002908
(21) Application number: 01903410.7
(87) International publication number:
(22) Date of filing: 30.01.2001
WO 2001/058064 (09.08.2001 Gazette 2001/32)
(54) USE OF VOICE-TO-REMAINING AUDIO (VRA) IN CONSUMER APPLICATIONS
VERWENDUNG VON VRA (VOICE-TO-REMAINING AUDIO) BEI VERBRAUCHERANWENDUNGEN
UTILISATION DU REGLAGE SIGNAL VOCAL A SIGNAL AUDIO RESTANT DANS DES
APPLICATIONS CONSOMMATEURS
(84) Designated Contracting States:
(72) Inventors:
AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU
MC NL PT SE TR
(30) Priority: 04.02.2000 US 180220 P
• VAUDREY, Michael, A.
Blacksburg, VA 24060 (US)
• SAUNDERS, William, R.
Blacksburg, VA 24060 (US)
26.05.2000 US 580206
(74) Representative: Vossius & Partner
(43) Date of publication of application:
Siebertstrasse 4
81675 München (DE)
18.12.2002 Bulletin 2002/51
(60) Divisional application:
(56) References cited:
08160819.2
(73) Proprietor: Hearing Enhancement Company LLC
US-A- 4 799 260
US-A- 5 285 503
US-A- 5 820 384
US-A- 5 155 770
US-A- 5 617 331
US-A- 5 852 800
EP 1 266 469 B1
Roanoke, VA 24017 (US)
Note: Within nine months of the publication of the mention of the grant of the European patent in the European Patent
Bulletin, any person may give notice to the European Patent Office of opposition to that patent, in accordance with the
Implementing Regulations. Notice of opposition shall not be deemed to have been filed until the opposition fee has been
paid. (Art. 99(1) European Patent Convention).
Printed by Jouve, 75001 PARIS (FR)
EP 1 266 469 B1
Description
FIELD OF THE INVENTION
5
[0001] Embodiments of the present invention relate generally to a method and apparatus for processing audio signals,
and more particularly, to a method and apparatus for processing audio signals for use in consumer applications.
BACKGROUND OF THE INVENTION
10
15
20
25
30
35
40
45
50
55
[0002] End users with "high-end" or expensive equipment including multi-channel amplifiers and multi-speaker systems, currently have a limited capability to adjust the volume on the center channel signal of a multi-channel audio
system independently of the audio signals on the other remaining channels. Since many movies have mostly dialog on
the center channel and other sound effects located on other channels, this limited adjustment capability allows the end
user to raise the amplitude of the mostly dialog channel so that it is more intelligible during sections with loud sound
effects. Currently, this limited adjustment has important shortcomings. First, it is an adjustment capability that is only
available to the end users that have an expensive digital versatile disk (DVD) player and a multi-channel speaker system
such as a six-speaker home theater system that permits volume level adjustment of all speakers independently. Thus,
user that cannot afford such a system cannot enjoy listening to a recording or broadcast program that allows the end
user to raise the amplitude of the mostly dialog channel so that it is more intelligible.
[0003] Second, it is an adjustment that can only service one customer at a time. For example, if a user selects a level
of mostly dialog to background that improves his listening intelligibility , this level however, may not be satisfactory to
other individual in the room. Thus, there is no way to simultaneously provide this adjustment feature to multiple listeners
having different hearing preferences.
[0004] Also, it is an adjustment that needs to be continuously modified during transients in a preferred audio signal
or voice dialog (center channel) and remaining audio signal (all other channels). The final shortcoming is that a voice to
remaining audio (VRA) adjustments that were acceptable during one audio segment of the movie program may not be
good for another audio segment if the remaining audio level increases too much or the dialog level reduces too much.
[0005] It is a fact that a large majority of end users do not and will not have a home theater that permits this adjustment
capability, i.e., Dolby Digital decoder, six-channel variable gain amplifier and multi-speaker system for many years. In
addition, the end users do not have the ability to ensure that the PVRA ratio selected at the beginning of the program
will stay the same for an entire program.
[0006] FIG 3 illustrates the intended spatial positioning setup of a common home theater system. Although there are
no written rules for audio production in 5.1 spatial channels, there are industry standards. As used herein, the term
"spatial channels refers to the physical location of an output device (e.g., speakers) and how the sound from the output
device is delivered to the end user. One of these standards is to locate the majority of dialog on the center channel 526.
Likewise other sound effects that require spatial positioning will be placed on any of the other four speakers labeled L
521, R 522, Ls 523, and Rs 524 for left, right, left surround and right surround. In addition, to avoid damage to midrange
speakers, low frequency effects (LFE) are placed on the 0.1 channel directed toward a subwoofer speaker 525. Digital
audio compression allows the producer to provide the user with a greater dynamic range for the audio that was not
possible through analog transmission. This greater dynamic range causes most dialog to sound too low in the presence
of some very loud sound effects. The following example provides an explanation. Suppose an analog transmission (or
recording) has the capability to transmit dynamic range amplitudes up to 95 dB and dialog is typically recorded at 80
dB. Loud segments of remaining audio may obscure the dialog when that remaining audio reaches the upper limit while
someone is speaking. However, this situation is exacerbated when digital audio compression allows a dynamic range
up to 105 dB. Clearly, the dialog will remain at the same level (80 dB) with respect to other sounds, only now the loud
remaining audio can be more realistically reproduced in terms of its amplitude. User complaints that dialog levels have
been recorded too low on DVD’s are very common. In fact, the dialog IS at the proper level and is more appropriate and
realistic than what exists for analog recordings with limited dynamic range.
[0007] Even for consumers who currently have properly calibrated home theater systems, dialog is frequently masked
by the loud remaining audio sections in many DVD movies produced today. A small group of consumers are able to find
some improvement in intelligibility by increasing the volume of the center channel and/or decreasing the volume of all
of the other channels. However, this fixed adjustment is only acceptable for certain audio passages and it disrupts the
levels from the proper calibration. The speaker levels are typically calibrated to produce certain sound pressure level
(SPL)s in the viewing location. This proper calibration ensures that the viewing is as realistic as possible. Unfortunately
this means that loud sounds are reproduced very loud. During late night viewing, this may not be desirable. However,
any adjustment of the speaker levels will disrupt the calibration.
2
EP 1 266 469 B1
SUMMARY OF THE INVENTION
5
[0008] A method for providing multiple users with voice-to-remaining audio (PVRA) adjustment capability includes
receiving at a first decoder a voice signal and a remaining audio signal and simultaneouly receiving at a second decoder,
the voice signal and the remaining audio signal, wherein the voice signal and the remaining audio signal are received
separately; and separately adjusting by each of the decoders, the separately received voice and remaining audio signals.
BRIEF DESCRIPTION OF THE DRAWING
10
15
20
25
30
35
40
45
[0009]
FIG 1 illustrates a general approach according to the present invention for separating relevant voice information
from general background audio in a recorded or broadcast program.
FIG 2 illustrates an exemplary embodiment according to the present invention for receiving and playing back the
encoded program signals.
FIG 3 illustrates the intended spatial positioning setup of a common home theater system.
FIG 4 illustrates a block diagram of a voice-to-remaining audio (VRA) system for simultaneous multiple broadcasting
according to an embodiment of the present invention.
FIG 5 illustrates an embodiment for multi-channel transmission according to the present invention.
FIG 6 illustrates an alternative embodiment of the present invention.
FIG 7 illustrates an alternative embodiment of the present invention.
FIG 8 illustrates an alternative embodiment of the present invention that includes signal processing for multi-channel
presentation.
FIG 9 illustrates an alternative embodiment of the present invention.
FIG 10 illustrates an embodiment of having a voice component and remaining audio component summed and
continually adjusted by a single control.
FIG 11 illustrates an alternative embodiment of the present invention utilizing Auto VRA.
FIG 12 illustrates an embodiment of the present invention wherein various functions of a slide control are shown.
FIG 13 illustrates a flow diagram of the various functions of the slide control.
FIG 14 illustrates an alternative embodiment of the present invention.
FIG 15 illustrates an alternative embodiment of the present invention.
FIG 16 illustrates an airline VRA adjustment box according to an embodiment of the present invention.
FIG 17 illustrates an alternative embodiment of the present invention.
FIG 18 illustrates an alternative embodiment of the present invention.
FIG 19 illustrates a headset configuration according to an embodiment of the present invention.
FIG 20 illustrates an embodiment for retaining delivery of a production mix to an end user in addition to providing
VRA adjustment capability according to the principles of the present invention.
FIG 21 illustrates an alternative embodiment of FIG 20.
FIG 22 illustrates a production process according to an embodiment of the present invention.
FIG 23 illustrates alternative embodiment of the present invention.
FIG 24 illustrates a user in a multi-channel listening environment
FIG 25 illustrates VRA and AutoVRA on multi-channel processed headphones according to the principles of the
present invention.
FIG 26 illustrates a conventional reproduction process.
FIG 27 illustrates an alternative embodiment of the present invention.
FIG 28 illustrates an alternative embodiment of the present invention.
DETAILED DESCRIPTION
50
55
[0010] The present invention describes a method and apparatus for providing voice-to-remaining audio capability. In
addition, the invention discloses technological, ergonomical, economical, and applications specific improvements to
voice-to-remaining audio (VRA) and AutoVRA. VRA refers to the personalized adjustment of an audio program’s voice
to remaining audio ratio by separately adjusting the vocal (speech or voice) volume independently of the separate
adjustment of the remaining audio volume (which may include music, sound effects, laughter, or other non-speech
sounds that are included in a total audio program). AutoVRA or AutoVRA hold refers to the automatic adjustments of
the VRA ratio so that program transients (such as an explosion) do not obscure the voice.
3
EP 1 266 469 B1
Significance of Ratio of Preferred Audio to Remaining Audio
5
[0011] The present invention begins with the realization that the listening preferential range of a ratio of a preferred
audio signal relative to any remaining audio is rather large, and certainly larger than ever expected. This significant
discovery is the result of a test of a small sample of the population regarding their preferences of the ratio of the preferred
audio signal level to a signal level of all remaining audio.
Specific Adjustment of Desired Range for Hearing Impaired or Normal Listeners
10
15
20
25
30
35
40
45
50
55
[0012] Very directed research has been conducted in the area of understanding how normal and hearing impaired
users perceive the ratio between dialog and remaining audio for different types of audio programming. It has been found
that the population varies widely in the range of adjustment desired between voice and remaining audio.
[0013] Two experiments have been conducted on a random sample of the population including elementary school
children, middle school children, middle-aged citizens and senior citizens. A total of 71 people were tested. The test
consisted of asking the user to adjust the level of voice and the level of remaining audio for a football game (where the
remaining audio was the crowd noise) and a popular song (where the remaining audio was the music). A metric called
the VRA (voice to remaining audio) ratio was formed by dividing the linear value of the volume of the dialog or voice by
the linear value of the volume of the remaining audio for each selection.
[0014] Several things were made clear as a result of this testing. First, no two people prefer the identical ratio for voice
and remaining audio for both the sports and music media. This is very important since the population has relied upon
producers to provide a VRA (which cannot be adjusted by the consumer) that will appeal to everyone. This can clearly
not occur, given the results of these tests. Second, while the VRA is typically higher for those with hearing impairments
(to improve intelligibility) those people with normal hearing also prefer different ratios than are currently provided by the
producers.
[0015] It is also important to highlight the fact that any device that provides adjustment of the VRA must provide at
least as much adjustment capability as is inferred from these tests in order for it to satisfy a significant segment of the
population. Since the video and home theater medium supplies a variety of programming, we should consider that the
ratio should extend from at least the lowest measured ratio for any media (music or sports) to the highest ratio from
music or sports. This would be 0.1 to 20.17, or a range in decibels of 46 dB. It should also be noted that this is merely
a sampling of the population and that the adjustment capability should theoretically be infinite since it is very likely that
one person may prefer no crowd noise when viewing a sports broadcast and that another person would prefer no
announcement. Note that this type of study and the specific desire for widely varying VRA ratios has not been reported
or discussed in the literature or prior art.
[0016] In this test, an older group of men was selected and asked to do an adjustment (which test was later performed
on a group of students) between a fixed background noise and the voice of an announcer, in which only the latter could
be varied and the former was set at 6.00. The results with the older group were as follows:
Table I
Individual
Setting
1
7.50
2
4.50
3
4.00
4
7.50
5
3.00
6
7.00
7
6.50
8
7.75
9
5.50
10
7.00
11
5.00
[0017] To further illustrate the fact that people of all ages have different hearing needs and preferences, a group of
21 college students was selected to listen to a mixture of voice and background and to select, by making one adjustment
to the voice level, the ratio of the voice to the background. The background noise, in this case crowd noise at a football
game, was fixed at a setting of six (6.00) and the students were allowed to adjust the volume of the announcers’ play
by play voice which had been recorded separately and was pure voice or mostly pure voice. In other words, the students
4
EP 1 266 469 B1
were selected to do the same test the group of older men did. Students were selected so as to minimize hearing infirmities
caused by age. The students were all in their late teens or early twenties. The results were as follows:
5
10
15
20
25
30
35
40
45
50
55
Table II
Student
Setting of Voice
1
4.75
2
3.75
3
4.25
4
4.50
5
5.20
6
5.75
7
4.25
8
6.70
9
3.25
10
6.00
11
5.00
12
5.25
13
3.00
14
4.25
15
3.25
16
3.00
17
6.00
18
2.00
19
4.00
20
5.50
21
6.00
[0018] The ages of the older group (as seen in Table I) ranged from 36 to 59 with the preponderance of the individuals
being in the 40 or 50 year old group. As is indicated by the test results, the average setting tended to be reasonably
high indicating some loss of hearing across the board. The range again varied from 3.00 to 7.75, a spread of 4.75 which
confirmed the findings of the range of variance in people’s preferred listening ratio of voice to background or any preferred
signal to remaining audio (PSRA). The overall span for the volume setting for both groups of subjects ranged from 2.0
to 7.75. These levels represent the actual values on the volume adjustment mechanism used to perform this experiment.
They provide an indication of the range of signal to noise values (when compared to the "noise" level 6.0) that may be
desirable from different users.
[0019] To gain a better understanding of how this relates to relative loudness variations chosen by different users,
consider that the non-linear volume control variation from 2.0 to 7.75 represents an increase of 20 dB or ten (10) times.
Thus, for even this small sampling of the population and single type of audio programming it was found that different
listeners do prefer quite drastically different levels of "preferred signal" with respect to "remaining audio." This preference
cuts across age groups showing that it is consistent with individual preference and basic hearing abilities, which was
heretofore totally unexpected.
[0020] As the test results show, the range that students (as seen in Table II) without hearing infirmities caused by age
selected varied considerably from a low setting of 2.00 to a high of 6.70, a spread of 4.70 or almost one half of the total
range of from 1 to 10. The test is illustrative of how the "one size fits all" mentality of most recorded and broadcast audio
signals falls far short of giving the individual listener the ability to adjust the mix to suit his or her own preferences and
hearing needs. Again, the students had a wide spread in their settings as did the older group demonstrating the individual
differences in preferences and hearing needs. One result of this test is that hearing preferences are widely disparate.
[0021] Further testing has confirmed this result over a larger sample group. Moreover, the results vary depending
upon the type of audio. For example, as shown in FIG 3, when the audio source was music, the ratio of voice to remaining
audio varied from approximately zero to about 10, whereas when the audio source was sports programming the same
ratio varied between approximately zero and about 20. In addition, the standard deviation increased by a factor of almost
three, while the mean increased by more than twice that of music.
[0022] The end result of the above testing is that if one selects a preferred audio to remaining audio ratio and fixes
that forever, one has most likely created an audio program that is less than desirable for a significant fraction of the
population. And, as stated above, the optimum ratio may be both a short-term and long-term time varying function.
5
EP 1 266 469 B1
5
Consequently, complete control over this preferred audio to remaining audio ratio is desirable to satisfy the listening
needs of "normal" or non-hearing impaired listeners. Moreover, providing the end user with the ultimate control over this
ratio allows the end user to optimize his or her listening experience.
[0023] The end-user’s independent adjustments of the preferred audio signal and the remaining audio signal will be
the apparent manifestation of one aspect of the present invention. To illustrate the details of the present invention,
consider the application where the preferred audio signal is the relevant voice information.
Creation of the Preferred Audio Signal and the Remaining Audio Signal
10
15
20
25
30
[0024] FIG 1 illustrates a general approach to separating relevant voice information from general background audio
in a recorded or broadcast program. There will first need to be a determination made by the programming director as to
the definition of relevant voice. An actor, group of actors, or commentators must be identified as the relevant speakers.
[0025] Once the relevant speakers are identified, their voices will be picked up by the voice microphone 301. The
voice microphone 301 will need to be either a close talking microphone (in the case of commentators) or a highly
directional shot gun microphone used in sound recording. In addition to being highly directional, these microphones 301
will need to be voice-band limited, preferably from 200-5000 Hz. The combination of directionality and band pass filtering
minimize the background noise acoustically coupled to the relevant voice information upon recording. In the case of
certain types of programming, the need to prevent acoustic coupling can be avoided by recording relevant voice of
dialogue off-line and dubbing the dialogue where appropriate with the video portion of the program. The background
microphones 302 should be fairly broadband to provide the full audio quality of background information, such as music.
[0026] A camera 303 will be used to provide the video portion of the program. The audio signals (voice and relevant
voice) will be encoded with the video signal at the encoder 304. In general, the audio signal is usually separated from
the video signal by simply modulating it with a different carrier frequency. Since most broadcasts are now in stereo, one
way to encode the relevant voice information with the background is to multiplex the relevant voice information on the
separate stereo channels in much the same way left front and right front channels are added to two channel stereo to
produce a quadraphonic disc recording. Although this would create the nerd for additional broadcast bandwidth, for
recorded media this would not present a problem, as long as the audio circuitry in the video disc or tape player is designed
to demodulate the relevant voice information.
[0027] Once the signals are encoded, by whatever means deemed appropriate, the encoded signals are sent out for
broadcast by broadcast system 305 over antenna 313, or recorded on to tape or disc by recording system 306. In case
of recorded audio video information, the background and voice information could be simply placed on separate recording
tracks..
Receiving and Demodulating the Preferred Audio Signal and the Remaining Audio
35
40
45
50
[0028] FIG 2 illustrates an exemplary embodiment for receiving and playing back the encoded program signals. A
receiver system 307 demodulates the main carrier frequency from the encoded audio/video signals, in the case of
broadcast information. In the case of recorded media 314, the heads from a VCR or the laser reader from a CD player
308 would produce the encoded audio/video signals.
[0029] In either case, these signals would be sent to a decoding system 309. The decoder 309 would separate the
signals into video, voice audio, and background audio using standard decoding techniques such as envelope detection
in combination with frequency or time division demodulation. The background audio signal is sent to a separate variable
gain amplifier 310, that the listener can adjust to his or her preference. The voice signal is sent to a variable gain amplifier
311, that can be adjusted by the listener to his or her particular needs, as discussed above.
[0030] The two adjusted signals are summed by a unity gain summing amplifier 312 to produce the final audio output
Alternatively, the two adjusted signals are summed by unity gain summing amplifier 312 and further adjusted by variable
gain amplifier 315 to produce the final audio output. In this manner the listener can adjust relevant voice to background
levels to optimize the audio program to his or her unique listening requirements at the time of playing the audio program.
As each time the same listener plays the same audio, the ratio setting may need to change due to changes in the
listener’s hearing, the setting remains infinitely adjustable to accommodate this flexibility.
VRA and Auto VRA Embodiments
55
[0031] As stated above, preferred ratio of voice to remaining audio differs significantly for different people and differs
for different types of programming (sports versus music, etc.). FIG 4 is a block diagram illustrating PVRA system for
simultaneous multiple users according to an embodiment of the present invention. As shown, system 400 includes
transceiver 221, and a plurality of playback devices such as personal listening devices (PLD) 220. Although only three
PLDs are shown, more PLDs can be used without departing from the spirit and scope of the present invention.
6
EP 1 266 469 B1
5
10
15
20
25
30
35
[0032] Transceiver 221 includes a receiver component 223 and a transmitter component 222 and receives either a
broadcast or a recorded signal 235. According to one embodiment of the present invention, signal 235 includes a separate
voice component signal and remaining audio component signal transmitted to transceiver 221 simultaneously. These
signals could be decoded by a decoder (not shown) before further processing. Alternatively, signal 235 could processed
by system components and circuitry in transmitter 222 such that a separate voice component 239 and a separate
remaining audio component 240 are created.
[0033] The separate voice and remaining audio signal components are transmitted to each PLD by transceiver 221
by way of a wireless or infrared transmit or by multi-wired transmit. The received signals are received by PLD receiver
231 which may be for example, an infrared receiver, a wireless radio frequency receiver, or a mult-port audio input jack
for a wired connection. One of the outputs from PLD receiver 231 received voice signal 239, is sent to a separate variable
gain amplifier 229, that the end user can adjust to his or her preference. The other output, received remaining audio
signal 240, is sent to a variable gain amplifier 230, that can be adjusted by the listener to his or her particular listening
preference. These adjusted signals are summed by adder 228 and may also be further adjusted by gain amplifier 227
before being forwarded to transducer 226. Transducer converts the electrical signal from gain amplifier 227 into an
audible acoustic audio signal 232.
[0034] As discussed above, the embodiment shown in FIG 4 discloses transmitting two (or more) signals where at
least one signal is a speech only or mostly speech only signal (voice) and the other signal(s) contains the remaining
audio (which may also contain some speech). If the remaining audio contains some speech, however, the VRA ratio
can only be made more positive, and will improve dialog intelligibility.
[0035] For multiple users in the same environment, a separate adjustment of the VIA can be accomplished if each
user is listening to a program on a personal listening device (PLD) which may include but is not limited to headphones,
hearing aids, cochlear implants, assistive listening devices, eyewear or headwear that incorporates speakers. Such
eyewear may include, for example, eyeglasses worn with speakers, or wearable computers. A PLD as used in this
context will be defined to mean an audio reproduction device capable of receiving an electrical or wireless signal and
converting it into audible sound in a manner that does not disturb other listeners in the same general environment.
[0036] After reception of the two (or more) signals at the personal listening device, the signals are separately adjusted
by independent volume controls (or other types of controls as described later) so that a preferred VRA for that individual
user is achieved. The signals are then combined, and further amplified and adjusted and transduced by the personal
listening device to become audible sound. Since the personal listening device is not interfering with others in the same
listening environment who may also have a personal listening device (with a different preferred PVRA setting), multiple
listeners in the same environment can independently adjust the PVRA for their own listening pleasure. This is facilitated
by the fact that the signals are transmitted (wired or wirelessly) to everybody in the audience simultaneously. One
possible application for this technology is in a public movie theater. Multiple listeners can all enjoy the same movie with
independent VRA adjustments on their headsets, ALD’s, hearing aids or other personal listening devices as discussed
above. FIG 4 illustrates these points through the hollowing brief descriptions.
Transmission Into Audience
40
45
50
55
[0037] In order for each end user to independently adjust the level of the voice with respect to the remaining audio,
the signals either arrive at the personal listening device separately or arrive in such a way (possibly encoded) that the
two signals can be separated prior to independent adjustment The transmission of two signals, for example, can be
accomplished using an FM stereo transmission where the voice (or remaining audio) is transmitted on the left (or right)
channel. If a stereo program is desired at the PLD then a more complex multi-channel transmission is required. If both
voice and the remaining audio have spatial information, a four channel transmission (either wired or wireless) and
reception is required to present the multi-channel program to the end user.
[0038] FIG 5 illustrates one possible embodiment for this multi-channel transmission according to the principles of the
present invention, The left voice and right voice programs are multiplexed (or instead coded) together by a multiplexer
9 as are the remaining audio left and right programs by a multiplexer 10. This allows a two-channel stereo transmission
transmitted by a transmitter 11 through a wired or wireless means 12 to be received by a stereo receiver 13. The four
signals are then derived and adjusted 16 independently to form a total left 17 and right 18 program with spatial information
from the voice and remaining audio signals. There are many possible ways to transmit these signals for separate
adjustment while retaining spatial information. Others might include the transmission of a left and right remaining audio
program along with a separate mono voice channel (since vocal information is predominantly non-spatial).
Center Channel Adjustment
[0039] As an extension to the above discussion, the transmission of the center channel of multi-channel programming
is also considered to be related to the VRA adjustments capability. For most multi-channel programming, the center
7
EP 1 266 469 B1
5
10
channel contains most of the dialog in a movie. In addition, most sound effects and music are directed toward one or
more of the other 4.1 audio channels. Currently, a purely voice channel is not made available to the general public.
Therefore, before a pure voice channel is made available to the general public for most broadcasting and recording, the
center channels can serve as the voice channel described above. Therefore the receiver in FIG 4 could be a multichannel sound decoder such as a Digital Theater Sound (DTS), Sony Dynamic Digial Sound (SDDS), Dolby Digital, or
other multi-channel format decoder. The output of such a decoder 19 as shown in FIG 6, converts a digital input into
left, right, left surround, right surround, center and subwoofer analog outputs. The mixer 20 can combine all channels
but then center (in varying ratios depending on the desired spatial effect) so that the output is either a stereo or mono
signal channel that is transmitted 21 separately and simultaneously with the center channel which approximates the
dialog only channel. The reception can be carried out as shown in FIG 4.
Decoder Inside Personal Listening Device and Provision for Spatial Processing
15
20
25
30
35
40
45
50
55
[0040] It should be noted that although the embodiment in FIG 6 combined with FIG 4 implies transmission of analog
signals to the PLD with the receiver or multi-channel decoder in a centralized location, it is not beyond the scope of this
invention to include the multi-channel decoder in the PLD where the signal that is transmitted is a digital signal that
requires decoding in order to extract the voice and remaining audio. FIG 7 illustrates this concept. A digital signal that
is read from a media source, by way of example (DVD, CD, TIVO) or Replay recorder, etc.) or received from a broadcast
(as in digital television or digital radio) is transmitted 22 directly to the PLD 28. The PLD has a built-in receiver 23 to
receive the infrared, radio or other broadcast signal that feeds the decoder 24 that is designed to meet the decoding
specification of the intended compression format for which it operates, by way of example (such as Dolby Digital or
DTS). The mixer 26 uses the outputs of the decoder to generate the remaining audio and voice signals (either pure
voice signal or center channel signal) that are adjusted separately by the user with gain amplifiers and/or attenuators
25 and are then recombined as disclosed above and are transduced 27 to audible audio that is the output of the PLD.
This particular embodiment may be more feasible to achieve a multi-channel audio presentation at the PLD because
the transmitted signal is digital (less susceptible to interfering noise)and only requires one transmission channel; but
may be more expensive as the decoding process is done in each individual PLD rather than at a centralized location.
The multi-channel presentation may include any signal processing that spatially relocates the left, left surround, right,
right surround and or center aural presentations to feel more natural in a PLD such as, by way of example, a headset.
The VRA adjustment is intended to work in conjunction with this type of processing in order to provide improved dialog
intelligibility without affecting any spatial processing done in addition to the user adjustable VRA mixture.
[0041] FIG 8 provides further detail of one possible embodiment that includes signal processing for multi-channel
presentation. The receiver 29 and decoder 30 are located either centrally or on the PLD depending on the embodiment
preferred for implementation. The center channel or other pure voice channel is separately adjusted 31 before the spatial
processing occurs as is the level adjustment of all remaining audio 32. The spatial processing 33 then receives the multichannel presentation (or in some cases two channel presentation) as it was originally recorded and then generates a
more realistic sound stage for the PLD. In this case the spatial processing 33 is not affected by the VRA adjustment but
the user can still select the desired level of the voice with respect to the remaining audio.
"Volume Controls" enabled by VRA purpose (attenuator)
[0042] The physical adjustment mechanisms of the voice and remaining audio and total volume signal control have
many possible embodiments. The most general adjustment mechanism occurs when the voice has a user adjustable
gain, the remaining audio signal has a user adjustable gain, and the total volume of the summed signal has a further
gain adjustment (total volume control). This is shown in detail in FIG 4. When a user is attempting to set the VRA ratio
in addition to a desired overall level for comfort, a further embodiment provides a more user friendly adjustment mechanism
with fewer steps in the adjustment process. As with most entertainment, dialog is the target sound around which the
program centers. Therefore, as shown in FIG 9, the overall level of the dialog will control the loudness of the program,
i.e. the overall program level is usually set based on the level of the dialog. Therefore by having only two controls (a
total volume level adjustment and a remaining audio attenuator) the user can select the desired PVRA and total volume
level with a simple two step process. First the overall level is set with the total volume adjustment 37 (in FIG 9) thereby
adjusting the level of the voice in the main program. At this point the dialog is at the desired listening level and only the
VRA needs to be set. By having only an attenuator on the remaining audio, the intelligibility can be improved to a
theoretical 100% by turning down the remaining audio without affecting the voice level. In addition, an attenuator can
be implemented using a variable voltage divider which requires no power but still allows the user to adjust to all values
of VRA ratios greater than 0 dB. To conserve additional power, the main volume adjustment 37 following the summer
36 can also be implemented as an attenuator. Provided the amplifier 38 has been designed with sufficient gain to power
the transducer 39 for the loudest volume level when the attenuator 37 is passing the full signal with no voltage division.
8
EP 1 266 469 B1
5
As an additional embodiment, the volume 35 can be placed on the voice instead of the remaining audio to allow the user
to control the overall program level as a function of the remaining audio rather than the dialog. It would not be desirable
to place an attenuator on the voice because positive PVRA ratios could not be achieved. Instead it would be necessary,
if the current embodiment were implemented, to place an active gain stage on the voice so that the level could be raised
sufficiently high over the unaffected remaining audio so as to provide a sufficiently positive VRA ratio. The overall
loudness is then controlled by the total volume control as before.
"Ratio Balance" Realized in a Single Dial
10
15
[0043] A further embodiment for VRA and total volume adjustment according to the present invention is provided with
a VRA knob used as a single adjustment with two inputs. This single knob adjustment, contrary to the dual knob level
controls and the single knob attenuator, can adjust the balance between the voice and remaining audio. FIG 10 illustrates
the voice and remaining audio being summed and continually adjusted by a single control 40 and further adjusted with
a total audio gain control (active or attenuator) 41. The balance control itself has received a well known application in
the adjustment of front to back fade or right to left balance on car or home stereo systems. There is a key difference in
this application in that it is adjusting the ratio of the content rather than the location of the audio on individual speakers.
In fact, further balance style controls can be implemented in order to adjust the spatial positioning of the audio if desired.
By implementing the single knob VRA control, the user can completely adjust the VRA (all ratios available) using a single
knob. The overall volume can then be adjusted based on the desired level.
20
AutoVRA
25
[0044] The AutoVRA hold feature allows the end user to not only adjust the desired ratio of voice to remaining audio
but to "lock-in" that ratio with respect to transient volume changes in either the voice or the remaining audio. For example,
a football game contains dialog from the announcers and background noise from the cheeping fans. If the desired VRA
is set during a point in time when the fans are relatively quiet, it is possible that as the fans become louder (but the
announcers stay at the same level) the crowd noise could obscure the voice of the announcers. Likewise if the VRA is
set during a segment when the announcers are speaking very loud, it is possible that when the announcers return to a
normal speaking volume, the level is too low for good intelligibility.
30
VRA Technique Based on Standard Deviation
35
40
45
50
[0045] To prevent the user from continually adjusting these levels, the user can press a button after the ratio has been
set and that ratio will be stored and maintained for the rest of the programming. One method for accomplishing this is
at the moment the button is selected, the standard deviations of the voice signal and the remaining audio signal are
computed and stored. Then as the program continues, a real time computation of the standard deviation for each signal
continues. If that deviation exceeds the stored value, the signal is multiplied by the ratio of the stored value over the
actual value, thereby lowering the volume. Likewise if the deviation is too far below the stored value the signal can be
multiplied by the same ratio to raise the level. If it is desired to raise the level (when the actual deviation is below the
stored deviation) then it is necessary to detect segments when the signal is not present so the noise floor isn’t amplified
unnecessarily. /A situation could occur where that ratio becomes close to infinity if the actual deviation is near zero. The
most general form of the Auto VRA method discussed here is shown in the following equation wherein:
G1 = volume control of the voice;
G2 = volume control of remaining audio;
G3 = total volume control;
V = voice;
RA = remaining audio;
σVactual = standard deviation of the actual voice;
σRactual = standard deviation of the actual remaining device;
σVstored = standard deviation of the stored voice;
σRstored = standard deviation of the stored remaining audio.
55
9
EP 1 266 469 B1
5
10
15
20
25
30
35
40
45
50
55
[0046] The stored standard deviation of each of the respective signals (voice and remaining audio) is stored and
compared to the actual standard deviations in real time. Here, standard deviation is used as a measure of the level of
each of the signals. Other metrics may also be used including the peak level over a period of time. In order to control
the volume adjustment and the effects it has on the overall signal level output, it may be desirable to compute the
standard deviations after the gains G1 and G2 have been applied to the signal. The results are slightly different in that
after the standard deviations have been stored, further volume adjustments on the voice and remaining audio will be
ineffective until a new stored value is entered. If this is a desirable feature then the deviation computations for V (voice)
and RA (remaining audio) should include the user selectable gains G1 and G2 If further adjustment is desired then the
gains can be applied after the deviation computation and multiplication as shown above.
[0047] FIG 11 illustrates these concepts in further detail. Both the voice and remaining audio signal experience the
same operations separately. It should be noted that a simpler and very effective implementation of this concept is to
remove the operations performed on the voice signal and only modify the remaining audio as the standard deviation
changes. The reasoning behind this is to decrease the required computational overhead by half (removing computation
performed on the voice signal) because it is assumed that variation in the dialog channel is small when compared to the
variation that might exist on the remaining audio channel. Nevertheless, the most general implementation is presented
in FIG 11 showing operations on both the voice and remaining audio signals. The dotted lines indicate redundant options
that are not to be used in conjunction with the solid lines, but do offer differences in performance as described in the
previous paragraph. The user adjustable dialog gain 45 can be applied either before (using element 46) or after (using
element 44) the standard deviation calculation occurs. When a desired performance is selected by the user at an instant
in time, the standard deviations of the voice and remaining audio are stored to a memory location (47 and 47A) that can
be volatile or non-volatile memory. This stored value is used in the numerator of the multiplication process 48 and 48A
for each signal while the denominator is the current actual standard deviation whether before or after the user adjustable
gain stage. (Note that both solid and dotted are not implemented at the same time). Not shown is a conditional that
decides if the current ratio is higher or lower than unity. If it is lower than unity, that indicates that the current actual level
is higher than the stored level and the volume should be reduced by the ratio. If it is higher it may be desirable to not do
anything and simply pass the signal affected only by the user adjustable gain value (this requires an "if’ style statement
that checks a ratio condition against the current condition and makes a decision). This prevents very large ratios from
multiplying lower level signals resulting in very high noise levels for quiet passages. Furthermore, a lower limit may be
set on the ratio through a further condition that will allow moderately low levels to be amplified accordingly but very low
levels (or absence) of signal to be unmodified or modified by the last value of the ratio before the condition was violated.
Storage of Distinct VRA and AutoVRA Settings
[0048] The storage of user preferred ratio levels in 47 and 47A is beneficial for controlling the sound over different
types of programming or different listeners using the same hardware/software. Since all users prefer a different VRA
and the respective audio levels may change for different types of programming, it is considered necessary to provide
for multiple storage areas for different types of programming and different users. Applying, by way of example, a name
or a password to each storage location will allow different users to recall different VRA settings for specific programs.
Depending on the method used from FIG 11, the storage elements may include the desired voice level, the desired
voice standard deviation, the desired remaining audio level, and/or the desired remaining audio standard deviation. This
will allow a user to return to the playback device with the same setting (possibly a different one for sports than for sitcoms)
without readjusting the VRA levels and resetting the Hold feature. There is no limit specified to the number of storage
locations that can be available on the playback device. FIG 11 illustrates the user adjustments, by way of example, as
buttons that select the current standard deviations to be the stored standard deviations. In addition there are user controls
over G1, G2, and G3. There are several methods to provide these adjustments to the end user depending on the hardware
that they are used on. A headset, for example might have several buttons to store different ratios and select those ratios
depending on the duration that the button is held. If these controls are used with a personal computer, personal data
assistant, or cell phone, they might be graphical user interface controls implemented using software. In order to further
simplify the adjustments, it is possible to combine all of the adjustments (VRA and AutoVRA ratio hold) into a single
control. The ratio of the remaining audio to the voice can be controlled as in FIG 10 with a single balance control.
10
EP 1 266 469 B1
5
10
15
20
25
However, in order to implement the AutoVRA feature described by FIG 11, it is necessary to modify the performance of
the knob by adding the ratio hold.
[0049] FIG 12 is a diagram that illustrates the various functions of a slider control that has been designed to accomplish
all of the functions using a single control. (Note that this can be any type of control including a rotary knob, software
control, incremental push button, etc. but the functionality will be the same). The center position of the VRA/AutoVRA
control will provide the user with the original mix where voice and remaining audio are approximately equal. As the knob
is gradually moved to the left the voice level does not change but the remaining audio begins lowering, but has not
engaged the hold function. At some predetermined distance away from the dead center condition (where N is equal to
that condition and may be as small as zero if desired) the value to which the standard deviation is compared begins
decreasing with movement of the knob, i.e. compression of the remaining audio begins. This continues until the stored
standard deviation (changing with knob movement) becomes so small that the division results in a number that is near
zero and the output of the multiplication is essentially zero and there is only voice. The reverse occurs with the remaining
audio at the other end of the knob.
[0050] FIG 13 illustrates the block diagram of the all-in-one knob shown in FIG 12. The knob controls the stored value
of the standard deviation of the remaining audio after the knob reaches the point N at the left side of the dial. Likewise
the stored standard deviation of the voice is adjusted by moving the knob to the far right One possible alternative to the
embodiment shown in FIG 13 is to eliminate the voice Auto VRA control thereby only lowering the actual level of the
voice as the knob moves to the left. (The argument for such an embodiment is presented in preceding paragraphs).
Referring to the block diagram in FIG 13, the remaining audio standard deviation is computed 52 and compared 53 with
the stored remaining audio standard deviation 56 that is controlled by movement of the knob 57 position to the left past
the point N. If the actual standard deviation exceeds the stored standard deviation then the remaining audio is multiplied
by the stored value and divided by the actual value before it is modified by the volume level of the remaining audio 55
also controlled by the main knob 57. If it is not greater then the remaining audio is simply multiplied by the current knob
setting 55 before being combined with the adjusted voice. As the knob is moved further to the right the same operations
occur with the voice track. This single knob adjustment is particularly useful in applications where space is an issue such
as a headset or hearing aid. This allows all the functionality of multiple controls but only the single knob is needed to
affect all of the adjustments.
Further VRA Consumer Applications
30
[0051]
35
40
45
50
55
Other VRA consumer applications may include:
• A portable "belt box" that receives and transmits an adjusted and/or controlled signal to an acoustic transducer
• A retrofittable device used to facilitate PVRA adjustment for airline in-flight movies
• A separate audio decoder that can be used in conjunction with existing home theater hardware to provide additional
VRA adjustments for multi-user applications
• Headphone VRA adjustments, remote commander with VRA adjustment capability
VRA hardware has been designed specifically for providing the VRA adjustment capability whereas the descriptions in
the following section explain how the VRA hardware can be integrated into existing audio reproduction hardware. This,
however, does not limit the hardware described in this section from being used in conjunction with existing audio reproduction hardware. In fact, as will be seen, the VRA specific hardware has been designed to interface specifically with
existing audio reproduction hardware such as a television or home theater system. It should also be noted that each of
the specific embodiments discussed in the preceding section can be directly applied to each of the inventions discussed
in this section to form a new user friendly invention for adjusting the PVRA ratio. For example, the first invention discussed
in this section will be a portable electronic component that can receive two (or more signals) one being voice and the
others being remaining audio, combine and adjust those signals, and retransmit them to, for example, an inconspicuous
headset, ALD, hearing aid, earphone, cochlear implant, eyewear or headwear incorporating speakers. The single knob
invention discussed in the previous section and detailed in FIG 9 could be included in this portable component to provide
the end user with a single adjustment capability in a portable format. However, each of the respective combinations of
technologies will not, by way of example, be discussed in detail but rather inferred from the description of the VRA
methodologies described above by way of example (operating on two signals) combined with VRA hardware that receives
two signals.
Portable Voice-to-Remaining Audio (PVRA) Devices
[0052] As used herein, PEA refers to portable VRA devices used in a variety of environments in conjunction with
standard PLD’s (personal listening devices such as, by way of examples, headsets, hearing aids, cochlear implants and
11
EP 1 266 469 B1
5
10
15
20
25
30
35
40
45
50
eyewear or headwear incorporating speakers, etc.). The PVRA device is capable of receiving wireless (or wired) transmissions by way of example from a source that is providing at least two signals, one of which is pure or mostly pure
dialog and the other is remaining audio. (More channels may also be included for further spatial positioning capability
as described in the previous section). The transmission 58 in FIG 14 may be standardized to a certain bandwidth and
low power so that the PVRA device 59 can be used in a variety of environments. This bandwidth might be 900 MHz for
radio transmission or may be standardized as a line of sight infrared type of transmission. Once providers agree on a
standard wireless transmission format, venues such as churches and movies may transmit the voice and remaining
audio into the audience. The PVRA can be a universal player, designed to receive 60 these signals, adjust the voice
61,68 separately from the remaining audio 62,69, combine them to form a total content program 64,70, and retransmit
them 63 via a wired or wireless connection to the PLD 67 having a receiver 65 and a transducer 66 for converting the
signal into audible sound. The methods for adjustment are described in detail in the previous section and include variable
gain amplifiers or attenuators, and may also include the AutoVRA hold capability. The PVRA box can become a standardized component to work with many existing PLD’s by incorporating, by way of example, a ’" stereo headphone jack
at the transmit stage 63. This embodiment may also require a headphone amplifier in the PVRA device. Additionally, by
way of example, a standardized plug for connecting a wired hearing aid to the PVRA can also be included. In order to
standardize the PVRA with home theater equipment, all that is needed is a stereo transmitter where one channel is the
voice and the other is the remaining audio and the receiver is tuned to receive the two signals.
[0053] As a subset of the above description of the PVRA device, another device disclosed herein is a VRA/AutoVRA
adjustment intended for use in conjunction with airline in-flight movie viewing. Dialog intelligibility can be particularly
poor with in-flight movies where the background noise from the airplane further obscures the movie dialog. By providing
the end user with the ability to adjust the voice and remaining audio separately, improved intelligibility can be realized
during in flight entertainment. It is possible to achieve this goal without interfering with the existing infrastructure for audio
transmission. We must assume that the audio source (VCR, DVD, broadcast, or other audio source) has a dialog track
separate from the remaining audio track or tracks. This can be accomplished in several ways, one of which is by using
the center channel on multi-channel formats or alternatively using a pure voice track that may exist in several audio
compression standards. (Generating the pure voice track is not the focus of this particular invention, but rather the
hardware and implementation used to adjust it and deliver it to the end user). The airline audio delivery infrastructure
includes a stereo (2-channel) path to the end user which is implemented via either 1) an electronic transfer of information
to the armrest of every chair (requiring standard headphones with an airline standard connector) or 2) a waveguide
system where there are tiny speakers in the armrest that when connected to a plastic tube, will transmit sound to the
user’s ears. To implement the retrofit airline VRA armrest adjustment device, it is necessary to assume that the voice
is sent on either the left or right channel while the remaining audio is sent on the other channel, to all the armrests in
the airplane. While this will remove the stereo effect, it is viewed as a minor sacrifice given the potential improvement
in overall program enjoyment. In addition, the airline noise will often mask the subtle stereo effects during in flight
entertainment. Since the electronic versus waveguide methods are so different, two different embodiments are required
and are shown in FIGs 15 and 16. However, if it is desired to make a universal adjustment mechanism, the components
of FIGs 15 and 16 can be combined into a single hardware unit that will perform in conjunction with any airline armrest.
[0054] FIG 15 is an illustration of the version of the airline box that can be used for the electronic connections in the
airline armrests. Plugs 71, for example, may be standardized male plugs, designed to fit into the armrest connecting to
the left and right signal sent from the central location. These signals are then adjusted 72,73, and 74,75 to achieve the
preferred VRA ratio from the voice (left) and remaining audio (right). These adjusted signals are then combined to form
the total audio program and further adjusted by 77 and 78. An amplifier 79 is required to power the transducers in the
PLD. The outputs 80 of the airline VRA box consist of connectors, for example, mating female connects, that exist in
the current armrests so as to allow the airlines to use their existing headsets for the PLD device.
[0055] FIG 16 illustrates the airline VRA adjustment box that can be used in conjunction with the waveguides in the
existing armrests. In order to adjust the levels of each of the two signals (voice and remaining audio), it is necessary to
convert the signals back into electronic format. Two microphones 82,83 and microphone amplifiers 84, 85 are located
in the device which measure the output of the armrest speakers that normally drive the waveguides. The output of the
amplifiers represents an electronic signal that is the voice or remaining audio. These signals are independently adjusted
via 86,87 and 88,89 and a total signal is created 90. This is further adjusted for overall level 91,92 and used to drive
another speaker 93. A waveguide and output jacks 94 identical to those in the armrest represent the output of the airline
PVRA box so that the standard waveguide style headphones can be used in conjunction with this design.
Wireless Transmitter in DVD, Television, etc. to Transmit Two Signals
55
[0056] Another application for multi-user VRA adjustment arises when home theaters as well as home television and
movie viewing are considered. Often times there are multiple viewers in a single room with different listening preferences
for varying voice to remaining audio ratios. This invention allows multiple signals to be made available to PLD’s worn
12
EP 1 266 469 B1
5
10
15
20
25
by multiple users so that each individual can adjust the VRA (and AutoVRA) to his liking. As before, by way of example,
the audio source (television broadcast, DVD player, etc.) contains at least one track that can be considered as pure
voice or mostly voice along with at least one track that is considered to contain remaining audio. (Different audio standards
and formats may support some form of a pure voice track in the future). What is anticipated by this invention is the need
for multiple users in the same listening environment to access the at least two audio signals separately and simultaneously
to allow them to set the ratio for their own personal listening pleasure. In the first and most preferred embodiment (for
space and cost considerations) a multi channel wireless transmitter is located inside the audio reproduction hardware
(such as a television or DVD player) that sends separately the voice and remaining audio signals into the audience so
listeners with VRA capable PLD’s that can receive the transmission can adjust the VRA independently. In addition to
locating a wireless transmitter inside the audio reproduction device, separate audio output jacks can be made available
that provide access to the remaining audio (either mono or multi-channel) and the pure voice (mono or multi-channel)
for the purpose of wired adjustment by a separate PLD without the wireless reception capability. In the case where the
audio jacks are available without a wireless transmitter in the television set, DVD player, or other device, the user can
connect a separate multi-channel wireless transmitter to those output jacks to make the separate audio signals available
to the audience. FIG 17 illustrates these concepts. The signal source 96 will deliver a coded or modulated version of
the entire program which may also include video information if applicable. Signal sources might include, by way of
example, a television broadcast signal (over satellite, cable, terrestrial) and a coded DVD or CD signal read by the laser.
This information must be received and decoded before it becomes electrical signals that represent the audio information.
The decoder will be capable of extracting the pure voice channel(s) (if it exists in the signal source) and keeping it (them)
separate from the remaining audio channel(s). After the receiving/decoding stage 98, there are two options available
for the separate voice and remaining audio signals 1) they can be made available through separate output audio jacks
97 (such as phono type connectors) or they can be sent to a multi-channel wireless transmitter 99 that is also housed
in the playback device 95. The playback device 95 might be a DVD player where the signal source is internal to the
device or a television where the source is external. If the signals are made available as hardware outputs, a separate
external multi-channel transmitter 100 that is capable of interfacing to a receiving PLD can be attached to these outputs
to provide wireless reception and VRA adjustment at the PLD location for all users in the same listening environment.
Additional VRA Decoder for Use With Other Systems That Are Not VRA Compatible
30
35
40
45
50
55
[0057] As an alternative embodiment, it is also possible to have a dedicated decoder specifically designed for VRA
applications. This will allow users who do not currently own a VRA capable decoder/transmitter to still access the VRA
capability without upgrading any specific component (i.e. without losing their current investment). Suppose DVD’s or
broadcast TV or broadcast radio begins transmitting an encoded pure voice channel. Current reception devices will not
only be incapable of receiving and extracting that information but will also be incapable of providing the VRA feature to
any individual, let alone multiple users in the same environment. The device 111 shown in FIG 18 will provide all of these
capabilities for the user who has available to him the signal sources with the voice separate but not the means to extract
and adjust the PVRA ratio. The signal source 101 may be as before (television broadcast; DVD information, etc.) and
can be transmitted directly to the primary reproduction system 102 which would be the TV or DVD player that cannot
provide the VRA adjustment because it was originally not equipped with this feature. The same signal source is "T’d off"
to the external VRA box 111 where the appropriate receiver or decoder has been installed to remove and separate the
voice and audio components from the signal source according to the standard that supports that type of data present
in the signal. Such a decoder might be a Dolby Digital decoder capable of extracting the hearing impaired mode but this
invention is certainly not limited to that particular decoder. Other popular audio formats may someday provide a means
to transmit a pure vocal track in addition to the existing audio. This invention herein 111 will incorporate the appropriate
decoder depending on the desired application. There are then three options for the external device after it has decoded
and separated the vocal track from the remaining audio tracks. First, the device can provide hardware outputs 104, 110
in the form of audio jacks such as RCA type or phono connectors that can provide a line level signal to a transmitter or
wired PLD capable of VRA adjustment; second, the receiver can provide a multi-channel transmitter 105 with the separate
voice and remaining audio signal(s) for wireless transmission to a PLD capable of VRA adjustment; or third provide the
VRA adjustment for a single user application 107 directly on the device where the voice and remaining audio are
separately adjusted, summed together and further adjusted for total volume level 108 before being provided as output
(s) 109 to any other audio playback component. It should be noted that although only a single adjusted output is shown
in FIG 18 (implying a total mono signal output) it is not beyond the scope of this invention (as mentioned in previous
descriptions) to provide several outputs for multi-channel spatial positioning of the audio. For example, if a 5 channel
remaining audio program (left, center, right, left surround, and right surround) were available along with a 5 channel (or
fewer) voice audio program, these could be combined after level adjustment so that the voice could be placed on any
speaker by using separate volume adjustment, or directly to the center speaker (typical). This still provides the end user
with VRA adjustment capability while also providing an immersive surround sound experience.
13
EP 1 266 469 B1
Reception and VRA to be used in Conjunction with Existing Wireless Transmission of Mixed (Video and Audio)
DVD Signals
5
10
15
20
25
[0058] A new class of products is emerging that allows consumers to enjoy DVD video and audio from remote locations.
This wireless technology delivers the audio and video information from a remote DVD player to a television or home
theater. This allows owners of a DVD drive in their personal computer to use that drive to view the content of DVD’s at
locations other than their computer. The VRA adjustment at the PLD location as well as at the centralized home theater
can be used in conjunction with the wireless DVD technology in two ways. First, since the DVD player is already sending
wireless audio signals to the home theater system, the PLD can be equipped with a wireless receiver set to the same
frequency so the PLD can intercept the same transmission. Since the video information is only needed at the viewing
location and not at the PLD, the audio can be selectively decoded at the PLD for reproduction. It is important to note
that the receiving process of the wireless signal will be followed by the decoding process to extract the voice and remaining
audio from the wireless DVD signal. Following that adjustment, recombination and transduction of the signal to audible
sound occurs, with full adjustment of the voice level, remaining audio level, total level, and any AutoVRA features also
included. All hardware mentioned above will be located in the personal listening device so each user can adjust the VRA
and level to his/her liking; such a system is depicted in FIG 4 where the transmitter is sending the entire DVD signal but
the PLD decoder is designed only to extract the audio from the incoming bitstream.
[0059] There is a second embodiment of the VRA used in conjunction with wireless DVD transmission that may result
in a lowered overall cost but an increase in the number of required components. As before, the total DVD signal is
transmitted from the DVD player to the playback location, a home theater for example. The receiver of the total DVD
signal at the centralized home theater location can then retransmit only the voice and remaining audio to achieve an
embodiment similar to that which is shown in FIG 4 but different in that the transceiver is receiving the wireless signal
from the wireless DVD transmitter at the centralized location. This allows multiple PLD’s in the same location to be
equipped with only a wireless receiver and not a digital decoder. The decoding process is performed centrally at the
home theater location with the video being sent to the viewing device and the audio (having been received from the
DVD player location and decoded) being retransmitted from the receiver, decoder, transmitter, to the PLD’s receiving
the voice and remaining audio.
VRA Knob on Headset Cup
30
35
40
45
[0060] The next class of inventions discussed focuses on incorporating the VRA adjustment features into specific
hardware designed for VRA applications, namely personal listening devices with PVRA capability. There are three
classes of PLD’s that are the focus of this embodiment and they are the headset, hearing aid , assistive listening device,
cochlear implants, eyewear, headwear incorporating speakers utilizing wired or wireless technology. In general an
assistive listening device utilizes a headset in conjunction with a type of microphone or wireless transmitter, depending
on the purpose of the product. In some sense, the wireless VRA system itself shown in FIG 4 might be considered as
an assistive listening device. But in general the VRA technology of receiving, separating, adjusting, recombining and
delivering can be used in products besides ALD’s. A wireless headset, headphones or earbuds might include volume
controls directly on the side of the headset itself. FIG 19 illustrates such an embodiment where all the necessary hardware
is located in the headset 112 itself and the necessary adjustments for VRA control are easily available to the end user
on the external housing of the earcup. First the wireless receiver 113 receives the multiple audio signals transmitted
from the source location after decoding occurs (if required). The demodulator 114 basebands the audio signals resulting
in a vocal track(s) and a remaining audio track(s) that are then manually adjustable 115, summed 117, and Further
adjusted for level 116, amplified by the headphone amplifier 118 and reproduced through the headphone speaker 119.
The volume snobs 115, 116 can be located on the external portion of the headset and can be accompanied by a balance
adjustment if multi-channel (stereo) reproduction is used. If multiple channels of audio are transmitted and received at
the PLD, they can be adjusted and combined to form the needed stereo image or spatial projection by incorporating the
hardware and software shown in FIGs 5 and/or 8.
50
Remote Control Adjustment of VRA in a Hearing Aid
55
[0061] In addition to PVRA adjustment carried out on a headphone device that is physically large enough to accommodate the aforementioned hardware, it may also be desirable to allow PVRA adjustment in smaller PLD’s including
hearing aids or smaller headphone devices. These smaller PLD’s may not be able to accommodate all of the needed
hardware for adjusting and manually controlling the levels of the two or more signals. In this case it may be more desirable
to use a device such as that described by FIG 13. With this embodiment used with a small PLD, only the wireless receiver
in the PLD is needed. Surface mount technology and miniaturization of electronic components facilitates incorporating
a low power wireless receiver into small spaces. (Note that less antenna amplifier power is required because the trans-
14
EP 1 266 469 B1
5
10
15
20
25
30
mission of the adjusted signal only needs about a five foot range because the handheld adjustment mechanism is
receiving the split signals from the source transmission location). The remote transceiver described by FIG 13 can also
be constructed in the form of a remote controller for the PLD. Therefore the remote controller controls the volume of the
voice and the remaining audio and the total volume of the PLD as well as acting as a transmitter to the PLD and a
receiver from the source location. Future technology may permit the entire set of electronic needed for adjustment of
the VRA, by way of example, in the hearing aid (or miniaturized PLD itself). It still may be necessary to use a remote
controller to actually control the volume levels since it is desirable to keep bearing aids as inconspicuous as possible.
In this case a remote controller may be desirable for aesthetic reasons.
Embodiments for VRA Headphones in Movie Theater Environment
[0062] Movie theaters offer yet another opportunity for individual adjustment of the VRA, provided the headset design
is equipped with the technology disclosed herein. Individuals can take advantage of the ability to control the voice level
independently of the remaining audio while enjoying the surround sound or large screen of a movie theater. It is desirable
to have full control over the remaining audio and voice in a movie theater assistive listening device or headset, a
circumaural headset with sufficient passive and or active control is needed. The passive noise control via circumaural
cushions, double cup designs and damping material is effective at blocking ambient frequencies down to approximately
500 Hz. It may also be desirable to incorporate active noise control into such a headset for further reduction of lower
frequencies that are not effectively controlled by passive measures. Such a theater headset can be designed as discussed
in previous headset embodiments A second alternative for slightly reduced functionality is available if control over only
the vocals is required. Many movies are mastered such that the voice levels are too low to provide good intelligibility
during loud segments of the remaining audio. It may be desirable to raise the dialog only level during these segments.
The reception, volume control, and reproduction of the dialog only signal can be accomplished using a non-circumaural
headset that allows ambient sound to reach the ears. Allowing the ambient sound to reach the users ears allows spatial
cues from multi-channel surround sound to still arrive at the ears and be heard but the dialog can be adjusted for improved
intelligibility.
[0063] It should be recognized that the adjustment of the voice to remaining audio (PVRA) ratio (a component of the
mixing process at the production end of professional recordings) may interfere with some individual’s concepts of artistic
freedom. Audio engineers for instance focus heavily on obtaining the right mix of sounds to generate the desired effect
in music, film and television. Therefore it is necessary to include a means for either transmitting the original (unaffected)
production level mixture of the total program or providing a means to easily arrive at that mixture. This allows the end
user to choose between the ability to select the production mixture (the way that the producers designed the mix) or to
adjust the VRA ratio themselves. There are at least four possibilities for accomplishing this goal and they are presented
next.
35
Embodiments for Selection of VRA Versus Artistic Mix (Original Audio Mix)
Method 1
40
45
50
55
[0064] FIG 20 presents the first two options for retaining delivery of the production mix (the mix designed as the
producers intended it) to the end user in addition to providing the PVRA adjustment capability. The producer begins with
separate tracks 120 of all the elements that make an entire program and combine them 121 together to form a mono or
multi-channel program 122 that is recorded or broadcast 123 toward the end user. In addition to the production mix, the
voice (time aligned, delayed, and processed 124 with the same processing as 121) that is used to create the production
mix should be kept separate from the production mix through the recording and broadcast stages 123. Typically a signal
is broadcast at a single frequency with a certain bandwidth therefore it is often represented as a single signal as shown
in FIG 20 as the input to 126. (Even though this is shown as a single signal, multiple signals are contained in the
modulated/coded signal). Decoder/playback device 125 decodes or demodulates the recording or transmission to provide
the original production mix 126 in addition to the dialog only track 127 that was produced and recorded in conjunction
with the production level mix. The dialog signal is passed through a switch 129 that can disconnect the dialog signal
from the output 133 altogether. When this switch is opened the production mix in its original form is available for playback
through the reproduction hardware 133 in any multi-channel configuration 131 that it was originally produced in, and the
volume knob 128 acts as a total volume control for the entire program. However, when the switch 129 is closed, the
dialog is passed through the volume adjustment 130 and added 132 to the production mix (if it is a multi-channel format
it will typically be added to the center or equally to the left and right channels). This allows the end user to adjust the
level of the total program 128 with respect to the level of the dialog adjusted by the dialog volume knob 130. If the dialog
volume knob 130 is turned down completely, the production mix is again available. If the dialog in the production mix
and the separate dialog track are recorded or broadcast simultaneously (i.e. time aligned) there will be no delay between
15
EP 1 266 469 B1
5
the two tracks; therefore, as the volume of the dialogue is increased in combination with the production mix, it begins to
raise the voice to remaining audio ratio as heard through the reproduction system 133. Time alignment may occur by
processing the dialog signal with all of the same delay causing elements (such as reverberation or filtering) that the
dialog in the production mix experiences. This will ensure that both the dialog in the production mix and the separate
dialog track line up identically in time. It should be noted that this particular embodiment does not permit achieving VRA
ratios that are below the production mix VRA ratio. If this is a desirable feature, the next two inventions will accomplish
that goal.
Method 2
10
15
20
[0065] As before it is assumed that both the production mix and the dialog are available from the broadcast or recording
and that both dialog signals (one in production mix and one pure) are time aligned with respect to each other. FIG 21
shows an alternate configuration for following the decoding of the production mix and the dialog channel that will allow
negative VRA ratios (i.e. for the user to lower the voice level and raise the remaining audio level if desired). The decoded
dialog signal 135 is subtracted 136 from the production mix 134 resulting in a purely remaining audio 137 mix. At this
point it is possible to adjust the remaining audio level 139 independently of the dialog level 138 before combining 140
to form the total user adjusted program. The production mix is made available as one pole of the switch 141 that alters
the selection between the user adjusted mix and the production mix. This configuration allows for the lowering of the
dialog signal to the point that only remaining audio is left. The next invention also provides a fully user adjustable VRA
ratio with the availability of a production level mix by recording production information before the encoding process.
Method 3
25
30
35
40
[0066] FIG 22 illustrates a production process that transmits information about the way a program was mixed to ensure
that the production mix is available at the consumer level as well as providing the capability to adjust the VRA ratio
universally. The original program elements 143 are mixed 144 to form a multi-channel or mono program that is continually
auditioned until the levels of all inputs 143 are correct for accurate reproduction as determined by the audio engineer
producing the program. The outputs of the production mix are then designed to be the dialog signal by itself 149, the
level that the producer determined was appropriate for the dialog signal 148, all combined audio that is not considered
to be critical dialog 146, and the overall level of that remaining audio. The respective level information is digitally encoded
150,151 as metadata into the actual audio signals themselves 152,153. These encoded signals with their respective
level information are transmitted, broadcast, or recorded 154. The playback device is equipped with a decoder 155 that
extracts the audio information as well as the header information 156 containing the original production mix levels (possibly
with respect to some digital full scale value depending on the resolution of the recording). The remaining audio level
and dialog level 157 can be provided to the gain adjustment circuits 158 and 159 so that the levels are correct and the
original production mix is realized upon combination 162 of the automatically adjusted dialog and remaining audio. This
can only occur if the switch (or switches) 160 and 161 connect the output of the header information block 156 to the
gain adjustment circuits. If the switch (or switches) is thrown to the right position connecting the user mix of PVRA 164
then the user selects the remaining audio level 158 and the dialog level 159. Although the paths shown indicate single
signals in many cases, it is well within the scope of this invention to view each signal path as a vector where multiple
signals are present, such as left, right, left surround, right surround, and center channels for spatial positioning, whose
levels are all controlled by the remaining audio level control of 158 in FIG 22. Likewise the pure voice track may also
consist of multi-channel information which can be adjusted by the control 159.
45
VRA In Conjunction With Existing Audio Reproduction Hardware
50
[0067] The above discussion focused on providing the end user with the ability to adjust the VRA of electronically
reproduced media (either broadcast playback or recording playback) on personal listening devices PLD’s so that individual
listeners in the same environment can enjoy different VRA ratios simultaneously. Further inventions are made when the
personal listening devices described above are extended to include the following electronics:
55
• Cellular telephones
• Wearable Computers
• Personal data assistants
• MP3 playback devices
• Personal audio players that use magnetic storage media to store the music
[0068]
These devices can be used for personal level playback of music or audio containing dialog and remaining audio
16
EP 1 266 469 B1
that might obscure the dialog. The embodiments discussed in the previous sections can be applied to the four devices
listed above to provide VRA adjustment for the playback of audio that has been prerecorded or produced with the dialog
separate from the remaining audio regardless of the coding format.
5
10
15
20
25
30
VRA on Personal Computers
[0069] The internet has become a popular avenue for the distribution of digital quality media. If consumers are receiving
music, movies, or other audio in real time (or delayed) over a data connection to their personal computer, they can
implement the VRA and AutoVRA features in a variety of ways. The control of the gains applied to the voice and remaining
audio can be entirely software driven through a graphical user interface. The voice and remaining audio are decoded
separately either through hardware or software, depending on the personal computer system. The signals can be
amplified in the digital domain by incorporating several lines of source code into the decoding program that indicated
that each of the two signals (voice and remaining audio) are to be multiplied by user adjustable constants. These constants
are controlled by the user before the signals are summed together, through a software user interface that allows adjustment of the two constants being multiplied by the decoded voice and remaining audio signals. In addition a further volume
adjustment may be applied to the overall combined level to allow the user to control the total volume of the program
prior to playback. Alternatively, if it is desirable to provide the user with more "user friendly" adjustment capability, the
VRA knob(s) (see methodologies discussion for possible control knob implementations) can be made available as actual
hardware on computer speakers, keyboards, mice, or monitors, all components of the PC system. If the VRA adjustments
are made available via hardware (as in a knob on the monitor) and the signal decoding is implemented via software, a
handshake protocol is needed to ensure that the adjustments made using the hardware knob are translated to software
gain change and multiplication operations. FIG 23 illustrates one possible option for interfacing the hardware and software.
The movement and location of the hardware knob(s) 165 generating a voltage output 167 relative to position and a full
scale voltage 166 must be sampled via an A/D 168 to convert the position information into a number that represents the
volume relative to full scale 166. One possible hardware implementation of such a knob is a rotary voltage divider which
is a potentiometer with a full scale voltage at one terminal, a ground at the other, and the wiper provides the divided
voltage as a function of rotary position. The output of the A/D is then polled periodically by the software 169 controlling
the signal flow to read the digitized number the user has selected. These numbers (one for voice and one for remaining
audio) are multiplied 170 by the respective signals and those outputs are summed 172 to form the total VRA adjusted
program. There are several other combinations of using software and hardware for controlling the VRA on a personal
computer playback device. The hardware only version requires that the signals are decoded and provided as outputs
of a sound card or from the PC motherboard itself as in the device shown in FIG 18. This implementation allows volume
adjustment of two signals using hardware gain or attenuation without the need for a graphical user interface. Each
configuration has its own advantages:
35
40
• All hardware: inexpensive, knobs readily available and easy to adjust with high visibility
• All Software: no hardware upgrade required to implement VRA, greater flexibility for adjustment options and
features, GUI controls can be customized by end user
• Software/Hardware: high visibility of adjustment mechanism fewer D/A converters required because the sum is
output rather than the remaining audio and the dialog
AutoVRA on Personal Computers
45
50
55
[0070] It should also be noted that while the personal computer is considered to be a personal listening device, a
headset (also a PLD) is often used in conjunction with the PC. Therefore the PC can act as the signal source for other
PLD’s. So previous embodiments discussing the television and DVD can also include signal sources on the PC such
as, for example, a headphone jack with VRA controls connected to it. In addition, while the focus has been on the VRA
adjustments alone, the AutoVRA hold feature can also be implemented on the PC. In fact, due to the computations
required for computing of the real time signal properties, a central processing unit (CPU) or digital signal processor
(DSP) capable of handling the large number of operations per second is required. Therefore the PC implementation of
VRA facilitates the implementation of the AutoVRA features by virtue of the available computing power. All of the Auto
VRA features and user controls discussed in the previous sections can be implemented using any of the hardware/
software interface options discussed in the previous paragraph. However, a purely hardware implementation may be
difficult given the required computing power and real time operations required to continually limit the signal levels.
Therefore an implementation where either all software is used, or a combination of hardware controls and software
mathematical operations will be preferred for the AutoVRA implementation.
[0071] FIG 24 illustrates a user in a multi-channel listening environment. Such an environment might have 2, 3, 4, 5
or more speakers although FIG 24 shows the situation with 5 speakers, left, center, right, left surround, and right surround.
17
EP 1 266 469 B1
5
Each speaker has a frequency response path from itself to each ear, creating a total of 10 paths. If the electrical signals
that drive those speakers are filtered with estimates of those paths before being combined to form a left ear and right
ear signal, a more realistic sound stage can be generated for headphone listening. This is clearly a desirable result that
allows individuals to experience the multi-channel surround sound experience without purchasing a multi-channel amplifier/speaker system. However, an invention is needed to provide the end user with the ability to adjust the VRA in
conjunction with the multi-channel spatial processing in order to achieve the desired intelligibility while experiencing the
surround sound stage using headphones.
VRA and AutoVRA on Multi-Channel Processed Headphones - Used in Conjunction with Dolby Headphones
10
15
20
25
30
35
[0072] The design described by FIG 25 assumes that the dialog track is provided separately in addition to the total
audio program. Therefore negative VRA ratios cannot be achieved with this particular embodiment. However, if two
multi-channel programs were simultaneously decoded (remaining audio and voice) then all possible VRA ratios would
be achievable by completely lowering or raising the remaining audio and/or voice. FIG 25 shows the total multi-channel
audio program (remaining audio plus voice) 173 being passed through an overall volume control 174 that will ultimately
act as the remaining audio control. The spatial processing in 175 refers to the prior art where each of the signals are
filtered and/or delayed to produce the desired multi-channel effect. However, before they are combined together to form
the two channel headphone program, the altered dialog is added 176 to the appropriate channel(s). As a component of
the spatial processing, information is retained as to which speaker the dialog track should appear on and that information
is relayed to a decision making step 178. For most productions, speech resides either on the center channel or on the
right and left channel simultaneously (phantom center). For this description it will be assumed that the dialog will be
routed to the center channel only. When the center channel is indicated by 178 as the proper location for the dialog ,
the dialog processing 181 copies the spatial processing information (filter, delay time, etc.) for the channel that the dialog
will be placed on. The dialog 179 is first adjusted for level (the voice level adjustment) 180 before being processed by
the (center) channel processing 181 that was copied from the multi-channel spatial process 175. After the dialog has
been processed 181, the speaker to which it will be applied is again referred to 178 and then the signal is added into
the appropriate channel 176. If for example the voice was meant to be on the left and right channels, the 178 block would
deliver the appropriately processed voice to each of those channels via 176 and not to any of the others. The remaining
audio signal can be raised and lowered using 174 and the dialog can be raised and lowered using 180, providing positive
VRA adjustments because the voice is included in the total program 173. The VRA adjusted, spatially processed, multichannel program is then further processed (prior art) by 177 to generate a two channel headphone program. This two
channel program is further adjusted for volume 182 and delivered to the headphone speakers 183. A recent implementation of the prior art discussed as elements 175 and 177 in FIG 25 is the Dolby Headphones. The VRA invention above
is designed to work in conjunction with the Dolby headphones as well as any other multi-channel processing headphone
that derives two headphone channels from multiple spatial channels.
VRA on "Stored Program" Playback Devices
40
45
[0073] Nonlinear television viewing offers a significant advantage to viewers in that programs can be recorded and
stored for later viewing. The newest technology, hard drive television recorders, (some companies include TiVo, Replay,
and Microsoft) differs from conventional VCR technology in that the recording methods are much more user friendly,
separate recorded channels can be set up and playback is almost instantaneous. The future of television viewing will
most likely favor non-live, nonlinear viewing as opposed to live viewing at otherwise inconvenient times. It is therefore
important that the VRA adjustment capability is able to work in conjunction with these replay devices. The adjustment
hardware can be located as part of a remote control, a video screen GUI or physical hardware on the replay mechanism.
The recording process simply needs to record all information as it has been transmitted, including the separate voice
track. Then the playback and adjustment mechanisms will incorporate the same components as discussed in previous
embodiments of the VRA and AutoVRA adjustment hardware.
50
Re-Production
55
[0074] Presently, the production of audio (for broadcasts, movies, music, etc.) can be considered as a multi-step
procedure as described by FIG 26 considered to be prior art. At the production level, there exists several types of sounds
that will be recorded to form the entire audio program. These sounds 184 can be divided into several types that include
sound effects, music, voice and other sounds. Typically, for understanding of the plot or lyrics of a program or audio,
the voice segment of production level sounds is considered critical. All of these sounds are first recorded separately
185. Some sounds are not recorded via a microphone such as sound effects, which are often dubbed from prerecorded
sets of effects tracks. It is not always necessary to record all sounds so that they are synchronized on the master.
18
EP 1 266 469 B1
5
10
15
20
25
Nonlinear recording and playback allow the production 186 to line up sounds with video and with other sounds over the
course of several playback/record cycles, often accomplished using software driving recording and editing. The production
process 186 synchronizes all sounds with each other (and the video if present), mixes them with appropriate ratios, and
applies them to the surround channel (if applicable) where the audio engineer feels they best belong 187. For example,
actors that are talking on screen will often have their voice placed on the center channel closest to the screen location.
In order to transmit or record large amounts of information, an encoding or compression process is often required 188.
This is not always the case if analog recording and playback is used, but multi-channel digital playback will often have
some sort of encoding (for copy protection), and/or compression (lossy or lossless) depending on the recording media
or broadcast requirements. The encoded and/or compressed program is then recorded or broadcast 189 and played
back or received 190 at the end user location. It is clear from the process in FIG 26 that the end user cannot have any
adjustment over the level of the dialog with respect to the remaining audio because the producer has complete control
at 186. Once mixed, it is nearly impossible to extract the voice from the remaining audio and return it to its original
recorded quality 184.
[0075] However, it is possible for producers to return to the master program 185 where all recorded elements are
separate from each other, and obtain the voice and the remaining audio separately. This invention focuses on providing
the means and ability to obtain the dialog and remaining audio and RE-record them separately so that the end user can
have the ability to adjust the relative levels to suit his/her hearing needs. FIG 27 shows one possible way to accomplish
this goal. The master recording 191 of nearly all movies and multi-track audio programs exist as archived media, completely separate. In addition, information about the levels and locations of each of the tracks as produced in the original
recording also exist along with those master recordings. This information is used in two separate multi-channel mixers
192 (for all remaining audio) and 193 (for voice only) to prepare two separate multi-channel programs that may be 6
channels as shown in 187 or more or fewer depending on the desired effects. Therefore the production information from
the original recording process 186 is used to generate the same exact effects for the remaining audio and voice separately
and simultaneously. Each of these multi-channel programs can thus be combined at the outputs of 192 and 193 to form
the same total audio program 187 from the original mix, however they are completely separate. Each of these two multichannel programs is then encoded and/or compressed as in 188 but using 194 and 195 completely separately. These
two encoded channels can then be encoded further or multiplexed 196 to produce a single signal that is broadcast or
recorded 197. Upon playback, this signal is decoded to form at least two multi-channel signals that can be VRA adjusted
199 using the hardware and embodiments discussed throughout the previous sections of this document.
30
Remastering Multi-Channel Audio to a "VRA Friendly" Form, Remastering Stereo to a "VRA Friendly" Form
35
40
45
50
55
[0076] FIG 27 represents the most general method for transmitting the voice and remaining audio separately to the
end user so that all spatial information for both the remaining audio and the voice is retained during playback. FIG 28
illustrates the opposite end of this spectrum in that no spatial information is retained but the end user has the ability to
adjust the voice to remaining audio levels independently. The master recording exists as in 185 and 191 as 200 in FIG
28. All components that are considered as remaining audio (non-voice elements of the total audio recording) are mixed
for level only by 201. This means that the relative contribution of one remaining audio component with respect to another
remaining audio component is decided but no spatial locating is performed since the output of 201 is a single signal
rather than a multi-channel signal. In addition, the voice level is adjusted 202 so that when combined without any
adjustment, the level of the voice with respect to the remaining audio is exactly as the producer desires. The adjusted
voice and adjusted remaining audio programs are then recorded onto a stereo media 203 which can include, for example,
CD’s, DVD’s, analog tapes, etc but may also include broadcast audio in stereo. At this point a recorded or broadcast
program in stereo exists where the voice and remaining audio are kept separate on the left and right or right and left
tracks respectively. It is noted that some convention is needed to ensure that all products recognize that either the right
or left channel contains the voice and the remaining audio is present on the other channel. This can be chosen based
on a consensus from consumer electronics manufacturers and does not limit the scope of this invention to choose one
over the other. The stereo playback device 204 will then provide two signals as outputs (left and right) one of which will
have only voice and the other only remaining audio. To experience the whole program simultaneously with VRA adjustment, these two signals are passed through two variable gain amplifiers 205 and 206 where each of the levels are
controlled, and then they are summed to form a total program. This total program can then be further adjusted for level
207. This fully adjusted total program is then split if it is to be reproduced 208 by a stereo playback system. The advantage
of this configuration is that it is possible to accomplish the VRA media production and playback with today’s consumer
electronics and master recordings. Only a minimal amount of external hardware (205,206,207) is needed to enjoy VRA
adjustment. The disadvantage is that the stereo image will be lost. However, many stereo effects are so subtle and
playback systems are so lo-fidelity, that most consumers will much prefer the VRA adjustment instead of the stereo image.
[0077] The two embodiments discussed above in FIGs 27 and 28 represent the most complex and most simple
possibilities for providing VRA to the end user. Any embodiment with any number of recorded, produced, or playback
19
EP 1 266 469 B1
channel can be envisioned from the descriptions in the above specification which is not limited to the two specific
embodiments shown in FIGs 27 and 28.
5
Claims
1.
A movie theatre system that outputs a motion picture to each of a plurality of listeners in a manner that permits
individualized audio volume adjustment for the plurality of listeners located in an audience of the movie theatre
environment, comprising:
10
a video device that displays a video portion of the motion picture to the audience;
a speaker system that emits a corresponding audio portion of the motion picture to the audience;
one or more elements of a storage medium that stores the video portion of the motion picture, the corresponding
audio portion of the motion picture, a first audio signal being substantially a vocal signal and having one or more
channels of spatial information, and a second audio signal including substantially audio content other than the
audio content of the first audio signal and having one or more channels of spatial information;
a transmitter that transmits the first and second audio signals from the one or more elements of the storage
medium in synchronization with the transmission of the video portion and the corresponding audio portion of
the motion picture, to a plurality of personal listening devices, wherein each of the plurality of personal listeningdevices is associated with each of the plurality of listeners in the theatre audience, and each of the personal
listening devices comprises:
15
20
a first receiver that receives the transmitted first audio signal independently of the speaker system;
a second receiver that receives transmitted second audio signal independently of the speaker system;
a first adjustment device that adjusts the first audio signal based on inputs from one of the listeners;
a second adjustment device that adjusts the second audio signal based on inputs from one of the listeners;
an audio signal combining device that combines the first audio signal’s spatial information channels with
corresponding spatial information channels of the second audio signal, to produce a combined audio signal;
and
one or more transducers that receives the combined audio signal converts the combined audio signal to
sounds, and outputs the sounds so the sounds are heard by each of the plurality of listeners associated
with each of the plurality of personal listening devices in the theatre audience;
25
30
wherein the system permits each of the plurality of listeners to adjust the first and second audio signals independently
of other ones of the plurality of listeners in the audience.
35
2.
The system as in claim 1, wherein the first and second adjustment devices are volume controllable active amplifiers.
3.
The system as in claim 1, wherein the first and second adjustment devices are volume controllable passive attenuators.
4.
The system as in claim 1, wherein the first adjustment and the second adjustment devices are combined into a
single volume control device.
5.
The system as in claim 1, wherein the first receiver receives a first digital bit stream that includes the first audio
signal and the second receiver receives a second digital bit stream that includes the second audio signal, the system
further comprising:
40
45
a first decoder that decodes the first digital bit stream; and
a second decoder that decodes the second digital bit stream.
50
55
6.
The system as in claim 1, wherein the first receiver and the second receiver are integrated into a single receiver.
7.
The system as in claim 6, wherein the single receiver receives a single digital bit stream that includes the first audio
signal and the second audio signal, the system further comprising:
a single decoder that decodes the single digital bit stream.
20
EP 1 266 469 B1
8.
The system as in claim 1, wherein the first and second receivers receive wireless transmissions.
9.
The system as in claim 1, further comprising at least one of a wave guide and an amplifier to enhance the combined
audio signal.
5
10. The system as in claim 1, further comprising a processor that computes a ratio of a volume of the first audio signal
to a volume of the second audio signal, wherein at least one of the first adjustment device, the second adjustment
device and the audio signal combining device automatically adjusts for and maintains the ratio of the first audio
signal volume to the second audio signal volume.
10
11. The system as in claim 10, wherein the processor computes a standard deviation of the first or second audio signal
over a finite time period.
12. The system as in claim 10, wherein the ratio is stored in a memory for use by the audio signal combining device.
15
13. The system as in claim 10, wherein the first and second adjustment devices are controlled by one of the listeners
through a graphical user interface.
20
14. The system as in claim 10, wherein the first adjustment device and the second adjustment device are coupled to a
single user controllable volume adjustment device, the single user controllable volume adjustment device operating
to adjust a volume of the combined audio signal so that a movement of the single user controllable volume adjustment
device in a first direction increases the volume of the first audio signal level and decreases the volume of the second
audio signal, and a movement in a second direction increases the volume of the second audio signal and decreases
the volume of the first audio signal.
25
15. A computer, comprising:
the system as in claim 1, wherein the first and second adjustment devices are operated using at least one of
computer software and hardware.
30
16. The system as in claim 1, further comprising a third adjustment device that adjusts a volume of the combined audio
signal.
35
17. The system as in claim 16, wherein the third adjustment device includes a user-controllable switch that instantaneously achieves an original production mixture of the first audio signal and the second audio signal.
18. The system as in claim 16, wherein the third adjustment device includes a surround processor that converts the
combined audio signal to an audio signal having a predetermined number of spatial information channels.
40
45
50
55
19. The system as in claim 18, wherein the surround processor converts the combined audio signal having left, center,
right, right surround and left surround spatial information channels to a signal having only left and right spatial
information channels.
20. A method of operating a movie theatre system that outputs a motion picture to each of a plurality of listeners in a
manager that permits individualized audio volume adjustment for the plurality of listeners located in an audience of
the movie theatre environment, comprising:
displaying a video portion of the motion picture to the audience;
outputting a corresponding audio portion of the motion picture to the audience using a speaker system;
providing one or more elements of a storage medium that stores the video portion of the motion picture, the
corresponding audio portion of the motion picture, a first audio signal being substantially a vocal signal and
having one or more channels of spatial information, and a second audio signal including audio content other
than the audio content of the first audio signal and having one or more channels of spatial information;
transmitting the first and second audio signals from the one or more elements of the storage medium in synchronization with the transmission of the video portion and the corresponding audio portion of the motion picture;
receiving the transmitted first and second audio signals using a plurality of personal listening devices that are
independent of the speaker system, wherein each of the plurality of personal listening devices is associated
with each of the plurality of listeners in the theatre audience,
21
EP 1 266 469 B1
5
adjusting the first audio signal based on inputs from one of the listeners;
adjusting the second audio signal based on inputs from one of the listeners;
combining the first audio signal’s spatial information channels with corresponding spatial information channels
of the second audio signal, to produce a combined audio signal;
converting the combined audio signal to sounds using one or more transducers; and
outputting the sounds so the sounds are heard by each of the plurality of listeners;
wherein the method permits each of the plurality of listeners to adjust the first and second audio signals independently
of other ones of the plurality of listeners.
10
21. The system as in claim 1 or the method as in claim 20, wherein at least one of the first audio signal and the second
audio signal is a monaural signal.
15
22. The system as in claim 1 or the method as in claim 20, wherein at least one of the first audio signal and the second
audio signal is a stereo signal, the stereo signal having left and right spatial information channels.
23. The system as in claim 1 or the method as in claim 20, wherein at least one of the first audio signal and the second
audio signal is a surround signal, the surround signal having spatial information channels that include left, center,
right, and one or more surround channels.
20
24. The system as in claim 1 or the method as in claim 20, wherein at least one of the first audio signal and the second
audio signal is a multiple channel surround signal, the multiple channel surround signal having spatial information
channels including left, left center, center, right center, right, and one or more surround channels.
25
25. The method as in claim 20, wherein the adjusting steps are performed using volume controllable active amplifiers.
26. The method as in claim 20, wherein the adjusting steps are performed using volume controllable passive attenuators.
27. The method as in claim 20, wherein the adjusting steps are performed using a single volume control device.
30
28. The system as in claim 4 or the method as in claim 27, wherein when the volume control device is moved in one
direction, a volume of the first audio signal increases while a volume of the second signal decreases, and when the
volume control device is moved in the other direction, the volume of the second signal increases while the volume
of the first signal decreases.
35
29. The method as in claim 20, wherein the receiving step includes receiving a first digital bit stream that includes the
first audio signal and a second digital bit stream that includes the second audio signal, the method further comprising:
40
decoding the first digital bit stream; and
decoding the second digital bit stream.
30. The method as in claim 20, wherein the receiving step is performed using a single receiver.
45
31. The method as in claim 30, wherein the receiving step receives a single digital bit stream that includes the first audio
signal and the second audio signal, the method further comprising:
decoding the single digital bit stream.
32. The method as in claim 20, wherein the transmissions received in the receiving step are wireless transmissions.
50
33. The system as in claim 1 or the method as in claim 20, wherein the personal listening devices are at least one of
stereo headphones, monaural earphones, hearing aids, and assistive listening devices.
55
34. The system as in claim 1 or the method as in claim 20, wherein the personal listening devices are body-worn
receivers, the body-worn receivers providing the combined audio signal to one or more electro-acoustic transducers.
35. The method as in claim 20, further comprising enhancing the combined audio signal using at least one of a wave
guide and an amplifier.
22
EP 1 266 469 B1
36. The method as in claim 20, further comprising:
computing a ratio of a volume of the first audio signal to a volume of the second audio signal; and
automatically adjusting for and maintaining the ratio of the first audio signal volume to the second audio signal
volume.
5
37. The method as in claim 36, wherein the computing step computes a standard deviation of the first or second audio
signal over a finite time period.
10
38. The method as in claim 36, further comprising storing the ratio in a memory for use by the audio signal combining
device.
39. The method as in claim 36, wherein the adjusting steps are performed by one of the listeners through a graphical
user interface.
15
20
40. The method as in claim 36, wherein the adjusting step is performed using a single user controllable volume adjustment
device, the single user controllable volume adjustment device operating to adjust a volume of the combined audio
signal so that a movement of the single user controllable volume adjustment device in a first direction increases the
volume of the first audio signal level and decreases the volume of the second audio signal, and a movement in a
second direction increases the volume of the second audio signal and decreases the volume of the first audio signal.
41. The system as in claim 1 or the method as in claim 20, wherein the corresponding audio portion of the motion picture
is stored as the first audio signal and the second audio signal.
25
30
42. The method as in claim 20, wherein the adjusting steps are performed using at least one of computer software and
hardware.
43. The system as in claim 1 or the method as in claim 20, wherein the personal listening devices are at least one of a
cellular telephone, a wireless communication device, a body-worn computer, a personal data assistant, a personal
audio playback device, a television and a DVD player.
44. The method as in claim 20, further comprising adjusting a volume of the combined audio signal.
35
45. The method as in claim 44, wherein the adjustment of the combined audio signal includes using a user-controllable
switch that instantaneously achieve an original production mixture of the first audio signal and the second audio signal.
46. The method as in claim 44, wherein the adjustment of the combined audio signal includes converting the combined
audio signal to an audio signal having a predetermined number of spatial information channels.
40
45
47. The method as in claim 46, wherein the converting step includes converting the combined audio signal having left,
center, right, right surround and left surround spatial information channels to a signal having only left and right spatial
information channels.
48. The system as in claim 1 or the method as in claim 20, wherein the second audio signal includes at least a portion
of the first audio signal.
Patentansprüche
50
55
1.
Kinosystem, das einen Film an jeden von mehreren Zuhörern in einer solchen Weise ausgibt, dass den mehreren
sich in einem Publikum der Kinoumgebung befindenden Zuhörern eine individualisierte Lautstärkeeinstellung ermöglicht wird, welches aufweist:
eine Videovorrichtung, die dem Publikum einen Videoteil des Films zeigt,
ein Lautsprechersystem, das einen entsprechenden Audioteil des Films an das Publikum ausgibt,
ein oder mehrere Elemente eines Speichermediums, das den Videoteil und den entsprechenden Audioteil des
Films speichert, wobei ein erstes Audiosignal im Wesentlichen ein Stimmsignal ist und einen oder mehrere
Kanäle räumlicher Informationen aufweist und ein zweites Audiosignal im Wesentlichen Audioinhalt aufweist,
23
EP 1 266 469 B1
der von dem Audioinhalt des ersten Audiosignals verschieden ist und einen oder mehrere Kanäle räumlicher
Informationen aufweist,
einen Sender, der das erste und das zweite Audiosignal von dem einen oder den mehreren Elementen des
Speichermediums synchron mit dem Senden des Videoteils und des entsprechenden Audioteils des Films zu
mehreren persönlichen Abhörvorrichtungen sendet, wobei jede der mehreren persönlichen Abhörvorrichtungen
jedem der mehreren Zuhörer in dem Kinopublikum zugeordnet ist und wobei jede der persönlichen Abhörvorrichtungen aufweist:
5
einen ersten Empfänger, der das gesendete erste Audiosignal unabhängig vom Lautsprechersystem empfängt,
einen zweiten Empfänger, der das gesendete zweite Audiosignal unabhängig vom Lautsprechersystem
empfängt,
eine erste Einstellvorrichtung, die das erste Audiosignal auf der Grundlage von Eingaben von einem der
Zuhörer einstellt,
eine zweite Einstellvorrichtung, die das zweite Audiosignal auf der Grundlage von Eingaben von einem der
Zuhörer einstellt,
eine Audiosignal-Kombiniervorrichtung, die die Kanäle räumlicher Informationen des ersten Audiosignals
mit entsprechenden Kanälen räumlicher Informationen des zweiten Audiosignals kombiniert, um ein kombiniertes Audiosignal zu erzeugen, und
einen oder mehrere Wandler, die das kombinierte Audiosignal empfangen, das kombinierte Audiosignal in
Töne umwandeln und die Töne ausgeben, so dass die Töne von jedem der mehreren Zuhörer, die jeder
der mehreren persönlichen Abhörvorrichtungen in dem Kinopublikum zugeordnet sind, gehört werden,
10
15
20
wobei es das System jedem der mehreren Zuhörer ermöglicht, das erste und das zweite Audiosignal unabhängig
von anderen der mehreren Zuhörer in dem Publikum einzustellen.
25
30
2.
System nach Anspruch 1, wobei die erste und die zweite Einstellvorrichtung aktive Verstärker mit steuerbarer
Lautstärke sind.
3.
System nach Anspruch 1, wobei die erste und die zweite Einstellvorrichtung passive Abschwächer mit steuerbarer
Lautstärke sind.
4.
System nach Anspruch 1, wobei die erste und die zweite Einstellvorrichtung zu einer einzigen Lautstärkesteuervorrichtung kombiniert sind.
5.
System nach Anspruch 1, wobei der erste Empfänger einen ersten digitalen Bitstrom empfängt, der das erste
Audiosignal aufweist, und der zweite Empfänger einen zweiten digitalen Bitstrom empfängt, der das zweite Audiosignal aufweist, wobei das System weiter aufweist:
35
40
45
einen ersten Decodierer, der den ersten digitalen Bitstrom decodiert, und
einen zweiten Decodierer, der den zweiten digitalen Bitstrom decodiert.
6.
System nach Anspruch 1, wobei der erste und der zweite Empfänger in einen einzigen Empfänger integriert sind.
7.
System nach Anspruch 6, wobei der einzige Empfänger einen einzigen digitalen Bitstrom empfängt, der das erste
und das zweite Audiosignal aufweist, wobei das System weiter aufweist:
einen einzigen Decodierer, der den einzigen digitalen Bitstrom decodiert.
50
55
8.
System nach Anspruch 1, wobei der erste und der zweite Empfänger Drahtlosübertragungen empfangen.
9.
System nach Anspruch 1, welches weiter mindestens einen von einem Wellenleiter und einem Verstärker aufweist,
um das kombinierte Audiosignal zu verbessern.
10. System nach Anspruch 1, welches weiter einen Prozessor aufweist, der ein Verhältnis zwischen einer Lautstärke
des ersten Audiosignals und einer Lautstärke des zweiten Audiosignals berechnet, wobei mindestens eine von der
ersten Einstellvorrichtung, der zweiten Einstellvorrichtung und der Audiosignal-Kombiniervorrichtung das Verhältnis
zwischen der Lautstärke des ersten Audiosignals und der Lautstärke des zweiten Audiosignals automatisch einstellt
24
EP 1 266 469 B1
und beibehält.
11. System nach Anspruch 10, wobei der Prozessor eine Standardabweichung des ersten oder des zweiten Audiosignals
über einen begrenzten Zeitraum berechnet.
5
12. System nach Anspruch 10, wobei das Verhältnis in einem Speicher für die Verwendung durch die AudiosignalKombiniervorrichtung gespeichert wird.
10
13. System nach Anspruch 10, wobei die erste und die zweite Einstellvorrichtung von einem der Zuhörer durch eine
graphische Benutzerschnittstelle gesteuert werden.
15
14. System nach Anspruch 10, wobei die erste und die zweite Einstellvorrichtung mit einer durch einen einzelnen
Benutzer steuerbaren Lautstärkeeinstellvorrichtung verbunden sind, wobei die durch einen einzelnen Benutzer
steuerbare Lautstärkeeinstellvorrichtung so arbeitet, dass sie eine Lautstärke des kombinierten Audiosignals so
einstellt, dass eine Bewegung der durch einen einzelnen Benutzer steuerbaren Lautstärkeeinstellvorrichtung in eine
erste Richtung die Lautstärke des ersten Audiosignals erhöht und die Lautstärke des zweiten Audiosignals verringert
und eine Bewegung in eine zweite Richtung die Lautstärke des zweiten Audiosignals erhöht und die Lautstärke des
ersten Audiosignals verringert.
20
15. Computer, welcher aufweist:
das System nach Anspruch 1, wobei die erste und die zweite Einstellvorrichtung unter Verwendung mindestens
einer von einer Computersoftware und einer Computerhardware betrieben werden.
25
16. System nach Anspruch 1, welches weiter eine dritte Einstellvorrichtung aufweist, die eine Lautstärke des kombinierten Audiosignals einstellt.
17. System nach Anspruch 16, wobei die dritte Einstellvorrichtung einen benutzersteuerbaren Schalter aufweist, der
momentan eine ursprüngliche Produktionsmischung des ersten Audiosignals und des zweiten Audiosignals erreicht.
30
18. System nach Anspruch 16, wobei die dritte Einstellvorrichtung einen Surround-Prozessor aufweist, der das kombinierte Audiosignal in ein Audiosignal wandelt, das eine vorbestimmte Anzahl von Kanälen räumlicher Informationen
aufweist.
35
40
45
50
55
19. System nach Anspruch 18, wobei der Surround-Prozessor das kombinierte Audiosignal, das Kanäle räumlicher
Informationen für Links, Mitte, Rechts, Rechts-Surround und Links-Surround aufweist, in ein Signal wandelt, das
nur Kanäle räumlicher Informationen für Links und Rechts aufweist.
20. Verfahren zum Betreiben eines Kinosystems, das einen Film an jeden von mehreren Zuhörern in einer solchen
Weise ausgibt, dass den mehreren sich in einem Publikum der Kinoumgebung befindenden Zuhörern eine individualisierte Lautstärkeeinstellung ermöglicht wird, welches folgende Schritte aufweist:
Zeigen eines Videoteils des Films für das Publikum,
Ausgeben eines entsprechenden Audioteils des Films an das Publikum unter Verwendung eines Lautsprechersystems,
Bereitstellen von einem oder mehreren Elementen eines Speichermediums, das den Videoteil und den entsprechenden Audioteil des Films speichert, wobei ein erstes Audiosignal im Wesentlichen ein Stimmsignal ist
und einen oder mehrere Kanäle räumlicher Informationen aufweist und ein zweites Audiosignal Audioinhalt
aufweist, der von dem Audioinhalt des ersten Audiosignals verschieden ist und einen oder mehrere Kanäle
räumlicher Informationen aufweist,
Senden des ersten und des zweiten Audiosignals von dem einen oder den mehreren Elementen des Speichermediums synchron mit dem Senden des Videoteils und des entsprechenden Audioteils des Films,
Empfangen des gesendeten ersten und zweiten Audiosignals unter Verwendung mehrerer persönlicher Abhörvorrichtungen, die von dem Lautsprechersystem unabhängig sind, wobei jede von den mehreren persönlichen
Abhörvorrichtungen jedem der mehreren Zuhörer in dem Kinopublikum zugeordnet ist,
Einstellen des ersten Audiosignals auf der Grundlage von Eingaben von einem der Zuhörer,
Einstellen des zweiten Audiosignals auf der Grundlage von Eingaben von einem der Zuhörer,
Kombinieren der Kanäle räumlicher Informationen des ersten Audiosignals mit entsprechenden Kanälen räum-
25
EP 1 266 469 B1
licher Informationen des zweiten Audiosignals, um ein kombiniertes Audiosignal zu erzeugen,
Konvertieren des kombinierten Audiosignals in Töne unter Verwendung von einem oder mehreren Wandlern und
Ausgeben der Töne, so dass die Töne von jedem der mehreren Zuhörer gehört werden,
5
wobei es das Verfahren jedem der mehreren Zuhörer ermöglicht, unabhängig von anderen der mehreren Zuhörer
das erste und das zweite Audiosignal einzustellen.
21. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei mindestens eines von dem ersten und dem
zweiten Audiosignal ein monoaurales Signal ist.
10
22. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei mindestens eines von dem ersten und dem
zweiten Audiosignal ein Stereosignal ist, wobei das Stereosignal einen linken und einen rechten Kanal räumlicher
Informationen aufweist.
15
20
25
23. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei mindestens eines von dem ersten und dem
zweiten Audiosignal ein Surround-Signal ist, wobei das Surround-Signal Kanäle räumlicher Informationen aufweist,
die Links-, Mitte-, Rechts- und einen oder mehrere Surround-Kanäle aufweisen.
24. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei mindestens eines von dem ersten und dem
zweiten Audiosignal ein Mehrkanal-Surround-Signal ist, wobei das Mehrkanal-Surround-Signal Kanäle räumlicher
Informationen aufweist, die Links-, Links-Mitte-, Mitte-, Rechts-Mitte-, Rechts- und einen oder mehrere SurroundKanäle aufweisen.
25. Verfahren nach Anspruch 20, wobei die Einstellschritte unter Verwendung aktiver Verstärker mit steuerbarer Lautstärke ausgeführt werden.
26. Verfahren nach Anspruch 20, wobei die Einstellschritte unter Verwendung passiver Abschwächer mit steuerbarer
Lautstärke ausgeführt werden.
30
35
40
27. Verfahren nach Anspruch 20, wobei die Einstellschritte unter Verwendung einer einzigen Lautstärkesteuervorrichtung ausgeführt werden.
28. System nach Anspruch 4 oder Verfahren nach Anspruch 27, wobei, wenn die Lautstärkesteuervorrichtung in eine
Richtung bewegt wird, eine Lautstärke des ersten Audiosignals zunimmt, während eine Lautstärke des zweiten
Signals abnimmt, und wenn die Lautstärkesteuervorrichtung in die andere Richtung bewegt wird, die Lautstärke
des zweiten Signals zunimmt, während die Lautstärke des ersten Signals abnimmt.
29. Verfahren nach Anspruch 20, wobei in dem Empfangsschritt ein erster digitaler Bitstrom, der das erste Audiosignal
aufweist, und ein zweiter digitaler Bitstrom, der das zweite Audiosignal aufweist, empfangen werden, wobei das
Verfahren weiter folgende Schritte aufweist:
Decodieren des ersten digitalen Bitstroms und
Decodieren des zweiten digitalen Bitstroms.
45
30. Verfahren nach Anspruch 20, wobei der Empfangsschritt unter Verwendung eines einzigen Empfängers ausgeführt
wird.
31. Verfahren nach Anspruch 30, wobei in dem Empfangsschritt ein einziger digitaler Bitstrom empfangen wird, der das
erste und das zweite Audiosignal aufweist, wobei das Verfahren weiter aufweist:
50
Decodieren des einzigen digitalen Bitstroms.
32. Verfahren nach Anspruch 20, wobei die in dem Empfangsschritt empfangenen Übertragungen drahtlose Übertragungen sind.
55
33. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei die persönlichen Abhörvorrichtungen mindestens
eine der folgenden Vorrichtungen sind: Stereokopfhörer, monoaurale Ohrhörer, Hörhilfen und Hörunterstützungsvorrichtungen.
26
EP 1 266 469 B1
34. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei die persönlichen Abhörvorrichtungen am Körper
getragene Empfänger sind, wobei die am Körper getragenen Empfänger das kombinierte Audiosignal einem oder
mehreren elektroakustischen Wandlern bereitstellen.
5
35. Verfahren nach Anspruch 20, bei dem weiter das kombinierte Audiosignal unter Verwendung von mindestens einem
von einem Wellenleiter und einem Verstärker verbessert wird.
36. Verfahren nach Anspruch 20, welches weiter folgende Schritte aufweist:
10
Berechnen eines Verhältnisses zwischen einer Lautstärke des ersten Audiosignals und einer Lautstärke des
zweiten Audiosignals und
automatisches Einstellen und Beibehalten des Verhältnisses zwischen der Lautstärke des ersten Audiosignals
und der Lautstärke des zweiten Audiosignals.
15
37. Verfahren nach Anspruch 36, wobei der Berechnungsschritt eine Standardabweichung des ersten oder des zweiten
Audiosignals über einen begrenzten Zeitraum berechnet.
38. Verfahren nach Anspruch 36, bei dem weiter das Verhältnis in einem Speicher für die Verwendung durch die
Audiosignal-Kombiniervorrichtung gespeichert wird.
20
39. Verfahren nach Anspruch 36, wobei die Einstellschritte von einem der Zuhörer durch eine graphische Benutzerschnittstelle ausgeführt werden.
25
30
40. Verfahren nach Anspruch 36, wobei der Einstellschritt unter Verwendung einer durch einen einzelnen Benutzer
steuerbaren Lautstärkeeinstellvorrichtung ausgeführt wird, wobei die durch einen einzelnen Benutzer steuerbare
Lautstärkeeinstellvorrichtung so arbeitet, dass sie eine Lautstärke des kombinierten Audiosignals so einstellt, dass
eine Bewegung der durch einen einzelnen Benutzer steuerbaren Lautstärkeeinstellvorrichtung in eine erste Richtung
die Lautstärke des ersten Audiosignals erhöht und die Lautstärke des zweiten Audiosignals verringert und eine
Bewegung in eine zweite Richtung die Lautstärke des zweiten Audiosignals erhöht und die Lautstärke des ersten
Audiosignals verringert.
41. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei der entsprechende Audioteil des Films als das
erste Audiosignal und das zweite Audiosignal gespeichert wird.
35
40
42. Verfahren nach Anspruch 20, wobei die Einstellschritte unter Verwendung von mindestens einer von einer Computersoftware und einer Computerhardware ausgeführt werden.
43. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei die persönlichen Abhörvorrichtungen mindestens
eine der folgenden Vorrichtungen sind: ein Mobiltelefon, eine drahtlose Kommunikationsvorrichtung, ein am Körper
getragener Computer, ein PDA, eine persönliche Audiowiedergabevorrichtung, ein Fernsehgerät und ein DVDPlayer.
44. Verfahren nach Anspruch 20, bei dem weiter eine Lautstärke des kombinierten Audiosignals eingestellt wird.
45
50
45. Verfahren nach Anspruch 44, wobei das Einstellen des kombinierten Audiosignals die Verwendung eines benutzersteuerbaren Schalters aufweist, der momentan eine ursprüngliche Produktionsmischung des ersten Audiosignals
und des zweiten Audiosignals erreicht.
46. Verfahren nach Anspruch 44, wobei das Einstellen des kombinierten Audiosignals das Wandeln des kombinierten
Audiosignals in ein Audiosignal mit einer vorbestimmten Anzahl von Kanälen räumlicher Informationen aufweist.
47. Verfahren nach Anspruch 46, wobei in dem Wandlungsschritt das kombinierte Audiosignal, das Kanäle räumlicher
Informationen für Links, Mitte, Rechts, Rechts-Surround und Links-Surround aufweist, in ein Signal gewandelt wird,
das nur Kanäle räumlicher Informationen für Links und Rechts aufweist.
55
48. System nach Anspruch 1 oder Verfahren nach Anspruch 20, wobei das zweite Audiosignal mindestens einen Teil
des ersten Audiosignals aufweist.
27
EP 1 266 469 B1
Revendications
1.
5
Système de salle de cinéma émettant un film cinématographique pour chacun d’une pluralité d’auditeurs de façon
à permettre un réglage de volume audio personnalisé à la pluralité d’auditeurs du public de l’environnement de salle
de cinéma, comprenant :
un dispositif vidéo affichant une partie vidéo du film cinématographique pour le public ;
un système de haut-parleurs émettant une partie audio correspondante du film cinématographique pour le
public ;
un ou plusieurs éléments d’un moyen de stockage stockant la partie vidéo du film cinématographique, la partie
audio correspondante du film cinématographique, un premier signal audio étant sensiblement un signal vocal
et ayant un ou plusieurs canaux d’informations spatiales, et un second signal audio comprenant sensiblement
un contenu audio autre que le contenu audio du premier signal audio et ayant un ou plusieurs canaux d’informations spatiales;
un émetteur émettant les premier et second signaux audio depuis le ou les éléments du moyen de stockage
de manière synchronisée avec l’émission de la partie vidéo et de la partie audio correspondante du film cinématographique, vers une pluralité de dispositifs personnels d’écoute, dans lequel chacun de la pluralité de
dispositifs personnels d’écoute est associé à chacun de la pluralité d’auditeurs dans la salle de cinéma, et
chacun des dispositifs personnels d’écoute comprend :
10
15
20
un premier récepteur recevant le premier signal audio émis indépendamment du système de haut-parleurs ;
un second récepteur recevant un second signal audio émis indépendamment du système de haut-parleurs ;
un premier dispositif de réglage réglant le premier signal audio en se basant sur des entrées provenant de
l’un des auditeurs ;
un second dispositif de réglage réglant le second signal audio en se basant sur des entrées provenant de
l’un des auditeurs ;
un dispositif de combinaison de signaux audio combinant les canaux d’informations spatiales du premier
signal audio avec les canaux d’informations spatiales correspondants du second signal audio, pour produire
un signal audio combiné ; et
25
30
dans lequel un ou plusieurs transducteurs reçoivent le signal audio combiné, convertissent le signal audio combiné
en sons, et émettent ces sons de façon à ce que ces sons soient entendus par chacun de la pluralité d’auditeurs
associés à chacun de la pluralité de dispositifs personnels d’écoute du public du cinéma ;
dans lequel le système permet à chacun de la pluralité d’auditeurs de régler les premier et second signaux audio
indépendamment des autres parmi la pluralité d’auditeurs du public.
35
40
2.
Système selon la revendication 1, dans lequel les premier et second dispositifs de réglage sont des amplificateurs
actifs de réglage de volume.
3.
Système selon la revendication 1, dans lequel les premier et second dispositifs de réglage sont des atténuateurs
passifs de réglage de volume.
4.
Système selon la revendication 1, dans lequel les premier et second dispositifs de réglage sont combinés en un
unique dispositif de réglage de volume,
5.
Système selon la revendication 1, dans lequel le premier récepteur reçoit une première séquence binaire comprenant
le premier signal audio, et le second récepteur reçoit une seconde séquence binaire comprenant le second signal
audio, le système comprenant en outre :
45
50
un premier décodeur décodant la première séquence binaire ; et
un second décodeur décodant la seconde séquence binaire.
6.
Système selon la revendication 1, dans lequel le premier récepteur et le second récepteur sont intégrés en un
unique récepteur.
7.
Système selon la revendication 6, dans lequel l’unique récepteur reçoit une unique séquence binaire comprenant
le premier signal audio et le second signal audio, le système comprenant en outre :
55
28
EP 1 266 469 B1
un unique décodeur décodant l’unique séquence binaire.
5
10
8.
Système selon la revendication 1, dans lequel les premier et second récepteurs reçoivent des émissions sans fil.
9.
Système selon la revendication 1, comprenant en outre au moins un d’un guide d’onde et d’un amplificateur afin
d’améliorer le signal audio combiné.
10. Système selon la revendication 1, comprenant en outre un processeur calculant un rapport d’un volume du premier
signal audio sur un volume du second signal audio, dans lequel au moins un du premier dispositif de réglage, du
second dispositif de réglage et du dispositif de combinaison de signaux audio règle automatiquement et maintient
le rapport du premier volume de signaux audio sur le second volume de signaux audio.
11. Système selon la revendication 10, dans lequel le processeur calcule un écart-type du premier ou du second signal
audio sur une période de temps finie.
15
12. Système selon la revendication 10, dans lequel le rapport est stocké dans une mémoire utilisable par le dispositif
de combinaison de signaux audio.
20
25
13. Système selon la revendication 10, dans lequel les premier et second dispositifs de réglage sont réglés par l’un des
auditeurs via une interface graphique.
14. Système selon la revendication 10, dans lequel le premier dispositif de réglage et le second dispositif de réglage
sont couplés à un dispositif de réglage de volume par un unique utilisateur, le dispositif de réglage de volume par
un unique utilisateur ayant pour fonction de régler le volume du signal audio combiné de telle sorte qu’un déplacement
du dispositif de réglage de volume par l’unique utilisateur dans une première direction augmente le volume du
premier niveau de signal audio et réduit le volume du second signal audio, et qu’un déplacement dans une seconde
direction augmente le volume du second signal audio et réduit le volume du premier signal audio.
15. Ordinateur, comprenant :
30
le système selon la revendication 1, dans lequel les premier et second dispositifs de réglage sont actionnés en
utilisant au moins un logiciel et un matériel informatiques.
35
16. Système selon la revendication 1, comprenant en outre un troisième dispositif de réglage réglant un volume du
signal audio combiné.
17. Système selon la revendication 16, dans lequel le troisième dispositif de réglage comprend un commutateur réglable
par l’utilisateur réalisant instantanément un dosage de production original du premier signal audio et du second
signal audio.
40
18. Système selon la revendication 16, dans lequel le troisième dispositif de réglage comprend un processeur d’ambiance
qui convertit le signal audio combiné en un signal audio ayant un nombre prédéterminé de canaux d’informations
spatiales.
45
50
55
19. Système selon la revendication 18, dans lequel le processeur d’ambiance convertit le signal audio combiné ayant
des canaux d’informations spatialesgauche, centre, droit, d’ambiance droit et d’ambiance gauche en un signal ayant
des canaux d’informationsspatialesgauche et droit.
20. Procédé d’actionnement d’un système de salle de cinéma émettant un film cinématographique vers chacun d’une
pluralité d’auditeurs d’une manière permettant le réglage du volume audio personnalisé pour la pluralité d’auditeurs
du public de l’environnement de salle de cinéma comprenant :
l’affichage d’une partie vidéo du film cinématographique pour le public ;
l’émission d’une partie audio correspondante du film cinématographique pour le public via un système de hautparleurs ;
la constitution d’un ou plusieurs éléments d’un moyen de stockage stockant la partie vidéo du film cinématographique, la partie audio correspondante du film cinématographique, un premier signal audio étant sensiblement un signal vocal et ayant un ou plusieurs canaux d’informations spatiales,et un second signal audio com-
29
EP 1 266 469 B1
5
10
15
prenant un contenu audio autre que le contenu audio du premier signal audio et ayant un ou plusieurs canaux
d’informations spatiales;
l’émission des premier et second signaux audio depuis le ou les éléments du moyen de stockage en synchronisation avec l’émission de la partie vidéo et de la partie audio correspondante du film cinématographique ;
la réception des premier et second signaux audio émis via une pluralité de dispositifs personnels d’écoute qui
sont indépendants du système de haut-parleurs, chacun de la pluralité de dispositifs personnels d’écoute étant
associé à chacun de la pluralité d’auditeurs du public du cinéma ;
le réglage du premier signal audio en se basant sur des entrées provenant de l’un des auditeurs ;
le réglage du second signal audio en se basant sur des entrées provenant de l’un des auditeurs en combinant
les canaux d’informations spatiales du premier signal audio avec les canaux d’informations spatiales correspondants du second signal audio, pour produire un signal audio combiné ;
la conversion du signal audio combiné en sons via un ou plusieurs transducteurs ; et
l’émission de sons de façon à ce que ces sons soient entendus par chacun parmi la pluralité d’auditeurs ;
le procédé permettant à chacun de la pluralité d’auditeurs de régler les premier et second signaux audio indépendamment des autres auditeurs de la pluralité d’auditeurs.
21. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel au moins un du premier signal
audio et du second signal audio est un signal monophonique.
20
25
30
22. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel au moins un du premier signal
audio et du second signal audio est un signal stéréophonique, le signal stéréophonique ayant des canaux d’informations spatiales gauche et droit.
23. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel le au moins un du premier
signal audio et du second signal audio est un signal d’ambiance, ce signal d’ambiance ayant des canaux d’informations spatiales comprenant des canaux gauche, centre, droit et un ou plusieurs canaux d’ambiance.
24. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel au moins un du premier signal
audio et du second signal audio est un signal d’ambiance à canaux multiples, ce signal d’ambiance à canaux
multiples ayant des canaux d’informations spatiales comprenant des canaux gauche, centre gauche, centre, centre
droit, droit et un ou plusieurs canaux d’ambiance.
25. Procédé selon la revendication 20, dans lequel les étapes de réglage sont réalisées au moyen d’amplificateurs
actifs de réglage de volume.
35
26. Procédé selon la revendication 20, dans lequel les étapes de réglage sont réalisées au moyen d’atténuateurs passifs
de réglage de volume.
40
27. Procédé selon la revendication 20, dans lequel les étapes de réglage sont réalisées via un unique dispositif de
réglage de volume.
45
28. Système selon la revendication 4 ou procédé selon la revendication 27 dans lequel, lorsque le dispositif de réglage
de volume est déplacé dans une direction, un volume du premier signal audio augmente tandis qu’un volume du
second signal audio diminue, et lorsque le dispositif de réglage de volume est déplacé dans l’autre direction, le
volume du second signal augmente tandis que le volume du premier signal diminue.
29. Procédé selon la revendication 20, dans lequel l’étape de réception comprend la réception d’une première séquence
binaire comprenant le premier signal audio et d’une seconde séquence binaire comprenant le second signal audio,
le procédé comprenant en outre :
50
le décodage de la première séquence binaire ; et
le décodage de la seconde séquence binaire ; et
30. Procédé selon la revendication 20, dans lequel l’étape de réception est exécutée au moyen d’un unique récepteur.
55
31. Procédé selon la revendication 30, dans lequel l’étape de réception reçoit une unique séquence binaire numérique
comprenant le premier signal audio et le second signal audio, le procédé comprenant en outre :
30
EP 1 266 469 B1
le décodage de l’unique séquence binaire.
32. Procédé selon la revendication 20, dans lequel les émissions reçues lors de l’étape de réception sont des émissions
sans fil.
5
33. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel les dispositifs personnels
d’écoute sont au moins un casque stéréophonique, des écouteurs monophoniques, un appareil auditif et des dispositifs d’écoute auxiliaires.
10
15
34. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel les dispositifs personnels
d’écoute sont des récepteurs portés, les récepteurs portés délivrant un signal audio combiné à un ou plusieurs
transducteurs électro-acoustiques.
35. Procédé selon la revendication 20, comprenant en outre l’amélioration du signal audio combiné via au moins un
guide d’ondes ou un amplificateur.
36. Procédé selon la revendication 20, comprenant en outre :
20
le calcul d’un rapport d’un volume du premier signal audio sur le volume d’un second signal audio ; et
le réglage automatique et maintien du rapport du premier volume de signal audio sur le second volume de
signal audio.
37. Procédé selon la revendication 36, dans lequel l’étape de calcul produit un écart-type du premier ou second signal
audio sur une période de temps finie.
25
38. Procédé selon la revendication 36, comprenant en outre le stockage du rapport dans une mémoire utilisable par le
dispositif de combinaison de signaux audio.
30
35
40
39. Procédé selon la revendication 36, dans lequel les étapes de réglage sont exécutées par l’un des auditeurs au
moyen d’une interface graphique.
40. Procédé selon la revendication 36, dans lequel l’étape de réglage est exécutée au moyen d’un dispositif de réglage
de volume par un unique utilisateur, le dispositif de réglage de volume par l’unique utilisateur ayant pour fonction
de régler le volume du signal audio combiné de telle sorte qu’un déplacement du dispositif de réglage de volume
par l’unique utilisateur dans une première direction augmente le volume du premier niveau de signaux audio et
réduit le volume du second signal audio, et qu’un déplacement dans une seconde direction augmente le volume
du second signal audio et réduit le volume du premier signal audio.
41. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel la partie audio correspondante
du film cinématographique est stockée sous forme de premier signal audio et de second signal audio.
42. Procédé selon la revendication 20, dans lequel les étapes de réglage sont exécutées au moyen d’au moins un
logiciel et un matériel.
45
43. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel les dispositifs personnels
d’écoute sont au moins un d’un téléphone cellulaire, d’un dispositif de communication sans fil, d’un ordinateur porté,
d’un assistant personnel numérique, d’un dispositif de lecture audio personnel, d’une télévision et d’un lecteur de
DVD.
50
44. Procédé selon la revendication 20, comprenant en outre le réglage du volume du signal audio combiné.
45. Procédé selon la revendication 44, dans lequel le réglage du signal audio combiné comprend l’utilisation d’un
commutateur réglable par l’utilisateur réalisant instantanément un dosage de production original du premier signal
audio et du second signal audio.
55
46. Procédé selon la revendication 44, dans lequel le réglage du signal audio combiné comprend la conversion du
signal audio combiné en un signal audio ayant un nombre prédéterminé de canaux d’informations spatiales.
31
EP 1 266 469 B1
47. Procédé selon la revendication 46, dans lequel l’étape de conversion comprend la conversion du signal audio
combiné ayant des canaux d’informations spatiales gauche, centre, droit, ambiance droit et ambiance gauche en
un signal ayant seulement des canaux d’informations spatiales gauche et droit.
5
48. Système selon la revendication 1 ou procédé selon la revendication 20, dans lequel le second signal audio comprend
au moins une partie du premier signal audio.
10
15
20
25
30
35
40
45
50
55
32
EP 1 266 469 B1
33
EP 1 266 469 B1
34
EP 1 266 469 B1
35
EP 1 266 469 B1
36
EP 1 266 469 B1
37
EP 1 266 469 B1
38
EP 1 266 469 B1
39
EP 1 266 469 B1
40
EP 1 266 469 B1
41
EP 1 266 469 B1
42
EP 1 266 469 B1
43
EP 1 266 469 B1
44
EP 1 266 469 B1
45
EP 1 266 469 B1
46
EP 1 266 469 B1
47
EP 1 266 469 B1
48
EP 1 266 469 B1
49
EP 1 266 469 B1
50
EP 1 266 469 B1
51
EP 1 266 469 B1
52
EP 1 266 469 B1
53
EP 1 266 469 B1
54
EP 1 266 469 B1
55
EP 1 266 469 B1
56
EP 1 266 469 B1
57
EP 1 266 469 B1
58
EP 1 266 469 B1
59
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement

Languages