sound signal processing for a virtual room

sound signal processing for a virtual room
Jarmo Hiipakka1, Tommi Ilmonen2, Tapio Lokki2, Lauri Savioja2
[email protected]
Helsinki University of Technology
of Acoustics and Audio Signal Processing
P.O.Box 3000, FIN-02015 HUT
2Telecommunications Software and Multimedia Laboratory
P.O.Box 5400, FIN-02015 HUT
This paper presents the audio system built for the virtual
room at Helsinki University of Technology. First we discuss
the general problems for multichannel sound reproduction
caused by the construction of, and the equipment in virtual
rooms. We also describe the acoustics of the room in question, and the effect of the back-projected screens and reverberation to the sound. Compensation of the spectral
deficiencies and the problems with the large listening area
and high frequency attenuation are introduced. The hardware configuration used for sound reproduction is shortly
described. We also report the software applications and
libraries built for sound signal processing and 3-D sound
equipment, to the applied processing techniques, and with
multichannel reproduction to the acoustics of the room.
Some deficiencies in the response of the reproduction system can be compensated for, but there are practical limits to
this approach.
Sound Reproduction Techniques
In binaural 3-D sound reproduction techniques, the principle
is to control the sound signal in the entrances of the listener’s ear canals as accurately as possible. This requirement is easiest to fulfil with headphones but can be done
also with a pair of loudspeakers. In virtual rooms, the usage
of loudspeakers for binaural sound reproduction is not feasible, because of the small sweet spot where correct 3-D
sound imaging can be achieved.
A virtual room is a fully immersive multiuser virtual reality
system [1]. The visual display of the virtual room consists of
large back-projected screens surrounding the users. Lightweight shutter glasses are used for a completely immersive
stereoscopic view of the virtual world. A tracking device is
used to monitor head position and orientation of one user so
that correct stereo perspective can be obtained. If there are
more than one user simultaneously present in the virtual
room, the stereo perspective is exact only for the tracked
observer. However, the perspective for other observers is not
overly distorted, and real cooperation in the virtual world is
Many virtual reality systems use only computer graphics
to produce immersive virtual environments. However, sound
is an important part of our everyday life and should be
included in the creation of an immersive virtual environment. In the following sections we present the audio system
built for the virtual room at Helsinki University of Technology (HUT) (Fig. 1) [2].
For an immersive virtual reality experience, a three-dimensional soundscape needs to be created. This can be done
using either binaural or multichannel techniques. The quality of the resulting soundfield is subject to the quality of the
Figure 1: A schematic figure of the virtual room at HUT.
Color indicates the height of the loudspeaker. Dark loudspeakers are on the floor, and light ones are ceiling-mounted.
Acoustics of the Room
3-D sound reproduction in virtual rooms is far from being
trivial. In virtual rooms there are back-projected screens and
data projectors that hinder arbitrary positioning of loudspeakers, and typical back-projection screens are not acoustically transparent. Headphone reproduction for spatial
audio solves these problems, however, one of the most
important features in virtual rooms is that many users can
simultaneously interact with the virtual environment and
with each other. If headphones were used, this interaction
could be disturbed. Another disadvantage with headphones
is the dependency of computational capacity on the number
of users. For every user, an individual soundscape has to be
calculated and everyone needs his or her own head-tracking
device to get a correct soundscape.
To minimize the influence of the room acoustics, the
room must be as anechoic as possible. This means that the
walls and the ceiling must be covered with absorbent material. For the room to be silent enough, all noisy equipment
should be placed in another room. For practical reasons,
however, at least the data projectors are usually in the same
room with the screens. The mirrors and the screens also produce reflections that may be disadvantageous for sound
To investigate the acoustics of our virtual room at HUT,
we conducted a series of impulse response measurements
from all fifteen loudspeakers to nine microphone positions
inside the virtual room. The height of all microphone positions was 160 cm, and the measuring points formed a
square-shaped grid with 150 cm long sides with the central
point in the middle of the virtual room. In the measurements
we used the multi-channel IRMA measurement system [4]
with MLS sequences as the source signal.
According to our measurements, the reverberation time in
our virtual room is in the order of 400ms. Figure 2 depicts
the frequency contents calculated from short segments of
the impulse responses from two different loudspeaker positions to the middle of the virtual room. Both loudspeakers
are on the floor. In the first case (Fig. 2a), there is no screen
between the loudspeaker and the microphone (loudspeaker
number 13 in figure 1), but in the second case (Fig 2b), there
is a screen (loudspeaker 8). Before analysis, the impulse
responses have been truncated so that the direct sound is in
the beginning of the response, i.e., the propagation delay has
been removed from the responses. The distance from the
Magnitude (dB)
loudspeaker to the microphone is approximately the same in
each case. The different line styles in the figures denote the
segment of the impulse response that has been used for
response calculation. For clarity, the frequency response has
been smoothed with quarter-octave resolution.
From the figure, the influence of the screen on sound can
be easily seen. The higher frequencies of the direct sound
are attenuated more than 10dB when there is a screen
between the loudspeaker and the microphone. The reverberant sound field will bypass the screen, i.e., the level of the
reverberant sound is hardly at all influenced by the presence
of the screen. The attenuation of the direct sound affects the
localization of the sound sources and blurs directional cues.
−60 2
Frequency (Hz)
Magnitude (dB)
In multichannel reproduction, multiple loudspeakers are
placed around the listener. With multichannel techniques, it
is possible to reproduce sound signals naturally from correct
directions. When the direction of the virtual sound source
coincides with the direction of a real loudspeaker, the source
direction is exact. When these directions do not coincide,
different panning methods can be used. In our system we
apply vector base amplitude panning (VBAP), which is an
simple mathematical way to calculate the gain coefficients
of the adjacent loudspeakers [3].
−60 2
Frequency (Hz)
Figure 2: Frequency responses calculated from several
11.6ms segments of two impulse responses. The starting
times of the segments are: 0ms for the solid line, 25ms for
the dotted line, 50ms for the dashdot line, and 75ms for the
dashed line. The responses in figure a) are for the loudspeaker on the floor in the back of the virtual room (without a
screen), and those in figure b) are for the loudspeaker on the
floor to the right of the virtual room.
When the simulated acoustics of a virtual environment is
reproduced using loudspeakers, the room response has to be
equalized so that desired acoustical conditions are faithfully
reproduced. Room equalization in the whole audible frequency range is an impossible task when we have multiple
loudspeakers and the listener is moving. (Room equalization
has been discussed in more detail, e.g., in [5,6]).
Filters that perform spectral whitening can be designed
automatically by using, e.g., linear prediction. With automated design methods, it is in principle possible to produce
a very flat frequency response from the loudspeakers to the
listening area. In our case, the difficulty comes from the fact
that high frequencies are rather much attenuated. If we
attempt to achieve a flat frequency response, the level of the
high frequencies needs to be very high even when using
moderate overall sound pressure levels.
An easy way to achieve the desired non-flat frequency
response is to pre-filter the original measured impulse
response before automatic flattening-filter design. We prefiltered the impulse responses with a filter that slightly
boosted the high frequencies.
After trying out different compensation filter design techniques, we chose FIR filters of order 25 for practical implementation of spectral compensation. The filters were
designed using linear prediction. There are a few reasons for
selecting such a small filter order. The computational
requirements for the filters must be very modest; the compensation has to be done for fifteen channels, and the computer must also be able to simultaneously run the
auralization engine. Additionally, the listening area inside
the virtual room is so large that it is not feasible to try to
compensate the response very accurately. The chosen equalization method provides us with sufficiently uniform timbre
across the whole listening area.
It is practically impossible to implement the digital compensation so that the level of the direct sound would be
increased with respect to the reflections; the only way to
achieve this goal is to decrease the energy of the reflections
by adding more absorbent material to the walls of the room.
The hardware for sound reproduction in the virtual room is
built around one dual-processor PC computer running Linux
operating system. This computer acts as a central soundprocessing unit. The computer runs special software that is
used for acoustic modeling, sound source panning, and
equalization filtering. The problems with using PC hardware
and Linux for sound processing are mostly due to the fact
that the support for advanced multi-channel sound cards on
Linux is on quite an immature stage. However, the overall
performance on inexpensive low-latency Linux systems outperforms many other computers and operating systems.
Magnitude (dB)
As sound sources we currently use audio files and sound
synthesis applications running on the central audio processing PC or on other hosts. The interconnections between
computers and other audio equipment can be changed in a
flexible manner, because we use only standard audio cables
to connect the equipment together, and we have written all
the software ourselves.
Sound output from the Linux PC is taken from two eightchannel ADAT interfaces that are connected to two eightchannel D/A converters. The current loudspeaker system in
the virtual room consists of fifteen active Genelec monitoring loudspeakers. We have plans to include subwoofer(s) in
the system, because high-quality bass reproduction is vital
for the immersion in many applications.
−60 2
Frequency (Hz)
Magnitude (dB)
In both cases, the level of the highest frequencies is rather
much reduced, and this makes the sound dull.
−60 2
Frequency (Hz)
Figure 3: Frequency responses calculated from segments of
the compensated impulse responses. The loudspeakers and
the segment boundaries are the same as those in figure 2.
All software used for signal processing has been written
with C++ language using object-oriented programming
approach. The software is split to efficient low-level signal
processing library and higher level dynamic signal processing/routing toolkit.
Low-Level Signal Processing
For efficient signal processing needed for the virtual room,
we have implemented a low-level signal processing library.
The library contains generic base classes for different types
of signal processing blocks, and optimized versions of several general-purpose and special-purpose DSP structures.
The signal processing units of the low-level library do their
calculations in a sample-by-sample fashion. The units are
combined on code level, and new combinations generally
require recompilation of the application.
The DSP library contains implementations for several
types of typical filter structures. Generic filter classes can be
used as filters of any order, and for sake of efficiency, special
implementations have been written for the most used filters
of first, second, and third order. Auxiliary functions can be
used to calculate filter coefficients using a few common filter design methods.
High-Level Signal Processing
To complement the low-level signal processing tools, a
high-level signal processing application has been built. The
application—called Mustajuuri—is a generic plugin-based
real-time signal processing tool. With proper plugins this
application can be used to run the audio processing for the
virtual room. Mustajuuri is available at <http://>.
5.2.1 Modularity
Mustajuuri is a highly modular signal processing platform.
Arbitrary signal processing modules can be chained in runtime to create a DSP network. Since most of the functionality comes from the plugins, the system can be easily
extended. New plugin types can be written in C++ and they
are loaded by the application without recompilation. Plugins
are chained to create desired signal processing networks in
run-time. By nature, all plugins can deal with audio signals
and with events (for example MIDI events). Modularity is
also a natural way to handle platform-dependent features
such as audio I/O and MIDI I/O.
5.2.2 Customizability
For this project, the customizability of Mustajuuri has been
a deciding factor. We have been able to incorporate plugins
for flexible auralization into the software. By creating
proper plugins one can control the software in a novel
way—it is even possible to directly access the internal data
structures of the application. Since all the source code is
available, one has total control over the internals of the software. As the software is plugin-based, we do not need to do
modifications to the application when adding or modifying
5.2.3 Connectivity
By nature, Mustajuuri is an application that is used locally
(the user and the software on the same machine). In virtual
reality applications, graphics and sound are typically processed in different computers. For this reason, system components must communicate via network. For the needs of
the virtual room, we created an auralization control plugin.
This plugin adds auralization server features to Mustajuuri.
Auralization can be controlled via a standard socket interface. A lightweight C++ library is used to hide the socket
interface from the client application.
In this paper we have described sound signal processing
issues for a virtual room. The virtual room is not an optimal
listening environment but with measurement and compensation the problems can be reduced. In this paper we have presented responses measured in the virtual room at HUT.
These responses clearly show the effect of the back-projection screens on sound.
In the virtual room at HUT, we have chosen to improve
the spectral responses by digital equalization filters. The low
and high level signal processing requirements were discussed and an example implementation was described.
This work has been supported by the Academy of Finland
and Tekes, the National Technology Agency.
[1] Cruz-Neira C., Sandin D., DeFanti T., Kenyon R. and
Hart J. 1992, “The Cave – Audio Visual Experience
Automatic Virtual Environment,” Communications of
ACM, 35(6), pp. 64–72.
[2] Jalkanen J., 2000, Building a Spatially Immersive Display: HUTCAVE, Licentiate Thesis, Helsinki University of Technology, 4 March 2000, p. 132.
[3] Pulkki V. 1997, “Virtual Sound Source Positioning
Using Vector Base Amplitude Panning,” Journal of the
Audio Engineering Society, 45(6), pp. 456-466.
[4] Peltonen T. 2000, A Multichannel Measurement System for Room Acoustics Analysis, Master’s thesis,
Helsinki University of Technology, 2000.
[5] Walker R. 1998, “Equalisation of Room Acoustics and
Adaptive Systems in the Equalisation of Small Room
Acoustics,” In Proc. AES 15th Int. Conf. Copenhagen,
Denmark, pp. 32–47.
[6] Mourjopoulos J. N. 1994, “Digital equalization of
room acoustics,” Journal of the Audio Engineering
Society, 42(11), pp. 884-900.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF