sound signal processing for a virtual room
SOUND SIGNAL PROCESSING FOR A VIRTUAL ROOM Jarmo Hiipakka1, Tommi Ilmonen2, Tapio Lokki2, Lauri Savioja2 [email protected] Helsinki University of Technology of Acoustics and Audio Signal Processing P.O.Box 3000, FIN-02015 HUT 2Telecommunications Software and Multimedia Laboratory P.O.Box 5400, FIN-02015 HUT 1Laboratory ABSTRACT This paper presents the audio system built for the virtual room at Helsinki University of Technology. First we discuss the general problems for multichannel sound reproduction caused by the construction of, and the equipment in virtual rooms. We also describe the acoustics of the room in question, and the effect of the back-projected screens and reverberation to the sound. Compensation of the spectral deficiencies and the problems with the large listening area and high frequency attenuation are introduced. The hardware configuration used for sound reproduction is shortly described. We also report the software applications and libraries built for sound signal processing and 3-D sound reproduction. 1 equipment, to the applied processing techniques, and with multichannel reproduction to the acoustics of the room. Some deficiencies in the response of the reproduction system can be compensated for, but there are practical limits to this approach. 2.1 Sound Reproduction Techniques In binaural 3-D sound reproduction techniques, the principle is to control the sound signal in the entrances of the listener’s ear canals as accurately as possible. This requirement is easiest to fulfil with headphones but can be done also with a pair of loudspeakers. In virtual rooms, the usage of loudspeakers for binaural sound reproduction is not feasible, because of the small sweet spot where correct 3-D sound imaging can be achieved. INTRODUCTION A virtual room is a fully immersive multiuser virtual reality system . The visual display of the virtual room consists of large back-projected screens surrounding the users. Lightweight shutter glasses are used for a completely immersive stereoscopic view of the virtual world. A tracking device is used to monitor head position and orientation of one user so that correct stereo perspective can be obtained. If there are more than one user simultaneously present in the virtual room, the stereo perspective is exact only for the tracked observer. However, the perspective for other observers is not overly distorted, and real cooperation in the virtual world is possible. Many virtual reality systems use only computer graphics to produce immersive virtual environments. However, sound is an important part of our everyday life and should be included in the creation of an immersive virtual environment. In the following sections we present the audio system built for the virtual room at Helsinki University of Technology (HUT) (Fig. 1) . 2 AUDIO IN THE VIRTUAL ROOM For an immersive virtual reality experience, a three-dimensional soundscape needs to be created. This can be done using either binaural or multichannel techniques. The quality of the resulting soundfield is subject to the quality of the Figure 1: A schematic figure of the virtual room at HUT. Color indicates the height of the loudspeaker. Dark loudspeakers are on the floor, and light ones are ceiling-mounted. Acoustics of the Room 3-D sound reproduction in virtual rooms is far from being trivial. In virtual rooms there are back-projected screens and data projectors that hinder arbitrary positioning of loudspeakers, and typical back-projection screens are not acoustically transparent. Headphone reproduction for spatial audio solves these problems, however, one of the most important features in virtual rooms is that many users can simultaneously interact with the virtual environment and with each other. If headphones were used, this interaction could be disturbed. Another disadvantage with headphones is the dependency of computational capacity on the number of users. For every user, an individual soundscape has to be calculated and everyone needs his or her own head-tracking device to get a correct soundscape. To minimize the influence of the room acoustics, the room must be as anechoic as possible. This means that the walls and the ceiling must be covered with absorbent material. For the room to be silent enough, all noisy equipment should be placed in another room. For practical reasons, however, at least the data projectors are usually in the same room with the screens. The mirrors and the screens also produce reflections that may be disadvantageous for sound reproduction. To investigate the acoustics of our virtual room at HUT, we conducted a series of impulse response measurements from all fifteen loudspeakers to nine microphone positions inside the virtual room. The height of all microphone positions was 160 cm, and the measuring points formed a square-shaped grid with 150 cm long sides with the central point in the middle of the virtual room. In the measurements we used the multi-channel IRMA measurement system  with MLS sequences as the source signal. According to our measurements, the reverberation time in our virtual room is in the order of 400ms. Figure 2 depicts the frequency contents calculated from short segments of the impulse responses from two different loudspeaker positions to the middle of the virtual room. Both loudspeakers are on the floor. In the first case (Fig. 2a), there is no screen between the loudspeaker and the microphone (loudspeaker number 13 in figure 1), but in the second case (Fig 2b), there is a screen (loudspeaker 8). Before analysis, the impulse responses have been truncated so that the direct sound is in the beginning of the response, i.e., the propagation delay has been removed from the responses. The distance from the 0 −10 Magnitude (dB) 2.2 loudspeaker to the microphone is approximately the same in each case. The different line styles in the figures denote the segment of the impulse response that has been used for response calculation. For clarity, the frequency response has been smoothed with quarter-octave resolution. From the figure, the influence of the screen on sound can be easily seen. The higher frequencies of the direct sound are attenuated more than 10dB when there is a screen between the loudspeaker and the microphone. The reverberant sound field will bypass the screen, i.e., the level of the reverberant sound is hardly at all influenced by the presence of the screen. The attenuation of the direct sound affects the localization of the sound sources and blurs directional cues. −20 −30 −40 −50 −60 2 10 3 10 Frequency (Hz) 4 10 a) 0 −10 Magnitude (dB) In multichannel reproduction, multiple loudspeakers are placed around the listener. With multichannel techniques, it is possible to reproduce sound signals naturally from correct directions. When the direction of the virtual sound source coincides with the direction of a real loudspeaker, the source direction is exact. When these directions do not coincide, different panning methods can be used. In our system we apply vector base amplitude panning (VBAP), which is an simple mathematical way to calculate the gain coefficients of the adjacent loudspeakers . −20 −30 −40 −50 −60 2 10 3 10 Frequency (Hz) 4 10 b) Figure 2: Frequency responses calculated from several 11.6ms segments of two impulse responses. The starting times of the segments are: 0ms for the solid line, 25ms for the dotted line, 50ms for the dashdot line, and 75ms for the dashed line. The responses in figure a) are for the loudspeaker on the floor in the back of the virtual room (without a screen), and those in figure b) are for the loudspeaker on the floor to the right of the virtual room. COMPENSATION When the simulated acoustics of a virtual environment is reproduced using loudspeakers, the room response has to be equalized so that desired acoustical conditions are faithfully reproduced. Room equalization in the whole audible frequency range is an impossible task when we have multiple loudspeakers and the listener is moving. (Room equalization has been discussed in more detail, e.g., in [5,6]). Filters that perform spectral whitening can be designed automatically by using, e.g., linear prediction. With automated design methods, it is in principle possible to produce a very flat frequency response from the loudspeakers to the listening area. In our case, the difficulty comes from the fact that high frequencies are rather much attenuated. If we attempt to achieve a flat frequency response, the level of the high frequencies needs to be very high even when using moderate overall sound pressure levels. An easy way to achieve the desired non-flat frequency response is to pre-filter the original measured impulse response before automatic flattening-filter design. We prefiltered the impulse responses with a filter that slightly boosted the high frequencies. After trying out different compensation filter design techniques, we chose FIR filters of order 25 for practical implementation of spectral compensation. The filters were designed using linear prediction. There are a few reasons for selecting such a small filter order. The computational requirements for the filters must be very modest; the compensation has to be done for fifteen channels, and the computer must also be able to simultaneously run the auralization engine. Additionally, the listening area inside the virtual room is so large that it is not feasible to try to compensate the response very accurately. The chosen equalization method provides us with sufficiently uniform timbre across the whole listening area. It is practically impossible to implement the digital compensation so that the level of the direct sound would be increased with respect to the reflections; the only way to achieve this goal is to decrease the energy of the reflections by adding more absorbent material to the walls of the room. 4 SOUND REPRODUCTION HARDWARE The hardware for sound reproduction in the virtual room is built around one dual-processor PC computer running Linux operating system. This computer acts as a central soundprocessing unit. The computer runs special software that is used for acoustic modeling, sound source panning, and equalization filtering. The problems with using PC hardware and Linux for sound processing are mostly due to the fact that the support for advanced multi-channel sound cards on Linux is on quite an immature stage. However, the overall performance on inexpensive low-latency Linux systems outperforms many other computers and operating systems. 0 −10 Magnitude (dB) 3 As sound sources we currently use audio files and sound synthesis applications running on the central audio processing PC or on other hosts. The interconnections between computers and other audio equipment can be changed in a flexible manner, because we use only standard audio cables to connect the equipment together, and we have written all the software ourselves. Sound output from the Linux PC is taken from two eightchannel ADAT interfaces that are connected to two eightchannel D/A converters. The current loudspeaker system in the virtual room consists of fifteen active Genelec monitoring loudspeakers. We have plans to include subwoofer(s) in the system, because high-quality bass reproduction is vital for the immersion in many applications. −20 −30 −40 −50 −60 2 10 3 10 Frequency (Hz) 4 10 a) 0 −10 Magnitude (dB) In both cases, the level of the highest frequencies is rather much reduced, and this makes the sound dull. −20 −30 −40 −50 −60 2 10 3 10 Frequency (Hz) 4 10 b) Figure 3: Frequency responses calculated from segments of the compensated impulse responses. The loudspeakers and the segment boundaries are the same as those in figure 2. 5 SOFTWARE IMPLEMENTATION All software used for signal processing has been written with C++ language using object-oriented programming approach. The software is split to efficient low-level signal processing library and higher level dynamic signal processing/routing toolkit. 5.1 Low-Level Signal Processing For efficient signal processing needed for the virtual room, we have implemented a low-level signal processing library. The library contains generic base classes for different types of signal processing blocks, and optimized versions of several general-purpose and special-purpose DSP structures. The signal processing units of the low-level library do their calculations in a sample-by-sample fashion. The units are combined on code level, and new combinations generally require recompilation of the application. The DSP library contains implementations for several types of typical filter structures. Generic filter classes can be used as filters of any order, and for sake of efficiency, special implementations have been written for the most used filters of first, second, and third order. Auxiliary functions can be used to calculate filter coefficients using a few common filter design methods. 5.2 High-Level Signal Processing To complement the low-level signal processing tools, a high-level signal processing application has been built. The application—called Mustajuuri—is a generic plugin-based real-time signal processing tool. With proper plugins this application can be used to run the audio processing for the virtual room. Mustajuuri is available at <http:// www.tcm.hut.fi/~tilmonen/mustajuuri/>. 5.2.1 Modularity Mustajuuri is a highly modular signal processing platform. Arbitrary signal processing modules can be chained in runtime to create a DSP network. Since most of the functionality comes from the plugins, the system can be easily extended. New plugin types can be written in C++ and they are loaded by the application without recompilation. Plugins are chained to create desired signal processing networks in run-time. By nature, all plugins can deal with audio signals and with events (for example MIDI events). Modularity is also a natural way to handle platform-dependent features such as audio I/O and MIDI I/O. 5.2.2 Customizability For this project, the customizability of Mustajuuri has been a deciding factor. We have been able to incorporate plugins for flexible auralization into the software. By creating proper plugins one can control the software in a novel way—it is even possible to directly access the internal data structures of the application. Since all the source code is available, one has total control over the internals of the software. As the software is plugin-based, we do not need to do modifications to the application when adding or modifying features. 5.2.3 Connectivity By nature, Mustajuuri is an application that is used locally (the user and the software on the same machine). In virtual reality applications, graphics and sound are typically processed in different computers. For this reason, system components must communicate via network. For the needs of the virtual room, we created an auralization control plugin. This plugin adds auralization server features to Mustajuuri. Auralization can be controlled via a standard socket interface. A lightweight C++ library is used to hide the socket interface from the client application. 6 CONCLUSIONS In this paper we have described sound signal processing issues for a virtual room. The virtual room is not an optimal listening environment but with measurement and compensation the problems can be reduced. In this paper we have presented responses measured in the virtual room at HUT. These responses clearly show the effect of the back-projection screens on sound. In the virtual room at HUT, we have chosen to improve the spectral responses by digital equalization filters. The low and high level signal processing requirements were discussed and an example implementation was described. ACKNOWLEDGMENTS This work has been supported by the Academy of Finland and Tekes, the National Technology Agency. REFERENCES  Cruz-Neira C., Sandin D., DeFanti T., Kenyon R. and Hart J. 1992, “The Cave – Audio Visual Experience Automatic Virtual Environment,” Communications of ACM, 35(6), pp. 64–72.  Jalkanen J., 2000, Building a Spatially Immersive Display: HUTCAVE, Licentiate Thesis, Helsinki University of Technology, 4 March 2000, p. 132.  Pulkki V. 1997, “Virtual Sound Source Positioning Using Vector Base Amplitude Panning,” Journal of the Audio Engineering Society, 45(6), pp. 456-466.  Peltonen T. 2000, A Multichannel Measurement System for Room Acoustics Analysis, Master’s thesis, Helsinki University of Technology, 2000.  Walker R. 1998, “Equalisation of Room Acoustics and Adaptive Systems in the Equalisation of Small Room Acoustics,” In Proc. AES 15th Int. Conf. Copenhagen, Denmark, pp. 32–47.  Mourjopoulos J. N. 1994, “Digital equalization of room acoustics,” Journal of the Audio Engineering Society, 42(11), pp. 884-900.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project