Master`s Thesis - Chalmers Publication Library

Master`s Thesis - Chalmers Publication Library
An Interactive Auralization Method
Auralization using ambisonics & real-time sound sources
JOSEFIN LINDEBRINK
Department of Civil and Environmental Engineering
Divison of Applied Acoustics
Vibroacoustics Group
Chalmers University of Technology
Gothenburg, Sweden 2013
Master’s Thesis 2013:132
MASTER THESIS 2013:132
An Interactive Auralization Method
Auralization using ambisonics & real-time sound sources
Masters Thesis in the Masters Programme in Sound & Vibration
JOSEFIN LINDEBRINK
Department of Civil and Environmental Engineering
Division of Applied Acoustics
Vibroacoustics Group
CHALMERS UNIVERSITY OF TECHNOLOGY
Göteborg, Sweden 2013
An Interactive Auralization Method
Auralization using ambisonics & real-time sound sources
c
JOSEFIN
LINDEBRINK, 2013
Master’s Thesis 2013:132
Department of Civil and Environmental Engineering
Division of Applied Acoustics
Vibroacoustics Group
Chalmers University of Technology
SE-412 96 Göteborg
Sweden
Telephone: + 46 (0)31 - 772 100
Cover:
Interpretation of audible environments, Martin Boyton, Stockholm,2012
Reproservice / Department of Civil and Environmental Engineering
Göteborg, Sweden 2013
An Interactive Auralization Method
Auralization using ambisonics & real-time sound sources
JOSEFIN LINDEBRINK
Department of Civil and Environmental Engineering
Divison of Applied Acoustics
Vibroacoustics Group
Chalmers University of Technology
Abstract
During recent years auralization methods have evolved towards using interactive measures. The use of interactive elements such as navigation in static sound fields has proven
to be very significant in order to better integrate the listener with the simulated soundscape. In this study the possibility of engaging the user by actively contributing to the
sound field is explored by enabling the subject to make sounds and communicate within
the environment. Auralization allows for a subjective evaluation of the acoustical space
and therefore plays an important part in a wider understanding of how we are affected by
different environmental characteristics. With an auralization framework utilizing realtime sound sources direct experience of the acoustical response of the physical space is
enabled and can thus be used as a tool for evaluation. Real-time convolution software
implementing this method for performing auralizations has been designed. A subjective
evaluation has been made using a ambisonics decoded sound field reproduced through a
multi-channel loudspeaker system and a directional microphone with feedback control.
Evaluation results indicate a positive response from the subjects to the added control
over the simulated space. Further studies need however to be made to analyze the effects
the added activity of the listener has on his or her perception of the space.
Key words : Auralization, Virtual Acoustics, Real-time convolution, Ambisonics, Acoustic feedback control, Interactive Virtual Environments, Pure Data
En interaktiv auraliseringsmetod
Auralisering med ambisonics & real-tidskällor
JOSEFIN LINDEBRINK
Institutionen för bygg- och miljöteknik
Avdelningen för Teknisk akustik
Vibroakustiksgruppen
Chalmers Tekniska Högskola
Sammanfattning
Under senare år har metoder för auralisering utvecklats till att allt mer tillämpa interaktion mellan lyssnare och ljudmiljö. Möjligheter så som navigering i statiska ljudfält
har visats vara signifikanta i arbetet med att förbättra integreringen av lyssnaren i det
simulerade ljudlandskapet. I detta arbete har möjligheten att låta lyssnare aktivt bidra
till ljudlandskapet genom att agera ljudkälla och kommunicera i den simulerade ljudmiljön studerats. Auralisering möjliggör en subjektiv bedömning av ljudmiljöer och är
därmed en viktig del i arbetet med att studera hur akustiska egenskaper hos fysikaliska
miljöer påverkar oss. Med ett ramverk för auralisering som tillämpar real-tidsljudkällor
tillåts en direkt upplevelse av miljöns akustiska återkoppling som i sin tur skulle kunna
användas som ett verktyg i en bedömning av dess egenskaper. Initiella studier har genomförts för att fastställa eventuella fördelar och nackdelar med en sådan metod. Subjekt
i genomförda studier har reagerat positivt till den adderade kontrollen av vad som hörs
och när. Fortsatta studier krävs dock för att fullständigt utreda metodens betydelse när
det gäller genomförandet av auraliseringar i syfte att bedömma akustiska egenskaper hos
fysikaliska miljöer.
Nyckelord : Auralisering, interaktiva ljudmiljöer, real-tids faltning, real-tids ljudkällor,
ambisonics, akustisk rundgångskontroll, pure data
Acknowledgements
First and foremost I would like to thank my supervisor at Chalmers University of Technology, Associate Professor Jens Forssén for all the great advice and support during the
course of this work.
I would also like to thank the people at the acoustics department of Tyréns for all the
input and support when working on this project, especially Philip Zalyaletdinov and
Martin Höjer for discussions had during this time.
Penny Bergman, Mendel Kleiner and Erkin Asutay for all their advice and expertise, as
well as the rest of the staff at the division of Applied Acoustics, Chalmers University of
Technology.
Peter Lundén at SP Technical Research Institute of Sweden for advice on real-time audio
processing and the Pure Data software. Pontus Larsson at Volvo Technology for advice
on real-time audio processing and reproduction systems and Bengt-Inge Dahlenbäck,
founder and creator of CATT-Acoustics for input on interactive auralization methods
and help with the CATT-Acoustics software.
Niklas Billström, Ricardo Atienza and Björn Hellström at Konstfack for allowing access
to the Sound Design Lab as well as help with the setup.
An important thank you should also be given to family and friends for endless support
during the work on this thesis.
Josefin Lindebrink, Stockholm, May 2012
Notations
Lp,rec
Sound pressure level at receiver
dB
Lw
Sound power level
dB
Q(θ,ϕ)
Angular direcitivity
r
Radius
A’
Total absorption area
V
Volume
ITDG
Initial time delay gap
s
t
Time
s
RT60
Reverberation time
s
H(w)
Transfer function
s
q
Gain factor
fs
Sampling frequency
Hz
B
Bandwidth
Hz
m
m2 S
m3
Contents
1 Introduction
1.1 Background . . . . .
1.2 Purpose . . . . . . .
1.3 Problem description
1.4 Limitations . . . . .
.
.
.
.
1
1
2
2
3
2 Auralization
2.1 The concept of auralization . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Auralization in the acoustical design process . . . . . . . . . . . . . . . . .
4
4
4
3 Room Acoustics
3.1 Room acoustical parameters . .
3.2 Retrieving a RIR . . . . . . . . .
3.2.1 Measuring a RIR . . . . .
3.2.2 Modelling a RIR . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
7
7
7
4 Methods for Performing Auralizations
4.1 Sound source . . . . . . . . . . . . . .
4.2 Listening room . . . . . . . . . . . . .
4.3 Reproduction . . . . . . . . . . . . . .
4.4 Interactive auralization . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
8
8
9
.
.
.
.
.
.
.
.
5 The Acoustic Feedback Problem and Measures of Control
5.1 Acoustic feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The acoustical path – predicting the acoustic feedback problem .
5.1.2 Criteria for system stability . . . . . . . . . . . . . . . . . . . . .
5.1.3 Predicting maximum stable gain from reverberation time and system bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1.4 Acoustic echo control . . . . . . . . . . . . . . . . . . . . . . . .
5.1.5 Acoustic feedback control . . . . . . . . . . . . . . . . . . . . . .
ix
11
. 11
. 12
. 13
. 13
. 14
. 14
CONTENTS
5.2
5.3
5.1.6 The multichannel case . . . . . . .
Modelling and measurement approach . .
5.2.1 The single-channel case . . . . . .
5.2.2 The multi-channel case . . . . . .
Analysis of the acoustic feedback problem
5.3.1 The single-channel case . . . . . .
5.3.2 The multi-channel case . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
15
15
19
20
20
20
6 FFT-based Block Convolution
21
6.1 Concept of blocked FFT-convolution . . . . . . . . . . . . . . . . . . . . 21
6.2 Time delay of convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Results from time delay measurements . . . . . . . . . . . . . . . . . . . . 23
7 Ambisonics
24
7.1 The B-format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7.1.1 Encoding and decoding ambisonics . . . . . . . . . . . . . . . . . . 25
8 Application Architecture
26
9 Psychoacoustic Evaluation
9.1 Criteria for sound field evaluation . . . . . . . . . . . . . . . . . . . . .
9.2 Test approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Results of the evaluation test . . . . . . . . . . . . . . . . . . . . . . .
9.3.1 Part 1 - Evaluating room size using only real-time synthesis . .
9.3.2 Part 2 - Comparing real-time and pre-convolved sound sources
.
.
.
.
.
.
.
.
.
.
28
28
28
32
32
32
10 Discussion
34
10.1 Design criteria for the auralization framework . . . . . . . . . . . . . . . . 34
10.2 Reproduction level and the acoustic feedback
problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
10.3 The subjective evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
11 Conclusion
11.1 the Auralization framework . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2 Choice of equipment and acoustic feedback problem . . . . . . . . . . . .
11.3 Short notes on a transportable auralization set-up . . . . . . . . . . . . .
37
37
37
38
12 Future Work
39
References
41
A
Measurement Set-Up, Feedback Measurements in the Semi-Anechoic
Chamber
43
A.1 Equipment data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A.2 Microphone directivity data . . . . . . . . . . . . . . . . . . . . . . . . . . 44
x
CONTENTS
A.3 Measurement set-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
A.4 Measurement results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
B Measurement Set-Up, Feedback Measurements in the Sound Design
Lab
49
B.1 Equipment data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
B.2 Measurement results Konstfack, multichannel loudspeaker set-up . . . . . 52
C Listening Test Questionnaire
54
D Auralization Set-Up
57
xi
CONTENTS
xii
CHAPTER 1
Introduction
1.1
Background
In acoustical design numerous parameters help us evaluate the acoustical qualities of
physical space. However, for a complete assessment perception-based data should be
included. For this the sound field needs to actually be heard. Therefor studies have
been made on how to recreate the acoustical ques of physical space through modelling
and measurements.
Beginning with pioneering work performed by Spandöck et al [1], this early work meant
measuring in physical scale models. Today, as computers have become much more powerful, calculations are primarily performed using digital 3D models. Dedicated software
such as Odeon or CATT-Acoustics has been made available for calculation of the objective data of room models as well as generating audio for auralization purposes. Efficiency
of testing and changing acoustical parameters has also gained through the use of digital
computer models. Today the main objectives are to optimize the auralization practice
through more efficient calculation of the impulse response, calculation of the impulse
response in real-time as well as developing methods for adding listener influence on the
simulated soundscape.
When it comes to presenting simulated soundscapes or virtual acoustic environments,
VAE, interactive real-time auralization methods have in recent time been adopted more
and more. Instead of having the listener simply be subjected to the environment through
a listening experience, added control is given with the aim of merging the listener more
with the soundscape. Recent interactive methods includes allowing the listener to move
around in static sound fields or alter its acoustical characteristics in real-time, instantly
hearing the effects [2, 3, 4]. By activating and engaging the listener more in the virtual
environment the possibilities of a subjective assessment seems to be improving.
With auralization a more detailed evaluation of the qualities of a soundscape can be
assessed, beyond the scope of objective data. With the indicated possibilities of utilizing
1
CHAPTER 1. INTRODUCTION
interaction, the question now arise as to what modes of interaction should be included
in the auralization process, arriving at methods that further improves the conditions
for subjective assessment. Possibly several modalities of interaction could be used that
engages the listener in more ways than one, resembling the control benefitted from real
environments.
Studies made by Appel and Beerends [5] on assessing the quality of one’s own voice
in telecommunication systems and enclosures suggests that experiencing the feedback
or response in ways of reflected energy, aspects of the speech production are affected.
Adjusting one’s voice when speaking, such as increasing the loudness due to background
noise (i.e. the Lombard effect) or altering the speech rate, implies that conscious or
subconscious mechanisms are set to work. If these mechanisms in turn can be used to
assess the conditions provided by the environment it offers a great tool to assess VAE.
It also allows the subject to use its own voice as reference.
Methods of enabling musicians to play their instrument in VAE have been proposed
[5, 6] as it has been found that musicians are well aware of the environment of which
they play their instrument in. Musicians are able to alter how they play their instrument depending on the characteristics of the environment. Similarly, the pre-existing
knowledge of the use of our own voice can potentially be used as a reference to assess
the room acoustical qualities of VAE. Knowing the sound of our own voice and being
able from a very early age to adapt the way of using it depending on the surroundings,
these functions could possibly be used to evaluate the acoustical qualities of a space.
1.2
Purpose
The purpose of this work is to study how to compile an auralization framework that
enables real-time sound sources within VAE. The scope of this work also includes performing initial studies of the significance of such an interaction with the hypothesis that
the added interaction is beneficial to the possibilities of assessment.
1.3
Problem description
The application should utilize a multi-channel loudspeaker system and a continuous microphone feed in the listening room. The application also needs to handle real-time convolution between pre-calculated impulse responses together with a sound source present
at the time of auralization. There needs to be an ability to use a flexible amount of
impulse responses, i.e. enable modelling of rooms with a wide variety of characteristics.
There can be no excess delays as it will deteriorate the listening experience. The framework for the application should preferably be made so that expansion such as also using
pre-convolved environmental sounds is made possible. Auralization of several real-time
sound sources would allow communication, the system should therefore allow for several
2
CHAPTER 1. INTRODUCTION
users to participate in the virtual acoustic environment. For several participants using
a 3D – surround loudspeaker system is called for. The application is compiled with
adjustable parameters to allow user flexibility. Tyréns have also requested a flexible
solution that allows an effective way of performing simulations outside of treated sound
labs. Only short notes on how this could be done have been included in the scope of this
work. These are given at the end of this report.
1.4
Limitations
The application compiled in this work is restricted to one interactive component making
it possible to only study the effects of this. However, the framework is designed in a
platform that allows for further development. As only the effects of real-time sound
sources are studied, fixed source and receiver positions are used, which also decreases
complexity. Utilizing pre-calculated impulse responses will substantially decrease the
computational load so implementation could be done using a personal working station.
Pre-convolved sound sources are only used in a comparative context during listening
tests. The ability to use such together with the real-time sound sources is not included
at this point in the auralization framework. Performing auralizations in regular rooms,
without sufficient absorption is a tedious task as the acoustics of the listening room will
blend with the simulated space. In this work short notes are given as on what to avoid
to prevent this from happening. The aim has been to compile an auralization framework
that is suitable for presenting sound fields in the acoustical design process. Necessary
approximations concerning the reproduction and simulation of VAE has been undertaken
when necessary.
3
CHAPTER 2
Auralization
2.1
The concept of auralization
Auralization is a way of re-creating or simulating the experience of an acoustic environment and thus enable subjective evaluation of sound fields. With auralizations, possible
health effects, subjective preference and environment suitability can be assessed. In practice auralization is made by modelling or measuring a room’s impulse response, RIR.
The impulse response includes information on how sound is transmitted from a source
to a receiver position in a physical space. The transmission path can later be convolved
with audio and thus the transmission can be simulated. The RIR contains information
on the transmission of a sound that is emitted in space. The sound will be absorbed,
scattered or reflected interacting with the different physical objects or restraints of an
environment.
2.2
Auralization in the acoustical design process
Since auralizations are a way of making the transmission path audible of sound emitted in an environment by a sound source to a receiver, it is a great tool to use in
the acoustical design process. Auralization enables direct comparison with and without
measures of control. The theoretical nature of traditional acoustics has made the possibilities and importance of acoustic design somewhat inaccessible to people outside of the
acoustics-community. Using auralizations, immediate access can be given to experiencing possibilities of an adequate acoustical design. As Furlong et al. have discussed [4]
simple solutions need to be applied, enabling an insight to the acoustical design process,
allowing this to be considered at an early designing stage.
4
CHAPTER 3
Room Acoustics
3.1
Room acoustical parameters
Sound pressure level, Lp,rec at the receiving point will consist of direct sound from the
sound source as well as reflected sound from the enclosure. The resulting sound pressure
level, Lp,rec , is dependent on the sound power level of the sound source, directivity of
the sound source as well as total absorption area of the enclosure.
Lp,rec = Lw + 10 log[(Q(θ,ϕ))/(4πr2 ) + 4/(A0 )]
(3.1)
A room impulse response, RIR, gives time-based information on the transmission of
sound from a source to a receiver position. The initial time delay gap, ITDG, seen in
the RIR, is a measure of the time between the first incoming sound at t0 , and the first
reflection, occurring at time t1 , depicted in figure 3.1. If this gap is long there is a risk
of echoes occurring.
Figure 3.1: Example of an impulse response showing the direct sound, early reflections
and reverberation tail
Echoes occur when reflections of a certain sound pressure level arrive at the listener
with a sufficient time delay to the direct sound. For speech applications echoes can be
5
CHAPTER 3. ROOM ACOUSTICS
detrimental as the talker is disrupted by retardant information. The nature of speech
makes us more sensitive to echoes occurring. Our sensitivity to echoes is dependent on
speech rate, angle of reflection incidence as well as reflection ratio to direct sound concerning sound pressure level and time delay. Tests conducted by Haas, figure 3.2, show
that with an equal sound pressure level, the annoyance or the detection of an echo was
reported by 50 % of the test subjects at a delay time of about 60 ms when subjected
to two sounds with the same incidence angle. From these tests one can also see indications that around 50 ms, the delay of a reflection with a -10 dB lower sound pressure
level to the direct sound would start being detectable, indicating a threshold for echo
detection. Many spatially dependent parameters utilize the time threshold of 50 ms to
distinguish early sound information from late. Haas tests indicate that a 50 ms time difference starts to have importance when the sound pressure level difference is about 10 dB.
Figure 3.2: Results from Haas measurements of echo annoyance between two signals.
Results show dependence on sound pressure level difference as well as with different time
delays. Both with a 0 degree angle of incidence [7]
Reverberation time, or reverberance, often abbreviated RT60 measures the time from
steady state sound pressure level in diffuse field to the time it decreases by 60 dB as the
sound source is switched off. The reverberation time can be estimated using Sabine’s
formula:
RT60 = 0.161 AV0 ,
(3.2)
Where V is the volume of the enclosure and A’ is the total absorption area. Reverberation time gives a sense of room size and is one of the most fundamental room acoustical
parameters for subjective perception [8].
Coloration of the sound field is unwanted changes in the frequency spectrum either
caused by non-linearity’s in the recording or reproduction system or by room reflections.
6
CHAPTER 3. ROOM ACOUSTICS
Extensive coloration causes deterioration of sound quality.
3.2
Retrieving a RIR
RIR is dependent on source and receiver positions, source characteristics as well as geometry of the enclosure and materialistic properties of spatial boundaries. The RIR
needs to be adapted to the chosen reproduction method. That is, for binaural reproduction two channels are produced corresponding to the right and left ear signals when
using headphones utilizing Head-Related Transfer Functions, HRTF. For multichannel
loudspeaker set-ups, the 3D sound field reproduced by the loudspeakers needs to be
encoded. Most commonly the ambisonics technique is utilized resulting in a 4-channel
RIR covering the coordinates of the physical space.
3.2.1
Measuring a RIR
Different methods for measuring RIRs are available such as the Maximum Length Sequence, MLS, the Inverse Repeated Sequence, IRS, Time-Stretched Pulses or sine sweep.
For a multi-channel loudspeaker reproduction the RIR should be measured using an
Omni-sound source combined with a microphone containing multiple capsules for incidence angle. For binaural reproduction a binaural dummy-head can be used. When
measuring RIRs it is important to keep the electronic SNR low. All combinations of the
source and receiver positions need to be recorded separately.
3.2.2
Modelling a RIR
Modelling and calculating a room impulse response, the geometry and material data
must be defined. Calculations usually include image-source modelling of the early order
reflections combined with calculation methods such as ray-tracing for the later order
reflections, including the reverberation tail. Room boundaries need to be defined as well
as surface and material properties. Absorption and scattering coefficients are set for
each surface. The sound source is defined with sound power level, directivity properties,
aim and position. The receiver is defined with position and head direction. The receiver
characteristics also needs to be specified from the reproduction method chosen to derive
a suitable RIR.
7
CHAPTER 4
Methods for Performing
Auralizations
4.1
Sound source
For auralization purposes, audio needs to be convolved with the impulse response. The
sound source needs to be free of any room influence and should preferably be recorded
in an anechoic room or in a highly damped environment.
4.2
Listening room
For loudspeaker reproduction the listening room needs to be free of any distinct room
influence. The total absorption should be high in a broad range of frequencies. Early
reflections from the listening room should be avoided and the reverberation time should
be kept sufficiently shorter than that of the simulated environment’s. Essentially the
listening room sets a limit for which rooms can be simulated.
4.3
Reproduction
Most commonly a binaural setup or multiple loudspeakers are used for reproduction of
acoustic simulations. With headphones the listening room has less impact on the simulation but the binaural signals need HRTF-filtering. When using a loudspeaker system
there needs to be sufficient loudspeakers so that the 3D-acoustical cues are reproduced
at an appropriate detailed level. The 3D-image needs to be kept intact. Most commonly
ambisonics or in some cases wave-field synthesis is used to decode the sound field when
utilizing loudspeakers.
8
CHAPTER 4. METHODS FOR PERFORMING AURALIZATIONS
4.4
Interactive auralization
For a real-time interactive auralization application proposed in this thesis the source and
receiver would be one and the same. Therefor the source and receiver distance should
theoretically be the distance of a person’s mouth to his or her ears. The reproduced
sound field should not contain the direct sound of the talker as this would be present at
the time of auralization. As the source signal will be generated at the time of auralization, the direct source signal of the RIR, shown in figure 4.1, should be edited out prior
to auralization.
A microphone needs to be added within the listening room to retrieve the source signal.
This signal needs to be continuously feed to the convolution algorithm as the convolution
needs to occur in real-time. The source signal is convolved with the edited RIR containing
only reflected energy of the simulated environment. The calculated sound pressure level
difference between direct energy and reflected sound energy needs to be kept intact at
the point of the receiver in the listening room. Therefore the system’s reproduction
level needs to be calibrated so that the corresponding sound pressure level difference
is achieved. As the sound power level of the source is changing during the course of
auralization this ratio needs to be intact and the reproduction level a consequence of the
dynamic change of the sound source power level.
The ratio between direct sound and reflected sound would be varying between different
RIR’s, thus the system needs to be calibrated for each RIR. If normalizing the RIR’s
too each other, the same reproduction level could theoretically be used. Modelling the
source and receiver at such close distance as the distance between mouth and ear, approximately 0.10-0.15 m, can create problems. An average standing height of 1.7 m
means the travelling distance is about 34 times longer for one of the earlier reflection to
reach the receiver than the direct sound. A sufficient dynamical range is needed to avoid
electronic or digital noise. To model the heads effect on the direct sound transmission
is also a tedious task and approximations are called for.
Excess time delays should be suppressed so that no audible echo, unnatural to the
Figure 4.1: Illustration, editing of RIR.
9
CHAPTER 4. METHODS FOR PERFORMING AURALIZATIONS
simulated environment occurs. The time of processing needs to be kept as short as possible.The time it takes from the sound being produced by the talker, runned through
the application and reproduced at the listener’s ear should not exceed the ITDG of the
RIR beyond the limit of audible delay, that is 20-30 ms. [5]
10
CHAPTER 5
The Acoustic Feedback Problem
and Measures of Control
5.1
Acoustic feedback
Acoustic feedback is a problem one usually has to address when dealing with electroacoustic systems. Having chosen to use a multichannel loudspeaker system coupled with
a live microphone placed within the same enclosure, this is likely to occur as the loudspeaker signal can re-enter the system through the microphone. Transmission through
reflection from enclosure boundaries makes this even more likely to occur. The acoustic
feedback phenomenon can manifest through single narrow-band oscillations giving rise
to self-oscillation or a so-called howling effect. This is referred to acoustical feedback.
When a broader band of the signal re-enters the microphone one instead talk about
acoustic echo. Depending on the nature of the problem, different measures of control
are necessary to adapt. With a single-channel case there is a forward- and feedback
path (figure 5.1-5.2). The forward path, G(w), will be defined as the electro-acoustical
path from the microphone to the loudspeaker, transmitted through the enclosure to the
listener. Here the microphone signal undergoes processing determined by the user, is
amplified with a frequency independent gain factor q, and sent to the loudspeaker. The
feedback path, F(w), connects the loudspeaker output to the microphone, creating a
closed-loop system [9].
11
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
Figure 5.1: Schematic of the acoustical feedback problem showing the forward and feedback
path of a single-channel case.
Figure 5.2: The forward and feedback path
This being a highly simplified case, not dealing with the transfer functions of the electronic system components and also not the specific nature of room boundary reflections
and their impact on the transmission of the closed loop system. It is also a highly simplified form of describing the transfer functions of the enclosure, that is HR(w) and HM (w) .
5.1.1
The acoustical path – predicting the acoustic feedback problem
The acoustic feedback problem is hard to predict as it is a consequence of many factors.
The transfer paths within the enclosure are subjected to change from people’s movement
as well as air conditions. This makes it hard to foresee and difficult to predict. Expanding
the problem to a multi-transmissional case it gets even more complex to predict. Several
ways of explaining and dealing with acoustic feedback and acoustic echo issues have been
proposed, one of them being the Hänsler and Schmidt’s LEM-model [10] including the
effects of the enclosure.
The Loudspeaker- enclosure – microphone system, (LEM), take the positions of the
different elements within in the enclosure in to account. If the main loudspeaker transmission lobes are directed away from the microphone, the LEM system mainly connects
12
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
through reflections from room boundaries, or off subjects and objects in the room, so
called indirect coupling. Direct coupling occurs when the loudspeaker signal is directly
transmitted to the microphone. Studying the single-channel case in figure 5.1, the input
spectrum will be that produced by the talker at the point of the microphone, X(w). The
signal is sent through the electro-acoustic forward path, G(w), with processing and amplification by a factor q. Two transfer paths are of interest, from the loudspeaker system
to the receiver position and to the microphone,HM . The resulting signal spectrum at
the receiver, X’(w), is determined using:
(X 0
)
X 0 (w) = qHR(w) [X(w) + HM (w) (H (w) ) ],
(5.1)
R(w)
Where q is the gain factor, HR(w) is the transfer function between loudspeaker and
receiver and HM (w) is the transfer function between the loudspeaker and microphone.
Equation 5.2 gives us the resulting transfer function, Htot(w) of the entire system:
Htot(w) =
Out
In
=
0
X(w)
X(w)
qHR(w)
(1−qHM (w) )
=
= qHR(w)
P∞
n
(n=0) [qHM (w) ]
The factor qHM (w) is characteristic for the amount of feedback and is called the openloop gain factor. This factor dominates the difference between the input spectrum X(w)
0 . For indirect-coupled acoustic feedback,
and the resulting spectrum at the listener X(w)
Schroeder explained that the usual behaviour of the feedback is peaks at the favoured
frequencies of about 10 dB higher than the average sound pressure level, spaced about
10 Hz apart. [12]
5.1.2
Criteria for system stability
To ensure a stable system one can use Nyqvist’s criteria which say that if for an angular
frequency of w, the following statements are true:
(
| (qHM (w) |≥ 1
6
qHM (w) = n2π
, n = 1,2,3.....(5.3)
the system is likely to become unstable and audible ringing will probably occur.
5.1.3
Predicting maximum stable gain from reverberation time and
system bandwidth
In cases where G(w) has a relatively smooth magnitude response, and the Schroeder
condition is fulfilled, i.e. the coupling between loudspeaker and microphone is mainly
due to reflections of the enclosure, Schroeder developed a criterion for maximum stable
gain, MSG also called gain before instability (GBI). With a known system bandwidth B
and a reverberation time of RT60 the MSG can be calculated using equation 5.4. [11]
13
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
60
M SG(t) = −10 log[log( BRT
22 )] − 3.8, dB
(5.4)
The MSG states the maximum, i.e. the highest possible amplification before feedback is
a definite problem. However, a margin of safety is important to ensure a stable system
throughout use, especially when possibly having moving subjects in the listening room.
For speech applications, it is recommended to use a margin of about 5 dB to the MSG.
[12]
20 log( qq0 ) ≤ −5 dB
(5.5)
Where:
q0 : Critical value of the gain factor for which the system goes unstable
q: Gain factor used
5.1.4
Acoustic echo control
Acoustic echo control is needed whenever a broad-band portion of the loudspeaker spectrum is transferred back into the microphone of any active system, not fulfilling the
Nyqvist’s criteria. This is usually done using adaptive filtering like the least mean
square, LMS - method. The basic idea is to record the source signal with a control
microphone and use the inverted frequency spectrum of the recorded signal as a filter to
cancel out the unwanted sound. Different echo-cancellation systems have been compiled
for telecommunication conference systems. These are however based upon the sending
and receiving room being two separate.
5.1.5
Acoustic feedback control
If only acoustic feedback control is necessary, there are frequency-cancellation systems
available. Many of them are based upon narrow-band filtering like the notch-filter based
howling suppression, NHS. These can either be automatic, then the system itself senses
frequency components starting to oscillate and cancels these, or they can be manually
controlled where the filters are set either beforehand or during run-time. For the auralization application, an automatic system is favourable. The system needs to sense
oscillating frequencies in a very short time so that audible ringing does not occur. The
system also needs to have a sufficient amount of filters, to handle multiple feedback frequencies, especially when going from a single-channel to the multichannel case.
5.1.6
The multichannel case
Expanding the reproduction system from one single channel to multiple channels, the
problem complexity grows vast dealing with multiple transfer paths, both direct and
indirect as well as multiple transfer functions. However, the NHS-method can still be
14
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
used as long as the problem is restricted to only narrow-band frequency oscillation. Since
loudspeakers are placed at different positions to the microphone, there are possibilities
of more feedback frequencies and additional filters are therefore necessary.
5.2
Modelling and measurement approach
To ensure a stable system, direct or indirect coupling needs to be avoided. Thus the
sound pressure level of the sound source, i.e. the subject, should be high in comparison
to the loudspeaker signals at the point of the microphone. This is in conflict with the
criteria of the auralization method, which states that the sound pressure level ratio between direct sound energy from the subject and reflected sound energy reproduced by
the loudspeakers should remain intact at the point of the sound source (as well as receiver’s position).. With a multichannel loudspeaker set up, the loudspeakers should not
be directive as the aim is to reproduce a full 3D sound field However, for a transportable
auralization set,a directive stereo system could be necessary to use as the reproduction
rooms are likely to not be sufficiently treated. In both cases, narrow pick-up equipment
can be used to prevent feedback from occurring. To ensure a sufficient sound source signal, the microphone should be placed as close to the talker as possible, but still keeping
the microphone’s presence negligible to the talker.
A model of the acoustical feedback problem has been made and validation measurements
performed to study what suppression of acoustic feedback can be achieved for the auralization application. To start off, a single-channel case is studied, later expanded to a
multi-channel case. A model of the acoustical feedback problem, using the single channel
case was made using the CATT-Acoustics software and different directional characteristics of the recording and reproduction system. The results of the models were then
validated through measurements in a semi-anechoic chamber. Finally a multi-channel
case was measured in a treated sound lab, the Sound Design Lab at the at the University
College of Arts, Crafts and Design.
5.2.1
The single-channel case
Measuring in a semi-anechoic chamber, room boundary reflections except from a hard
floor surface can be neglected, studying only the effects of directivity and positioning
of the equipment as well as direct coupling together with a controlled indirect coupling.
The subject is for measurements replaced with an omni-directional loudspeaker referred
to the talker to avoid any confusion. Positions of the loudspeaker and microphone were
chosen so that their main lobes where directed towards the talker and away from each
other, figure 5.3 shows early sketches of this. In configuration 1, the loudspeaker is placed
facing the talker with a 0 degree vertical and horizontal angle and the microphone in
a 45 degree vertical and 0 degree horizontal angle. Two different distances between
15
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
loudspeaker and talker were used, 2 and 3 m. The microphone was placed at distances
0.2, 0.4, 0.8 and 1.6 m from the talker. In configuration 2, the loudspeaker was instead
hanging from the ceiling directly above the talker. The microphone was placed in the
same way as in configuration 1. The derived transfer functions where then compared
directivity and position- wise.
Figure 5.3: Early sketch of the positioning and directivity of equipment used in the model
and for the measurements.
Model of the single-channel case
A model of the semi-anechoic chamber was made using Google Sketchup and calculations
were made using the CATT-Acoustics software. The model is shown in figure5.4 and 5.5.
For reproduction both a line source and an omni-directional source were tested. The
talker and loudspeaker sources were given similar directivity data as the loudspeaker
used for measurements. Directivity data for the loudspeakers used for measurements
were not available so data on similar loudspeakers was in the end used for the model.
The talker source was defined using a human voice directivity as the loudspeaker used
for measurements was not determined at the time of modelling. Source directivity is
depicted in figure 5.6. The sound pressure level of all sound sources were calibrated to
each other at 1 m distance before calculations for comparison reasons.
16
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
Figure 5.4: Configuration 1, loudspeaker placed in front of talker.
Figure 5.5: Configuration 2, Loudspeaker hanging above the talker.
17
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
Figure 5.6: Directivity plots at 1000Hz for the a) talker source and b) line source used in
the model
Different microphone directivities were used, OMNI, Cardioid, Super-Cardioid and Shotgun. The shotgun directivity had to be manually specified using equation 5.6.
F = Z + X cos θ(5.6)
Where X defines the sensitivity on the main axis that is at a 0 degree angle, Z defines the
sensitivity at an angle, θ. For OMNI directivity, X is set to 0 and Z to 1, for Cardioid
directivity, X =1 and Z =0. In these tests, X=0.85 and Z=0.15 were used to define the
shotgun directivity. Calculated transfer functions between the talker and microphone
where compared to the derived transfer functions between loudspeaker and microphone.
Measurements of the single-channel case
The semi-anechoic chamber at Tyréns was used for validation measurements, with the
walls and ceiling of this chamber covered with wedged mineral wool, and the floor surface made of heavy untreated concretes figure 5.7. The same configurations used in the
model were also applied, having the talker source replaced with an OMNI-directional
loudspeaker for comparison purposes.
18
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
Figure 5.7: Schematic of the semi-anechoic chamber. Red arrows indicate the angle and
direction of the microphone.
Again the transfer function between talker and microphone, and loudspeaker and microphone was derived. These were measured one at a time, using a sine sweep as a source
signal and in such an order so that measuring positions stayed the same.
5.2.2
The multi-channel case
Measurements were carried out in the Sound Design Lab where the application is being
implemented. The room is treated with absorbers and diffusors having a very short
reverberation time, around 0.20-0.25 s. A large volume above the ceiling of the room
controls the low frequency modes. In the room a 5.1 channel system is installed utilizing a
total of 9 mid- and high-range loudspeakers, three in the front and sets of three surround
loudspeakers on each side. Four subwoofers are also installed. The room is furnished
with a control desk and an audience seating arranged in the back of the room.
There was two possibilities of installing a microphone in the room one having it suspended from the ceiling at an approximate 2 m distance from the audience seating at
a 45 degree angle (configuration 1), or having it placed on the control desk at a same
angle to the audience area (configuration 2). Configuration 1 would mean placing the
microphone outside of the main horizontal axis of the loudspeaker system but further
away from the sound source. The opposite conditions were the case when testing configuration 2. Since the loudspeaker calibration had to be disconnected, calibration was
made ensuring the same reproduction level at the receiver point. The talker source gain
was set sufficiently high and remained the same for each measurement. Again the transfer functions were measured one at a time.
19
CHAPTER 5. THE ACOUSTIC FEEDBACK PROBLEM AND MEASURES OF
CONTROL
5.3
5.3.1
Analysis of the acoustic feedback problem
The single-channel case
The shotgun directivity showed the worst results, i.e. less differences in sound pressure
level between the loudspeaker and the talker, at the larger tested distances than the other
microphone directivities. This could possibly be due to the back-lobe of highly directive
microphones. However at closer distances to the talker the shotgun directivity showed
an average 10 dB difference over the lower tested frequency spectrum (250-500Hz) and
an average 15 dB difference in the upper tested spectrum (1000-2000Hz) between the
transfer functions, giving the best results between the different microphone directivities. Possibly due to the floor reflection, having the loudspeaker hung from the ceiling
gave less difference in sound pressure level between the different transfer functions. Test
showed that for the single-channel case, a shotgun microphone at close distance to the
talker should be used in combination with a loudspeaker placed in front of the talker.
The different loudspeaker directivities showed smaller differences.
5.3.2
The multi-channel case
Measurements of the multi-channel case showed best results when placing the microphone
close to the sound source within the main axis range of the loudspeakers. Audible ringing
was proven difficult to avoid for this case as multiple narrow band frequency oscillations
occurred frequently. Tests indicate that the microphone should be placed even closer to
the sound source.
20
CHAPTER 6
FFT-based Block Convolution
6.1
Concept of blocked FFT-convolution
Convolution is a method of ’adding’ two signals where an input signal is processed by
a filter kernel. With auralization the room impulse response, RIR, is used to filter the
source signal and so the sound source is ’placed’ at the point defined in the model of
the physical space. There are different ways of performing this convolution and many
algorithms to choose from. For real-time audio processing the computational time and
computational load are of highest importance. Most algorithms today have sufficiently
low time delays and with hardware available, processing is more powerful. However,
with long filter kernels as in the case of RIR’s, the execution scheme of the convolution
needs to be efficient. As can be seen from eq. 6.1, RIR-filters are very long even those
derived from small and damped rooms. Thus the RIR should be divided into segments
before processing. The segments should be processed in parallel to keep the execution
time short.All signal processing of the application together with delays caused by acoustic transmission paths all need to sum up to an execution time below that of audible
delay. As the convolution is one of the most time-consuming efforts have been made to
optimize the execution of this.
Blocksize = t × fs = 0.5 × 44100 = 22050 samples,
(6.1)
Blocked Fast Fourier Transform - convolution, (FFT-convolution), uses the overlap-add
method to divide the convolution filter (i.e. the RIR) into segments or blocks. To be able
to use the FFT-convolution the blocksize needs to correspond to 2N . FFT-convolution is
much more time effective than Discrete Fourier Transform for signals longer than about
40-60 samples. A schematic of the convolution process using the overlap-add method is
shown in figure 6.1.
21
CHAPTER 6. FFT-BASED BLOCK CONVOLUTION
Figure 6.1: Convolution processes of an input signal and a filter using FFT-convolution
and the overlap-add method.
After dividing the filter and input signal into sections, the segments are transformed
with FFT to real and imaginary part frequency spectrums in the frequency domain. In
the frequency domain multiplication occurs which is the equivalent to convolution in the
time domain. The resulting output segments are transformed back into the time domain
using inverse-FFT. Here the output segments are then re-combined into the resulting
output signal.
6.2
Time delay of convolution
Since dealing with long room impulse response filters, the main time consumer in the
application will be the convolution between the microphone feed and the room impulse
response. To ensure that the convolution algorithm chosen is suitable, a measurement
of the time delay caused by this signal processing was measured. A convolution patch in
the Pd-extended library called partconv ∼ , created by Ben Saylor, proved to be sufficent
to use for this purpose. The partconv ∼ uses the FFTW library which contains several algorithms for performing blocked fast Fourier transforms. Depending on available
hardware a suitable algorithm for computing the FFT-convolution is chosen. [13]With
a user defined block-size the convolution process could be optimized to the length of
the impulse response used. The convolution was tested in-software using a normalized
sine sweep of 0.4 s/octave as sound source. By sending the sine sweep from a wav-file
player directly to a wav-recorder as well as through the partconv ∼-object to another
wav-recorder, the time difference between these processes would give the time delay of
the convolution.
22
CHAPTER 6. FFT-BASED BLOCK CONVOLUTION
6.3
Results from time delay measurements
The time delay proved to be correlated to the size of one block. Thus a blocksize of 512
samples and a sampling frequency of 44.1 kHz would result in a time delay of:
512
44100
= 11,6 ms,
(6.2)
which is below the audible limit and also gives some extra headroom for additional
processing delays. One limitation of using this convolution algorithm is that the input
signals can only be divided into maximum 256 partitions i.e. blocks, of 2N size, which
limits the total length of the RIR to about 3 s when using a block size of 512 samples.
This is deemed acceptable for the application at this stage. When using shorter RIRs,
the process could be further optimized by lowering the segment size to 256 samples. This
would result in a much more time efficient convolution process giving only a delay time
of 5.8 ms.
23
CHAPTER 7
Ambisonics
Ambisonics is a method of encoding and decoding a 3D sound field. It allows for great
variability of loudspeaker set-ups. The sound field is recorded in a single point and the
sound field encoded in different channels depending on angle of incidence. Microphones
dedicated to the ambisonics technique are built out of several membranes directed in
a pattern shown in fig 7.1. The microphone utilizes small membranes for point reception. Ambisonics allows for any number of loudspeakers to be used although a sufficient
amount is necessary to reproduce a complete sound field and to provide necessary localization ques. The loudspeaker formation is not fixed although the loudspeaker set-up
should make sure there are no holes in the auditory image. Different orders of ambisonics
can be used depending on the number of input channels. The 1th order ambisonics is
the basic ambisonics format containing 4 channels, the so-called B-format.
7.1
The B-format
The B-format is a 4 channel recording method, where sound is recorded in a single
point. The 4 channels, W,X,Y, Z, contains information depending of angle of incidence,
W being the OMNI input, an average of all incidence angles, X contains information on
sound arriving from left and right, Y the front and back and finally Z which handles
height information. The four channel input, handling different angles of incidence is a
consequence of the microphone membrane placement. From these four channels one can
reproduce an entire sound field. Often sound reproduction in 2D, i.e. only the horizontal
field is used, thus the Z-channel can be excluded. This method many believe still gives an
accurate depiction of the sound field since often humans have a hard time discriminating
sounds in the median plane.
24
CHAPTER 7. AMBISONICS
Figure 7.1: Ambisonics B-format pick-up pattern.
7.1.1
Encoding and decoding ambisonics
Most ambisonics recording microphones first registers the incoming sound field in what
is called the A-format. The recording then needs to be encoded to the B-format. This
is either done by dedicated hardware whilst recording or in software applications. From
the B-format, the recording then needs to be decoded to the reproduction system. One
advantage of Ambisonics is that any number of loudspeakers can be used, however, a
sufficient amount is necessary to be able to reproduce a uniform sound field, and still,
the more loudspeakers used, the more details of the sound field can be reproduced.
The B-format is decoded to the loudspeakers depending on their angular position to a
reference point. The loudspeaker will receive a certain amount of each B-format channel
determined by a scaling matrix compiled during decoding.
25
CHAPTER 8
Application Architecture
The auralization application was implemented using the graphical programming software Pure Data, Pd. The application in its rough stage contains a loading function
of the 4-channel RIR imported as combined wav-files. The RIR together with a continuous signal from the microphone in the listening room are sent to the convolution
function, partconv ∼, of the Pd extended library. An ambisonics decoder is utilized,
also obtained from the Pd extended library compiled by Thomas Musil at the Institut
für Elektronische Musik und Akustik in Graz, Austria. This is used for the distribution
of the convolved signal to the loudspeakers, able to handle higher orders of ambisonics,
producing both two- and three-dimensional sound fields, including the z-channel of the
B-format. Number of loudspeakers as well as relative angle is easily set within the application. Time delay- as well as separate gain-units is applied to each loudspeaker signal,
giving opportunity to calibrate the loudspeaker system in-software if necessary.
26
CHAPTER 8. APPLICATION ARCHITECTURE
Figure 8.1: Application architecture scheme
27
CHAPTER 9
Psychoacoustic Evaluation
9.1
Criteria for sound field evaluation
The intention of the method proposed in this thesis is to study the effects of the added
interaction by having the subject contribute as a sound source within the simulated
space. Tests have been conducted seeking differences in the subject’s response when
exposed to environments utilizing different methods for performing auralization. As a
hypothesis, the added control of the real-time sound sources application, enabling the
utilization of factors such as altered speech rate and the Lombard effect, benefits the
ability to perceptually judge the physical spaces acoustical qualities through the direct
experience of the acoustical response. However, the added activity of the subject could
also be distracting to the experience of the acoustic simulation. Thus at this initial
stage of testing the preservation of fundamental acoustical qualities as well as potential
benefits/restrictions of experiencing this through a direct response compared to a simple
listening experience seems relevant to start with.
The tests have been conducted in a comparative context between having the subject
only listen to sound events occurring during simulation to the one where real-time sound
sources are utilized.The aim of the tests have not been to judge if the subject can perceive
a large hard surfaced room as such but rather to see if the same subject would judge
the same simulated room with the same parameters or perceive it as different using the
different methods of auralization. Perceived size of rooms based on the respective RT60
as well as tonal character was used as a basis for these tests.
9.2
Test approach
The tests were performed using a smaller group of 9 participants in the Sound Design Lab
at Konstfack, (University College of Arts, Crafts and Design). For the tests the installed
multi-channel loudspeaker system was used and a directional microphone was placed on
a 1 m distance from a designated seat where the subjects were asked to sit. During tests
28
CHAPTER 9. PSYCHOACOUSTIC EVALUATION
the room was kept dark to avoid any visual influence on the results. For the study, RIRs
of realistic environments were chosen with varying room geometry but trying to keep
the total absorption area not to any extreme, as reverberation time are dependent on
both factors. Again, usually listening tests using auralization are not conducted with
such a comparative purpose and the aim is not to arrive at an absolute value for these.
Therefor rooms with widely varying size and absorption were used. When evaluating
the results, the rooms where categorized into large, medium sized and small rooms, as a
more detailed description would for these tests be insignificant. Modelled rooms used for
testing where provided by Tyréns AB. Also a measured room was used, the Great Hall
of People’s Palace in London, recorded by students at Centre for Digital Music, Queen
Mary, University of London [14]. The latter proved to have low-frequency artifacts in its
RIR, and the RT60 of this had to be estimated. The respective reverberation time of the
environments are presented in table 9.1 and 9.2. The models used as well as a picture
of the great hall is shown in figure 9.1 and a representative RIR for each environment is
shown in figure 9.2.
Table 9.1: Acoustic parameters for rooms used in listening tests. The RIR of the Great
hall contained low-frequency artifacts and had to be estimated.
Venue
Reverberation time, [s]
Opera Hall
1.4
Lecture Room (small auditorium)
0.4
Great Hall (measured)
2.1*
Open-plan Office
0.4
Canteen/Cafeteria
0.4
Table 9.2: Tested rooms divided into groups depending on the length of the reverberation
time.
Room Size
Reverberation time, [s]
Large SizedRooms
≥ 1.4
Medium Sized Rooms
0.5 − 1.4
Small Sized Rooms
≤ 0.5
29
CHAPTER 9. PSYCHOACOUSTIC EVALUATION
Figure 9.1: a)-e) - Models of the rooms used for evaluation tests.
The test was divided into two parts; the first dealing with the ability to perceive room size
when only utilizing real-time sound sources. For this part rooms with a wider spectrum
of characteristics were used , the opera hall and the great hall with RT60 ≥ 1.4 s and the
office and lecture room with RT60 <0.5 s. The subjects were shown pictures of different
rooms and had for each presented environment choose one of the depicted spaces which
they sensed the closest to what they were hearing.
30
CHAPTER 9. PSYCHOACOUSTIC EVALUATION
Figure 9.2: a)-e) Early parts of the impulse responses used for listening test including the
direct sound for all cases. Al arel normalized with a direct sound equal to one except for
the measured Great hall. Shown time lengths as well as amplitudes have been adjusted for
visualization purposes.
The second part of the listening test was designed to evaluate the interactive element
compared to using pre-recorded wav-files. Again four environments were used, the cafeteria, the smaller auditorium, the opera hall and the small open office space. Minimizing
the time between presented spaces, the first section of testing was made using only
pre-recorded sound sources switching to using real-time auralization in a latter section.
When using the pre-convolved sound sources, the source position was determined arbitrarily in the environment. For each environment the user had to answer if the room
volume was perceived as big or small and if the sounds heard within the room where
perceived as either hard or soft. On a running scale from 1-5 they also had to judge
the perceived naturalness of the environment, 1 being completely unnatural to 5, similar
to a real environment experience. Again these factors can be hard to judge given the
situation although for comparative purposes these where used but with some caution.
Finally they had to correlate the perceived environment to different types of venues and
rate the correspondence to their judgment of different environments from 1-5 in fixed
steps. A number of different environments where written down and the subject had
to place the simulated environment on the scale for each. As spaces used for different
31
CHAPTER 9. PSYCHOACOUSTIC EVALUATION
purposes can vary vastly in size and geometry, indication where given as to what size the
exampled room had, for example: a smaller lecture room or a larger exhibition hall. The
results of the two different parts of the test were later compared. Once again dividing
the different environments into groups of large, small and medium sized, judging the
correlation between exampled rooms and the simulated one’s.
9.3
9.3.1
Results of the evaluation test
Part 1 - Evaluating room size using only real-time synthesis
From the first section of the test where the subjects had to correlate depicted rooms with
what they were hearing, it seemed easier to separate the larger rooms (the opera and
great hall) from the medium-sized, and small sized rooms. Greater confusion seemed to
occur when discriminating between the simulated small rooms with the depicted medium
sized, where the answers had around a 50% spread. Results are presented in table 9.3.
9.3.2
Part 2 - Comparing real-time and pre-convolved sound sources
As can be seen from the results presented in table 9.4, the subjects had no problem
distinguishing room size of the opera hall either with pre-convolved sound sources or
with the real-time auralization.
Table 9.3: Test results part 1, perception of room size using only real-time auralization.
RT60
Approx.
Mean
volume, [m3 ]
absorption*
Perceived Size
Large
Medium
Small
Great Hall
2.1
≤ 15000
-
9
0
0
Opera Hall
1.4
15000
35%
7
0
2
Open-plan office
0.4
450
43%
0
4
5
Lecture room
0.4
400
35%
0
5
4
32
CHAPTER 9. PSYCHOACOUSTIC EVALUATION
Table 9.4: Results from subjective listening test section 2, part 2.
RT60
Approx.
Mean
Perceived Size
volume,
absorption*
Pre-convolved
[m3 ]
sound source
Real-time
sound source
Large
Small
Large
Small
Opera Hall
1.4
15000
35%
8
1
8
1
Canteen
0.4
3600
45%
4
5
3
6
Open-plan office
0.4
450
43%
1
8
1
8
Lecture room
0.4
400
35%
1
8
2
7
The canteen or cafeteria, although large in volume had unfortunately a very high mean
absorption, resulting in the short reverberation time. The answered room size varied,
more answered that it was smaller than large using real-time convolution. Tendencies
where shown to judge the environments as softer using the real-time convolution, figure
9.3.
Figure 9.3: Judged tonal character, comparison between using the pre-convolved sound
source and the real-time sound source. Results shown out of a 100% of the subjects i.e. 67%
means amount of subjects judging it as soft whilst 33% judged it as hard.
In other results fewer differences were seen. Naturalness or authenticity ratings where
overall deemed higher using the real-time convolution than when using pre-convolved
ones. One can however not exclude that the subject had ideas about the aim of these
tests, which might have had an effect on the results. When having to correlate the
perceived room size to different exampled environments, differences in answers could be
seen, however no distinct tendencies could be distinguished.
33
CHAPTER 10
Discussion
10.1
Design criteria for the auralization framework
One of the most important factors to consider when designing a real-time audio application for perception-based analysis is the time delay caused by signal processing. In
this case the only time delay should be that of the initial time delay gap, determined
by the RIR used. Acceptable added time delays should be below that of audibility. In
this case, Haas criteria for audible echo is not sufficient as ‘ lagging’ effects could be
heard already at lesser time delays. Therefor the added time due to processing should
preferably be less than 20-30 ms. If also removing the initial time delay gap from the
RIR one can add to this time buffer. Adjustable block-size of the convolution algorithm
can make auralization of smaller rooms more time efficient, reducing the block-size and
the inherent time delay.
Another is to prepare the RIR used for simulation. As the subject will act as sound
source and receiver the positioning of these should correspond to the distance of the
subject’s mouth and ear. Modelling the RIR this way creates problems concerning large
dynamical differences between direct and reverberant sound. Approximations with this
distance are deemed necessary and from user test results acceptable. The auralization
framework is prepared to also use the Z-channel of the B-format to reproduce a threedimensional sound field giving height information. At the time of writing, loudspeakers
in the ceiling of the sound design lab are being installed making it possible to reproduce a
hemisphere sound field in the future. Measurements suggests however that more feedback
oscillations would then occur requiring a closer distance between the microphone and
subject as well as additional filtering.
34
CHAPTER 10. DISCUSSION
10.2
Reproduction level and the acoustic feedback
problem
As the problem with acoustic feedback is dependent on needed amplification of the
loudspeaker system, stated by the open-loop gain factor, qHM (w) , this needs to be known.
The needed amplification is in turn dependent on the SPL difference between direct
sound and reflected sound, determined by the RIR used for simulation, and will thereby
differ depending on simulated environment. If accepting the approximations relating to
the added distance between source and receiver, and determining that this relationship
should hold at the position of the talker as well, this will state which reproduction
level is necessary for the loudspeaker system, and thus the needed amplification. Still
this needed amplification is dependent on which RIR is used and how high the sound
pressure level of the talker will be, making it difficult to predict to what extent acoustic
feedback will occur and also to use above stated criterion and estimations. However,
one should still strive to make the relationship between talker sound pressure level and
that of the loudspeaker system as substantial as possible at the point of the microphone.
From measurement results the microphone should have a narrow pick-up range and
be placed as close to the talker as possible without making its presence to apparent.
During measurements it was clear that multiple narrow band oscillations occurred when
increasing amplification. This in turn implies that acoustic feedback control needs to
be applied to ensure a stable system, handling different SPL differences of the RIR
and source signal levels. As the problem seems to be that of acoustic feedback, not
acoustic echo, a parametric-filter based equalizer can be used. Adaptive filtering, like
the LMS method could also be used, but configuring the reference signal recording, used
to produce the filter, is a tedious task when the signal needs to be recorded in the same
enclosure as the wanted signal.
10.3
The subjective evaluation
Even though perceiving room size being one of the fundamental qualities of room perception, it is even for the acoustician hard to predict exact reverberation time, even
dividing the different venues into large groups. Giving the subjects clues in form of depicted venues can be both helpful and detrimental for the evaluation. However, results
from subjective evaluation test show differences in the perception of environments used
for simulation differ using real-time sound sources and pre-convolved ones, suggesting
further studies be made on this topic. A part of the easier recognition of large halls
might possible be due to the deteriorated speaking conditions, another possible reason
is also the vast difference between the listening rooms acoustical characteristics and the
simulated one’s. This however would imply that the subject has coupled reverberance
with room size. As this was a fact using both pre-convolved sound sources and the
real-time application, no discrepancy could be made between the methods. However it
indicates that the perception of the sound field is not diminished by the added activity
of the user. A possible reason for the small room confusion might be due to the listening
35
CHAPTER 10. DISCUSSION
room and the simulated room having little difference in reverberation time. Subject’s
response to the tonal character of the environment, judging the environments in general
as softer using the real-time auralization, might be due to the added control. It can
however not be excluded that the characteristics of the sound source used for the preconvolved auralization could have participated to these results. The authenticity ratings
are hard to judge as different meanings could be put to the expression. The familiarity
of one’s voice could have a possible effect to the higher ratings when using real-time
convolution. Clear indication of appreciation could although be confirmed as a large
majority of the subjects responded they preferred hearing the simulated environments
with the possibility to contribute to the environment, and being a part of it.
36
CHAPTER 11
Conclusion
11.1
the Auralization framework
An auralization method utilizing real-time sound sources has in this project been implemented and evaluated. The application, developed and compiled in the open source
software Pd (Pure Data) is at this point able to auralize rooms with a room impulse
response (RIR) length of up to 3 s, without any noticeable delays. At present, only static
source and receiver positions are used, calculating and preparing the RIR offline. The
application has been implemented in a controlled listening environment using an ordinary personal computer, a narrow-pick up microphone and a multi-channel loudspeaker
system as well as feedback cancellation by a parametric-filter equalizer. Approximated
RIRs have been necessary to use, calculating the RIR using a larger distance between
source and receiver than that between the subject’s mouth and ears (0.5 m instead of
0.1 m). From evaluation tests, a large majority of the subjects responded that they
appreciated experiencing the simulated sound fields this way. This combined with high
ratings of authenticity suggests that further investigations should be performed.
11.2
Choice of equipment and acoustic feedback problem
The microphone used should have a narrow shotgun-like directivity, and placed close to
the talker (i.e. the subject). As shotgun-directivity microphones usually have both a
distinct main-lobe and back-lobe at both ends of the microphone, it should be placed
so that the main-lobe of the loudspeakers are in line with the microphone’s sides. Amplification of the loudspeaker reproduction level is determined by the RIR and level of
the sound source. As these varies for different environments, the amount of feedback is
hard to determine although measurements show multiple feedback oscillations for most
RIRs used. This requires measures of control in form of a parametric equalizer, ensuring
stability by detecting oscillating frequencies rapidly.
37
CHAPTER 11. CONCLUSION
11.3
Short notes on a transportable auralization set-up
As for compiling a transportable auralization set-up it can be a tedious task since many
factors will affect the result. The room will still have to be sufficiently damped, with a
lower RT60 than that of the modelled environment. Strong early reflections should be
avoided and the background noise level should be kept low. As for reproduction system,
possibly a stereo-channel loudspeaker system could be used, however this will not create
the same feel of spaciousness as the multi-channel case. Usually these are implemented
using cross-talk cancellation, if although the room gives rise to reflections the cross-talk
cancellation will fail. For a transportable solution preferably headphone-reproduction
should be used, implementing correct head related transfer function filtering. If the
source signal is recorded at a close enough distance, there should be minimal room influence. Calibration of correct reproduction level will be difficult to achieve, and approximations might have to be made. If using a loudspeaker set-up possibly both loudspeaker
and microphone should be directive, minimizing the effects of the environment. Measurements of the single-channel case, suggests that the loudspeaker should preferably be
placed in front of the subject. The microphone used should have a narrow pick-up and
should be placed at a 45 degree angle to the subject, at close distance.
38
CHAPTER 12
Future Work
A more extensive evaluation should be made as to what effects the mode of interaction
interaction has on the subjective perception of the sound field. Further studies could
be made as to what way the inherent speech alteration when subjected to different
environments affect the room assessment. In this study only the real-time sound source
was present in the simulated environment. Added environmental sounds as well as
possible other sound sources might add to the experience, making the sound field more
complete and natural. Using several modalities of interaction would also be of interest,
using for example the real-time sound sources combined with possibility to move around
in the environment. Since different interactive methods have shown positive results,
combining several modalities of interaction might take us closer to the goal of reproducing
realistic and immersive sound fields.
39
Bibliography
[1] M. Kleiner, B.-I. Dalenbäck, P. Svensson, Auralization-an overview, J. Audio Eng.
Soc 41 (11) (1993) 861–875.
URL http://www.aes.org/e-lib/browse.cfm?elib=6976
[2] L. Savioja, J. Huopaniemi, T. Lokki, R. Väänänen, Creating interactive virtual
acoustic environments, J. Audio Eng. Soc 47 (9) (1999) 675–705.
URL http://www.aes.org/e-lib/browse.cfm?elib=12095
[3] T. Lentz, D. Schröder, M. Vorländer, I. Assenmacher, Virtual reality system with
integrated sound field simulation and reproduction, EURASIP J. Appl. Signal Process. 2007 (1) (2007) 187–187.
URL http://dx.doi.org/10.1155/2007/70540
[4] D. J. Furlong, M. P. Doyle, E. Kelly, C. J. MacCabe, R. MacLaverty, Interactive
virtual acoustics synthesis system for architectural acoustics design, in: Audio Engineering Society Convention 93, 1992.
URL http://www.aes.org/e-lib/browse.cfm?elib=6691
[5] R. Appel, J. G. Beerends, On the quality of hearing one’s own voice, J. Audio Eng.
Soc 50 (4) (2002) 237–248.
URL http://www.aes.org/e-lib/browse.cfm?elib=11084
[6] K. Ueno, H. Tachibana, Experimental study on the evaluation of stage acoustics
by musicians using a 6-channel sound simulation system, Acoustical Science and
Technology 24 (3) (2003) 130–138.
[7] M. Kleiner, Acoustics and Audio Technology, Acoustics: Information and Communication, J. Ross Pub., 2011.
[8] B. Shinn-Cunningham, Acoustics and perception of sound in everyday environments, in: Proceedings of the 3rd International Workshop on Spatial Media, AisuWakamatsu, Japan, 2003.
40
BIBLIOGRAPHY
[9] T. van Watershoot, M. Monnen, Fifty years of acoustic feedback control: State of
the art and future challenges, in: Proceedings of the IEEE, Vol. 93, 2011, pp. 288–
327.
URL http://dx.doi.org/10.1109/JPROC.2010.2090998
[10] E. Hansler, G. Schmidt, Acoustic Echo and Noise Control: A Practical Approach,
Wiley-Interscience, 2004.
[11] P. Svensson, On reverberation enhancement in auditoria (1994).
[12] H. Kuttruff, Acoustics: An Introduction.
[13] M. Frigo, S. G. Johnson, FFTW: An adaptive software architecture for the FFT, in:
Proc. 1998 IEEE Intl. Conf. Acoustics Speech and Signal Processing, Vol. 3, IEEE,
1998, pp. 1381–1384.
[14] R. Stewart, M. Sandler, Database of omnidirectional and b-format impulse responses, in: Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
(ICASSP 2010), Dallas, Texas, 2010.
41
BIBLIOGRAPHY
42
APPENDIX A
Measurement Set-Up, Feedback
Measurements in the
Semi-Anechoic Chamber
A.1
Equipment data
Table A.1: Equipment used for acoustic feedback measurements - semianechoic chamber
Type
Model
Manufacturer
Talker source & RIR reproduction loudspeaker
Mask 6T-BL
Apart- Audio
Line loudspeaker, RIR reproduction loudspeaker
COLW-101
Apart- Audio
Loudspeaker amplification
PA2240BP
Apart-Audio
Capsules &
CK31-33, CK47
AKG
Preamplifier
HM1000
AKG
Capsule &
Type 4189
Brüel & Kjaer
DSP
Soundweb BLU-101
BSS Audio
Measurement software
Room Capture
Wave capture
Data treatment
Matlab
Mathworks
Microphones:
43
APPENDIX A. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SEMI-ANECHOIC CHAMBER
A.2
Microphone directivity data
Figure A.1: Omni-directional capsule, CK 32
Figure A.2: Cardioid capsule, CK 31
Figure A.3: Hyper-cardioid capsule, CK 33
44
APPENDIX A. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SEMI-ANECHOIC CHAMBER
Figure A.4: Hyper-cardioid (shotgun) capsule, CK 47
A.3
Measurement set-up
Source signal: Sine Sweep
No. of averages: 4
Frequency range: 20-20000Hz
Time windowing: 200ms
45
APPENDIX A. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SEMI-ANECHOIC CHAMBER
Figure A.5: Cases, measurement configuration
A.4
Measurement results
Exampled results from measurements
46
APPENDIX A. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SEMI-ANECHOIC CHAMBER
Figure A.6: Case 1: Transfer functions between talker source loudspeaker and microphone
compared to transfer function between reproduction loudspeaker(omni-directional) and microphone. A, using the CK31 capsule, B, using the CK32 capsule, C, using the CK33 capsule
and D, using the CK47 capsule Microphone is placed 0.4m from the talker source.
47
APPENDIX A. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SEMI-ANECHOIC CHAMBER
Figure A.7: Case 3: Transfer functions between talker source loudspeaker and microphone
compared to transfer function between reproduction loudspeaker (omni-directional) and microphone. A, using the CK31 capsule, B, using the CK32 capsule, C, using the CK33 capsule
and D, using the CK47 capsule Microphone is placed 0.4m from the talker source.
48
APPENDIX B
Measurement Set-Up, Feedback
Measurements in the Sound
Design Lab
B.1
Equipment data
Table B.1: Equipment used for acoustic feedback measurements -multichannel reproduction
Type
Model
Manufacturer
Simulated talker source
Mask 6T-BL
Apart- Audio
Loudspeaker amplification
PA2240BP
Apart-Audio
RIR Reproduction
Pre-installed multichannel system
Ino-Audio
Capsules &
CK31-33, CK47
AKG
Preamplifier
HM1000
AKG
Capsule &
Type 4189
Brüel & Kjaer
Measurement software
Room Capture
Wave capture
Data treatment
Matlab
Mathworks
Microphones:
49
APPENDIX B. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SOUND DESIGN LAB
Figure B.1: The sound design lab at Konstfack.
50
APPENDIX B. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SOUND DESIGN LAB
Figure B.2: Loudspeaker set up at the Sound Design Lab. Left surround and right surround
signal are fed to clusters of three loudspeakers.
51
APPENDIX B. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SOUND DESIGN LAB
B.2
Measurement results Konstfack, multichannel loudspeaker set-up
Figure B.3: Omni-directional capsule, CK 32, comparison hanging from the ceiling and
placed on table
Figure B.4: Cardioid-directional capsule, CK 31, comparison hanging from the ceiling and
placed on table
52
APPENDIX B. MEASUREMENT SET-UP, FEEDBACK MEASUREMENTS IN
THE SOUND DESIGN LAB
Figure B.5: Hyper-cardioid-directional capsule, CK 33, comparison hanging from the
ceiling and placed on table
Figure B.6: Shotgun-directional capsule, CK 47, comparison hanging from the ceiling and
placed on table
53
APPENDIX C
Listening Test Questionnaire
54
APPENDIX C. LISTENING TEST QUESTIONNAIRE
55
APPENDIX C. LISTENING TEST QUESTIONNAIRE
56
57
APPENDIX D. AURALIZATION SET-UP
APPENDIX D
Auralization Set-Up
Figure D.1: Schematic of the auralization set-up
58
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement