D e p a r t me nto fM e d i aT e c h no l o g y L o c al iz at io n and t rac ing o fe arl y ac o ust icre ﬂe c t io ns S a k a r iT e r v o D O C T O R A L D I S S E R T A T I O N S Aalto University publication series DOCTORAL DISSERTATIONS 143/2011 Localization and tracing of early acoustic reﬂections in enclosures Sakari Tervo Doctoral dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the School of Science for public examination and debate in Auditorium T2 at the Aalto University School of Science (Espoo, Finland) on the 13th of January 2012 at 12 noon. Aalto University School of Science Department of Media Technology Supervisor Professor Lauri Savioja Instructor Adjunct professor Tapio Lokki Preliminary examiners Prof. Angelo Farina, University of Parma, Italy Prof. Dr. ir. Emanuël A.P. Habets, International Audio Laboratories Erlangen, Germany Opponent Associate professor Ramani Duraiswami, University of Maryland, the United States of America Aalto University publication series DOCTORAL DISSERTATIONS 143/2011 © Sakari Tervo ISBN 978-952-60-4437-8 (printed) ISBN 978-952-60-4438-5 (pdf) ISSN-L 1799-4934 ISSN 1799-4934 (printed) ISSN 1799-4942 (pdf) Unigrafia Oy Helsinki 2011 Finland The dissertation can be read at http://lib.tkk.fi/Diss/ Publication orders (printed book): http://lib.tkk.fi/Diss/ Abstract Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Sakari Tervo Name of the doctoral dissertation Localization and tracing of early acoustic reﬂections in enclosures Publisher School of Science Unit Department of Media Technology Series Aalto University publication series DOCTORAL DISSERTATIONS 143/2011 Field of research Acoustics Manuscript submitted 23 August 2011 Date of the defence 13 January 2012 Monograph Manuscript revised 9 November 2011 Language English Article dissertation (summary + original articles) Abstract Objective room acoustic studies are conducted by measuring room impulse responses. The standard techniques include the use of an omni-directional source and, in most cases, one omni-directional microphone. This approach is well deﬁned when measuring the standard room acoustic parameters. Recently, early reﬂections, the ﬁrst arriving sound waves in the room impulse response after the direct sound, have gained attention in research. The spatial location of the early reﬂections, i.e., the location of the image-source, can be used in room acoustic studies, auralization, room geometry inference, and in-situ measurement of acoustic properties of surfaces from room impulse responses. The location, however, cannot be obtained from the standard room impulse response measurement. Therefore, special microphone array techniques have been used for spatial analysis of room impulse responses. This thesis studies the localization of early reﬂections. Firstly, a measurement technique of room impulse responses with directional loudspeakers is proposed. This allows better spatial and temporal separability between the reﬂections than the standard omni-directional loudspeaker. Secondly, the use of microphone array techniques on the localization of early reﬂections is studied. Several techniques used in other localization tasks are transformed and applied for the localization of early reﬂections. In detail, the combination of time of arrival and time difference of arrival is researched. Moreover, interpolation of the time difference of arrival estimation function is proposed. The use of sound intensity vector based localization is also considered. Finally, novel ad-hoc localization techniques for early reﬂections are proposed. Results for theoretical, simulation, and real data experiments are presented. Keywords Early reﬂections, localization, room acoustics ISBN (printed) 978-952-60-4437-8 ISBN (pdf) 978-952-60-4438-5 ISSN-L 1799-4934 Location of publisher Espoo Pages 202 ISSN (printed) 1799-4934 ISSN (pdf) 1799-4942 Location of printing Helsinki Year 2012 The dissertation can be read at http://lib.tkk.ﬁ/Diss/ Tiivistelmä Aalto-yliopisto, PL 11000, 00076 Aalto www.aalto.fi Tekijä Sakari Tervo Väitöskirjan nimi Varhaisten akustisten heijastusten paikannus huoneissa Julkaisija Perustieteiden korkeakoulu Yksikkö Mediatekniikan laitos Sarja Aalto University publication series DOCTORAL DISSERTATIONS 143/2011 Tutkimusala Akustiikka Käsikirjoituksen pvm 23.08.2011 Väitöspäivä 13.01.2012 Monografia Korjatun käsikirjoituksen pvm 09.11.2011 Kieli Englanti Yhdistelmäväitöskirja (yhteenveto-osa + erillisartikkelit) Tiivistelmä Huoneakustiset tutkimukset suoritetaan mittaamalla huoneimpulssivasteita. Standardin mukaan lähteen tulisi olla ympärisäteilevä. Suurimmassa osassa mittauksista käytetään yhtä mikrofonia. Tämä menetelmä soveltuu hyvin standardin mukaisten huoneakustisten parametrien mittaukseen. Viimeaikoina tarve ymmärtää varhaisten heijastusten vaikutusta akustiikassa on lisääntynyt. Varhaisten heijastusten tiedetään vaikuttavan havaittuun tilaääneen merkittävästi, etenkin konserttisalissa. Ensimmäinen askel varhaisten heijastusten tutkimisessa on niiden paikantaminen. Tarkasti tiedetty paikka auttaa heijastuksen piirteiden tarkastelussa. Heijastuksen paikkaa ei voi kuitenkaan arvioida tai mitata standardin mukaisista huoneimpulssivastemittauksista. Siksi heijastuksen paikannukseen ja huoneakustiikan tila-analyysiin on käytetty erilaisia monimikrofonimenetelmiä. Tässä väitöskirjassa tutkitaan heijastusten paikannusta erilaisilla signaalinkäsittely-, monimikrofoni- ja kaiutintekniikoilla. Ensiksi suuntaavia kaiuttimia ehdotetaan käytettäväksi ympärisäteilevän kaiuttimen sijasta sillä niillä voidaan saavuttaa parempi tilallinen ja ajallinen erottelu heijastuksien välille kuin ympärisäteilevällä kaiuttimella. Toiseksi erilaisia monimikrofonitekniikoita tutkitaan ja sovelletaan heijastusten paikannukseen. Näitä paikannustekniikoita on aikaisemmin käytetty erilaisissa paikannustehtävissä monilla eri tieteenaloilla. Erityisesti tutkitaan aikaviiveiden ja aikaviiveerojen yhdistämistä paikan arvioinnissa. Lisäksi ehdotetaan interpolaatio-menetelmää parantamaan paikannuksen tarkkuutta. Ääni-intensiteetin käyttöä heijastusten paikannuksessa tutkitaan myös. Lopuksi ehdotetaan erityismenetelmiä juuri akustisten heijastusten paikannukseen. Avainsanat Heijastus, paikannus, huoneakustiikka ISBN (painettu) 978-952-60-4437-8 ISBN (pdf) 978-952-60-4438-5 ISSN-L 1799-4934 ISSN (painettu) 1799-4934 ISSN (pdf) 1799-4942 Julkaisupaikka Espoo Painopaikka Helsinki Vuosi 2012 Sivumäärä 202 Luettavissa verkossa osoitteessa http://lib.tkk.ﬁ/Diss/ Preface The research work for this thesis has been carried out in the Department of Media Technology in Helsinki University of Technology 2008-2009, and in Aalto University 2010-2011. Part of the research was initially started during a research visit to Philips research in 2007. Moreover, some parts of the thesis were written during another research visit to University of York in 2010. The work has been supported by Helsinki Doctoral Programme in Computer Science, the Finnish Foundation for Technology Promotion, and the Nokia Foundation. The work has also received funding from the Academy of Finland, project no. [119092], the European Research Council under the European Community’s Seventh Framework Programme / ERC grant agreement no. [203636], and European Cooperation in Science and Technology [TD0804]. Firstly, I wish to acknowledge my supervisor, Prof. Lauri Savioja, and my instructor, Dr. Tapio Lokki, for discussions, guidance and the inspiring atmosphere in our research group. I thank Prof. Angelo Farina and Prof. Dr. ir. Emanuël Habets for the pre-examination of the thesis. Their expertise and valuable comments helped to improve the quality of the thesis. A special thanks goes to my coauthors Dr. Teemu Korhonen and Dr. Jukka Pätynen for the collaboration in research, and to Mr. Philip Robinson for proofreading the manuscript. I wish to thank all the people that I have had the pleasure working or discussing with during the course of my work. Thanks to all my coworkers in our department, Dr. Jukka Pätynen, Dr. Sampo Vesa, Dr. Samuel Siltanen, Dr. Alex Southern, Mr. Raine Kajastila, Mr. Hannes Gamper, Mr. Antti Kuusinen, Mr. Heikki Vertanen, Mr. Philip Robinson, Mr. Robert Albrecht, and Dr. Timo Tossavainen, it has been a joy working with you. I also thank my colleagues at Acoustics laboratory, especially Dr. Ville Pulkki, Mr. Marko Hiipakka, Mr. Jukka Ahonen, and Mr. Mikko-Ville Laitinen. Moreover, I would like to thank my instructors at Philips Research, Dr. Aki Härmä and Dr. Steven van de Par, and my former co-workers at Tampere University of Technology, especially Dr. 5 Teemu Korhonen, Dr. Pasi Pertilä, and Dr. Tuomo Pirinen. In addition, I thank Dr. Damian Murphy for collaboration during my visit to University of York. Thanks to all my friends for keeping me entertained outside of work. Finally, I wish to express my gratitudes towards my family for supporting me during the making of this thesis. Helsinki, November 29, 2011, Sakari Tervo 6 Contents Preface 5 Contents 7 List of Publications 11 Author’s Contribution 13 1 Introduction 21 1.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.2 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2 Background 25 2.1 Estimation theory . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.1.1 Maximum likelihood estimation . . . . . . . . . . . . . 26 2.1.2 Gauss-Markov theorem . . . . . . . . . . . . . . . . . . 27 2.1.3 Monte-Carlo simulations and error metrics . . . . . . 28 2.1.4 Cramér-Rao lower bound . . . . . . . . . . . . . . . . . 28 2.2 Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.1 Sound pressure . . . . . . . . . . . . . . . . . . . . . . 29 2.2.2 The wave equation . . . . . . . . . . . . . . . . . . . . 30 2.2.3 Sound intensity . . . . . . . . . . . . . . . . . . . . . . 30 2.3 Measurement of sound pressure and intensity . . . . . . . . 30 2.3.1 Fourier transform and spectral density . . . . . . . . 30 2.3.2 Sound intensity measurement using microphone pairs 31 2.4 Directivity of the sources . . . . . . . . . . . . . . . . . . . . . 33 2.5 Geometrical quantities . . . . . . . . . . . . . . . . . . . . . . 35 2.5.1 Time of arrival . . . . . . . . . . . . . . . . . . . . . . . 36 2.5.2 Time difference of arrival . . . . . . . . . . . . . . . . 36 2.6 Propagation of sound in enclosures in short . . . . . . . . . . 37 7 2.6.1 Speed of sound . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.2 Attenuation and air absorption . . . . . . . . . . . . . 37 2.6.3 Specular reflections . . . . . . . . . . . . . . . . . . . . 38 2.6.4 Specific acoustic impedance and absorption . . . . . . 38 2.6.5 Diffraction . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.6.6 Scattered reflections or diffusion . . . . . . . . . . . . 40 2.6.7 Definitions of the diffuse sound field . . . . . . . . . . 40 2.6.8 Measurement of instantaneous diffusion . . . . . . . 41 2.7 The room impulse response . . . . . . . . . . . . . . . . . . . 41 2.7.1 Modal and echo density . . . . . . . . . . . . . . . . . 42 2.7.2 Central limit theorem . . . . . . . . . . . . . . . . . . 42 2.7.3 Statistical models of the room impulse response . . . 43 2.8 Mixing time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.8.1 Formal definitions . . . . . . . . . . . . . . . . . . . . . 46 2.8.2 Estimation methods 47 . . . . . . . . . . . . . . . . . . . 3 Related research 49 3.1 Room impulse response measurement . . . . . . . . . . . . . 49 3.2 Localization methods . . . . . . . . . . . . . . . . . . . . . . . 49 3.3 Localization of reflections and room geometry estimation . . 51 3.4 Automatic calibration . . . . . . . . . . . . . . . . . . . . . . . 54 3.5 Visualization of reflections . . . . . . . . . . . . . . . . . . . . 54 3.6 Application areas . . . . . . . . . . . . . . . . . . . . . . . . . 54 4 Room impulse response measurement 57 4.1 Standard measurement technique . . . . . . . . . . . . . . . 57 4.1.1 Sine-sweep technique . . . . . . . . . . . . . . . . . . . 58 4.1.2 On the use of natural sound sources . . . . . . . . . . 58 4.2 The sparse impulse response technique . . . . . . . . . . . . 58 4.2.1 Measurement . . . . . . . . . . . . . . . . . . . . . . . 59 4.2.2 Comparison to other techniques and discussion . . . 60 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5 Localization Methods 8 67 5.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 Time difference of arrival estimation . . . . . . . . . . . . . . 68 5.2.1 Generalized correlation method . . . . . . . . . . . . . 68 5.2.2 Average square difference function . . . . . . . . . . . 70 5.3 Time of arrival estimation . . . . . . . . . . . . . . . . . . . . 70 5.3.1 Auto correlation method . . . . . . . . . . . . . . . . . 71 5.3.2 Maximum absolute pressure . . . . . . . . . . . . . . . 72 5.3.3 Other methods . . . . . . . . . . . . . . . . . . . . . . . 72 5.4 Localization functions . . . . . . . . . . . . . . . . . . . . . . 72 5.4.1 Maximum likelihood estimation for time of arrival and time difference of arrival . . . . . . . . . . . . . . 73 5.4.2 Maximum likelihood estimation for the signal model 74 5.4.3 Steered response power . . . . . . . . . . . . . . . . . 75 5.4.4 Maximum pseudo-likelihood . . . . . . . . . . . . . . . 76 5.4.5 Least squares localization approaches . . . . . . . . . 77 5.4.6 Sound intensity vector based localization . . . . . . . 78 5.5 Examples of the localization maps . . . . . . . . . . . . . . . 78 5.6 Search of the extremum . . . . . . . . . . . . . . . . . . . . . 81 5.7 Automatic calibration of the loudspeaker and microphone positions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.8 Localization of reflections . . . . . . . . . . . . . . . . . . . . 82 5.9 Computational complexity of the localization methods . . . . 83 5.10 Interpolation Methods . . . . . . . . . . . . . . . . . . . . . . 84 5.10.1 Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.10.2 Time difference of arrival estimate . . . . . . . . . . . 85 5.10.3 Time difference of arrival estimation function . . . . 85 6 Theoretical performance 87 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Time difference of arrival estimation . . . . . . . . . . . . . . 87 6.3 Time of arrival estimation . . . . . . . . . . . . . . . . . . . . 88 6.4 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.5 Time difference of arrival based localization . . . . . . . . . . 89 6.6 Time of arrival based localization . . . . . . . . . . . . . . . . 90 6.7 Combination of time difference and time of arrival information based localization . . . . . . . . . . . . . . . . . . . . . . 91 6.8 Theoretical comparison . . . . . . . . . . . . . . . . . . . . . . 92 7 Experiments 97 7.1 Monte-Carlo simulations . . . . . . . . . . . . . . . . . . . . . 97 7.1.1 Time difference of arrival estimation . . . . . . . . . . 97 7.1.2 Time of arrival estimation . . . . . . . . . . . . . . . . 98 9 7.1.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . 100 7.2 Real data experiments . . . . . . . . . . . . . . . . . . . . . . 104 7.2.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 8 Summary 111 8.1 Main results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 8.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Bibliography 115 Errata 137 Publications 139 10 List of Publications This thesis consists of an overview and of the following publications which are referred to in the text by their Roman numerals. I Sakari Tervo, Jukka Pätynen, and Tapio Lokki. Acoustic Reflection Path Tracing Using A Highly Directional Loudspeaker. In Proceedings of the 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2009) p. 245–248, New Paltz, NY, USA, October 18-21 2009. II Sakari Tervo, Lauri Savioja, and Tapio Lokki. Maximum Likelihood Estimation of Loudspeaker Locations from Room Impulse Responses . Journal of the Audio Engineering Society, (In press) 14 pages , III Sakari Tervo. 2012. Direction Estimation Based on Sound Intensity Vec- tors. In Proceedings of the 2009 European Signal Processing Conference (EUSIPCO 2009), p. 700–704, Glasgow, Scotland, August 24-28 2009. IV Sakari Tervo and Teemu Korhonen. Estimation of Reflective Surfaces from Continuous Signals. In Proceedings of the 35th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2010), p. 153–156, Dallas, TX, USA, March 14-19 2010. V Sakari Tervo and Tapio Lokki. Interpolation Methods for the SRP- PHAT Algorithm. In Proceedings of the 11th International Workshop for Acoustic Echo and Noise Control (IWAENC 2008), Article ID 9037, Seattle, WA, USA, September 14-17 2008. 11 VI Sakari Tervo, Teemu Korhonen, and Tapio Lokki. Estimation of Reflections from Impulse Responses. Journal of Building Acoustics, Volume 18, Number 1-2, p. 159–174, March 2011. 12 Author’s Contribution Publication I: “ Acoustic Reflection Path Tracing Using A Highly Directional Loudspeaker” A measurement technique for the investigation of early reflections is developed. The method is based on measuring the impulse responses with a directional loudspeaker spanned over a set of angles. In addition, the direction of arrival is estimated from the sound intensity vectors. The results show that the use of directional loudspeakers provides better spatial and temporal resolution than the standard omni-directional loudspeaker for early reflections. The present author invented the measurement technique, wrote 90 % of the article and implemented all of the experiments. Dr. Jukka Pätynen, and Dr. Tapio Lokki assisted in taking measurements for this article. Publication II: “Maximum Likelihood Estimation of Loudspeaker Locations from Room Impulse Responses ” A method for calibrating loudspeaker locations in acoustic measurements is developed. The method uses time of arrival and time difference of arrival obtained from the direct sound of room impulse response measurements. Results show that the method outperforms previously proposed methods in the loudspeaker localization task in theory and in practical situations. The present author invented the method, wrote 90 % of the article, and implemented all of the experiments. 13 Publication III: “Direction Estimation Based on Sound Intensity Vectors” Direction estimation methods that use sound intensity vectors are compared in real situations. The comparison reveals that the mixture modelbased direction estimation methods outperforms the direct average based methods. In addition, phase information is found to provide more reliable direction estimation than when the amplitude and phase information are both used. The present author is the sole author of the article. Publication IV: “Estimation of Reflective Surfaces from Continuous Signals” An inverse mapping of the first order reflections in the time difference of arrival framework is given. The mapping is used together with acoustic source localization to deduce the surface location and normal from a speech signal. The inverse mapping proposed in this article is developed by Dr. Teemu Korhonen. The method is demonstrated in an auditorium. The present author designed and conducted the experiments that validate the method in real situations, and wrote 50 % of the article. Additional results produced with the method are found in the doctoral thesis of Dr. Korhonen. Publication V: “Interpolation Methods for the SRP-PHAT Algorithm” An interpolation method is developed for a popular acoustic source localization algorithm, the steered response power phase transform (SRPPHAT). The interpolation is done to the cross correlation functions which are then used by the SRP-PHAT algorithm. Experiments are conducted in a concert hall environment and results are compared to the standard Fourier-interpolation method. The results indicate that the developed method outperforms the standard method. The present author invented the method under the supervision of Dr. Aki Härmä and Dr. Steven van De Par at Philips Research. The present author wrote 95 % of the article. 14 Publication VI: “Estimation of Reflections from Impulse Responses” An overview of acoustic localization techniques for the localization of early reflections is given and a visualization example is presented. The performance of the methods for direction estimation of the early reflections is studied. In addition, approaches for diffusion estimation of the early reflections are proposed and studied. The present author wrote 90 % of this article and implemented all of the experiments. 15 List of Abbreviations 2-D Two-dimensional 3-D Three-dimensional CRLB Cramér-Rao Lower Bound FFT Fast Fourier Transform SIRR Spatial Impulse Response Rendering MSE Mean Squared Error MMSE Minimum Mean Squared Error MLE Maximum Likelihood Estimation PL Pseudo-Likelihood SRP Steered Response Power LS Least squares GCC Generalized cross correlation PHAT Phase Transform CC Direct Cross Correlation ASDF Average Squared Difference Function TDOA Time Difference of Arrival TOA Time of Arrival, Time of Flight CM Combined Method 17 Some mathematical notations and symbols a Scalar |a| j= √ a∗ Absolute value of a −1 Imaginary unit Complex conjugate a Vector kak Euclidean distance A Matrix −1 Inverse of A T A Transpose of A trace{A} Trace of A I Identitity matrix J(θ) Fisher information matrix for θ θ Parameter θ Parameter vector χ̂ Measurement χ̂ Measurement vector φ, θ, ϕ Angle t Time f Frequency ω Angular frequency F {·} Fourier transform A F −1 {·} Inverse Fourier transform p(t) Time domain signal P (ω) Frequency domain signal Gp,p (ω) Auto spectral density of p(t) E{·} Expected value (over time) var{·} Measured variance ˆ· Estimate cov{·} Covariance Σ Covariance matrix N (µ, σ) Gaussian distribution with mean µ and variance σ 2 U(a1 , a2 ) Uniform distribution from a1 to a2 R(a) Rayleigh distribution with parameter a E(a) Exponential distribution with parameter a 18 Some mathematical notations and symbols continued p(·|·) Probability density function L(·|·) Likelihood λ(·|·) Log-likelihood x, y, z Cartesian coordinates c Speed of sound α Absorption coefficient β Reflection coefficient s Source position x Source candidate position r Receiver candidate position = 4 Denote (x, y, z) 3D coordinate location ∼ Distributed according to {·} Set x ∈ [a1 , a2 ] x Belongs to a closed interval from a1 to a2 x ∈ (a1 , a2 ) x Belongs to an open interval from a1 to a2 19 1. Introduction The location of acoustic reflections, i.e., the image-sources, is a useful piece of information in room acoustic studies, auralization, room geometry inference, and in-situ measurement of acoustic properties of surfaces from room impulse responses. In spatial room impulse response rendering [1,2] the locations of the reflections are used in spatial reproduction. Incorrect or inaccurate reflection localization will lead to incorrect auralization of the space. Moreover, the locations of the reflections can be used together with the source location to deduct the normals and the locations of the reflective surfaces [3–5], that is, to infer the room geometry. In addition, the location of the reflection is needed for accurate time windowing of the reflection from the room impulse response when estimating, for example, the absorption coefficient of the surface from in-situ measurements [6, 7]. The standardized way of studying room acoustics is to measure an impulse response using a sound source in the performance area and a microphone in the audience area [8]. The impulse response is considered to consist of three parts that have their distinct features. The direct sound arrives first, then the early reflections, followed by the late reverberation. The important difference between early reflections and late reverberation is that late reverberation or reverberation refers to the part of the impulse response, which has some specific statistical properties [9–11]. The early reflections are the discrete events before the late reverberation which do not have these statistical features. The topic of this thesis is the objective localization of early reflections and the direct sound, using measurement devices and related applied mathematics. Instead of a mono room impulse response, a spatial room impulse response is preferred when studying the location of reflections. The spatial impulse response is measured with a microphone array instead of a single microphone. Special microphone arrays and techniques 21 Introduction are presented and applied to this problem [12, 13]. Typically the spatial impulse response measurement is done with techniques such that the auralization of the enclosure is also possible. In our studies [14], the auralization is based on sound intensity vector analysis and synthesis [1,2], and therefore a specially designed open spherical microphone array is used. 1.1 Scope This thesis studies localization and tracing of early reflections, as well as calibration of measurement system, and measurement of room impulse responses. All of the analysis is based on measured spatial room impulse responses. Figure 1.1 shows the subtasks required in the localization of reflections. Reflection locations can be used in several applications, for example in speech source localization [15]. Initially, the main motivation for this study was to better explain some objective properties of the acoustics of the concert halls together with the subjective evaluations, as in [16]. This is not yet completed and it is the future work of the author. The contributions of this thesis are shown in Table 1.1. In detail, the contributions are: 1. Room impulse response measurement A measurement technique that improves the spatial and temporal separability of reflections has been developed. The method is based on the use of highly directional loudspeaker. The method was demonstrated with a Panphonics panel loudspeaker in Publication I. Comparison between the standard omni-directional, and two directional loudspeakers is given in Chapter 4. 2. Localization methods The study of the theoretical and practical performance, and the development of localization methods in the acoustic reflection localization task is done in Publications IV, V, and VI. Some results for the theoretical performance are presented in Publication II and in Chapter 6. Additional results for practical situations are presented in Chapter 8. 22 Introduction Table 1.1. Contributions of this thesis to various subtasks of acoustic reflection localization. Measurement of room Automatic calibration impulse responses - Sparse impulse response tech- - Source position estimation [II] nique [I] Localization of reflections Visualization - Comparison of methods [VI] - Tracing of reflections I - Interpolation methods [V] - Sound intensity based direction estimation [III] - Localization of reflective surfaces from speech [IV] Figure 1.1. Subtasks in the localization of acoustic reflections. 3. Calibration of the measurement system A method, robust with respect to noise, to be used in acoustic measurements for the calibration of the loudspeaker and microphone array positions is developed in Publication II. 4. Visualization of reflections A technique for visualization of early reflections is presented in Publication I. The method is based on inversely using the ray-tracing approach. In the Appendix, a comparison between the different visualization techniques is given. 23 Introduction 1.2 Organization This thesis presents 6 publications and related background information. Chapter 2 gives some basic information about signal processing techniques and room acoustics. Chapter 3 lists the research related to the reflection localization. In Chapter 4, the standard measurement and the proposed room impulse response measurement techniques are presented. Relevant localization methods are reviewed in Chapter 5, theoretical and practical performance of the methods are presented in Chapters 6 and 7, respectively. Visualization examples of early reflections are provided in the Appendix. A summary of the work is given in Chapter 8. 24 2. Background The goal of this thesis is to study estimation and methods related to localization of acoustic reflections. This chapter outlines the background on estimation theory, sound, and acoustics, as related to the localization of reflections in the context of this thesis. 2.1 Estimation theory The measurement or estimation of some physical phenomenon always includes a random error. This error is due to unideal conditions in real situations and is commonly referred to as noise. In the scope of this thesis the noise is always considered to be additive. That is, if the parameter to be measured is θ, then the measurement or the estimation can be given as [17]: θ̂ = θ + ε, (2.1) where ˆ· denotes an estimate, ε is the error term. A set of logical operations and calculations which produce the estimate is called the estimator. The estimator is unbiased if in overall it produces the correct value, i.e.: E{θ̂ − θ} = 0, (2.2) where E{·} denotes the expectation. Usually the error is assumed to be normally distributed with zero mean. Within this assumption the random error term can be described by only one term, the variance: σe2 = var(ε) = E{[ε − µε ]2 }, (2.3) where µε = E{ε}. Perhaps a more intuitive quantity describing the error variance is the signal-to-noise ratio (SNR) SNR = θ2 /σe2 , (2.4) 25 Background which is given in decibel-scale as SNR [dB] = 10 log10 {θ2 /σe2 }[dB]. (2.5) In a typical estimation task, instead of a single parameter θ, a parameter vector θ = [θ1 , θ2 , . . . , θK ]T ∈ RK is estimated, and the estimation vector is then given as θ̂ = θ + e. (2.6) In that case also the noise term is a K-dimensional vector e = [ε1 , ε2 , . . . , εK ]T ∈ RK . Again, if the estimator is unbiased E{θ − θ̂} = 0. (2.7) The error vector is described by the error covariance matrix, n o Σ = E [e − µe ][e − µe ]T , (2.8) where µe = E{e}. The individual components of the error covariance matrix are given as o n cov(εx , εy ) = E [εx − µεx ][εy − µεy ]T (2.9) In the case studied in this thesis, the parameter vector θ is the 3-D location of the reflection. Often the parameters cannot be measured directly. Instead some other variable χ̂ is measured, which is then related to the estimated parameter by a linear or non-linear model, i.e. χ(θ). 2.1.1 Maximum likelihood estimation The parameter θ can be estimated in several ways. One of the most popular methods is the maximum likelihood estimation (MLE) method. The MLE can be considered as two-step estimation approach. Firstly, the measurements χ̂ = [χ̂1 , χ̂2 , . . . , χ̂N ], χ̂ ∈ RN , 26 Background are assumed to have an error probability density function f (χ̂i ; χi (θ)) where the true values of the variable are related to the parameter, χ(θ) = [χ1 (θ), χ̂2 (θ), . . . , χ̂N (θ)], χ(θ) ∈ RN ×K . The joint probability density function for the variables χ(θ) given the measurements χ̂ is formed by multiplying the individual density functions [17] N Y L(χ(θ); χ̂) = f (χ̂; χ(θ)) = f (χ̂i ; χi (θ)) (2.10) i=1 This joint density function is referred to as likelihood, and it is denoted with L(·; ·). Assuming the normal distributions in Eq. (2.10) give a multivariate normal distribution [17] L(χ(θ); χ̂) = exp(− 12 [χ̂1 −χ1 (θ),...,χ̂N −χN (θ)]Σ−1 [χ̂1 −χ1 (θ),...,χ̂N −χN (θ)]T ) (2π)(N )/2 √ det(Σ) , (2.11) where Σ is the covariance matrix that includes the variances of the individual error probability functions and their covariances. In the case of independent variables, Σ is a diagonal matrix with diagonal components corresponding to the error variances σ 2 . In the dependent case, the covariance matrix is symmetric and it includes information on the correlation between the variables. In the second part of MLE, the likelihood is maximized. However, it is often more common to use the log-likelihood instead 4 λ(χ(θ); χ̂) = log{L(χ(θ); χ̂)} = N X log{f (χ̂i ; χi (θ))}. (2.12) i=1 The argument that maximizes the likelihood function is called as the maximum likelihood estimate θ̂ = arg max{λ(χ(θ); χ̂)}, (2.13) θ where θ̂ is the N-dimensional estimated parameter vector. 2.1.2 Gauss-Markov theorem The Gauss-Markov theorem states that [18, p. 217], in the case when the noise variances are equal var{εi } = σ 2 , zero mean E{εi } = 0, and the noise terms are uncorrelated, i.e., cov{εi , εj } = 0, the best linear unbiased estimator (BLUE) is the ordinary least squares estimator, i.e., (N ) X 2 θ̂ = arg min (χ̂i − χi (θ)) . θ (2.14) i=1 27 Background This is also called the minimum mean squared error estimator (MMSE). It is straightforward to show that Eq. (2.14) is a direct result of Eq. (2.12) with the given assumptions. 2.1.3 Monte-Carlo simulations and error metrics Monte-Carlo simulations are a useful tool for inspecting the performance of an estimator. In the simulations the modeled process is simulated N times, with selected models for the signal and error. The output of the estimator is then observed, and the estimator variance can be calculated directly from the output values. Often, instead of the variance, root mean squared error (RMSE) of the estimator is calculated q q RMSE(θ̂) = MSE(θ̂) = E{kθ̂ − θk2 }. (2.15) Other alternatives for the error measure are the mean absolute error or median error. These measures do not weight large errors as heavily as RMSE. Another metric used in the estimation is the number of anomalous estimates or the anomaly percentage. It is defined as the ratio between the estimates that have an error greater than some threshold and the total number of estimates n n oo AN(θ̂) = E 1 θ̂ − θ > ε (2.16) where 1{·} = 1 if the condition is true and 0 otherwise. 2.1.4 Cramér-Rao lower bound The lower bound for the estimator covariance is given by the Cramér-Rao lower bound (CRLB). In the multivariate case, it is given by the matrix inverse of the Fisher information matrix J(θ) [17, Ch. 3] cov(θ̂) ≥ J(θ)−1 . (2.17) In the single variable case, the Fisher information is one dimensional and the covariance is simply variance. The Fisher information matrix is defined as the squared derivative of the log-likelihood of the estimate probability density function, and it is given in the single parameter case as [17, Ch. 3] J(θ) = E 28 ( ∂λ(χ(θ); χ̂) ∂θ 2 ) , (2.18) Background and in the multivariate case as ( ) ∂λ(χ(θ); χ̂) ∂λ(χ(θ); χ̂) T J(θ) = E . ∂θ ∂θ (2.19) The mean squared error is limited by the CRLB MSE(θ̂) ≤ trace{J(θ)−1 } (2.20) If the estimator achieves the CRLB and is unbiased, it is called as an efficient estimator. The CRLB may not be achieved by any estimator. Especially, if the measured variable is not an injection, an efficient estimator does not exist [19] and thus the CRLB is not achieved by any estimator. 2.2 Sound A sound source emits sound energy in a medium. The sound energy causes the fluid particles of the medium to move from their initial state. The movements of the particles are described by the instantaneous particle velocity. On the other hand, the pressure of the medium changes due to different densities introduced by the particle movements. That is, the sound pressure is the effect of the sound power emitted by a sound source. This pressure is often referred to as acoustic pressure. The sound field has certain characteristics that are different in the near-field and the far-field. The sound field in the near-field is called active and in the far-field reactive [20]. In this thesis, the source is always considered to be in the far-field. 2.2.1 Sound pressure The total sound pressure is the superposition of atmospheric pressure p0 and the acoustic pressure p [11, 21–23]: ptot = p0 + p; (2.21) Often, in acoustics, a quantity called the sound pressure level is used instead of the total sound pressure. It is given as the relative change in the acoustic pressure respective to the just audible hearing threshold (2 × 10−5 Pa) [11]. 29 Background 2.2.2 The wave equation Using Newton’s laws of motion, and assuming that the air has no net velocity, i.e. the air does not move, sound pressure can be expressed using the wave equation [11, 23] 52 p − 1 ∂2p = 0, c2 ∂t2 (2.22) where t is time and c is the speed of sound. In this case, the sound pressure is a four-dimensional scalar function consisting of three coordinate components and time, i.e., p = p(x, y, z, t). 2.2.3 Sound intensity Particle velocity v describes the speed of the air (fluid) particle movements. Together with sound pressure they define the instantaneous sound intensity [21–23], I = pv. (2.23) Note that the sound intensity is a vector quantity as is the particle velocity. Sound intensity is perhaps best described as the flow of energy or the sound power per area. 2.3 Measurement of sound pressure and intensity Sound pressure is measured with a pressure microphone. The microphones that are used in this thesis, translate the mechanical vibration of the diaphragm (membrane) of the microphone into electric current using capacitance change, or electromagnetic induction. Although there exist special intensity sensors, such as the ones Microflown has developed [24], here the intensity is measured using pressure microphone pairs. 2.3.1 Fourier transform and spectral density The pressure signal recorded with a microphone is denoted with p(t). The Fourier transform of the continuous time signal p(t) is given as [25, 26] P̃ (ω) = = lim T P (ω) T →∞ Z ∞ −∞ 30 p(t)e−jωt dt (2.24) (2.25) Background 5 1 4 3 2 6 (a) TKK-3-D microphone array (b) G.R.A.S. sound intensity probe Figure 2.1. Microphone arrays used in this thesis. TKK 3-D microphone array has 12 microphones equally spaced on two spheres with diameters of 10 mm and 100 mm. G.R.A.S. array has a 6 microphones on a single sphere with diameter of 100 mm. Spacing dspc is equal for microphone pair on a single axis on a single sphere. See Table 2.1 for the locations of the microphones in the array. where ω = 2πf is the angular frequency and the discrete Fourier transform of the signal of length T is given by P (k) = with ω∆ = 2π T . 1 T Z T /2 p(t)e−jktω∆ dt (2.26) −T /2 The discrete signal has a power spectral density which is equal to [25, 26] E[P (k)P ∗ (k)] = 1 Gp,p (k), T (2.27) where ·∗ denotes the complex conjugate. The spectral density of the continuous signal approaches [25, 26] E[P̃ (ω)P̃ ∗ (ω)] = G̃p,p (ω). 2.3.2 (2.28) Sound intensity measurement using microphone pairs Throughout this thesis, the microphone array design that is used is an open spherical microphone array. Examples are shown in Fig. 2.1. The microphones are omni-directional. This setup is the optimal six-microphonesetup for localization, as shown in [27]. The use of this kind of array makes it possible to measure sound intensity on 3-D coordinate system. Other microphone configurations can be used as well to obtain the 3-D sound intensity [28]. Since sound intensity can be measured, auralization using spatial impulse response rendering technique (SIRR) is possible [1, 12]. In SIRR the features relevant for human perception are analyzed from the sound intensity vectors. 31 Background Table 2.1. Origin centered coordinates for the microphone arrays. Spacing dspc is equal for each microphone pair on a single axis. Microphone No. X [m] Y [m] Z [m] 1 dspc /2 0 0 2 −dspc /2 0 0 3 0 dspc /2 0 4 0 −dspc /2 0 5 0 0 dspc /2 6 0 0 −dspc /2 On a certain axis x, the instaneous reactive sound intensity is given in the frequency domain as Ix (ω) = <{P ∗ (ω)Ux (ω)}, (2.29) where P (ω) and Ua (ω) are the frequency presentations of the sound pressure and of the particle velocity with angular frequency ω [12]. In addition, <{·} is the real part of a complex number and (·)∗ denotes the complex conjugate. The pressure in the middle of the array, shown in Fig. 2.1, can be estimated as the average pressure of the microphones [12, 29]: 6 P (ω) ≈ 1X Pn (ω). 6 (2.30) n=1 In the frequency domain, the particle velocity is estimated for the x-axis as: Ux (ω) ≈ −j [P1 (ω) − P2 (ω)], ωρ0 d (2.31) where d is the distance between the two receivers, j is the imaginary unit, and, for example, with the speed of sound c = 343 m/s, the median density of air is ρ0 = 1.204 kg/m3 . The particle velocity is calculated similarly for y-axis Uy (ω) ≈ −j [P3 (ω) − P4 (ω)], ωρ0 d (2.32) Uz (ω) ≈ −j [P5 (ω) − P6 (ω)]. ωρ0 d (2.33) and for z-axis The overall sound intensity vector for a frequency ω is then noted with I(ω) = [Ix (ω), Iy (ω), Iz (ω)]. The sound intensity estimation with microphone pair technique is limited by the distance between the microphones. 32 Background Frequencies above f> c d (2.34) are spatially aliased and the sound intensity for them cannot be properly estimated using the above equations. The low frequency limit is typically set by the properties of the pressure microphones. The estimation of sound intensity vectors using Eqs. (2.30) and (2.31) is shown to be biased [29]. The bias is described by the equation [29] sin(ωd sin(θ)/(2c)) . (2.35) g(θ) = arc tan sin(ωd cos(θ)/(2c)) The unbiased estimate θunb is obtained via the inverse function as θunb = g −1 (θ). The bias is caused by the fact that the pressure gradient is a sinusoidal one instead of the assumed constant. The bias cannot be corrected for frequencies [29] 1 c f>√ , (2.36) 2d which is much lower than the previously set threshold by the spatial aliasing. In this thesis, the bias correction is not used since the highest frequency in the experiments is selected to be so low that the bias can be neglected. 2.4 Directivity of the sources Assuming a homogeneous medium and a free path between the source and the sensor, the direct sound wave arriving at a sensor depends on the characteristic of the sound source. The most used characterization is the directivity of the source [30]. It is a measure of how much energy the source emits to a certain angle at a certain distance. It is measured in free-field conditions: in an anechoic chamber, or in a room where the reflective surfaces are sufficiently far so that windowing can be applied to isolate the direct sound from the reflections. The more measurements made around the source, the more accurate estimation of the directivity is achieved. The acoustic power can be estimated with a surface integral over the directivity measurement, defined by an ISO-standard [31]. Figure 2.2 shows examples of the directivities of three loudspeakers. The directivity and acoustic power measurements assume that the sound source is a point source. This is not true with real sound sources. For example, a violin has a vibrating body which emits energy in addition to that emitted from the f-holes [30]. Also, loudspeakers are not point sources. 33 Background 240 mm 350 0 dB 300 −3 dB −6 dB Azimuth θs, [°] 250 −10 dB 200 −15 dB 150 −20 dB 100 −25 dB 50 −30 dB 63 80 100 125 160 200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3200 4000 5000 6300 8000 10000 12500 16000 0 −60 dB Frequency, [Hz] (a) Standard omni-directional 0 dB 300 −3 dB 19 1m m 350 −6 dB 247 mm Azimuth θs, [°] 250 −10 dB 200 −15 dB 150 −20 dB 100 −25 dB 50 −30 dB 63 80 100 125 160 200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3200 4000 5000 6300 8000 10000 12500 16000 0 151 mm −60 dB Frequency, [Hz] (b) Genelec 1029A 350 0 dB 300 −3 dB 0.4 m −6 dB Azimuth θs, [°] 250 2.40 m −10 dB 200 −15 dB 150 −20 dB 100 −25 dB 50 −30 dB 63 80 100 125 160 200 250 315 400 500 630 800 1000 1250 1600 2000 2500 3200 4000 5000 6300 8000 10000 12500 16000 0 −60 dB Frequency, [Hz] (c) Panphonics panel loudspeaker Figure 2.2. Pictures and dimensions of three loudspeakers, and their directivity measured in 1/3-octave bands, at 12 m distance at every 10 degrees azimuth. The speaker is facing the microphone when azimuth angle is 90 degrees. For instance, a widely used monitor loudspeaker Genelec 1029A has two elements, the bass-element and the tweeter, which both emit sound energy. The bass-element reacts more slowly to the changes that the coil passes on than the tweeter. For this reason, and due to the different locations of the elements, the high frequencies arrive at a sensor placed in front of the loudspeaker earlier than the low frequencies, as shown in Fig. 2.3 where the impulse response is filtered at the cross-over frequency of the loudspeaker. The fact that a loudspeaker consists of several sound sources affects the phase of the received signal. When the single location of the sound source is wanted, the acoustic center of the source is used, 34 Background which is the weighted average of the sound energy over an area. Near-field acoustic holography [32, 33] is a useful tool for describing the sound source. Acoustic holography is concerned with the inverse problem of what the sound source has emitted given the observations of sound pressure at some distance. Typically, a grid of sensors is placed in the vicinity of the source, and the Kirchhoff-Helmholtz integral is used to inversely to predict where the energy is distributed on a hologram plane [34]. It can be used, for example, in noise source measurements or to investigate which parts of an instrument emit sound energy. 0.6 Normalized sound pressure 0.4 0.2 0 −0.2 −0.4 −0.6 Wide−band −0.8 Low−pass filtered, f < 3.3 kHz High−pass filtered, f > 3.3 kHz −1 14.6 14.8 15 15.2 Time, [ms] 15.4 15.6 Figure 2.3. Impulse response of Genelec 1029A, measured at approximately 5.1 m distance, in front of the loudspeaker. The low frequencies arrive later at the microphone than the high frequencies. 2.5 Geometrical quantities Useful geometrical quantities in acoustic source localization are time of arrival (TOA) and time difference of arrival (TDOA). The calculation of these quantities depend on the selected wave propagation model. Two commonly used wave propagation models are the spherical and the plane wave propagation models. Fig. 2.4 illustrates the principles of these two models in 2D. The plane wave propagation model is usually assumed and used, if the intra-sensor distances are small and the source is far away from the sensors. 35 Background (a) Plane wave propagation (b) Spherical wave propagation Figure 2.4. Plane and spherical wave propagation models. Time of arrival (TOA) and time difference of arrival (TDOA) are presented also for both cases. 2.5.1 Time of arrival Time of arrival (TOA), often also referred to as time of flight, is the time that the sound wave takes to travel from the source to the receiver. In the case of spherical wave propagation model it is given as: t(rn ; x) = c−1 krn − xk (2.37) and for plane wave propagation model as t(rn ; x) = |c−1 nT (rn − x)|, (2.38) where c is the speed of sound and n is the direction of the plane wave. 2.5.2 Time difference of arrival Time difference of arrival (TDOA) for a spherical wave propagation model is the difference of two TOAs: τ (ri , rj ; x) = c−1 (kri − xk − krj − xk), (2.39) where c is again the speed of sound and (·)T denotes vector transpose. For the plane wave propagation model, the TDOA formulates into τ (ri , rj ; x) = c−1 nT (ri − rj ). 36 (2.40) Background 2.6 Propagation of sound in enclosures in short The inspection of sound phenomena are now restricted to room conditions. In a room environment, when a wave confronts a surface S, the reflected wave depends on the features of the surface. The surfaces considered here are impenetrable, rigid, or porous. An impenetrable surface does not transmit any waves to the other side of the surface [22]. A rigid surface is stationary, i.e. does not move, and a porous wall is not necessarily rigid or impenetrable [22]. A porous surface can transmit some of the arriving energy through refraction [35]. 2.6.1 Speed of sound Particle velocity describes the speed of the particle movements. However, the more interesting quantity in room acoustics is the speed of the propagating sound pressure wave, commonly known as the speed of sound. In room air, the most prominent factors that affect the speed of sound are the temperature, relative humidity, barometric pressure, and carbon dioxide content [36]. Several approximations, all derived from fluid theory, exist for the speed of sound calculation [36]. Throughout this thesis the speed of sound is calculated using the approximation presented in [36, p. 1046], and assuming that the carbon dioxide content and the barometric (atmospheric) pressure are 0 % and 1013 hPa, respectively. The relative humidity of the air and the temperature are measured using commercially available equipment. Based on measurements by the author, during an acoustic measurement, for example in a concert hall, these factors change over time. In this thesis, it is assumed that the air in the enclosure is homogeneous during each measurement. 2.6.2 Attenuation and air absorption In general, the amplitude of the sound pressure decreases in relation to 1/r, where r is the distance from the source, for spherical waves, and by √ 1/ r for cylindrical waves. This is caused by the fact that the energy is spread over a bigger area, thereby attenuated. In addition to attenuation, the air absorbs some of the energy of the sound wave [35, 37, 38]. Air absorption is a function of frequency, and in general it depends on distance and the same physical quantities as the 37 Background speed of sound [38]. 2.6.3 Specular reflections An impenetrable surface S can be stationary or vibrating. Consider a point xS on the surface S with velocity of the (moving) surface vS = dxS /dt near the point xS . The velocity of the fluid v at the boundary has to be equal to the velocity of the particles near the boundary, i.e. [22]: v · nS = vS · nS , (2.41) where nS is the normal component of the surface at xS . On stationary surfaces, the surface does not move (vS = 0), and one has v · nS = 0 [22]. This implies that the particle velocity at the boundary is 0. Therefore, a plane wave at a flat rigid surface is reflected according to the law of mirrors (also included in Snell’s law), i.e. the reflected wave is the mirrored angle with respect to the normal of the surface as shown in Fig. 2.5. The specularly reflected wave can be modeled conveniently using the image-source principle [39], shown in Fig. 2.6. Figure 2.5. A plane wave at flat rigid surface is reflected according to the law of mirrors. After [23]. 2.6.4 Specific acoustic impedance and absorption A boundary condition where the surface is not necessarily rigid or impenetrable is described by the specific acoustic impedance. Specific acoustic impedance Z is the relation between sound pressure p and particle veloc- 38 Background P surface Sensor Image sensor Reflection point Image source / Reflection Source Figure 2.6. Concepts of image-source and image-sensor. ity v on the surface S, i.e. [22]: Z(ω) = p(ω) . v(ω) (2.42) This is analogous to the electrical circuits, i.e. the relation between impedance, current, and voltage. Note that in this case, the particle velocity is not written in the vector form because the measurement is considered at one connection point. The specific acoustic impedance consists of specific acoustic reactance and resistance, which are the real and imaginary parts of Z(ω), respectively. The resistance can be seen as the part where energy is lost, and reactance as the part where energy is stored. A closely related quantity to the specific acoustic impedance is the pressureamplitude reflection coefficient β which describes the relation between pressures of the incident arriving waveform and the reflected wave. Through some theoretical examination (see [22] or [23] for details) the relation to specific acoustic impedance is given: Z(ω) 1 + β(θ, φ, ω) = , ρc 1 − β(θ, φ, ω) (2.43) where ρc is the characteristic specific impedance of air, and θ is the angle of the incident wave. So, β(θ, φ, ω) depends on the angle of incident and frequency. For a plane wave at a flat rigid surface the reflection coefficient is independent of the angle of incidence. 39 Background If the reflection coefficient is less than 1, the material absorbs energy. The absorption coefficient is defined as [22]: α(θ, φ, ω) = 1 − |β(θ, φ, ω)|2 . (2.44) Measurement of the absorption coefficient can be done using an impedance tube measurement [40, 41], various in-situ measurement methods [6, 7, 42], or the reverberation chamber technique [43]. 2.6.5 Diffraction Diffraction occurs when a sound wave confronts an edge. A practical example of this is confronted in everyday-life: a person is able to hear what someone is speaking on the other side of a corner. There are three regions around the corner where different waves besides the diffracted wave occurs. In the first region, only reflected wave is possible, in the second region there is only direct wave and no reflected wave, and in the third region there is only diffracted wave. The formal definitions for these cases are given in [23]. It is found to be important to model the diffraction for auralization purposes [44]. 2.6.6 Scattered reflections or diffusion When the surface is rough or someway uneven, the measurement or the modeling of specular reflections becomes difficult. In this case, scattering and diffusion coefficients are a useful way to describe the behavior of the sound field [43]. The phenomenon that causes diffuse reflection is the diffraction in very small scale [45]. Scattering and diffusion coefficients describe the reflection from a surface that is not perfectly specular. For example, the scattering coefficient is calculated by dividing the reflection in to two components: the specular reflection, and the scattered reflections [46]. Several measurement approaches and different definitions for the coefficient exist for diffusion and scattering [43, 47]. 2.6.7 Definitions of the diffuse sound field A sound field is perfectly diffuse if the directional energy density inside a volume is equal for each point and direction [22]. In practice this means that direction and the phase of the sound field are uniformly distributed and the amplitude is equally distributed for each point. Thus, the sound 40 Background field is spatially homogeneous and isotropic. Another definition for diffuse sound field is that the net energy flow over the volume is zero, i.e. the sound intensity over the surface S of the volume V Z IdS = 0, (2.45) S where dS is a surface element. A way of constructing a diffuse sound field is by superpositioning infinite number of plane waves with random phases in the volume. In practice, a finite number of plane waves, e.g., 1145, will produce a diffuse sound field [48]. The diffuseness of a sound field can be measured with spatial correlation function [48–50], its variations [51–53], spatial coherence [54], its variations [53], or spatial uniformity of the sound field [51]. 2.6.8 Measurement of instantaneous diffusion All of the above methods measure the diffuseness of a sound field over a large set of measurements. A more interesting method in the context of this thesis is the one that can describe the diffusion of a part of the room impulse response. Examples of this kind of method is the diffuseness analysis used in SIRR [1, 12, 55]. Other methods are presented in Publication VI and [56]. 2.7 The room impulse response When a sound wave propagates in an enclosure, it is affected by the phenomena listed above. The signal received in the sensor is therefore a modified version of the signal emitted by the source. If the source signal is a single impulse, the signal arriving to a sensor is called the impulse response. In the context of this thesis, the impulse response can be presented as h(t) = K X hk (t) + w(t)dω + w(t) (2.46) Z αk (ω)ejω(t−tk +φk (ω)) dω (2.47) k=1 where hk (t) = is a single reflection, αk (ω) is the frequency dependent attenuation factor for each sound wave k, tk is the time delay related to the distance of the path of a reflection, and w(t) is measurement noise that is assumed to be 41 Background independent and normally distributed. The attenuation factor αk (ω) is dependent on the properties of the surface and air absorption [11]. Quite often in real situations the phase term φk (ω) is dependent on the frequency. Here the frequency dependency of the phase term φk (ω) is acknowledged, but the analysis of the room impulse responses assumes that with early reflections the phase is independent of the frequency, i.e. φk (ω) = 0, ∀k. The first arriving sound wave in Eq. (2.46) is referred to as the direct sound. The sound waves arriving after the direct sound are called the early reflections, up to a time instant called a mixing time tm [10]. The early reflections are considered to be discrete events, with only small deviations in the phase of the sound wave. After the mixing time, the impulse response is called late reverberation. The impulse response, especially the late reverberation, exhibits some statistical behavior [9–11]. The reflections cannot therefore be identified or localized from the late reverberation. Figure 2.7 illustrates the three parts of the impulse response. 2.7.1 Modal and echo density The modal density, the number of modes, i.e., resonance frequencies, at a frequency f is given as [11, p. 61] dNf f2 = 4πV 3 , df c (2.48) where V is volume, c is speed of sound, and Nf is number of modes. The echo density, the number of reflections arriving at time t is [11, p. 92] c3 t2 dNr = 4π , dt V (2.49) where Nr is the number of reflections. Both of these equations apply to rooms with arbitrary shape [11, p. 92]. When frequency increases, modal density becomes large and when time increases, echo density becomes large. 2.7.2 Central limit theorem (i) In the discrete time domain, the samples {hk }, i ∈ {1 . . . L} of a reflection n arriving within the time window dt are considered random variables with mean E{hk } = µ, variance var{hk } = σ 2 , and some unknown probability density function. According to the central limit theorem, as K approaches infinity, the mean of the samples approaches a normal distri- 42 Background bution [57, p. 357]: K 1 X (i) d {hk } → N (µr , σr2 ), K (2.50) k=1 with some mean µr and variance σr2 . Thus, the sum of an infinite number of reflections can be considered normally distributed in the discrete (i) time domain. Note that if {hk }L i=1 is normally distributed, then the (i) mean of the reflections is always normal. If {hk }L i=1 is not normally distributed, then it takes K = ∞ reflections to achieve normality, as stated by Cramér’s theorem. In practice, it is not required to have infinite number of reflections to achieve normality. The number of reflections at which the average of them is normally distributed depends on the reflection signals hn . However, no matter what the reflection signal hn is, it is inevitable that after a certain number of reflections the time distribution of their average is normal. Since the modal density and echo density are differential measures, the frequency and time intervals in them are infinitesimal, respectively. Then, in those infinitesimal intervals the distributions of the time domain and frequency domain pressure signals are normally distributed when the number of modes and reflections is high enough. This model is introduced by Schroeder [9] and later complemented by Polack [58] and they are summarized in the following section. 2.7.3 Statistical models of the room impulse response For a given room impulse response, when considered in the frequency domain, if the distance from the source to the receiver is sufficient, and if enough room modes are excited simultaneously, then the real and imaginary parts can be considered independent Gaussian processes [9]. The amplitude of the frequency domain room impulse response H(f ) therefore follows the Rayleigh distribution, i.e. kH(f )k ∼ R(σf2 ), where σf2 is the standard deviation in the frequency domain [9, 59]. This applies for frequencies above the Schroeder frequency [9], r T60 , fSchroeder ≈ 2000 V (2.51) where T60 is the reverberation time and V is the volume of the room. It should be noted that the energy follows exponential distribution, i.e., kH(f )k2 ∼ E(λe ), where λe > 0 is a parameter, which can be calculated from σf . 43 Background (a) Time-domain (b) Frequency-time-domain Figure 2.7. Impulse response from a concert hall. Early reflections appear before mixing time and late reverberation. 44 Background In the time domain, a statistical model of the room impulse response can be applied after the mixing time [10]. Discussion about the mixing time is presented in the next subsection. The statistical model in the time domain is given as [10, 60]: h(t) = b(t)f (t), (2.52) where f (t) is a monotonically decaying function, and b(t) is zero mean normally distributed noise, i.e. b(t) ∼ N (0, σt2 ), and σt is the fixed standard deviation. The decaying function is dependent on the reverberation time T60 of the room determined by the following relation [10]: f (t) = e−δt , (2.53) where δ = 3 log(10)/T60 is the damping factor. The reader is reminded that the total distribution of h(t) is not normal since it is multiplied with the decaying function f (t). However, the frequency domain transform F {h(t)}, where F {·} is the Fourier transform, follows the Rayleigh distribution with variance σf2 = σt2 E{f 2 (t)} 2 (2.54) where E{·} denotes the expected value. In real rooms, the reverberation time and the decaying function are a function of frequency, i.e. δ(f ) = 3 log(10)/T60 (f ). Figure 2.7, shows an example of the frequency dependent reverberation time in a concert hall, estimated as proposed in [10]. A generalization of Eq. (2.52) to the frequency dependent case (originally suggested by J.-D. Polack according to [10]) is given by the ensemble average of the Wigner-Ville distribution: < W (t, f ) >= kH(f )k2 e−2δ(f )t (2.55) where kH(f )k2 is the power spectral density. That is, the average of the Wigner-Ville distribution over a set of time instants and frequencies has an exponentially decaying shape in the time domain multiplied by the power spectral density. The Wigner-Ville distribution itself is defined as [10]: Z ∞ W (t, f ) = h(t − τ /2)h(t + τ /2)e−j2πf τ dt. (2.56) −∞ The Wigner-Ville distribution has the following properties. The integration of Eq. (2.56) with respect to frequency produces the temporal energy density h2 (t) = Z ∞ W (t, f )df, (2.57) −∞ 45 Background and integration with respect to time gives the spectral energy density Z ∞ kH(f )k2 = W (t, f )dt. (2.58) −∞ Equation (2.56) allows the attenuation factor δ in Eq. (2.52) to be dependent on frequency. 2.8 Mixing time Mixing time appears in various applications and research studies that use or study room impulse responses [10, 37, 61–73]. Generally, in impulse response analysis and synthesis, the mixing time is used as the time after which the impulse response can be approximated by an appropriate model. This is generally much more efficient than modeling all of the reflections in the impulse response. Consequently, the use of statistical models can save considerably on computation time and/or required system memory, which are important aspects, for example, in auralization applications [37, 61, 62, 68, 69], particularly if real-time interaction and dynamic source and receiver positioning are required. Traditionally, the mixing time is subjectively defined simply to be 80 ms [8, 74]. Furthermore, values from 50 to 200 ms have been suggested for the mixing time from the human hearing point of view [75–77]. Although these figures are reasonable as a subjective parameter, they might not correspond to the objective mixing time. That is, as the objective mixing time is dependent on the physical properties of a concert hall, it is not reasonable to assume that these physical properties do not change between concert halls. Therefore, there is a need to estimate the mixing time from a room impulse response directly. 2.8.1 Formal definitions Echo density, i.e. Eq. (2.49), is related to the room volume through the billiard theory [10, 60]. A sufficiently large echo density should also indicate the mixing time of a room. Different values for the sufficiently large echo density have been proposed, varying from 1000 to 10000, according to [62]. Several authors define the mixing time, as the time instant when 10 or more reflections overlap in a time window of 24 ms [10, 58, 78, 79]. This 46 Background corresponds to approximately [10] √ tm ≈ γ V (2.59) where γ = 1 × 10−3 [s/m3 ] is a normalizing constant. Another approach is to define the mixing time through energy. In [67], the mixing time as the time when the energy of the impulse response has decreased a certain amount from the overall energy level. Values from -20 dB to -15 dB are used [67]. 2.8.2 Estimation methods The estimation of mixing time is an ungrateful research area, because absolute reference, i.e. ground truth, for the mixing time does not exist. Yet, several research articles about the topic exist [59, 63–66, 70–73]. The approximation in Eq. (2.59) is based on theoretical developments and has not been verified by experimental results from real data. In addition, it is debatable whether the mixing time should be given as a transition time zone, rather than a strict transition time. Therefore, all that can be done is to compare the output of the methods in different situations, as done, for example, in [72]. There exist several methods that estimate the mixing time of a room impulse response based on statistical assumptions of the properties of the signal. Mixing time is estimated as the time when the kurtosis and standard deviation ratio are close to that of a Gaussian distribution [63]. The same approach is used for separating the late reverberation of impulse responses in order that the spatial coherence and correlation functions of impulse responses might be examined [53, Fig 5.]. In addition, the echo density, for some reason, is estimated with the standard deviation ratio [80]. However, the actual relation between standard deviation ration and echo density is not shown. The relation in Eq. (2.59) suggests that the room volume or echo density can be used to calculate the mixing time. Hence, if the echo density or the volume of the room is estimated from a single impulse response, as in [81], then the mixing time is also estimated. In [64, 70], the mixing time is estimated from the phase of the impulse response, assuming that the phase of the impulse response is linear when the early reflections are dominant and non-linear when the late reverberation starts. From this non-linearity the mixing time can then be determined. Theoretical relation between the non-linearity of the phase of the 47 Background impulse response and the mixing time has not been presented in [64, 70]. It is suggested in [65, 66] that matching pursuit can be used for finding the reflections within a room impulse response to estimate the mixing time. Matching pursuit, in this case, is essentially the same as calculating the cross correlation between a prototype of the direct sound signal and to the rest of the impulse response. The time instant when the number of reflections no longer follows a predefined cubic model of the echo density, given in Eq. (2.49), is then the mixing time. According to [59], mixing time can be identified when the correlation between the amplitude of certain frequencies of the whole impulse response and late sound is sufficiently low. This is proposed as the definition of mixing time and the relevance of this definition and its relation to other acoustic parameters is discussed. It is found that mixing time and reverberation time have the highest correlation out of the studied acoustic parameters [59]. The temporal overlap of reflections is used to define the mixing time in [71]. The basic assumption is that the original emitted sound wave from the sound source widens after each reflection. The width of these reflections is compared to the time differences between the reflections to deduce the mixing time. In [73] the room’s free path temporal distribution is considered to be an indicator of the mixing time. The free path temporal distribution is obtained by ray tracing and it describes the energy of the reflections at each time instant. In ergodic rooms, the energetic average of the path lengths converges rapidly after the mixing and the free path value becomes independent of the time. 48 3. Related research Previous work and related research on localization of reflections are described. Measurement of room impulse responses is presented. The most relevant localization methods are discussed, and different approaches and setups for localization of reflections used in the previous research are reviewed. Possible application areas for reflection localization are listed in the end of the chapter. 3.1 Room impulse response measurement The standard way of studying room acoustics is to measure a room impulse response [8]. The standard states that an omni-directional source and also, in most of the cases, an omni-directional microphone are to be used in the measurement. Recently, advanced microphone array techniques have been applied for room impulse response measurements [1, 13, 82–91]. The advantage over traditional omni-directional microphone measurement is that spatial analysis of the impulse response can be applied. In addition, auralization of the space is made possible [1, 2, 83]. 3.2 Localization methods Source localization methods are based on time of arrival estimation (TOA), time difference of arrival (TDOA), or directly on the signals. TDOA estimation is a far more popular topic than TOA estimation. This is due to the fact that time of arrival (TOA) cannot be directly measured with unknown source signals. More than ten methods have been developed for the TDOA problem over the last decades [25, 92–98]. One of the 49 Related research most popular approaches is the generalized correlation [25]. Other methods include time domain difference function [96], and the use of some additional information such as the fundamental frequency [93]. The theoretical performance of TDOA estimation is well known in theory for the case when additive noise is present [99, 100]. Lower bounds, such as CramérRao, describe the variance of the TDOA estimation in the case of additive noise [99]. The accuracy of TDOA based localization is limited by the sampling frequency. In [101] parabolic fit and in [102] exponential fit are proposed for interpolating the TDOA estimate. The most straightforward algorithm for TOA estimation is a simple peak-picking algorithm [65,66,103]. In addition, it has been proposed that statistical features, such as kurtosis, can be used to detect peaks [104]. Other methods are based on correlation or some other similarity measure and they usually require a priori knowledge of the signal [66, 103]. In principle, the onset detection methods used in music signal analysis could be used here [105]. The theoretical performance of TOA estimation is not studied extensively under additive noise to the knowledge of the present author. TOA accuracy can be improved by basic Fourier-interpolation or by assuming a shape for the estimation function, similarly as in TDOA estimation. When two- or three-dimensional localization is desired, the TDOAs, TOAs, or the signals are combined spatially using an acoustic source localization function. Popular acoustic source localization functions are the maximum likelihood estimation (MLE) function [106], steered response power (SRP) functions [106, 107], and pseudo-likelihood functions [108]. MLE methods have been formulated for TOA [109], TDOA [106, 109– 111], and signal models [112–117]. Advancements in MLE for a signal model come from an updated noise model [117] or an updated signal model [112]. The MLE for TDOA, with certain assumptions, can be presented as a least squares (LS) problem. The TDOA LS problem has gained lot of attention in research [109, 118–126, 126–136], mostly because the LS solution can be given in closed form by making first some assumption on the error or on the signal. The LS solutions and problems are so addressed in research that several textbooks deal with them (e.g. [137, 138]). Also, the MLE for TOA can be presented as a LS problem. Closed form solutions for the TOA LS problem have also been applied [139–142]. The SRP-Phase Transform algorithm has been studied extensively [106, 50 Related research 107, 116, 143–146] and it has been followed by various modifications and optimizations [116, 144, 145, 147–150]. The SRP method is shown to be equivalent to basic beamforming [151, 152]. The performance of the localization can be studied with CRLB [17], dilution of precision, which is a special case of CRLB [153, Ch.3.3], or similar variance analysis [128, 154]. The direction of arrival of the sound wave can be estimated using the sound intensity vectors [1,12,29,155–159]. These vectors can be measured using a special microphone, such as first order B-format microphone. The location of the source can be estimated as the average of the intensity vectors over time or frequency. 3.3 Localization of reflections and room geometry estimation A relevant topic to the localization of reflections is the localization of the reflective surfaces, or the blind estimation of room geometry. Namely, the estimation of reflective surfaces is equivalent to localization of first order reflections. The localization of reflections and estimation of room geometry from room impulse responses have been studied in several research articles [1,3–5,12,13,85–87,160–165]. The approaches are based on TOA, TDOA, and direction of arrival (DOA) estimation. TOA estimation requires that the loudspeakers and microphones are time-synchronized, and the TDOA and DOA based methods do not require synchorinization. In [1, 12] a technique called spatial impulse response rendering (SIRR) is developed. The analysis part of SIRR inspects the direction of arrival of the reflection and the diffuseness of the sound field. Since the analysis is done in short time windows, the location of the reflections can be deduced using the a priori knowledge of speed of sound, the time of arrival and the estimated DOA which is calculated from sound intensity vectors. In another study, a spherical microphone array with an integrated video camera is used in [13, 85, 160] for visually inspecting the reflections. The energy of the spherical beamformer output that is applied for an impulse response that is divided into short time windows is overlayed on top of a panorama video image from the center of the microphones. The location of the reflection is then inspected visually for each frame. The maximum of the beamformer output corresponds to the DOA of the reflection and the distance to the reflection is calculated from the time stamp of the current 51 Related research frame. In [161] the reflections are localized using TDOA estimation with a microphone array that consists of 8 microphones. The method is demonstrated in an auditorium. In [5] the room geometry is estimated by rotating a B-format microphone around a loudspeaker, directed towards the microphone. The estimation is based on the TOA and the DOA of the first arriving reflection in each direction. For each direction a single TOA and DOA estimate is obtained. In the post-processing phase the TOA and DOA measurements are grouped using hierarchical clustering to avoid estimating the same plane multiple times. The reflecting plane parameters are estimated by rotating an omnidirectional microphone around a loudspeaker which is directed towards to microphone in [3]. The impulse responses are transformed into an acoustic localization map from where the local maxima correspond to the plane locations. As the source position is known, the plane parameters can be calculated. In [4] the reflecting plane parameters are estimated with a common tangent algorithm in two dimensional space. The problem is first formulated into quadractic equation that describes the relation between the TOAs and plane parameters and source location. For a single reflection the solution of this quadratic equation provides the parameters of a single plane. The solution is called the common tangent algorithm (COTA). For multiple planes, the estimated TOAs are first grouped using the generalized Hough transform and then the plane for each group is solved using the COTA. The generalized Hough transform detects the TOAs that describe the same plane. The approaches in [4] are extended to three dimensions in [164]. Moreover, a closed form solution for the plane parameter estimation from the quadratic equation is presented in [165]. COTA is applied in [162] for the estimation multiple plane parameters in two dimensional space. Whereas in [4] the grouping was done with the generalized Hough trasnform in [162] the grouping is done with an iterative search. The iteration proceeds as follows. First the parameters of the closest plane are estimated. Then the TOAs associated with the first plane are removed and the search is performed again. This iteration is performed as many times as there are a priori known planes. In [163] a closed form solution to the above mentioned quadratic equation that describes the relation between the TOA and plane parameters is 52 Related research presented for the 2-D case. In the solution, two planes are selected where the cost function is inhomogeneous. Then, the gradients of the cost function on these planes are solved analytically. The minimum of the obtained solutions corresponds to the plane parameters. Moreover, the generalized Hough transform is applied to improve the estimation of the parameters of a single plane. The room geometry has also been estimated from continuous signals [15, 166–169]. The advantage of these approaches is that they are blind, i.e. there is no need for controlled source signal. Inverse mapping of the multi-path propagation problem for first order reflections in TDOA framework is presented in [15]. The mapping is used together with acoustic source localization to estimate reflective surfaces from speech signals in meeting rooms. In [167] a circular microphone array is used around a loudspeaker to estimate the room geometry. A constrained room model and L1-regularized least-squares method is used to obtain the locations of walls. This method can be considered as semi-blind since it requires the knowledge of the number of walls. Acoustic imaging for finding room geometry and other acoustic properties of enclosure is applied in [86,87]. Acoustic imaging is based on the inverse extrapolation of the Kirchoff-Helmholtz and Rayleigh integrals. An acoustic image can be created by measuring multiple impulse responses, for example, on a line grid with B-format microphone [86, 87]. In [166], the location of the reflections is found by beamforming a speech signal. The direction of the source is found from the maximum direction and the direction of the reflections corresponds to smaller local maxima in the beamformer output. The TDOA between the reflection and the direct sound can be estimated from the beamformer output. From the directions and the TDOA the location of the reflector can then be deduced. The location of a planar reflector is estimated in two dimensions from direction of arrival estimates in [168]. An unconstrained least squares solution is developed for quadratic constraints that represent the reflection path parameters. In [169] the location of planar reflector is estimated in two dimensions using a white noise source and spherical beamforming [169]. A very similar approach is used in [170] where reflectors are localized in three dimensions using music signals and spherical beamforming. The difference is that a spherical microphone array is used in [170] and circular in [169]. 53 Related research The basic idea in [169, 170] is identical to the one presented in [166], the difference is in the beamforming techniques and in the TDOA estimation method. 3.4 Automatic calibration In principle, any general localization method can be used to calibrate the positions of the loudspeakers and the microphones in the measurement system. Previously, at least MLE for TOA or TDOA [109–111,171,172], LS for TOA [142], Multi-PHAT for TDOA [15], and beamforming [173] have been used to calibrate some parts of the measurement system. The requirement for the number of microphones and/or loudspeakers are given for different calibration cases in [109]. 3.5 Visualization of reflections The visualization of the reflections is an important step in studying them. A good visualization enables intuitive and quick inspection of the reflections and their properties. The reflections can be illustrated by overlaying them on top of an image as in [13, 85, 160, 174]. In [1] the directions of the reflections are plotted on top of the spectrogram of the impulse response. 3.6 Application areas Concert hall acoustics can be studied effectively by subjective listening tests [16,175]. The methodology used in [16] and [175] allows the comparison between objectively measured physical features of the concert halls and subjectively elicited attributes. It is not yet fully understood which physical properties of the concert hall acoustics explain the subjective perception of the acoustics [175]. The main motivation for the studies in this thesis is that it is thought that some properties of the acoustics of concert halls and other musical performance spaces can be explained by the features of the early reflections. As an example, the importance of temporal envelope preserving early reflections has been recently demonstrated in concert halls [176]. 54 Related research These reflections are reflected from flat surfaces. If some feature of the reflection is to be extracted, the location of the reflection is needed. Although the location can be calculated from the computer aided design schemes obtained from architectural design, this might be cumbersome if the geometry is complex. In addition, the architectural design schemes of the enclosure are not always available. Since the spatial room impulse responses are measured in the acoustic studies anyway, the localization of the reflections from them is a natural choice. In addition to the main motivation of this thesis, the location of the reflections are useful to know, for example, in acoustic source localization that utilizes reflections [15, 177–184]. Overall, these methods exhibit better performance than the traditional acoustic source localization methods when strong enough specular reflections are present. 55 4. Room impulse response measurement In a room environment, sound s(t), emitted from the sound source at position s, and received at receiver n at position rn , is affected by the impulse response h(t; rn , s): p(t; rn , s) = h(t; rn , s) ∗ s(t) + w(t), (4.1) where ∗ denotes convolution and, w(t) is the measurement noise, independent and identically distributed for each receiver. For simplicity, the impulse response measured at receiver n is noted with hn (t) in this section. 4.1 Standard measurement technique In the majority of cases, the impulse response is measured by playing back a signal s(t) from a loudspeaker and recording it with a microphone. The most popular signals for the source excitation are the sine-sweep [185, 186], the maximum length sequence [187], and the optimized timestreched pulse [188]. An estimate of the impulse response is then obtained by deconvolution of the received signal and the source signal ĥ(t; rn , s) = p(t; rn , s) ∗−1 s(t) + w̃(t), (4.2) where ∗−1 is the deconvolution operator and w̃(t) is the (i.i.d.) noise term. As well-known, deconvolution corresponds to division in the frequency domain. The signal to noise ratio in the impulse response measurement is defined by the ratio SNR [dB] = 10 log10 ! ĥ2 (tdir ) , E{w̃2 (t)} (4.3) where tdir is the time of arrival of the direct sound. The noise variance (the energy) can be approximated from the beginning of the impulse response, before the direct sound, or from the end, where there is no signal. 57 Room impulse response measurement The ISO-standard for room acoustic parameters states that an omnidirectional source is to be used [8]. By definition, the omnidirectional source emits equal amount of energy to all directions. In reality, according to the standard [8], small variations up to 6 dB are allowed in different frequency bands. 4.1.1 Sine-sweep technique The sine sweep source signal is given as [186]: t ω1 T exp log {ω2 /ω1 } − 1 , s(t) = sin log {ω2 /ω1 } T (4.4) where ω1 and ω2 are the lower and upper frequency of the sweep, and T is the total length of the sweep. The advantage of the sine-sweep signal over the maximum length sequence is that the harmonic distortion of the loudspeaker can be removed from the impulse response as pointed out by Farina, e.g. in [186]. Sometimes the sine-sweep with Eq. (4.4) is referred to as logarithmic sine sweep, since in logarithmic scale the frequency changes linearly. The SNR achieved with the sine-sweep technique is approximately from 60 to 90 dB in the measurements taken for this thesis. 4.1.2 On the use of natural sound sources Sometimes balloon bursts and gunshots [66, 70] are used as the source signal. In this case, the exact source signal is unknown, and therefore the emitted sound is usually assumed to resemble an impulse closely enough. However, at least balloon bursts have been shown not to fulfill the ISOstandard on the directivity of an omnidirectional source [189]. In addition, the balloon burst has a poor repeatability if the balloon type, the pressure, or the bursting technique changes [189]. 4.2 The sparse impulse response technique The ISO-standard measurement is well suited for the estimation of the traditional room acoustic parameters. However, here the interest is in the early reflections and their properties. With the omni-directional source, if the length of the reflection path, that is the path from the source via the reflections to the receiver, is equal with two or more reflections, then they 58 Room impulse response measurement arrive at the receiver at the same time and can not be localized properly. In practical situations, a short time window is used in the analysis of the reflections. Then the reflection paths need only to be approximately the same when they already overlap in the analysis window and interfere with the directional analysis. In addition, since the reflections in real situations are often not discrete events, they tend to spread over time and overlap with each other even more. Moreover, due to the physical limitations of the loudspeakers in dimensions and on the frequency band, even the emitted sound field is not a perfect Dirac-impulse, especially if the loudspeaker consists of several elements as shown in Fig. 2.3. Recently a novel measurement technique, the sparse impulse response technique, for the investigation of early reflections was developed in Publication I. The technique takes advantage of directional loudspeakers. A directional loudspeaker emits more sound in some directions than others. When a room impulse response is measured with such a directional loudspeaker, some reflections are excited with more energy than others. This way, some reflections have a better signal-to-noise or signal-to-interference ratio than others and should be more separable in the impulse response. 4.2.1 Measurement The impulse response measured with a directional loudspeaker that is directed to an angle {θs , φs } at time instant t is denoted with h(t, θs , φs ) and it is named in Publication I, as a sparse impulse response. Here, the loudspeaker is only rotated with respect to the z-axis, therefore φs is no longer used in the notation. In theory, if the loudspeaker has an infinitely narrow directivity, all the reflections that do not have exactly the same reflection path length should be separable in time and space. The idea is analogous to the ray tracing method [190], used, for example, in room acoustics simulations, where the rays are first sent from the source position and then observed in the receiver position. However, since infinitely narrow directionality is not practically achieved with loudspeakers, the idea is more analogous to beam-tracing [191] than ray-tracing. In the case of unequal reflection paths, and ideal reflections, the rotation angle that produces the largest absolute pressure at some time instant t gives the direction to which the loudspeaker was directed to produce the sound pressure observed in the receiver. Thus, the direction of the loudspeaker can be estimated as the maximum argument of the absolute 59 Room impulse response measurement pressure values of the sparse responses, i.e.: θ̂s (t) = arg max{|h(t, θs )|}. (4.5) θs When θ̂s (t) is used as an argument in the sparse impulse response as h(t, θ̂s (t)), an impulse response that includes the reflections from the strongest direction of the loudspeaker is formed. This response is named as the compound sparse response. The separability in space, and the knowledge of the geometry of the enclosure, allows the tracing of reflections at each time instant from the source to the receiver. In theory, by using only one microphone and an infinitely directive loudspeaker that produces Dirac-impulses, and having only ideal perfectly specular reflections, Eq. (4.5) will produce all the reflections that do not have equal path lengths. However, since in reality, the impulses are not perfect Dirac-impulses, not all the reflections are separable in real situations. 4.2.2 Comparison to other techniques and discussion Other authors have also spanned a directional loudspeaker around its axis to achieve more spatial separation. Günel was the first to present this idea in room acoustic measurements [5]. In [5] a loudspeaker is directed to different angles around its z-axis and a B-format microphone is at a fixed length and direction with respect to the loudspeaker. Antonacci et al. also span a directional loudspeaker around its z-axis [3, 4]. The setup is otherwise the same as in Günel’s method but the B-format microphone is replaced with an omnidirectional microphone. The difference of the proposed method and other methods, is that the sparse impulse response and the compund sparse impulse response can be measured with any loudspeaker and microphone setup whereas other methods are designed for setup where a microphone and a loudspeaker are interconnected. Moreover, Günel’s method only considers one impulse response in one direction at a time, whereas the presented method considers all the directions simultaneously in the compund sparse phase presented in Eq. (4.5). Thus, the presented method is designed to replace the traditional single source impulse response measurement, when the other presented methods with loudspeaker spanning are specially designed for a certain measurement task, e.g. room geometry estimation as in [5]. The spatio-temporal separability of the reflections can be achieved by using directional microphone or directional loudspeakers. Here, the ad- 60 Room impulse response measurement vantages or disadvantages of the directional loudspeakers over the directional microphones are not studied. A comparison is made only with the traditional omni-directional and the directional loudspeakers measurements. 4.3 Experiments The goal is to compare the proposed technique with two loudspeakers to the standard measurement technique in same conditions. The loudspeakers are Genelec 1029 A, Panphonics panel loudspeaker, and a standard omni-directional loudspeaker. The directivities of the sources is discussed and depicted in Section 2.4. The impulse responses are measured using the sine-sweep technique at 48 kHz, and the frequency band is from 40 Hz to 24 kHz. 4.3.1 Setup Experiments are conducted in two auditoria illustrated in Fig. 4.1. Auditorium 1 has a volume of 250 m3 . The auditorium was stripped of all furniture and has a shoebox shape. The acoustic center of the source and location of the microphone array center are shown in Fig. 4.1(a). In addition, the height for both the source and the array was set to 1.4 m. The G.R.A.S microphone array with 6 microphones, shown in Fig. 2.1(b) is used in both receiver locations. In Auditorium 2, shown in Fig. 4.1(b), the audience area has an inclination of about 10 degrees, as the height of the auditorium decreases from about 8 m to 5 m, leading to a volume of 1800 m3 . One source position and two array positions were used in the experiments of Auditorium 2. The height of the source and the array in this auditorium were about 1.2 m from the floor level. Each of the receiver locations in Auditorium 2 has the TKK-3D 12 -microphone array illustrated in Fig. 2.1(a). 4.3.2 Results Figure 4.2 shows examples of the sparse impulse responses measured at every 10 degrees in azimuth angle with Genelec 1029A and the Panphonics loudspeaker from Auditorium 2. The corresponding compound impulse responses are shown below the sparse responses, in Fig. 4.2. Visual in- 61 Room impulse response measurement 9 8 S2 7 S1 Y [m] 6 5 4 3 R2 2 R1 1 0 0 1 2 3 4 X [m] 5 6 7 (a) Auditorium 1 Source position Array positions J 14.5 m J J (0,0) J x J y 25 m (b) Auditorium 2 Figure 4.1. Array (R) and source (S) positions and the floorplans of the auditoriums. 62 Room impulse response measurement spection shows that, when compared to a response measured with an omnidirectional source in the same position, shown in the bottom plot of Fig. 4.2, the proposed measurement technique provides higher peaks that can be easily recognized with both tested loudspeakers. Although the sparse response for the Panphonics loudspeaker is shown from 0 to 350 degrees in Fig. 4.2, only the angles from 0 to 170 degrees are used in the analysis with the Panphonics loudspeaker due to the dipole directivity pattern. In addition to the visual inspection, the performance of the impulse response measurement can be verified by counting the number of recognizable reflections within the impulse response. A good impulse response for the reflection tracing task is the one that has more identifiable reflections. Here, the number of recognizable reflections is calculated using the local energy ratio [81]. The identification of a reflection is based on the relation between the absolute sound pressure in a small analysis window and the current absolute sound pressure. The local energy is calculated for the directional sources from the compound sparse responses as: Eloc (t) = 1 Tloc Z τ =t+Tloc /2 wH (t)|h(τ, θ̂s (t))|dτ (4.6) τ =t−Tloc /2 where wH (t) is a Hanning window function of length Tloc = 128 samples (2.67 ms). Note that, unlike in [81], here a Hanning windowing function is used. The decision whether the sample is a reflection or not, is given by [81]: 1, if |h(t, θ̂s (t))| > εEloc (t) hrefl (t) = 0, otherwise, (4.7) where ε is the threshold value for the detection. For the omnidirectional source, the detection procedure is the same with the exception that the standard impulse response is used instead of the compound sparse response. The number of identified reflections, noted here with K, with respect to the threshold ε is shown in Fig. 4.3. The results are averaged over all the measured impulse responses for each auditorium. That is, for Auditorium 1 and 2, the results are averaged over 24 measurements for each loudspeaker type. In Auditorium 1 the 24 measurements consists of a single source position and 12 microphones of the TKK-3D array in two different receiver locations. In Auditorium two the 24 measurements include two source positions and six microphones of the G.R.A.S. microphone array in two receiver positions. 63 Room impulse response measurement The results indicate that the proposed measurement technique provides more recognizable reflections than the standard measurement technique. With an arbitrarily selected threshold of ε = 4, the omnidirectional source, Genelec 1029A, and the Panphonics panel loudspeaker give 2, 61, and 132, reflections for the Auditorium 1, and 3, 101, and 169, for Auditorium 2, respectively. Thus, the more directional the loudspeaker is, the more individual reflections can be identified. The number of identified reflections depends strongly on the threshold. However, the order of the number of detected reflections with different loudspeakers stays the same, no matter what threshold value is selected. In addition, as expected, the larger space (Auditorium 2) has more identifiable reflections. As the distance between the individual reflections is longer in a larger space the reflections become more separable. 64 Room impulse response measurement Sparse Response h(t, θ ) s 300 250 200 s Azimuth θ / Amplitude 350 150 100 50 |h(t)| max{|h(t,θ )|} s 0 Compound Response 1 0.5 0 1 Omni 0.5 0 0 0.05 0.1 Time t, [s] 0.15 0.2 (a) Genelec 1029A Sparse Response h(t, θs) Azimuth θs / Amplitude 350 300 250 200 150 100 50 max{|h(t,θs)|} 0 Compound Response 1 |h(t)| 0.5 0 1 Omni 0.5 0 0 0.05 0.1 Time t, [s] 0.15 0.2 (b) Panphonics Figure 4.2. Sparse impulse responses with (a) Genelec 1029A and (b) the Panphonics loudspeaker on a wide band from 40 Hz to 24 kHz in Auditorium 2. The panphonics loudspeaker provides sharper peaks to the sparse impulse response due to its higher directionality. The maximum is normalized to 0 dB for the compound sparse responses and the response measured with the standard method. 65 Room impulse response measurement Number Of Reflections log10{K} 4 Omni Genelec Panphonics 3 2 1 0 −1 1 2 3 4 Threshold ε 5 6 2 3 4 Threshold ε 5 6 (a) Auditorium 1 Number Of Reflections log10{K} 4 3 2 1 0 −1 1 (b) Auditorium 2 Figure 4.3. Number of identified reflections versus the local energy ratio threshold. The selected threshold (ε = 4) is depicted with a thick black line. The proposed measurement technique with Genelec 1029A and the Panphonics loudspeaker provides more spatial separability than the standard measurement technique with omnidirectional source since more reflections are found. 66 5. Localization Methods This chapter presents methods from earlier research that are applied in this thesis for the localization of reflections. Also some novel ad-hoc localization functions are proposed. It should be noted, that the analysis does not differentiate between a reflection and other acoustic phenomena, but all the sound waves arriving at the receivers are considered reflections and treated in the same manner. 5.1 Signal Model The spherical wave propagation model (see Chapter 2) is assumed. The formulations are the same for plane wave propagation model with the exception that the TOA and TDOA terms are replaced by those given in Eqs. (2.38) and (2.40), respectively. The only exception is the sound intensity vector based localization which is only capable of direction of arrival estimation and assumes always plane wave propagation model. The assumed signal model is the following h1 (t) = a1 s(t − t1 ) + w1 (t) h2 (t) = a2 s(t − t2 ) + w2 (t) .. .. . . (5.1) hN (t) = aN s(t − tN ) + wN (t), where the noises wi (t), i = 1, · · · , N are normally distributed and uncorrelated with each other and with the loudspeaker impulse response s(t). 67 Localization Methods Then, in the frequency domain, the signal model is given as H1 (ω) = A1 (ω)S(ω)e−jωt1 + W1 (ω) H2 (ω) = A2 (ω)S(ω)e−jωt2 + W2 (ω) .. .. . . (5.2) HN (ω) = AN (ω)S(ω)e−jωtN + WN (ω), where the signal, noise, and received signal have spectral densities Gs,s (ω) = E[S(ω)S ∗ (ω)], Gw1 ,w1 (ω) = E[W1 (ω)W1∗ (ω)], and Gh1 ,h1 (ω) = E[H1 (ω)H1∗ (ω)], respectively. The amplitudes An (ω), n = 1, · · · , N are dependent on the distance from the source to the microphones, the directivity of the source and of the microphones, the properties of the reflective surfaces, and the air absorption. Here the amplitudes are assumed to be equal to unity, i.e., An (ω) = 1, ∀n, ω. This model is assumed for simplicity in the cases studied in this thesis since omnidirectional microphones are used, the aperture size of the microphone array is small, and the loudspeaker is in the far-field. Moreover, it is assumed that the reflections can be windowed from the spatial impulse responses. 5.2 Time difference of arrival estimation In the TDOA estimation, the task is to estimate the time delay τi,j = ti −tj between two received signals hi (t) and hj (t). The maximum argument of the estimation function Rhi ,hj (τ ) is the TDOA estimate, i.e., τ̂i,j = arg max{Rhi ,hj (τ )}. (5.3) τ Next, two approaches used in previous research for TDOA estimation are formulated. 5.2.1 Generalized correlation method The most used TDOA estimation approach is the generalized correlation method [25]. The generalized cross correlation (GCC) function between two received impulse responses hi and hj is calculated as [25]: −1 RGCC {W(ω)Ĝh1 ,h2 (ω)}, h1 ,h2 (τ ) = F (5.4) where W(ω), and F −1 , are the weighting function, and inverse Fourier transform, respectively. 68 Localization Methods Maximum likelihood estimation The well-known maximum likelihood weighting is given as [25] (ω) = WhMLE 1 ,h2 where Ch1 ,h2 (ω) 1 |Gh1 ,h2 (ω)| [1 − Ch1 ,h2 (ω)] (5.5) |Gh1 ,h2 (ω)|2 (5.6) Gh1 h1 (ω)Gh2 h2 (ω) is the magnitude squared coherence function. For the derivation of the Ch1 ,h2 (ω) = MLE weighting function see [25]. Since the noises are assumed to be uncorrelated, the true spectral densities can be written as [25] Gh1 ,h2 (ω) = Gs,s (ω)e−jωτi,j , (5.7) Gh1 ,h1 (ω) = Gs,s (ω) + Gw1 ,w1 (ω), and (5.8) Gh2 ,h2 (ω) = Gs,s (ω) + Gw2 ,w2 (ω) (5.9) Then, by using these equivalences in Eq. (5.5), one has (ω) = WhMLE 1 ,h2 Gs,s (ω) . Gw1 ,w1 (ω)Gw2 ,w2 (ω) + Gs,s (ω)Gw1 ,w1 (ω) + Gs,s (ω)Gw2 ,w2 (ω) (5.10) In practical situation, since the signal is an impulse response, it is easy to estimate the noise auto power spectral density Gw1 ,w1 (ω) from the beginning of the impulse response. Then, the auto spectral density of the source signal is obtained from Eq. (5.8), e.g., Gs,s (ω) = Gh1 ,h1 (ω) − Gw1 ,w1 (ω). The MLE weighting then formulates to WhMLE (ω) = 1 ,h2 Gh1 ,h2 (ω)× {Gw1 ,w1 (ω)Gw2 ,w2 (ω)+ (5.11) [Gh2 ,h2 (ω) − Gw1 ,w1 (ω)]Gw1 ,w1 (ω)+ [Gh1 ,h1 (ω) − Gw2 ,w2 (ω)]Gw2 ,w2 (ω)}−1 . By assuming that the spectral densities of the noise signals are equal Gw,w (ω) = Gw2 ,w2 (ω) = Gw1 ,w1 (ω), one has WhMLE (ω) = 1 ,h2 Gh1 ,h2 (ω) . (Gh2 ,h2 (ω) + Gh1 ,h1 (ω))Gw,w (ω) − G2w,w (ω) (5.12) Note that there are three options for estimating Gs,s (ω) and two options for estimating Gw,w (ω). One possibility is to estimate Gs,s (ω) as the (weighted) average over the different estimates, and insert them in to WhMLE (ω) = 1 ,h2 1 . 2Gw,w (ω) + G2w,w (ω)/Gs,s (ω) (5.13) If the noise can not be estimated, the first version of the MLE weighting in Eq. (5.5) can be used, but the coherence should then be estimated using for example Welch’s approach [26, 192]. Coherence estimation can be problematic for non-stationary signals [92]. In addition, since it includes additional computational load, it is not used in this thesis. 69 Localization Methods Other weighting functions Practical weighting functions that do not require estimation of the noise auto power spectral densities exist. In this thesis, direct cross correlation (CC) weighting [25] W CC (ω) = 1 (5.14) and phase transform (PHAT) are used WhPHAT (ω) = 1/kGhi ,hj (ω)k. i ,hj (5.15) 5.2.2 Average square difference function Similar to the generalized correlation method, are the difference function based methods [96]. In these methods, two signals are subtracted from each other, while the other signal is delayed by the TDOA. Here, the average squared difference function (ASDF) is also tested [95, 96]: RASDF hi ,hj (τ ) = Z T /2 −T /2 [hi (t) − hj (t − τ )]2 dt, (5.16) where T is the length of the integration window. With ASDF, instead of the maximum, the minimum argument of the estimation function is the TDOA estimate τ̂i,j = arg min{RASDF hi ,hj (τ )}. (5.17) τ 5.3 Time of arrival estimation In time of arrival estimation, the delay tn of a signal is estimated. In a short time window the maximum argument of the estimation function Dn (t) is the TOA estimate t̂n = tstart + arg max{Dn (t)}, (5.18) t where t is limited by the starting point, and the ending point of the time window, i.e., tstart < t < tend . Since the problem is similar to TDOA estimation, also the TDOA estimation methods introduced above can be applied for TOA estimation. This requires the knowledge of the source signal. 70 Localization Methods 5.3.1 Auto correlation method This method requires a priori information of the sound source used. First, a reference s(t) is measured for the sound source in free-field conditions: in an anechoic chamber, or it can be windowed from an in-situ impulse response. The reference represents the waveform of the emitted impulse response from the source. The reference is then correlated with the impulse response AC (t) = Ds,h 1 Z T /2 s(ξ)h1 (ξ + t)dξ, (5.19) −T /2 where AC denotes auto correlation, and T ms is the length of the short time analysis window. Defrance et al. use similar auto correlation approach for detecting reflections from a single impulse response [65, 66]. In addition, similar auto correlation method has been used to detect the TOA of a reflection as a preliminary task before absorption coefficient calculations [42]. Maximum likelihood estimation The autocorrelation function can be given in the frequency domain as the generalized correlation function −1 DAC {Ws,h1 (ω)Gs,h1 (ω)} s,h1 (τ ) = F (5.20) By definition, the maximum likelihood weighting also for this method is given by Eq. (5.5). Since the other signal is the true signal without noise the spectral densities can be written as Gs,h1 (ω) = Gs,s (ω)e−jωt1 , and (5.21) Gh1 ,h1 (ω) = Gs,s (ω) + Gw1 ,w1 (ω). (5.22) Then, the MLE weighting for the auto-correlation method is given as Cs,h1 (ω) 1 |Gs1 ,h1 (ω)| [1 − Cs,h1 (ω)] = ··· 1 = Gw1 ,w1 (ω) MLE−AC Ws,h (ω) = 1 (5.23) (5.24) (5.25) where Cs,h1 (ω) is the magnitude squared coherence between s and h1 . The analogy between the AC method for TOA and the generalized correlation method for TDOA is obvious. The difficulty with the AC method is that, a real loudspeaker emits different impulses in different directions. 71 Localization Methods Thus, the method requires the response of the loudspeaker in each direction as a priori knowledge. This can be artificially done using the sparse impulse response technique as in Publication I. 5.3.2 Maximum absolute pressure Peak detection is a straightforward method to detect the TOA of a sound wave. It is assumed that the arriving sound wave introduces an impulse, a local maximum or minimum, that can be detected. The maximum argument is then the estimated TOA t̂n = arg max{|hn (t)|}. (5.26) t This may also include some windowing or filtering. 5.3.3 Other methods The statistical features of impulse response differ when there is a reflection present in the analysis window [53, 63, 70, 104]. One way of measuring the statistical difference is the kurtosis [104]. Other option is to detect the peak from a local absolute pressure ratio between the current absolute pressure and its surroundings [81]. Here, these statistical approaches are no longer pursued in the TOA estimation. 5.4 Localization functions When robust 3-D or 2-D localization is required, the TOA or TDOA information is combined spatially over several microphones and microphone pairs, respectively. Three commonly used state-of-the-art acoustic source localization functions are formulated next for TOA, TDOA, and their combination. This leads to nine different localization functions in total. That is, for each dataset (TOA, TDOA, or their combination) three methods are formulated. In addition, the methods are compared to a MLE function designed for the signal model. Also some least-squares localization approaches and sound intensity vector based methods are discussed. For each method, the maximum argument of the localization function P (x) is the location estimate, i.e. x̂ = arg max{P (x)}. x 72 (5.27) Localization Methods For notational convenience, a TOA, t(rn ; x), is denoted by tn (x), where n = 1 . . . N , and N is the number of microphones. In addition, a TDOA, τ (ri , rj ; x), is denoted by τm (x), where m = {i, j} = 1 . . . M is a tuple, and M is the number of microphone pairs. The TDOA estimates are denoted with τ̂m , and the TDOA estimation function Rhi ,hj (τ ) with Rm (τ ). In this thesis, the number of microphones is N = 6, and the number of microphone pairs is M = 15. Then, the microphone pairs m from 1 to 15 are {{1, 2}, {1, 3}, {1, 4}, {1, 5}, {1, 6}, {2, 3}, {2, 4}, {2, 5}, {2, 6}, {3, 4}, {3, 5}, {3, 6}, {4, 5}, {4, 6}, {5, 6}}. 5.4.1 Maximum likelihood estimation for time of arrival and time difference of arrival The MLE function for TDOA is given as the joint probability density function [109] PMLE-TDOA (x) = M Y p(τ̂m ; τm (x)) = (5.28) m=1 = exp(− 21 [τ̂ − τ (x)]Σ−1 [τ̂ − τ (x)]T ) p , (2π)(M )/2 det(ΣTDOA ) (5.29) where p(τ̂m ; τm (x)) is the normal error probability density function for a TDOA estimate, τ̂ = [τ̂1 , τ̂2 , . . . , τ̂M ], τ (x) = [τ1 (x), τ2 (x), . . . , τM (x)], 2 ΣTDOA = IσTDOA , (5.30) (5.31) (5.32) with σTDOA as the standard deviation of the error and τm (x) is given by Eq. (2.39). The MLE function for TOAs, assuming normally distributed errors is given as [109] PMLE-TOA (x) = N Y p(t̂n ; tn (x)) = (5.33) n=1 = exp(− 21 [t̂ − t(x)]Σ−1 [t̂ − t(x)]T ) p , (2π)(N )/2 det(ΣTOA ) (5.34) where p(t̂n ; tn (x)) is the normal error probability density function for a TDOA estimate, t̂ = [t̂1 , t̂2 , . . . , t̂M ], t(x) = [t1 (x), t2 (x), . . . , tM (x)], 2 ΣTOA = IσTOA , (5.35) (5.36) (5.37) 73 Localization Methods with σTOA as the error standard deviation and τm (x) is given by Eq. (2.37). For combining the TOA and TDOA information with MLE an assumption is made, that the TDOA and TOA have independent errors. Then, the MLE function for CM is given as the multiplication of MLE-TOA and MLE-TDOA functions: PMLE-CM (x) = PMLE-TOA (x, σTOA )PMLE-TDOA (x, σTDOA ). (5.38) 2 2 If different error variances σTOA and σTDOA are assumed for TOA and TDOA, respectively, the MLE-TOA and MLE-TDOA functions have dif2 2 ferent weightings. In Publication II, it is found that σTOA = σTDOA is a reasonable choice. The measurement errors of the TDOAs and TOAs can be highly correlated if certain TOA and TDOA estimation methods are used. As a consequence, the covariance matrix of the combined method is no longer a diagonal matrix as assumed above. A further investigation should be conducted to study which of the estimators produce errors that correlate. In Publication II the maximum absolute pressure ise used for the TOA estimation and the direct cross correlation for the TDOA estimation and it is found that in most of the cases the errors do not correlate, i.e. the covariance matrix is diagonal. If the TDOAs are directly calculated from the estimated TOAs then the combined method will have the same performance as the MLE-TOA and will not gain any advantage. 5.4.2 Maximum likelihood estimation for the signal model Earlier, the maximum likelihood estimation was formulated with respect to TOA and TDOA estimates. It is also possible to formulate the MLE directly with respect to the source signal and the measurement noise [112–117] P (x) = Y p(H(ω); x) = ω = Y exp(−1/2[H(ω) − D(ω, x)S(ω)]Q−1 (ω)[H(ω) − D(ω, x)S(ω)]) p , (2π)N/2 det(Q(ω)) ω (5.39) where H(ω) = [H1 (ω), H2 (ω), . . . , HN (ω)]T , −jωt1 (x) D(ω, x) = [e Q(ω) = IσF2 . 74 −jωt2 (x) ,e (5.40) −jωtN (x) T ,...e ] , (5.41) (5.42) Localization Methods where σF2 = E{Gw,w (ω)} is the expected noise variance and it is assumed constant for all frequencies. Under the assumption on the independent errors, and using the log-likelihood leads to a maximization function [112, 114, 116, 117] LMLE-S (x) = Z | N X ω n=1 Hn (ω)ejωtn (x) /σF |2 dω. (5.43) This approach is denoted with MLE-S and it stands for MLE for the signal model. 5.4.3 Steered response power A popular family of TDOA-based acoustic source localization functions is the SRP methods. In these methods, the acoustic source localization likelihood is evaluated as a spatial combination of cross correlation functions Rm (τ ) for each location candidate, denoted with x [106, 107]: PSRP-TDOA (x) = 1/M M X Rm (τm (x)). (5.44) m=1 The SRP using generalized correlation method with PHAT weighting is commonly referred to as SRP-PHAT function, introduced originally in [107]. The signals can be similarly steered using TOAs, as the TDOA estimation functions were steered using TDOAs. In steered beamforming the signals are artificially steered by delaying them towards a location. The sum-and-delay beamformer is considered as the most basic case of beamforming [193]. When the sum-and-delay beamformer output is squared the output is SRP [106] PSRP-TOA (x) = Z N 2 X hn (t − tn (x)) dt. 1/N (5.45) n=1 This function is the same as MLE with the signal model in Eq. (5.43) without the variance term. However, if Eq. (5.46) is implemented in the frequency domain, the TOA information is lost, since SRP-TOA becomes the same as SRP-TDOA with an additional (constant) energy term [151, 152]. Since the room impulse responses are already directly mapped into the TOAs, the time variable becomes t = 0. The time integral over dt then has no effect on the localization function and Eq. (5.45) is written as N 2 X PSRP-TOA (x) = 1/N hn (tn (x)) , (5.46) n=1 75 Localization Methods which is computationally more efficient implementation of the SRP-TOA than the first one. The TOA and TDOA information can be both used to measure the position of a reflection. Intuitively, the next step is to combine both TOA and TDOA information. The SRP function, when TDOA and TOA information are both used, is here proposed to be calculated as PSRP-CM (x) = (1 − W )PSRP-TOA (x) + W PSRP-TDOA (x), (5.47) where CM stands for combined method, and 0 < W < 1 is a weighting factor, included in this function since the steered response is effectively used twice in SRP-CM. 5.4.4 Maximum pseudo-likelihood Recently it was shown in [108] and [153] that the use of multiplication instead of addition is advantageous in the steering function. This leads to a pseudo-likelihood function [108, 152, 153] PPL-TDOA (x) = M Y Rm (τm (x)), (5.48) m=1 where PL stands for pseudo-likelihood. It should be noted that thresholding and shaping has to be done for the TDOA estimation functions so that they are non-negative pseudo-likelihoods [15]. It is straightforward to show that, if the maximum of TDOA estimation function is modeled with a probability density function, PL-TDOA and MLE-TDOA methods are the same methods. Here it is proposed that the PL function for TOA is formed by multiplying the individual TOA estimation functions, i.e., PPL-TOA (x) = N Y Dn (tn (x)). (5.49) n=1 Thresholding and shaping can be done for the TOA estimation functions so that they are non-negative pseudo-likelihoods. In the simplest case, the TOA estimation function is the absolute maximum of the room impulse response: PPL-TOA (x) = N Y n=1 |hn (tn (x))|. (5.50) The analogy between PL-TOA and MLE-TOA is the same as with TDOAs. If only one maximum is selected in PL-TOA from the impulse response, 76 Localization Methods and the corresponding TOA is assigned with an error probability density function, PL-TOA and MLE-TOA are the same methods. The combined maximum pseudo-likelihood is here proposed to be the multiplication of the PL-TOA and PL-TDOA functions PPL-CM (x) = PPL-TOA (x)PPL-TDOA (x). (5.51) As in MLE-CM, also in PL-CM, weighting can be applied for PL-TOA and PL-TDOA functions. If the shaping functions for PL-TOA and PL-TDOA are selected as probability density functions, then PL-CM is equal to the MLE function. Note that here the weighting of PL-TOA or PL-TDOA similarly as SRP-TOA and SRP-TDOA in SRP-CM has not effect, since the weighting will not change the maximum of PL-CM. However, although the PL-CM cannot be weighted, the logarithmic version of it can be, i.e., λPL-CM (x) = (1 − W ) log{PPL-TOA (x)/N } + W log{PPL-TDOA (x)/M }, (5.52) where the log-pseudo-likelihoods of TOA and TDOA are normalized with N and M , respectively. The weighting W is ad-hoc weighting and does not correspond to anything in theory. 5.4.5 Least squares localization approaches When independent and normally distributed errors are assumed for the MLE-TOA, MLE-TDOA, or MLE-CM, it follows from the properties of the normal distribution, that the mean square error function (MMSE) of the estimates is also the MLE function of estimates [106, 194]: PMMSE (x) = M X (θ̂m − θm (x))2 , (5.53) m=1 With MMSE the minimum argument is the position estimate instead of the maximum. The solution in Eq. (5.53) is of least squares form. Possibly the most straightforward solution for TOA and TDOA data is the unconstrained least squares (ULS). Table 5.1 lists some of the optimization methods and closed form solutions used for the least squares problem of TOA and TDOA. Possibly some other optimization methods have also been proposed for the problem, but the main focus in this work is not in the optimization methods. In principle, any well behaving global optimization algorithm can be used for the problem, as long as the initial guess given for the algorithm is good enough. 77 Localization Methods As shown in Table 5.1, interestingly, the ULS solution for TOA has not been presented for planar waves. However, since the plane wave equations are linear, the ULS solutions are trivial to formulate. The ULS solution for spherical wave propagation model with TOA and TDOA can be formulated by integrating solutions in [139] and [118]. 5.4.6 Sound intensity vector based localization Sound intensity measurement assumes plane wave propagation model. Therefore, with the microphone array used here, only the direction of the arriving sound can be achieved. The direction of the arriving sound wave can be estimated as the spherical mean (SME) of the sound intensity vectors over a frequency band [199] n̂ = where S= S , kSk ω2 X I(ω) (5.54) (5.55) ω1 with I(ω) = I(ω)/kI(ω)k, which is the amplitude normalized version of the discretized sound intensity vector. The length of each sound intensity vector is first normalized to unity based on the results in Publication III, where the normalized vectors are found to provide more noise robust results than the unnormalized ones. In Publication III four other possibilities for estimating the direction of arrival from the sound intensity vectors are presented and discussed. Although the methods in Publication III are given in 2-dimensions they can be extended to 3 dimensional data by using spherical probability density functions instead of circular. It is shown in Publication VI that the sound intensity vector based methods do not perform as well in the direction estimation of the reflections as the TDOA based methods. This is due to the limited frequency band, that is a feature of sound intensity vector based direction estimation. 5.5 Examples of the localization maps Examples of localization maps with different methods are provided in Fig. 5.1. The data is a simulated perfect reflection with no noise at (2,11,1.5) m, and the array is at (0,0,0) m. As can be seen the TDOA based methods 78 Localization Methods Table 5.1. Some of the least squares localization approaches that have been applied for time of arrival (TOA) and time difference of arrival (TDOA)-based localization. Wave model Data Reference Solution / Optimization Method Spherical TDOA [118] ULS [119] SI, SX, PX [120] SX [121] TWLS [124] EULS [125] CLS 1 TOA TOA & TDOA [126] ULS [127] WCLS [109, 129] LM [130] ALS [131]1 TWLS [132] PSO [133] SDP [134] CRM [139] ULS, NCLS [140] ALS [109] LM [141]2 TWLS, MMA None ULS [195] - [196] ALS [197]3 - TDOA [198] ULS TOA None ULS TOA & TDOA None ULS Planar ALS: Approximate least squares through Taylor-Series expansion, CLS: Constrained least squares, CRM: Convex relaxation methods, EULS: Extended unconstrained least squares, LM: Levenberg–Marquardt method, MMA: Min-max algorithm, NCLS: Nonconvex constrained least squares, PSO: Particle swarm optimization, PX: Plane intersection, SI: Spherical interpolation, SX: Spherical intersection, SDP: Semi-definite programming, TWLS: Two-step weighted least squares, ULS: Unconstrained least squares, WCLS: Weighted constrained least squares, 1 : Used for joint speed of sound and position estimation, 2 : TOA and unknown time term, 3 : Review article. 79 Localization Methods Data / X-coordinate [m] TDOA TOA CM 15 15 10 10 10 5 5 5 SRP 15 0 −5 0 5 0 −5 0 0 −5 5 15 15 10 10 10 5 5 5 0 5 0 5 0 5 Localization function / Y-coordinate [m] PL MLE 15 0 −5 0 5 0 −5 0 0 −5 5 15 15 15 10 10 10 5 5 5 0 −5 0 5 5 −50 0 5 0 −5 Figure 5.1. Examples of acoustic source localization functions for a grid of (x, y, z)locations with z = 1.5 m in the case of no noise. The microphone array, with six microphone with spacing of 100 mm, at (0,0,0) m is denoted with a star, and the reflection at (2,11,1.5) m is denoted with a circle. provide good information about the direction where as the TOA based methods seem to work well in the distance estimation. When the TOA and TDOA are combined a better localization method is made. As seen in Fig. 5.1 SRP methods have more "ghosts" than other methods, i.e., local maxima that do no correspond to the true reflection location. In this example, the simplest search of the maximum is presented. That is, the maximum can be found using a predefined grid of locations. However, this is often not very efficient, therefore some other methods for the search of the maximum are discussed next. 80 Localization Methods 5.6 Search of the extremum Basically any global optimization method can be used for the search of the extremum. In general, there is no way of ensuring that the global optimization method will converge to the global extremum since localization with spherical wave propagation model is a non-linear problem. Therefore there is usually a need for Monte-Carlo simulations to validate the optimization method for a certain problem. Since the literature on optimization methods is extensive, only some selected methods used for localization are discussed here. In addition to the closed form solutions and optimization methods listed in Table 5.1 used for TOA and TDOA based methods, other optimization methods have been proposed for other ASL functions. Especially the search of the maximum of the SRP-PHAT function has been of interest [144, 147–150, 200, 201]. The most naive and straightforward method for the search of the maximum is to use a (predefined) grid of location candidates. The drawbacks of this approach is the slowness of the computation when the grid size is large. Namely, 3D grid of a volume of say a concert hall, the number of data points becomes very large, thus the estimation meets the curse of dimensionality. The number of data points naturally depends on the selected grid spacing. However, since the evaluation of the ASL function is the same at each selected time instant for any data point, the process can be parallelized as in [202]. Using parallel computation decreases the time used for the evaluation in total, but requires special implementation considerations and special equipment, such as the general purpose graphic processing unit. Specially designed sequential Monte-Carlo methods, a.k.a. particle filtering, can be used to track speech and other sources [15, 108, 152, 203– 205]. The advantage of particle filtering is that only a small subset of samples is needed to represent the underlying probability distribution. For reflection localization, particle filtering approaches are not useful since the reflections are not moving targets but discrete events in the spatial room impulse response. However, particle swarm optimization [206] has similar features as particle filtering, i.e. it includes a randomization step, and it has been used, for example, with the LS approach [132] and with the MLE [207]. It could also be applied to other ASL functions. In this thesis, the well-known Nelder-Mead method is used to find the 81 Localization Methods extremum in the ASL functions [208]. The Nelder-Mead method requires a proper initial guess in the source localization problem for the parameters to be estimated. 5.7 Automatic calibration of the loudspeaker and microphone positions When the room impulse responses have been measured, the calibration of the loudspeaker and/or microphone positions, and the estimation of speed of sound in the measurement system can be done from the direct sound, which is the first event in the impulse response. In principle, any of the above methods can be used to localize the loudspeakers and/or the microphones. Raykar et al. have listed the number of required microphones and loudspeakers in different calibration schemes [109]. 5.8 Localization of reflections After the direct sound, the rest of the events in the room impulse response are reflections. The processing of the spatial impulse response measured with a compact microphone array is done in short time windows [1, 12, 13, 81, 160], and [Publication I]. The analysis window size is selected so that it includes as few reflections as possible but it is still possible to do some processing for the data in the window. Using proper time windowing, the reflections can be temporally and spatially separated. Since the maximum intra-sensor distance in the microphone array is 10 cm, the minimum time window length is about 0.3 ms. Based on previous knowledge [1, 12, 81], and [Publication I], a good window size for the analysis of early reflections is approximately from 1 ms to 4 ms. Naturally, the number of reflections arriving within one window depends on the echo density defined by Eq. (2.49). Echo density states that the larger the room, the larger the temporal and spatial spacing between the reflections. In addition, the smaller the time interval, the less reflections within a time window. In this work, it is assumed that there is only one reflection present per analysis window. This is generally true for the first order reflections with the suggested 4 ms analysis window in large spaces, such as auditoriums 82 Localization Methods and concert halls. Using Eq. (2.49), on average, only one individual reflection should be present in an analysis window, when r dNr V t≤ . dt 4πc3 (5.56) For example, if V = 1800 m3 , dt = 0.004 s, and c = 345 m/s, then less than two reflections dNr = 2 are present in the analysis window until about t < 0.042 s after the direct sound, which corresponds to about 14.4 m in distance. In practice, the number of reflections within a window greatly depends on the location of the source and of the receiver, Eq. (5.56) can be seen as a guideline. The case where there are more than one reflection present within one analysis window is left for future research. In principle, it is the same problem as the multiple source localization problem, and some of the methods used for that problem, e.g. [209], should also be applicable here. With the assumption of only one reflection per analysis window, the measurement noise is the only aspect corrupting the localization results. A recognizable feature, also shown in Fig. 2.7, is the fact that the signalto-noise ratio (SNR) decreases as the time increases. Thus, the reflections that arrive later in time have lower SNR. 5.9 Computational complexity of the localization methods Although reflection localization within the framework of this thesis is always an offline task, some comparison between the complexity of the methods is provided. The complexity is compared with the ’Big O notation’, O(·). For basic beamforming the complexity is built up from the number of ASL function evaluations E, the length of the signal L, and the number of the microphones N . For cross correlation the complexity of the estimation function is O(L log{L}) and since all the microphones are used twice in the calculation of the ASL function the complexity increases by O(L2 ). [15] Moreover, the complexity of the TOA estimation with the simple peak picking method is O(L). For TOA estimation with AC approach the complexity is O(L log{L}), but that approach is not used here. Since the MLES method calculates the ASL function over a frequency band, its complexity is increased by the number of frequencies used O(F ). Table 5.2 lists the computational complexity of the methods introduced 83 Localization Methods Table 5.2. Computational complexity of the localization methods in the reflection localization task. Data Method Complexity TOA MLE-S O(EL log{L}N F ) TOA SRP, PL, & MLE O(N L + N E) TDOA SRP, PL, & MLE TOA & TDOA SRP, PL, & MLE O(N 2 L log{L} + EN 2 ) O(N L + N E + N 2 L log{L} + EN 2 ) E: Number of ASL function evaluations, L: The length of the signal, N : The number of the microphones, and F : The number of the frequency bins in this chapter. The TOA-based methods have lower computational complexity than the other methods since the room impulse responses are directly mapped into the TOAs. As the number of evaluations increases, the computational complexity and time of MLE-S increases. This results was also pointed out by Korhonen for the time domain beamformer [15]. However, when the number of evaluations increases, the computational complexity of the time domain beamformer (SRP-TOA) does not increase as rapidly as the computational complexity of the conventional time-domain beamformer. This is due to direct mapping of impulse response to TOAs, which does not require additional calculations. 5.10 Interpolation Methods Due to the limited sampling frequency, interpolation is required in practical situations in the TDOA and TOA based localization. Namely, the sampling frequency sets an upper limit for the spatial resolution that can be achieved. Here, three possibilities for interpolation are presented. The first one interpolates the received signal, the second one interpolates TDOA or TOA estimates by making assumption on the shape of the estimation function. These estimates can be directly used in TOA and TDOA based MLE methods. The third approach extends the function fitting for TOA and TDOA estimation function. These interpolated estimation functions can then be used in SRP and PL methods. 84 Localization Methods 5.10.1 Signal The most straightforward way of interpolation is to upsample the signals by Fourier-interpolation. Upsampling the signals with Fourier-interpolation consists of two parts. First zeros are added, and then the signal is lowpass filtered [210]. 5.10.2 Time difference of arrival estimate In traditional TDOA estimation, the interpolation is done usually by fitting a parabola [101] or an exponential function [102] to the maximum peak of the TDOA function. TDOA and its interpolation leads to a single time delay estimate. These interpolated values can then be used in the MLE methods. Similarly the TOA estimates can be interpolated by assuming some shape for the energy or the pressure of the room impulse response. This would require a priori knowledge of the impulse shape, as does the interpolation of the TDOA estimate. 5.10.3 Time difference of arrival estimation function In the SRP and PL methods, the spatial response is built on the TDOA and TOA estimation functions. The above TDOA interpolation methods can not be used directly for interpolating the TDOA estimation functions for the SRP or PL methods. Therefore, an algorithm for using the function fitting approaches in the steered response function is developed in Publication V. Although this approach is designed for TDOA estimation functions, it can be directly applied also for TOA estimation functions. Here, for clarity it is formulated for TDOA estimation functions. The algorithm makes an assumption on the TDOA estimation function shape near the maxima. Throughout this thesis, the exponential shape is used: 2 fl (τ ) = al e−bl (τ −cl ) , (5.57) where al , bl , and cl are the coefficients and fl is the function for lth local maximum. Other possibility is the parabolic shape, but it is shown to perform worse than the exponential shape in the interpolation task of the cross correlation function in [102] and in Publication V. The interpolation of a TDOA estimation function is described by the following steps. Firstly, the TDOA estimation function is normalized so 85 Localization Methods 15 15 10 10 5 5 0 −5 0 0 5 −5 0 5 (a) SRP-PHAT at 48 kHz, no interpolation (b) SRP-PHAT at 48 kHz, exponential interpolation Figure 5.2. Example of the interpolation of SRP-PHAT function with exponential fitting applied to the cross correlation vectors. The microphone array is at (0,0,0) m. that it is positive. Secondly, the local maxima are searched from the TDOA function in the region of interest. Thirdly, the coefficients in Eq. (5.57) are solved using the local maximum and two neighboring points on both sides of the maximum. This leads to a function fl (τ ) for each local maximum l. Finally, as a result, the interpolated TDOA function can be evaluated at any time delay τ : Rinterpolated (τ ) = max fl (τ ). l (5.58) If the number of the local maxima is reduced similarly as in [145], the method will be more efficient in terms of computational time. In addition, an advantage of the proposed algorithm over e.g. the Fourierinterpolation is that the TDOA function is presented with a limited number of coefficients, when in the Fourier-interpolation the number of samples increases with the sampling frequency. The interpolation method is suitable for other TDOA estimation functions than cross correlation function as well and the shape assumption is not limited to the ones presented here. Figure 5.2 shows an example of the interpolation with exponential assumption for SRP-PHAT function. 86 6. Theoretical performance This chapter presents a theoretical performance limits in the acoustic reflection localization framework. Different localization approaches are compared in the theoretical framework. 6.1 Overview The positions of the sensors and the source as well as the signal and the noise have an effect on the localization variance. These effects can be theoretically measured using Cramér-Rao lower bound (CRLB) [17] analysis. The theoretical boundaries given in this section use the assumption that the source signal and noise signals are white Gaussian noise. This assumption is necessary and required to make the signal model in Eq. (5.2) mathematically tractable [25, 26]. 6.2 Time difference of arrival estimation The theoretical performance bounds for TDOA estimation have been a topic of various research studies [26, 99, 100, 211–213]. In addition to CRLB, other performance bounds have been presented. For example, the Ziv-Zakai lower bound is of interest in the presence of large errors [26, 213]. Here only CRLB is considered. The Fisher information for TDOA estimation is given as [26, 213] Z 2T ∞ 2 Ch1 ,h2 (ω) J(τ ) = (ω) dω (6.1) 2π 0 1 − Ch1 ,h2 (ω) where T is the window length and the magnitude squared coherence is related to the SNR via [26] Ch1 ,h2 (ω) SNR2 (ω) = . 1 − Ch1 ,h2 (ω) 1 + 2SNR(ω) (6.2) 87 Theoretical performance Note that the Fisher information above is independent of τ . Setting the power spectral densities flat as G s,s , |ω| ∈ [ωc − B/2, ωc + B/2] Gs,s (ω) = 0 , otherwise (6.3) with center frequency ωc . Assuming also that the noises are equal Gn1 ,n1 (ω) = Gn2 ,n2 (ω) = Gn,n (ω), the Fisher information formulates into [213] J(τ ) = SNR2 T (Bωc2 + B 3 /12). 1 + 2SNR π (6.4) This analysis is valid only for T 2π/B and for sufficiently large SNR values, in detail [100, 213] " #2 2 12ωc2 −1 1 B Φ SNR > πT B 3 24 ωc2 where Φ−1 (x) is the inverse of the exponential integral Z ∞ 1 2 Φ(x) = √ e−µ /2 dµ. 2π x 6.3 (6.5) (6.6) Time of arrival estimation The Fisher information for TOA estimation is given by through the derivation of the MLE function in Eqs. (5.20) and (5.23), and it is equal to Z 2T ∞ 2 Cs,h1 (ω) J(t) = ω dω (6.7) 2π 0 1 − Cs,h1 (ω) where Cs,h1 (ω) Gs,s (ω) = = SNR(ω). 1 − Cs,h1 (ω) Gw,w (ω) (6.8) With the same assumptions on the spectral densities as with in TDOA estimation, the Fisher information becomes T J(t) = SNR (Bωc2 + B 3 /12). π (6.9) Since SNR > 0, it can be seen that the CRLB is always smaller for TOA estimation since Fisher information in TOA estimation is higher. 6.4 Localization The log-likelihood of the localization with respect to signal model is given by Eq. (5.39). The Fisher information matrix is formulated as [113, 114, 88 Theoretical performance 214] J(x) = 2<[H(D(ω, x))H Q−1 H(D(ω, x))], (6.10) where ∂S(ω)D1 (ω, x) ∂S(ω)DN (ω, x) , ..., ∂x ∂x " # ∂e−jωt1 (x) ∂e−jωtN (x) = S(ω) . , ..., ∂x ∂x H(ω, D(x)) = For a single microphone and frequency the differential with respect to location x is given by ∂S(ω)e−jωtn (x) ∂tn (x) −jωtn (x) = −S(ω)jω e , ∂x ∂x (6.11) where x − rn ∂ . (6.12) tn (x) = c−1 ∂x kx − rn k When assuming independent errors and equal error variances, the Fisher information matrix can be expressed as Z ∞ 2 T J(x) = (ωkS(ω)k)2 df [HTOA (t(x))Q−1 HTOA (t(x))](6.13) 2π ω=0 where a design matrix is given for TOAs as ∂ t1 (x) ∂ x ∂ ∂ x t2 (x) . HTOA (t(x)) = .. . ∂ t (x) N ∂x (6.14) Moreover, when constant spectral densities for noise and signal are assumed on a certain frequency band B, and within some time window of length T , the Fisher information formulates into Z 2T ∞ 2 kS(ω)k2 T df [HTOA (t(x))HTOA (t(x))] (6.15) J(x) = ω 2π ω=0 σF2 Z ∞ 2T T = ω 2 SNR(ω)dω [HTOA (t(x))HTOA (t(x))] (6.16) 2π ω=0 T T = SNR (Bωc2 + B 3 /12)[HTOA (t(x))HTOA (t(x))]. (6.17) π 6.5 Time difference of arrival based localization The probability density function for TDOAs is given in Eq. (5.28). The Fisher information matrix for TDOA is given by [121, 153]: " T # ∂ ∂ J(x) = E log p(τ ; x) log p(τ ; x) , ∂x ∂x x= x0 (6.18) 89 Theoretical performance where x0 is the true source position. The partial derivation with respect to the source position x is ∂ ∂ log p(τ ; x) = − τ (x) ∂x ∂x !T Σ−1 TDOA (τ̂ − τ (x)), (6.19) where for a single TDOA the partial derivate of Eq. (2.39) is: x − rj x − ri ∂ τm (x) = c−1 − ∂x kx − ri k kx − rj k (6.20) Following [153], the partial derivates can be re-formulated in to a matrix ∂ τ1 (x) ∂ x ∂ ∂ x τ2 (x) . HTDOA (τ (x)) = (6.21) .. . ∂ τ (x) M ∂x The Fisher information matrix is then given by [153]: T I(x) = HTDOA Σ−1 TDOA HTDOA (6.22) and the Cramer-Rao lower bound is calculated using Eq. (2.17). The minimum variance that TDOA estimation can achieve is given by Eq. (6.4). The covariance matrix can be replaced by this information which yields J(x) = 1 HT HTDOA 2 σTDOA TDOA T = HTDOA HTDOA J(τ ) (6.23) 2 since min σTDOA = 1/J(τ ), and J(τ ) = I × J(τ ) due to the independence assumption. 6.6 Time of arrival based localization The probability density function of the error is given in Eq. (5.33). The calculation of CRLB for TOA proceeds as previously for TDOAs. The difference is that the partial derivation in Eq. (6.20) for TOAs has the form given in Eq. (6.12). The partial derivates are re-formulated into a matrix, which has the form given in Eq. (6.14). The Fisher information matrix is then given as in Eq. (6.22) by re- placing HTDOA with HTOA , and the Cramer-Rao lower bound is calculated using Eq. (2.17). When the partial derivates of TOAs are substituted to the Fisher information matrix in Eq. (6.22), and the minimum variance of the TOA 90 Theoretical performance estimation given in Eq. (6.9) is used as the variance for the covariance matrix one has T J(x) = HTOA Σ−1 TOA HTOA = 1 T T HTOA HTOA = J(t)HTOA HTOA , 2 σTOA (6.24) which is exactly the same as Eq. (6.17). That is, in theory, localization using time of arrival estimation, SRP-TOA or MLE-S function have the same performance. 6.7 Combination of time difference and time of arrival information based localization When the errors are independent the covariance matrix for the combination of TOA and TDOA estimates is given as 2 2 2 2 ΣCM = diag(σTOA , . . . , σTOA , σTDOA , . . . , σTDOA ), where the first values are TOA variances and the rest are TDOA variances. For notational convenience, it is of use to define a measurement vector including both TOA and TDOA measurements χ̂ = [χ̂1 , χ̂2 , ...χ̂N +M ] = [t̂1 , t̂2 , . . . , t̂N , τ̂1 , τ̂2 , . . . , τ̂M ]. That is, with the 6 microphones used in this thesis, the 6 first values of the vector are TOAs and the rest TDOAs. Then the combined design matrix is given as ∂ ∂ x t1 (x) ∂ ∂ x t2 (x) . ∂ . χ1 (x) . ∂ x ∂ ∂ ∂ x χ2 (x) ∂ x tN (x) = HCM (χ(x)) = .. ∂ ∂ x τ1 (x) . ∂ ∂ ∂ x τ2 (x) ∂ x χN +M (x) . .. ∂ τ (x) M ∂x (6.25) The Fisher information matrix is then given by T T J(x) = HCM Σ−1 CM HCM = HCM diag(J(t), . . . , J(t), J(τ ), . . . , J(τ ))HCM . (6.26) 91 Theoretical performance 6.8 Theoretical comparison In this theoretical comparison the frequency and the temporal parameters are fixed to ωc /(2π) = 12 kHz, B/(2π) = 24 kHz, T = .004 s. This corresponds to a situation where full bandwidth at 48 kHz sampling frequency and 4 ms time window is used in the analysis. The idea is to compare the localization methods in the same conditions. Figure 6.1 presents the CLRB for TOA and TDOA against SNR. In addition, CRLB for TDOA that is calculated as the difference of two TOA estimates is presented. TOA estimation has smaller CLRB than TDOA estimation, which is not surprising, since in TOA estimation it is assumed that both source and noise signals are known. The CRLB of the traditional TDOA estimation approaches the CRLB of the TDOA estimation which is calculated as the difference of two TOAs, as expected from their equations. In Fig. 6.2 CRLB for TOA, TDOA, and CM are shown with parameters at location (10.5, 8.2, 2) m. The microphone array is the one given in Table 2.1 with dspc = 100 mm. As mentioned, the CRLB for signal model is the same as CRLB for TOA. Clearly, CM has the smallest CRLB and TOA the second smallest. Interestingly around -25 dB, TOA and CM have the same performance. This is caused by the increment in the variance of TDOA, shown in Fig. 6.1. Figure 6.3 shows an example of the CRLB for x, y, and z components with TOA, TDOA, and CM data with SNR= 30 dB. It can be seen that CM has the smallest CRLB in all conditions, and TOA the second smallest. Thus it is expected that CM and TOA will perform well in the reflection localization with the given setup. In the next chapter, the theoretical performance bounds are compared with Monte-Carlo simulation results. 92 Theoretical performance −6 TDOA, τi,j TOA, ti TDOA, τi,j = ti − tj log 10 {CRLB(θ)} [log 10 s2 ] −8 −10 −12 −14 −16 −18 −60 −40 −20 0 20 40 60 80 SNR [dB] Figure 6.1. Cramer-Rao lower bound versus signal-to-noise ratio (SNR) for TDOA and TOA. 93 Theoretical performance 10 TDOA TOA / SM CM 8 log 10 {trace(CRLB(x))} [log 10 m2 ] 6 4 2 0 −2 −4 −6 −8 −60 −40 −20 0 20 SNR [dB] 40 60 80 Figure 6.2. Cramer-Rao lower bound versus signal-to-noise ratio (SNR) for localization using TDOA, TOA, and CM at (10.5, 8.2, 2) m. 94 Theoretical performance 10 5 0 −5 −10 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 log10 (CRLB(x)) TDOA 10 TOA log10 {CRLB(y)} 10 5 5 0 0 0 −5 −5 −5 −5 0 5 10 −10 −10 −5 0 5 10 −10 −10 10 10 10 5 5 5 0 0 0 −5 −5 −5 −10 −10 CM 10 5 −10 −10 Y-coordinate [m] log10 {CRLB(x)} −5 0 5 10 −10 −10 −5 0 5 10 −10 −10 10 10 10 5 5 5 0 0 0 −5 −5 −5 −10 −10 −5 0 5 10 −10 −10 −5 0 5 10 −10 −10 log10 {CRLB(z)} −5 0 5 10 −5 0 5 10 −5 0 5 10 X-coordinate [m] Figure 6.3. Cramér-Rao lower bound presented for three diagonal components of Eq. (2.17) for a grid of {x, y, z}-locations (with z = 0). The signal-to-noise ratio is 30 dB. The array is depicted with a star. The maps are presented in 10-base logarithmic scale to enhance the differences between them. The color scale is the same for all maps. 95 7. Experiments This chapter presents simulated and real data experiments. The performance of TOA, TDOA, and localization methods is under investigation. The CRLB for each estimation task is also presented. 7.1 Monte-Carlo simulations The reflection signal model in the following Monte-Carlo simulations is of exponential form sn (t|tn (x), σ 2 ) = e−(t−tn (x)) 2 /σ 2 . (7.1) Throughout the simulations the ’variance’ parameter of the reflection signal is set to σ = 2/fs , where fs = 10, 000 Hz is the sampling frequency. The TOA tn (x) is calculated assuming the spherical wave propagation model. Since the assumed reflection signal is exponential, the exponential fitting for the TDOA and TOA estimates and for TDOA and TOA estimation functions presented in [102], and in Publication V, respectively, are applied. As an example, in the case of no noise the direct cross correlation of two exponential functions is an exponential function. This result is well known for the example with normal distributions. 7.1.1 Time difference of arrival estimation Time difference of arrival estimation methods, introduced in Section 5.2 are compared against signal-to-noise-ratio. The length of the time window is set to 4 ms in this experiment and the reflection signal in Eq. (7.1) is used. The TDOAs are randomized from a uniform distribution between -1 and 1 ms, i.e. U(−1, 1) ms. The results of 10,000 Monte-Carlo samples are presented in Fig. 7.1. As expected, the MLE is the most robust against noise having the small- 97 Experiments est number of anomalous estimates. ASDF has the smallest number of anomalous estimates when SNR < 20 dB, but this is due to its limitations in the TDOA estimation. That is, the maximum TDOA error with ASDF is half of that of the other methods. The most accurate method is MLE when SNR < 60 dB. When 60 dB < SNR < 80 dB, CC and ASDF, are the most accurate and when SNR > 80 dB, ASDF is the most accurate. As shown in Fig. 7.1, ASDF and GCC-CC achieve CRLB when 25 dB < SNR < 75 dB. Moreover, GCC-MLE is lower than the CLRB when 15 dB < SNR < 55 dB. This result indicates that the GCC-MLE TDOA estimation is biased. The bias is a result of the exponential fitting. With very high SNR values the CRLB does not predict the MSE of the methods. This behaviour was also noticed in [95]. The reason for this behaviour is the truncated window size [95]. The two different windows include two different peaks that have different samples [95]. True zero delay value can therefore only be achieved with autocorrelation and zero noise level. Direct cross correlation (CC) is the most reasonable selection for TDOA estimation for reflection localization since it does not require a priori information about the noise as MLE does. Moreover, CC performs well when compared to the other methods, and the calculation is straightforward and computationally light. 7.1.2 Time of arrival estimation Time of arrival estimation methods, introduced in Section 5.3 are tested against signal-to-noise-ratio. The length of the time window is set to 4 ms. The TOAs are randomized from a uniform distribution between -1 and 1 ms, i.e. U(−1, 1) ms. The results of 10,000 Monte-Carlo samples are presented in Fig. 7.2. The simple peak picking method is noted with arg max{h(t)} in the results of Fig. 7.2. ASDF and CC are the most accurate methods for the TOA estimation. MLE is the most robust against noise, but loses accuracy, due to the fact that the exponential fit does not describe the MLE function shape. The peak picking method, that does not require any a priori knowledge about the source signal or the noise signal, performs in general better than PHAT and has smaller variance than MLE when SNR > 20 dB. As in TDOA estimation, also here the maximum TOA errors for ASDF are half of the maximum error of the other methods. 98 Experiments −4 GCC-PHAT GCC-CC GCC-MLE ASDF CRLB −6 log 10 (MSE(τ )) [log 10 s2 ] −8 −10 −12 −14 −16 −18 −20 0 20 40 SNR [dB] 60 80 100 40 SNR [dB] 60 80 100 (a) Variance 1 0.9 0.8 Anomalies [%], |τ̂ − τ| > 1/fs 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −20 0 20 (b) Anomaly % Figure 7.1. Results for TDOA estimation against signal to noise ratio. 99 Experiments As shown in Fig. 7.2, ASDF and AC-CC achieve CRLB when 15 dB < SNR < 75 dB. When SNR < 15 dB, the estimation is saturated as the large number of anomalies suggests. As with the TDOA estimation, also here the MSEs of the methods does not achieve CRLB with very high SNR values. The explanation for this behaviour is the same as earlier for TDOA estimation. The TOA estimation with GCC-MLE is not realistic, since it would require the knowledge of both source and noise signals. Here, the focus is on the blind methods that do not require a priori information. Since the peak picking method is the only blind method and has a performance that is comparable to the other methods, it is the most reasonable choice in the general case for the estimation of TOAs. 7.1.3 Localization Nine different localization methods are tested. In detail, SRP, MLE, and PL with TOA, TDOA, and CM data are used for localization of reflections. The formulation for the methods is given in Section 5. Direct cross correlation and direct peak picking methods with exponential fitting provided in Sections 3 and 4 are used for TDOA and TOA estimation, respectively. Since MLE-S will lead to the same localization result as SRP-TDOA, as shown in [151], it is not tested here. The reflection location is drawn 1,000 times from a 3-D uniform distribution between -20 and 20 m, i.e. x ∼ U(−20, 20) m, y ∼ U(−20, 20) m and y ∼ U(−20, 20) m. The microphone array is set to (0,0,0) and the reflection signal is windowed with 4 ms time window around the TOA between the reflection location and (0,0,0). The reflection signal model is the one presented in Eq. (7.1). The location is searched from the localization function using the Nelder-Mead simplex method implemented in MATLAB’s fminsearch. The initial location value for the optimization method is set to the vicinity of the true location. Optimization of the parameters The weighting parameters for the combined methods are optimized. The question is, which weight produced the best result for each method? For MLE the weighting factor κ is defined as the relation between the TOA and TDOA variance, as κ= 100 σTDOA σTOA (7.2) Experiments −4 −6 log 10 (MSE(t)) [log 10 s2 ] −8 −10 −12 −14 −16 −18 −20 AC-PHAT AC-CC AC-MLE AC-ASDF arg max{|h(t)|} CRLB 0 20 40 SNR [dB] 60 80 100 40 SNR [dB] 60 80 100 (a) Variance 1 0.9 0.8 Anomalies [%], |t̂ − t| > 1/fs 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 −20 0 20 (b) Anomaly % Figure 7.2. Results for TOA estimation against signal to noise ratio. 101 Experiments This selection sets the following limitations as shown in Publication II: lim PMLE-CM (x) = PMLE-TOA (x), (7.3) lim PMLE-CM (x) = PMLE-TDOA (x). (7.4) κ→∞ κ→0 For PL-CM and SRP-CM the weighting is limited to 0 < W < 1. This gives the following obvious limits for SRP-CM function lim PSRP-CM (x) = PSRP-TOA (x) (7.5) lim PSRP-CM (x) = PSRP-TDOA (x). (7.6) W →0 W →1 and for PL-CM lim λPL-CM (x) = log{PPL-TOA (x)/N } (7.7) lim λPL-CM (x) = log{PPL-TDOA (x)/M }. (7.8) W →0 W →1 The weight factor κ is changed from log10 {κ} = −10, . . . , 10. For MLE- 2 CM the variance of the TOA error is set to σTOA = 1, and the variance of the TDOA error is altered as σTDOA = κσTOA . The weight for SRP-CM and PL-CM is 0 < W < 1, and it is calculated as W = 1/(10κ + 1). The results of this experiment are shown in Fig. 7.3. Also shown are the performance of TOA and TDOA based methods. All the combined methods achieve the same performance with some weighting. 1 As shown in Fig. 7.3, the optimal weight for SRP-CM is W ∈ 10−61 +1 , 10−1.2 , +1 1 1 for PL-CM W ∈ 10−7 +1 , 10−1.4 +1 , and for MLE-CM log10 {κ} ∈ (−3.4, −0.2). A reasonable choice for MLE-CM weighting factor is log10 (κ) = −2 since it is close to the middle region of the optimal values. For MLE-CM this means that TOA variance is about 100 times the TDOA variance. For SRP-CM, and PL-CM the value W = 1/(10−2 + 1) = 0.99 for the weight is a good choice because this is in middle region of the optimal values. This means that SRP-TOA has a weight W = .01. and the SRP-TDOA has the weight W = 0.99. The same applies for PL-TOA and PL-TDOA in the PLCM method. These optimized values are used in the experiments in the following experiments. Simulation results for localization methods As can be seen from Fig. 7.4 the CM-based methods have the smallest RMSE, MLE-CM having the smallest and SRP-CM the highest, out of the combined data methods. At 15 dB MLE-CM has smaller RMSE than MLE-CM. This is due to the fact that at 15 dB, the probability of anomalous estimate grows quite large for the TDOA estimation, which is weighted heavily in SRP-CM and PL-CM. 102 Experiments 14 12 10 log 10 {MSE(x)} [m2 ] 8 PL-TDOA SRP-TDOA 6 MLE-TDOA PL-TOA 4 SRP-TOA MLE-TOA 2 PL-CM SRP-CM 0 MLE-CM −2 −4 −10 −8 −6 −4 −2 0 2 Weight, log 10 {κ} 4 6 8 10 (a) Mean squared error log10 {MSE(x) } [m2 ] −2.4 −2.9 −3.4 −7 −6 −5 −4 −3 −2 Weight, log10 {κ} −1 0 1 2 (b) Details of (a) Figure 7.3. Optimization results against weighting parameter κ with signal-to-noise ratio of 60 dB and with 1,000 Monte-Carlo Samples for each SNR condition. 103 Experiments TOA based methods have clearly smaller RMSE than TDOA based methods. Again, MLE-TOA has the smallest RMSE and SRP-TOA the highest out of the TOA-based methods. The results thus indicate that combining TOA and TDOA data is advantageous in the localization of reflections in the current framework. Moreover, methods based only on TDOA information do not perform well in the reflection localization task with the given setup. As shown in Fig. 7.4, the methods achieve CRLB for TOA but not the CRLB for CM. This is due to the selection of the TOA and TDOA estimation methods. Since no a priori information of the source signal or the noise signal is used in the localization, the CRLB-CM cannot be achieved. As with TOA and TDOA estimation, the CRLB is best achieved when 15 dB < SNR < 75 dB. It is evident from the results that combining the TOA and TDOA estimation benefits the localization, since without any a priori information, the same performance can be achieved as when the source, and noise signal would be known. 7.2 Real data experiments Real data experiments were conducted in Lahti concert hall. The measurement setup is depicted in Fig. 7.5. One source location on the stage and one receiver location in the audience area was used. The loudspeaker on the stage was of type Genelec 1029A, and the G.R.A.S microphone array, introduced in Section 2.2, with dspc = 100 mm spacing, was used in the receiver location. The height of the loudspeaker and the microphone array from the stage level was about 1.2 m, and 1.0 m, respectively. The sampling frequency was set to 48 kHz in the measurements. The impulse responses were measured using the sine-sweep technique with a 6 s long source signal with bandwidth from 40 Hz to 24 kHz. Three reflections are windowed from the room impulse responses based on the source and receiver positions and the geometry of the hall. The estimated traces of the reflections are shown in Fig. 7.5. The time domain signals and frequency responses of the reflections in microphone no. 1 (-x direction) of the microphone array are shown in Fig. 7.6. The first two reflections, illustrated in Fig. 7.6, are from the curved side walls. The third reflection is a second order reflection via the same curved walls and it is already disturbed by another reflection arriving 1.2 ms before it. This 104 Experiments 20 15 PL−TDOA PL−TOA SRP−TDOA SRP−TOA MLE−TDOA MLE−TOA log10 {MSE(x) } [m2 ] 10 5 0 −5 CRLB−TDOA PL−CM CRLB−TOA SRP−CM CRLB−CM MLE−CM −10 20 30 40 50 SNR [dB] 60 70 80 (a) Mean squared error 1 0.9 0.8 Anomalies [%], kx̂ − xk > 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 20 30 40 50 SNR [dB] 60 70 80 (b) Anomaly % Figure 7.4. Results for localization against signal-to-noise ratio (SNR) from 1,000 MonteCarlo samples. 105 Experiments 25 * Genelec 1029A * 15 * 20 Microphone array 10 5m y x 25 20 15 10 5m Figure 7.5. The setup in the real experiments. A loudspeaker of type Genelec 1029 A is located in the stage area, and the G.R.A.S. microphone array in the audience area. The reflections used in the experiments are illustrated with lines. 40 10 30 5 SPL [Pa] SPL [dB] Direct sound 20 0 10 −5 0 40 −10 5 Left wall reflection SPL [Pa] SPL [dB] 30 20 0 10 0 30 −5 4 Right wall reflection SPL [Pa] SPL [dB] 2 20 10 0 −4 2 20 0 SPL [Pa] SPL [dB] −2 0 30 10 31.5 40 50 63 80 125 160 250 315 400 500 630 800 1000 1250 1600 2000 2500 3150 4000 6300 8000 10000 12500 0 −2 Reflection via left and right wall −4 −2 −1.5 −1 −0.5 0 Time [s] 0.5 1 1.5 2 −3 x 10 1/3-octave band center frequency [Hz] Figure 7.6. Time domain presentation of the reflections (right) and their frequency responses (left) on 1/3-octave bands. Please note that the y-axis scaling changes between the subplots and the sound pressure levels are relative. A typical impulse response of Genelec 1029A has two peaks and is visible in all the time domain impulses. 106 Experiments shows as emphasized low frequency content in the signal. As can be seen from Fig. 7.6, the wall reflections have a quite similar shape as the direct sound. This is due to the fact that the directionality of the loudspeaker stays similar in the frontal plane and the wall materials are highly reflective. Namely, the curved side walls in the inner stalls, characteristic to the Lahti concert hall, are of painted concrete which has a reflection coefficient of about 0.99 over the audible frequencies [43]. The measurement noise is removed from the impulse responses using spectral subtraction method [215]. The spectral subtraction will not benefit the localization accuracy. The spectral subtraction is made so that it can be assumed that the noise level is 0, and the SNR can be calculated in a more precise manner. The localization result after the spectral subtraction is chosen as the reference in these experiments. White noise is added to the clean signals as earlier in the simulations. The setup corresponds to the situation that was simulated earlier in this chapter, the difference is that here the reflection signals are measured in real situation. 7.2.1 Results The localization results for the real reflections are shown in Fig. 7.7. In overall, the performance is clearly worse in the real situation than in the simulated situation. This is due to the fact that the real signals are not as easily localized as the simulated ones since their frequency content is not constant and they include several peaks instead of a single peak. This makes the TOA-based localization especially difficult. The secondary peak in the reflection signal causes the localization to vary between several locations. This is visible as an increase in the RMSE in Fig. 7.7, when 35 dB < SNR < 70 dB. The real experiments reveal the weaknesses of MLE-TOA. When SNR<70 dB, MLE-TOA has worse performance than the other TOA-based localization methods. This is due to the fact that the time domain impulse response has two peaks. In the TOA estimation, only the maximum is selected. Since both of the peaks are almost equally strong, it is very probable that when additive noise is present the wrong one is selected. MLE-CM and PL-CM have the best performance in the localization of real reflections. SRP-CM has clearly worse performance than other combined methods when SNR<70 dB. The reason for the weak performance of the SRP-CM is thought to be the fact that the competing maxima in 107 Experiments the TOA estimation functions induce even more ghosts to the localization functions than with a single peak. 7.3 Discussion The errors in the localization with all the methods might be caused by other acoustic phenomena, for example the diffraction from chairs in the enclosure. The case where there are more than one reflection present within one analysis window was not studied in this thesis. In principle, it is the same problem as the multi-source localization problem, and some of the methods used for that problem, e.g. the one presented in [209], should also be applicable for this problem. The MLE method presented in this thesis can not be directly applied for the multi-reflection localization problem. However, the PL method is directly applicable. Therefore in future work the PL method is preferred. Since the SRP-CM adds the squared impulse responses and TDOA estimation functions, it is possible that the true reflection location gets less evidence than a "ghost" or a competing reflection. This behavior is also recognized in speech source localization [108]. The problem is not present in the PL-CM method, as shown in Fig. 5.1, since the ghosts are effectively downsized. Therefore, PL-CM outperforms SRP-CM in real situations. One reason for the anomalous estimates with all the methods is that the arriving sound wave from the direction of the reflection is not as "impulselike" as the sound wave in front of the loudspeaker. Thus, the magnitude of emitted sound wave in the direction of the reflections is lower, and does not contain as much high frequency energy as the impulse in front of the loudspeaker. Moreover, the impulse response of the loudspeaker consists of two impulses instead of one, as shown in Fig. 7.6. In this case, the reflections do not introduce sharp peaks in the localization function with the TOA methods, and the intersection of the spheres is "blurred". By analyzing the two impulses of the loudspeaker impulse response with linear filtering, it is revealed that the first peak consists of frequencies that are above approximately 3.3 kHz, which is the cut-off frequency between loudspeaker elements, and the second peak for the frequencies below 3.3 kHz. Thus, the lower frequencies arrive about 0.3 ms later than the high frequencies, in 108 Experiments 20 15 log 10 {MSE(x)} [m2 ] 10 5 0 −5 −10 20 30 40 50 60 SNR [dB] 70 80 90 100 90 100 (a) Mean squared error 1 0.9 0.8 Anomalies [%], kx̂ − xk > 1 0.7 PL-TDOA 0.6 SRP-TDOA MLE-TDOA 0.5 PL-TOA SRP-TOA 0.4 MLE-TOA PL-CM 0.3 SRP-CM MLE-CM 0.2 0.1 0 20 30 40 50 60 SNR [dB] 70 80 (b) Anomaly % Figure 7.7. Results for localization against signal-to-noise ratio (SNR) with the real reflection signals. 109 Experiments front of the loudspeaker. The two peaked impulse response of Genelec 1029A is caused by two issues. Firstly, the loudspeaker consists of two elements that are separated by approximately 10 cm. This causes some differences in the delays for low and high frequencies, depending on the direction of the loudspeaker with respect to the microphone. Secondly, the low-frequency-element of the loudspeaker has a higher mass, thus it does not respond to the voltage changes in the coil as quickly as the tweeter, thus causing the low frequencies to be delayed. All of the above, makes accurate localization of the loudspeaker quite difficult using only TOA information. One can also ask: What is then the location of a two-way loudspeaker? The methods in this thesis, assume that it is the acoustic center of the loudspeaker. One possibility to get around the above problems related to the loudspeaker non-idealities is to use only the phase information of the signal. However, this decreases the SNR in the frequencies that have a low magnitude and as a result decreases the performance, as seen in the simulations with PHAT which uses only the phase information. Another possibility to obtain more accurate TOA information is to measure the impulse response of the loudspeaker to a grid of directions in free-field conditions. Then the impulse response of the loudspeaker can be compensated from the impulse response by deconvolving the reflection with the free-field impulse response in the corresponding direction. This however would require a large data space of a priori measurements of the loudspeaker. The accuracy could be further improved if a one-way loudspeaker would be used. TOA estimation can be also improved by applying the sparse impulse response technique presented in Publication I. The higher the directionality of the loudspeaker is, the better the TOA estimation accuracy is, when the sparse impulse response technique is used. 110 8. Summary This thesis presented techniques for localizing early reflections from room impulse responses. A measurement technique for the investigation of early reflections was proposed and studied. Several localization methods were proposed. The performance of the localization methods was studied in theory, as well as in simulated, and in realistic situations. 8.1 Main results The main results of this thesis can be summarized as follows: • When studying the early reflections, a directional loudspeaker should be preferred because better spatial and temporal spacing can be achieved. The more directional the loudspeaker is, the more separability is achieved. • One way loudspeaker is preferred in the localization of early reflections. Each element in the loudspeaker produces a peak in the impulse response. Therefore multi-element loudspeakers cause multi-peaked impulses in many cases, which then complicate the localization. • Localization of the reflections should use both the time of arrival and the time difference of arrival information. The combination of these two pieces of information was shown to provide better performance than when only time of arrival or time difference of arrival was used. • Simple direct cross correlation and peak-picking are good-enough-methods for TDOA estimation and TOA estimation in the reflection localization, respectively. Although better performing methods exist for both TDOA and TOA estimation, they require a priori knowledge of the source or of 111 Summary the noise signals. • Localization methods that use pressure signals directly should be preferred over the sound intensity vector based methods in the reflection localization task. • Maximum pseudo-likelihood and maximum likelihood estimation methods should be preferred over steered response power methods in the localization of reflections, since they have better performance. The decrement in the performance of the steered response power methods was considered to be due to the ghosts in the localization functions. • Interpolation is needed to achieve better spatial resolution. The proposed interpolation method is found to provide a clear improvement to the baseline method. The method is based on assuming the shape of the local maxima of the time difference of arrival or time of arrival estimation functions. • In addition to room impulse responses, it is possible to localize reflections from speech or other continuous signals, without any a priori knowledge of the source signal. The localization of a reflection with speech sources has a worse performance than with impulse responses since the signal-to-noise ratio is typically lower for speech than for impulse responses. 8.2 Future work Future work in the area of reflection localization includes: • The development of an algorithm that can deal with multiple reflections arriving during the same time window. This thesis considered the case when a reflection arrives during a short time window. • Theoretical performance of the localization of reflections when directional loudspeakers are used. • The use of superdirectional microphone arrays along with superdirec- 112 Summary tional loudspeakers should be investigated. This could be applied, for example, in the in-situ measurement of absorption coefficients. 113 Bibliography [1] J. Merimaa and V. Pulkki. Spatial Impulse Response Rendering I: Analysis and Synthesis. The Journal of the Audio Engineering Society, 53(12):1115– 1127, 2005. [2] V. Pulkki and J. Merimaa. Spatial impulse response rendering II: Reproduction of diffuse sound and listening tests. The Journal of the Audio Engineering Society, 54(1-2):3–20, 2006. [3] D. Aprea, F. Antonacci, A. Sarti, and S. Tubaro. Acoustic reconstruction of the geometry of an environment through acquisition of a controlled emission. In 17th European Signal Processing Conference, pages 710–714, 2009. [4] F. Antonacci, A. Sarti, and S. Tubaro. Geometric Reconstruction of the Environment from its Response to Multiple Acoustic Emissions. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2822–2825, 2010. [5] B. Günel. Room shape and size estimation using directional impulse response measurements. In 3rd EAA Congress on Acoustics, Forum Acusticum, 2002. [6] E. Mommertz. Angle-dependent in-situ measurements of reflection coefficients using a subtraction technique. Applied Acoustics, 46(3):251–263, 1995. [7] C. Nocke. In-situ acoustic impedance measurement using a free-field transfer function method. Applied Acoustics, 59(3):253–264, 2000. [8] ISO Standard 3382-1. Acoustics – measurement of room acoustic parameters – part 1: Performance spaces, 2009. [9] M.R. Schroeder. Statistical parameters of the frequency response curves of large rooms. The Journal of the Audio Engineering Society, 35(5):299–305, 1987. [10] J.M. Jot, L. Cerveau, and O. Warusfel. Analysis and synthesis of room reverberation based on a statistical time-frequency model. In 103th Audio Engineering Society Convention, 1997. Paper number 4629. [11] H. Kuttruff. Room acoustics, 4th Ed. Spon Press, NY, NY, USA, 2000. 115 Summary [12] J. Merimaa. Analysis, Synthesis, and Perception of Spatial Sound–Binaural Localization Modeling and Multichannel Loudspeaker Reproduction. PhD thesis, Helsinki University of Technology, 2006. [13] A. O’Donovan, R. Duraiswami, and D. Zotkin. Imaging concert hall acoustics using visual and audio cameras. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 5284–5287, 2008. [14] T. Lokki, L. Savioja, et al. Auralization, url: auralization.tkk.fi, (Last accessed) July 2011. [15] T. Korhonen. Acoustic Source Localization Utilizing Reflective Surfaces. PhD thesis, Tampere University of Technology, 2010. [16] T. Lokki, H. Vertanen, A. Kuusinen, J. Pätynen, and S. Tervo. Concert hall acoustics assessment with individually elicited attributes. The Journal of the Acoustical Society of America, 130(2):835–849, Aug. 2011. [17] S.M. Kay. Fundamentals of Statistical signal processing, Volume 1: Estimation theory. Prentice-Hall, New Jersey, USA, 1998. [18] J. Fox. Applied regression analysis, linear models, and related methods. Sage Publications, Inc, London, UK, 1997. [19] A. Høst-Madsen. On the existence of efficient estimators. IEEE Transactions on Signal Processing, 48(11):3028–3031, 2000. [20] F. Jacobsen. Springer handbook of acoustics, chapter 25 Sound Intensity, pages 1053–1075. Springer, NY, NY, USA, 2007. [21] F. Fahy. Sound intensity (2nd ed.). E&FN Spon, Chapman & Hall, London, UK, 1995. [22] A.D. Pierce. Acoustics: an introduction to its physical principles and applications. Acoustical Society of America, 1994. [23] A.D. Pierce. Springer handbook of acoustics, chapter 3. Basic Linear Acoustics. New York: Springer, 2007. [24] H.E. de Bree. An overview of microflown technologies. united with Acustica, 89(1):163–172, 2003. Acta acustica [25] C. Knapp and G. Carter. The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech and Signal Processing, 24(4):320–327, 1976. [26] G.C. Carter. Coherence and time delay estimation. IEEE, 75(2):236–255, 1987. Proceedings of the [27] B. Yang and J. Scheuing. Cramer-rao bound and optimum sensor array for source localization from time differences of arrival. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 961–964, 2005. [28] R. Hickling and A. W. Brown. Determining the direction to a sound source in air using vector sound-intensity probes. The Journal of the Acoustical Society of America, 129(1):219–224, 2011. 116 Summary [29] M. Kallinger, F. Kuech, R. Schultz-Amling, G. del Galdo, J. Ahonen, and V. Pulkki. Enhanced Direction Estimation Using Microphone Arrays for Directional Audio Coding. In Hands-Free Speech Communication and Microphone Arrays, pages 45–48, 2008. [30] J. Pätynen and T. Lokki. Directivities of Symphony Orchestra Instruments. Acta Acustica united with Acustica, 96(1):138–167, 2010. [31] ISO Standard 3745-1. Determination of sound power levels of noise sources using sound pressure – Precision methods for anechoic and hemi-anechoic rooms, 2009. [32] J.D. Maynard, E.G. Williams, and Y. Lee. Nearfield acoustic holography: I. Theory of generalized holography and the development of NAH. The Journal of the Acoustical Society of America, 78:1395–1413, 1985. [33] W.A. Veronesi and J.D. Maynard. Nearfield acoustic holography (NAH) II. Holographic reconstruction algorithms and computer implementation. The Journal of the Acoustical Society of America, 81:1307–1322, 1987. [34] K. Yang-Hann. Springer Handbook of Acoustics, chapter 26. Acoustic Holography. Springer-Verlag, New York, NY, USA, 2007. [35] M. McPherson M.A. Breazeale. Springer handbook of Acoustics, (Ed. Rossing, Thomas D.), chapter 6. Physical Acoustics, pages 209–237. Springer, New York, NY, USA, 2007. [36] G.S.K. Wong. Springer handbook of acoustics, chapter 24 Microphones and Their Calibration, pages 1024–1048. Springer, NY, NY, USA, 2007. [37] T. Lokki. Physically-based Auralization-Design, Implementation, and Evaluation. PhD thesis, Helsinki University of Technology, 2002. [38] C.M. Harris. Absorption of sound in air versus humidity and temperature. The Journal of the Acoustical Society of America, 40:148–159, 1966. [39] J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943–950, 1979. [40] Y. Liu and F. Jacobsen. Measurement of absorption with a pu sound intensity probe in an impedance tube. The Journal of the Acoustical Society of America, 118:2117–2120, 2005. [41] W.T. Chu. Transfer function technique for impedance and absorption measurements in an impedance tube using a single microphone. Journal of the Acoustical Society of America, 80(2):555–560, 2010. [42] P. Robinson and N. Xiang. On the subtraction method for in-situ reflection and diffusion coefficient measurements. Journal of the Acoustical Society of America, Express Letters, pages EL99 – EL104, 2010. [43] T.J. Cox and P. D’Antonio. Acoustic absorbers and diffusers: theory, design, and application. London and New York: Spon Press, 2004. [44] R.R. Torres, U.P. Svensson, and M. Kleiner. Computation of edge diffraction for more accurate room acoustics auralization. The Journal of the Acoustical Society of America, 109:600–610, 2001. 117 Summary [45] B.I. Dalenbäck, M. Kleiner, and P. Svensson. A macroscopic view of diffuse reflection. Journal of the Audio Engineering Society, 42(10):793–807, 1994. [46] M. Vorländer and E. Mommertz. Definition and measurement of randomincidence scattering coefficients. Applied Acoustics, 60(2):187–199, 2000. [47] T.J. Cox, B.I.L. Dalenback, P. D’Antonio, J.J. Embrechts, J.Y. Jeon, E. Mommertz, and M. Vorlander. A tutorial on scattering and diffusion coefficients for room acoustic surfaces. Acta Acustica united with Acustica, 92(1):1–15, 2006. [48] B. Rafaely. Spatial-temporal correlation of a diffuse sound field. Journal of the Acoustical Society of America, 107:3254–3258, 2000. The [49] R.V. Waterhouse. Statistical Properties of Reverberant Soundfields. The Journal of the Acoustical Society of America, 43:1436–1443, 1968. [50] C.T. Morrow. Point-to-point correlation of sound pressures in reverberation chambers. Journal of Sound and Vibration, 16(1):29–42, 1971. [51] H. Nelisse and J. Nicolas. Characterization of a diffuse field in a reverberant room. The Journal of the Acoustical Society of America, 101(6):3517– 3524, 1997. [52] W.K. Blake and R.V. Waterhouse. The use of cross-spectral density measurements in partially reverberant sound fields. Journal of Sound and Vibration, 54(4):589–599, 1977. [53] M. Kuster. Spatial correlation and coherence in reverberant acoustic fields: Extension to microphones with arbitrary first-order directivity. The Journal of the Acoustical Society of America, 123:154–162, 2008. [54] F. Jacobsen and T. Roisin. The coherence of reverberant sound fields. The Journal of the Acoustical Society of America, 108:204–210, 2000. [55] J. Ahonen and V. Pulkki. Diffuseness estimation using temporal variation of intensity vectors. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 285–288, 2009. [56] O. Thiergart, G. Del Galdo, and E.A.P. Habets. Diffuseness estimation with high temporal resolution via spatial coherence between virtual firstorder microphones. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 217–220, 2011. [57] P. Billingsley. Probability and measure. John Wiley & Sons, New York, NY, USA, 1995. [58] J.-D. Polack. La transmission de l’énergie sonore dans les salles. thesis, Université du Maine, Le Mans, 1988. PhD [59] T. Hidaka, Y. Yamada, and T. Nakagawa. A new definition of boundary point between early reflections and late reverberation in room impulse responses. The Journal of the Acoustical Society of America, 122:326–332, 2007. [60] J.-D. Polack. Modifying chambers to play billiards, the foundations of reverberation theory. Acustica, 76(6):257–272, 1992. 118 Summary [61] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen. Creating interactive virtual acoustic environments. The Journal of the Audio Engineering Society, 47(9):675–705, 1999. [62] K. Meesawat and D. Hammershøi. An investigation on the transition from early reflections to a reverberation tail in a BRIR. In International Conference on Auditory Display, 2002. [63] R. Stewart and M. Sandler. Statistical Measures of Early Reflections of Room Impulse Responses. In 10th International Conference on Digital Audio Effects (DAFx-07), pages 59–62, 2007. [64] G. Defrance and J.-D. Polack. Measuring the mixing time in auditoria. In Acoustics 08, pages 3869–3874, 2008. [65] G. Defrance, L. Daudet, and J.-D. Polack. Detecting arrivals within room impulse responses using matching pursuit. In 11th International Conference on Digital Audio Effects (DAFx-08), pages 1–4, 2008. [66] G. Defrance, L. Daudet, and J.-D. Polack. Using matching pursuit for estimating mixing time within room impulse responses. Acta Acustica united with Acustica, 95(6):1082–1092, 2009. [67] E. Lehmann and A. Johansson. Diffuse reverberation model for efficient image-source simulation of room impulse responses. IEEE Transactions on Audio, Speech, and Language Processing, 17(8), 2009. [68] C. Borß. A Novel Approach for Optimally Matching a Late Reverberation Model to an Image Source Model-Or: What Does a Football Have to Do With Shoebox Shaped Rooms? In EAA Symposium on Auralization, 2009. [69] G. Kearney, C. Masterson, S. Adams, and F. Boland. Towards Efficient Binaural Room Impulse Response Synthesis. In EAA Symposium on Auralization, 2009. [70] G. Defrance and J.-D. Polack. Estimating the mixing time of concert halls using the eXtensible Fourier Transform. Applied Acoustics, 71:777–792, 2010. [71] Cheol-Ho Jeong, Jonas Brunskog, and Finn Jacobsen. Room acoustic transition time based on reflection overlap. The Journal of the Acoustical Society of America, 127(5):2733–2736, 2010. [72] A. Lindau, L. Kosanke, and S. Weinzierl. Perceptual Evaluation of Physical Predictors of the Mixing Time in Binaural Room Impulse Responses. In 128th Audio Engineering Society Convention, 2010. Paper number 8089. [73] Alexis Billon and Jean-Jacques Embrechts. Numerical evidence of mixing in rooms using the free path temporal distribution. The Journal of the Acoustical Society of America, 130(3):1381–1389, 2011. [74] A.C. Gade. Springer handbook of Acoustics, (Ed. Rossing, T.D.), chapter 9. Acoustics in Halls for Speech and Music, pages 301–353. Springer, New York, NY, USA, 2007. [75] K.H. Kuttruff. Auralization of impulse responses modeled on the basis of ray-tracing results. The Journal of the Audio Engineering Society, 41:876– 876, 1993. 119 Summary [76] L.L. Beranek. Concert hall acoustics–1992. The Journal of the Acoustical Society of America, 92:1–39, 1992. [77] T. Hidaka, L.L. Beranek, and T. Okano. Interaural cross-correlation, lateral fraction, and low-and high-frequency sound levels as measures of acoustical quality in concert halls. The Journal of the Acoustical Society of America, 98:988–1007, 1995. [78] L. Cremer, H.A. Müller, and T.D. Northwood. Principles and applications of room acoustics. Applied Science, London, UK, 1982. [79] W. Reichardt and U. Lehmann. Raumeidruck als oberbegriff von raumlichkeit und halligkeit. Acoustica, 40:174–183, 1978. [80] J. Abel and P. Huang. A Simple, Robust Measure of Reverberation Echo Density. In 121st Audio Engineering Society Convention, 2006. Paper number 6985. [81] M. Kuster. Reliability of estimating the room volume from a single room impulse response. The Journal of the Acoustical Society of America, 124:982– 993, 2008. [82] A. Farina and R. Ayalon. Recording concert hall acoustics for posterity. In 24th Audio Engineering Society Conference on Multichannel Audio, Banff, Canada, pages 26–28, 2003. [83] A. Farina, A. Capra, L. Conti, P. Martignon, and F.M. Fazi. Measuring spatial impulse responses in concert halls and opera houses employing a spherical microphone array. In 19th International Congress on Acoustics (ICA), Madrid, 2007. Paper number RBA-07-010. [84] A. Farina, P. Martignon, A. Capra, and S. Fontana. Measuring impulse responses containing complete spatial information. In 22nd Audio Engineering Society UK Conference, 2007. [85] A. Farina, A. Amendola, A. Capra, and C. Varani. Spatial analysis of room impulse responses captured with a 32-capsules microphone array. In 130th Audio Engineering Society Convention, London, 13-16 May 2011, 2011. Paper number 8400. [86] M. Kuster, D. de Vries, E.M. Hulsebos, and A. Gisolf. Acoustic imaging in enclosed spaces: Analysis of room geometry modifications on the impulse response. The Journal of the Acoustical Society of America, 116:2126– 2137, 2004. [87] M. Kuster and D. de Vries. Modelling and Order of Acoustic Transfer Functions Due to Reflections from Augmented Objects. EURASIP Journal on Advances in Signal Processing, 2007. Article ID 30253. [88] H. Okubo, M. Otani, R. Ikezawa, S. Komiyama, and K. Nakabayashi. A system for measuring the directional room acoustical parameters. Applied Acoustics, 62(2):203–215, 2001. [89] A. Omoto and H. Uchida. Evaluation method of artificial acoustical environment: Visualization of sound intensity. Journal of Physiological Anthropology and Applied Human Science, 23(6):249–253, 2004. 120 Summary [90] B.N. Gover, J.G. Ryan, and M.R. Stinson. Measurements of directional properties of reverberant sound fields in rooms using a spherical microphone array. The Journal of the Acoustical Society of America, 116(4):2138– 2148, 2004. [91] M. Park and B. Rafaely. Sound-field analysis by plane-wave decomposition using spherical microphone array. The Journal of the Acoustical Society of America, 118:3094, 2005. [92] M.S. Brandstein and H.F. Silverman. A robust method for speech signal time-delay estimation in reverberant rooms. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 375–378, 1997. [93] M.S. Brandstein. Time-delay estimation of reverberated speech exploiting harmonic structure. The Journal of the Acoustical Society of America, 105:2914, 1999. [94] J. Benesty. Adaptive eigenvalue decomposition algorithm for passive acoustic source localization. The Journal of the Acoustical Society of America, 107:384–391, 2000. [95] G. Jacovitti and G. Scarano. Discrete time techniques for time delay estimation. IEEE Transactions on Signal Processing, 41(2):525–533, 1993. [96] J. Chen, J. Benesty, and Y. Huang. Performance of GCC-and AMDF-based time-delay estimation in practical reverberant environments. EURASIP Journal on Applied Signal Processing, 2005:25–36, 2005. [97] J. Chen, J. Benesty, and Y. Huang. Time delay estimation in room acoustic environments: an overview. EURASIP Journal on Applied Signal Processing, 2006(1), 2006. Article ID 26503. [98] Eric A. Lehmann. Particle filtering approach to adaptive time-delay estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages 1129–1132, 2006. [99] A. Weiss and E. Weinstein. Fundamental limitations in passive time delay estimation–Part I: Narrow-band systems. IEEE Transactions on Acoustics, Speech and Signal Processing, 31(2):472–486, 1983. [100] E. Weinstein and A. Weiss. Fundamental limitations in passive time-delay estimation–Part II: Wide-band systems. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(5):1064–1078, 1984. [101] X. Lai and H. Torp. Interpolation methods for time delay using crosscorrelation for blood velocity measurement. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 46(2):277–290, 1999. [102] L. Zhang and X. Wu. On cross correlation based discrete time delay estimation. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume IV, pages 981–984, 2005. [103] C. Falsi, D. Dardari, L. Mucchi, and M.Z. Win. Time of arrival estimation for uwb localizers in realistic environments. EURASIP Journal on Applied Signal Processing, 2006:152–152, 2006. 121 Summary [104] John Usher. An improved method to determine the onset timings of reflections in an acoustic impulse response. The Journal of the Acoustical Society of America, Express Letters, 127(4):EL172–EL177, 2010. [105] J.P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M.B. Sandler. A tutorial on onset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5):1035–1047, 2005. [106] J.H. DiBiase, H.F. Silverman, and M.S. Brandstein. Microphone arrays: signal processing techniques and applications, chapter 8 Robust Localization in Reverberant Rooms, pages 157–180. Springer Verlag, New York, NY, USA, 2001. [107] M. Omologo, P. Svaizer, and R. De Mori. Spoken dialogues with computers (Ed. E. De Mori), chapter Acoustic Transduction, page 61. Academic Press, London, UK, 1998. [108] P. Pertilä, T. Korhonen, and A. Visa. Measurement combination for acoustic source localization in a room environment. EURASIP Journal on Audio, Speech, and Music Processing, 2008. Article ID 278185. [109] V.C. Raykar, I.V. Kozintsev, and R. Lienhart. Position calibration of microphones and loudspeakers in distributed computing platforms. IEEE Transactions on Speech and Audio Processing, 13(1):70–83, 2005. [110] V.C. Raykar, I. Kozintsev, and R. Lienhart. Self localization of acoustic sensors and actuators on distributed platforms. In International Workshop on Multimedia Technologies in E-Learning and Collaboration, 2003. [111] V.C. Raykar, I. Kozintsev, and R. Lienhart. Position calibration of audio sensors and actuators in a distributed computing platform. In ACM international conference on Multimedia, pages 572–581, 2003. [112] I. Ziskind and M. Wax. Maximum likelihood localization of multiple sources by alternating projection. IEEE Transactions on Acoustics, Speech and Signal Processing, 36(10):1553–1560, 1988. [113] J.C. Chen, R.E. Hudson, and K. Yao. A maximum-likelihood parametric approach to source localizations. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 3013–3016, 2001. [114] J.C. Chen, R.E. Hudson, and K. Yao. Maximum-likelihood source localization and unknown sensor location estimation for wideband signals in the near-field. IEEE Transactions on Signal Processing, 50(8):1843–1854, 2002. [115] B. Mungamuru and P. Aarabi. Joint sound localization and orientation estimation. In IEEE International Conference on Information Fusion, pages 81–85, 2003. [116] B. Mungamuru and P. Aarabi. Enhanced Sound Localization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34(3):1526– 1540, 2004. 122 Summary [117] C. Zhang, Z. Zhang, and D. Florêncio. Maximum likelihood sound source localization for multiple directional microphones. In IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 125–128, 2007. [118] H. Schau and A. Robinson. Passive source localization employing intersecting spherical surfaces from time-of-arrival differences. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(8):1223–1225, 1987. [119] J. Smith and J. Abel. Closed-form least-squares source location estimation from range-difference measurements. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(12):1661–1669, 1987. [120] J. Abel and J. Smith. The spherical interpolation method for closed-form passive source localization using range difference measurements. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 12, pages 471–474, 1987. [121] Y.T. Chan and K.C. Ho. A simple and efficient estimator for hyperbolic location. IEEE Transactions on Signal Processing, 42(8):1905–1915, 1994. [122] K. Yao, R.E. Hudson, C.W. Reed, D. Chen, and F. Lorenzelli. Blind beamforming on a randomly distributed sensor array system. IEEE Journal on Selected Areas in Communications, 16(8):1555–1567, 1998. [123] C.W. Reed, R. Hudson, and K. Yao. Direct joint source localization and propagation speed estimation. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 1169–1172, 1999. [124] M.D. Gillette and H.F. Silverman. A linear closed-form algorithm for source localization from time-differences of arrival. IEEE Signal Processing Letters, 15:1–4, 2008. [125] Y. Huang, J. Benesty, G.W. Elko, and RM Mersereati. Real-time passive source localization: A practical linear-correction least-squares approach. IEEE Transactions on Speech and Audio Processing, 9(8):943–956, 2001. [126] A. Mahajan and M. Walworth. 3D position sensing using the differences in the time-of-flights from a wave source to various receivers. IEEE Transactions on Robotics and Automation, 17(1):91–94, 2002. [127] H.C. So and S.P. Hui. Constrained location algorithm using TDOA measurements. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 86(12):3291–3293, 2003. [128] V.C. Raykar and R. Duraiswami. Approximate expressions for the mean and the covariance of the maximum likelihood estimator for acoustic source localization. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 3, pages 73–76, 2005. [129] T. Ajdler, I. Kozintsev, R. Lienhart, and M. Vetterli. Acoustic source localization in distributed sensor networks. In IEEE 38th Asilomar Conference on Signals, Systems and Computers, volume 2, pages 1328–1332, 2004. [130] P. Stoica and J. Li. Lecture notes-source localization from range-difference measurements. IEEE Signal Processing Magazine, 23(6):63–66, 2006. 123 Summary [131] J. Zheng, K.W.K. Lui, and H.C. So. Accurate three-step algorithm for joint source position and propagation speed estimation. Signal Processing, 87(12):3096–3100, 2007. [132] K.W.K. Lui, J. Zheng, and HC So. Particle swarm optimization for timedifference-of-arrival based localization. In European Signal Processing Conference, pages 414–417, 2007. [133] K.W.K. Lui, W.K. Ma, HC So, and F.K.W. Chan. Semi-definite programming algorithms for sensor network node localization with uncertainties in anchor positions and/or propagation speed. IEEE Transactions on Signal Processing, 57(2):752–763, 2009. [134] K. Yang, G. Wang, and Z.-Q. Luo. Efficient convex relaxation methods for robust target localization by a sensor network using time differences of arrivals. IEEE Transactions on Signal Processing, 57:2775–2784, July 2009. [135] H. Jwu-Sheng and Y. Chia-Hsin. Estimation of Sound Source Number and Directions under a Multisource Reverberant Environment. EURASIP Journal on Advances in Signal Processing, 2010:Article ID 870756, 2010. [136] M. Walworth and A. Mahajan. 3D position sensing using the difference in the time-of-flights from a wave source to various receivers. In 8th International Conference on Advanced Robotics, pages 611–616, 1997. [137] Å. Björck. Numerical methods for least squares problems. Society for Industrial and Applied Mathematics, Amsterdam, Holland, 1996. [138] C.L. Lawson and R.J. Hanson. Solving least squares problems, volume 15. Society for Industrial and Applied Mathematics, Amsterdam, Holland, 1995. [139] K.W. Cheung, H.C. So, W.K. Ma, and Y.T. Chan. Least squares algorithms for time-of-arrival-based mobile location. IEEE Transactions on Signal Processing, 52(4):1121–1130, 2004. [140] W. Kim, J.G. Lee, and G.I. Jee. The interior-point method for an optimal treatment of bias in trilateration location. IEEE Transactions on Vehicular Technology, 55(4):1291–1301, 2006. [141] E. Xu, Z. Ding, and S. Dasgupta. Source Localization in Wireless Sensor Networks from Signal Time-of-Arrival Measurements. IEEE Transactions on Signal Processing, –(Early Access):1–11, 2011. [142] P. Pertilä, M. Mieskolainen, and M.S. Hämäläinen. Closed-form selflocalization of asynchronous microphone arrays. In Joint Workshop on Hands-free Speech Communication and Microphone Arrays, pages 139– 144, 2011. [143] P. Aarabi. The Fusion of Distributed Microphone Arrays for Sound Localization. EURASIP Journal on Applied Signal Processing, 2003(4):338– 347, 2003. [144] D.N. Zotkin and R. Duraiswami. Accelerated speech source localization via a hierarchical search of steered response power. IEEE Transactions on Speech and Audio Processing, 12:499–508, 2004. 124 Summary [145] J.M. Peterson and C. Kyriakakis. Hybrid algorithm for robust, real-time source localization in reverberant environments. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 4, pages iv/1053–iv/1056, 2005. [146] A. Johansson, G. Cook, and S. Nordholm. Acoustic direction of arrival estimation, a comparison between Root-MUSIC and SRP-PHAT. In TENCON, IEEE Region 10, volume B, pages 629–632, 2004. [147] H. Do and H.F. Silverman. A fast microphone array srp-phat source location implementation using coarse-to-fine region contraction (CFRC). In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 295–298, 2007. [148] H. Do, H. F. Silverman, and Y. Yu. A real-time srp-phat source location implementation using stochastic region contraction (src) on a large-aperture microphone array. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 121–124, 2007. [149] H. Do and H.F. Silverman. A method for locating multiple sources from a frame of a large-aperture microphone array data without tracking. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 301–304, 2008. [150] H. Do and H. F. Silverman. Stochastic particle filtering: A fast SRP-PHAT single source localization algorithm. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 213–216, 2009. [151] J.M. Valin, F. Michaud, and J. Rouat. Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3):216–228, 2007. [152] E.A. Lehmann. Particle Filtering Methods for Acoustic Source Localisation and Tracking. PhD thesis, Australian National University, 2004. [153] P. Pertilä. Acoustic Source Localization in a Room Environment and at Moderate Distances. PhD thesis, Tampere University of Technology, 2009. [154] M.S. Brandstein, J.E. Adcock, and H.F. Silverman. Microphone-array localization error estimation with application to sensor placement. The Journal of the Acoustical Society of America, 99(6):3807–3816, 1996. [155] B. Günel, H. Haclhabiboğlu, and A.M. Kondoz. Acoustic Source Separation of Convolutive Mixtures Based on Intensity Vector Statistics. IEEE Transactions on Acoustics, Speech and Signal Processing, 16(4):748–756, 2008. [156] M. Hawkes and A. Nehorai. Wideband source localization using a distributed acoustic vector-sensor array. IEEE Transactions on Signal Processing, 51(6):1479–1491, 2003. [157] A. Nehorai and E. Paldi. Acoustic vector-sensor array processing. IEEE Transactions on Signal Processing, 42(9):2481–2491, 1994. [158] D. Levin, E.A.P. Habets, and S. Gannot. On the angular error of intensity vector based direction of arrival estimation in reverberant sound fields. The Journal of the Acoustical Society of America, 128:1800–1811, 2010. 125 Summary [159] D. Levin, S. Gannot, and E.A.P. Habets. Direction-of-arrival estimation using acoustic sensor-vectors in the presence of noise. In IEEE International Conference of Acoustics, Speech, and Signal Processing, pages 105–108, 2011. [160] A. O’Donovan, R. Duraiswami, and J. Neumann. Microphone arrays as generalized cameras for integrated audio visual processing. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007. [161] Eric Van Lancker. Localization of reflections in auditoriums using time delay estimation. In 108th Audio Engineering Society Convention, 2000. Paper number 5168. [162] J. Filos, E.A.P. Habets, and P.A. Naylor. A two-step approach to blindly infer room geometries. In International Workshop on Acoustic Echo and Noise Cancellation, 2010. [163] A. Canclini, M. R. P. Thomas, A. Antonacci, F. Sarti, and P. A. Naylor. Robust inference of room geometry from acoustic impulse responses. In 19th European Signal Processing Conference, pages 161–165, 2011. [164] E. A. Nastasia, F. Antonacci, A. Sarti, and S. Tubaro. Localization of planar acoustic reflections through emission of controlled stimuli. In 19th European Signal Processing Conference, pages 156–160, 2011. [165] A. Canclini, F. Antonacci, M. R. P. Thomas, J. Filos, A. Sarti, P. A. Naylor, and Tubaro S. Exact localization of acoustic reflectors from quadratic constraints. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pages 17–20, 2011. [166] A.E. O’Donovan, R. Duraiswami, and D.N. Zotkin. Automatic matched filter recovery via the audio camera. In IEEE International Conference on Acoustics Speech and Signal Processing, pages 2826–2829, 2010. [167] D. Ba, F. Ribeiro, C. Zhang, and D. Florencio. L1 regularized room modeling with compact microphone arrays. In 35th IEEE International Conference Acoustics, Speech and Signal Processing, pages 157–160, 2010. [168] A. Canclini, P. Annibale, F. Antonacci, A. Sarti, R. Rabenstein, and S. Tubaro. From direction of arrival estimates to localization of planar reflectors in a two dimensional geometry. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 2620–2623, 2011. [169] E. Mabande, K. Sun, K. Kowalzcyk, and W. Kellermann. On 2d-localization of reflectors using robust beamforming techniques. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 153–156, 2011. [170] H. Sun, E. Mabande, K. Kowalczyk, and W. Kellermann. Joint doa and tdoa estimation for 3d localization of reflective surfaces using eigenbeam mvdr and spherical microphone arrays. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 113–116, 2011. [171] V.C. Raykar. Position calibration of acoustic sensors and actuators on distributed general purpose computing platforms. PhD thesis, University of Maryland, Maryland, USA, 2003. 126 Summary [172] V.C. Raykar and R. Duraiswami. Automatic position calibration of multiple microphones. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume iv, pages 69–72, 2004. [173] A. Redondi, M. Tagliasacchi, F. Antonacci, and A. Sarti. Geometric calibration of distributed microphone arrays. In IEEE International Workshop Multimedia Signal Processing, pages 1–5, 2009. [174] M. Binelli, A. Venturi, A. Amendola, and A. Farina. Experimental analysis of spatial properties of the sound field inside a car employing a spherical microphone array. In 130th Audio Engineering Society Convention, London, 13-16 May 2011, 2011. Paper number 8338. [175] T. Lokki, H. Vertanen, A. Kuusinen, J. Pätynen, and S. Tervo. Auditorium acoustics assessment with sensory evaluation methods. In International Symposium on Room Acoustics, pages 29–31, 2010. [176] T. Lokki, J. Pätynen, S. Tervo, S. Siltanen, and L. Savioja. Engaging concert hall acoustics is made up of temporal envelope preserving reflections. The Journal of the Acoustical Society of America, 129(6):EL223–EL228, 2011. [177] P. Bergamo, S. Asgari, H. Wang, D. Maniezzo, L. Yip, R.E. Hudson, K. Yao, and D. Estrin. Collaborative sensor networking towards real-time acoustical beamforming in free-space and limited reverberance. IEEE Transactions on Mobile Computing, 3(3):211–224, 2004. [178] W. Yan, W. Qun, B. Danping, and J. Jin. Acoustic localization in multi-path aware environments. In International Conference on Communications, Circuits and Systems, pages 667–670, 2007. [179] T. Korhonen. Acoustic localization using reverberation with virtual microphones. In International Workshop on Acoustic Echo and Noise Control, 2008. Paper ID 9038. [180] J. Scheuing and B. Yang. Disambiguation of tdoa estimation for multiple sources in reverberant environments. IEEE Transactions on Audio, Speech, and Language Processing, 16(8):1479–1489, 2008. [181] F. Ribeiro, C. Zhang, D.A. Florêncio, and D.E. Ba. Using reverberation to improve range and elevation discrimination for small array sound source localization. IEEE Transactions on Audio, Speech, and Language Processing, 18(7):1781–1792, 2010. [182] F. Ribeiro, D. Ba, C. Zhang, and D. Florêncio. Turning enemies into friends: using reflections to improve sound source localization. In IEEE International Conference on Multimedia and Expo, pages 731–736, 2010. [183] P. Svaizer, A. Brutti, and M. Omologo. Use of reflectedwavefronts for acoustic source localization with a line array. In Joint Workshop on Handsfree Speech Communication and Microphone Arrays, pages 165–169, 2011. [184] M. Omologo P. Svaizer, A. Brutti. Analysis of reflected wavefronts by means of a line microphone array. In International Workshop on Acoustic Echo and Noise Control, 2010. Paper ID 965. 127 Summary [185] A. Farina. Simultaneous measurement of impulse response and distortion with a swept-sine technique. In 108th Audio Engineering Society Convention, 2000. Paper number 5093. [186] A. Farina. Advancements in impulse response measurements by sine sweeps. In 122nd Convention Audio Engineering Society, 2007. Paper number 7121. [187] D.D. Rife and J. Vanderkooy. Transfer-function measurement with maximumlength sequences. Journal of the Audio Engineering Society, 37(6):419– 444, 1989. [188] Yoiti Suzuki, Futoshi Asano, Hack-Yoon Kim, and Toshio Sone. An optimum computer-generated pulse signal suitable for the measurement of very long impulse responses. The Journal of the Acoustical Society of America, 97(2):1119–1123, 1995. [189] J. Pätynen, B.F.G. Katz, and T. Lokki. Investigations on the balloon as an impulse source. The Journal of the Acoustical Society of America, 129(1):EL27–EL33, 2011. [190] A. Krokstad, S. Strom, and S. Sorsdal. Calculating the acoustical room response by the use of a ray tracing technique. Journal of Sound and Vibration, 8(1):118–125, 1968. [191] T. Funkhouser, N. Tsingos, I. Carlbom, G. Elko, M. Sondhi, J.E. West, G. Pingali, P. Min, and A. Ngan. A beam tracing method for interactive architectural acoustics. The Journal of the Acoustical Society of America, 115:739, 2004. [192] P. Welch. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, 15(2):70–73, 1967. [193] R. Mucci. A comparison of efficient beamforming algorithms. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(3):548–558, 1984. [194] M.S. Brandstein, J.E. Adcock, and H.F. Silverman. A closed-form location estimator for use with room environment microphone arrays. IEEE Transactions on Speech and Audio Processing, 5(1):45–50, 1997. [195] G.P. Yost and S. Panchapakesan. Automatic location identification using a hybrid technique. In IEEE Vehicular Technology Conference, volume 1, pages 264–267, 1998. [196] R.I. Reza. Data fusion for improved TOA/TDOA position determination in wireless systems. PhD thesis, Faculty of the Virginia Polytechnic Institute and State University, 2000. [197] S. Gezici and H.V. Poor. Position estimation via ultra-wide-band signals. Proceedings of the IEEE, 97(2):386–403, 2009. [198] J. Yli-Hietanen, K. Kalliojärvi, and J. Astola. Low-complexity angle of arrival estimation of wideband signals using small arrays. In IEEE Signal Processing Workshop on Statistical Signal and Array Processing, pages 109–112, 1996. 128 Summary [199] K.V. Mardia, P.E. Jupp, and KV Mardia. Directional statistics. Wiley, New York, NY, USA, 2000. [200] A.D. Firoozabadi and H.R. Abutalebi. A new region search method based on DOA estimation for speech source localization by SRP-PHAT method. In 18th European Signal Processing Conference, pages 656–660, 2010. [201] J.P. Dmochowski, J. Benesty, and S. Affes. A generalized steered response power method for computationally viable source localization. Audio, Speech, and Language Processing, IEEE Transactions on, 15(8):2510– 2526, 2007. [202] L.G. da Silveira, V.P. Minotto, C.R. Jung, and B. Lee. A gpu implementation of the srp-phat sound source localization algorithm. In International Workshop on Acoustic Echo and Noise control, 2010. Paper ID 1062. [203] J. Vermaak and A. Blake. Nonlinear filtering for speaker tracking in noisy and reverberant environments. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 5, pages 3021–3024, 2001. [204] D.B. Ward, E.A. Lehmann, and R.C. Williamson. Particle filtering algorithms for tracking an acoustic source in a reverberant environment. IEEE Transactions on Speech and Audio Processing, 11(6):826–836, 2003. [205] F. Antonacci, D. Riva, A. Sarti, M. Tagliasacchi, and S. Tubaro. Tracking of two acoustic sources in reverberant environments using a particle swarm optimizer. In IEEE Conference on Advanced Video and Signal Based Surveillance, 2007. [206] J. Kennedy and R. Eberhart. Particle swarm optimization. In IEEE International Conference on Neural Networks, volume 4, pages 1942–1948, 1995. [207] R. Parisi, P. Croene, and A. Uncini. Particle swarm localization of acoustic sources in the presence of reverberation. In IEEE International Symposium on Circuits and Systems, pages 4739–4742, 2006. [208] J.A. Nelder and R. Mead. A Simplex Method for Function Minimization. The Computer Journal, 7(4):308–313, 1965. [209] B. Alessio, O. Maurizio, and S. Piergiorgio. Multiple source localization based on acoustic map de-emphasis. EURASIP Journal on Audio, Speech, and Music Processing, 2010:Article ID 147495, 2010. [210] A.V. Oppenheim and R.W. Schafer. Discrete-time signal processing (2nd ed.). Prentice Hall Press Upper Saddle River, NJ, USA, page 1120, 2009. [211] S. Bellini and G. Tartara. Bounds on error in signal parameter estimation. IEEE Transactions on Communications, 22(3):340–342, 1974. [212] J. Ianniello, E. Weinstein, and A. Weiss. Comparison of the ziv-zakai lower bound on time delay estimation with correlator performance. In IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 8, pages 875–878, 1983. 129 Summary [213] B.M. Sadler and R.J. Kozick. A survey of time delay estimation performance bounds. In IEEE Workshop on Sensor Array and Multichannel Processing, pages 282–288, 2006. [214] J.C. Chen, K. Yao, and R.E. Hudson. Acoustic source localization and beamforming: theory and practice. EURASIP Journal on Applied Signal Processing, pages 359–370, 2003. [215] S. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing, 27(2):113– 120, 1979. [216] J. Merimaa, T. Lokki, T. Peltonen, and M. Karjalainen. Measurement, Analysis, and Visualization of Directional Room Responses. In 111th Audio Engineering Society Convention, 2001. Paper number 5449. [217] Norsonic. Nor848 acoustic camera. Technical report, Norsonic, Jan. 2011 (last accessed). 130 Appendix: Visualization examples The visualization of early reflections is considered. Three visualization techniques are implemented and demonstrated for two reflections in a concert hall. Also, other visualization techniques for the reflections exist, such as acoustic holography [86,87]. However, it requires a line or a plane microphone array setup and differs therefore from the setup used in this example. Overlaying the sound intensity vectors on top of a spectrogram Possibly the first visualization of spatial room impulse responses is presented by Merimaa et al. [216]. The same approach is further developed and used in [1, 12]. The spatial room impulse response is divided into short time windows. For each pre-selected frequency band at the short time windows, the direction of arrival is estimated using sound intensity vectors. The vector is then plotted on top of a spectrogram consisting of these time-frequency “tiles”. The azimuth and elevation of the direction of arrival are plotted separately. An example of this visualization technique is shown in Fig. A. 1. The setup for the measurements is shown in Fig. A. 2 and the measured impulse responses in Fig. A. 3. The intensity vectors are calculated from two measurements. The first measurement with microphone array spacing of dspc = 25 mm is used for frequencies above 1000 Hz and dspc = 100 mm is used for frequencies below 1000 Hz. 131 Summary Audio or acoustic camera The visual and audio information is applied in several application areas [13,85,160,174]. In [13] the output of a spherical beamformer is overlayed on top of a 360 camera view of the enclosure. This is done in short time windows for an impulse response and the location of the reflections are then inspected visually. This idea is widely applied. “Acoustic cameras” (see e.g. [217]), take advantage on beamforming to enhance speech or to study, for example, noise sources. An example of the same data as above is visualized with the acoustic camera principle in Fig. A. 3. This visualization technique lacks of frequency response information, but this information can be provided as an additional plot. The visualization technique is intuitive since the visual cues of the enclosure support the visualized reflection. One drawback is the lack of three-dimensionality in the visualization of the reflection location. It is obvious that this visualization technique requires interactive user interface to be practical. Mapping the reflections to the geometrical model The localized reflections can be traced back to the source via the reflective surfaces. This approach requires a priori information on the normals and the locations of the reflective surfaces, which can be extracted from the architectural models of the enclosures if available or estimated from the impulse responses. A ray-tracing approach is used inversely in Publication I to trace the reflections. The tracing is iterative. The ray is traced to the nearest surface at each iteration. Before each iteration, it is checked that the ray is long enough to reach the nearest surface. If it is not long enough, then the iteration is stopped, and ideally the ray should end in the position of the source. An example of the same data as above is traced in Fig. A. 2. This visualization technique lacks frequency response information, but it could be easily added to the visualization. 132 Summary 5 60dB Horiz. plane − Frequency [kHz] 4.5 50dB 4 3.5 40dB 3 30dB 2.5 2 20dB 1.5 1 10dB 0.5 0dB 5 60dB Median plane − Frequency [kHz] 4.5 50dB 4 3.5 40dB 3 30dB 2.5 2 20dB 1.5 1 10dB 0.5 40 45 50 55 60 65 Time [ms] 70 75 80 0dB Figure A. 1. Visualization of reflections using the SIRR-framework. 133 Summary Figure A. 2. Visualization of reflections using the tracing of reflections principle. 134 Summary 1 Direct sound, path length 16.7 m 2 Left wall reflection, path length 21.3 m 3 Right wall reflection, path length 25.2 m Normalized sound pressure 1 Mic. 1 Mic. 2 Mic. 3 Mic. 4 Mic. 5 Mic. 6 2 3 0.05 0 −0.05 0 0.02 0.04 0.06 0.08 0.1 Time, t [s] 0.12 0.14 0.16 0.18 0.2 Room impulse responses Figure A. 3. Visualization using the audio camera. The steered responses are calculated using PL-TDOA. Also shown are the impulse responses for 6 microphones. The numbered boxes indicate events shown in the audio camera. 135 Errata Publication IV The character α is overloaded. 137 Sakari T e rvo A al t o D D1 4 3/ 2 0 1 1 L o c al iz at io n and t rac ing o fe arl y ac o ust icre ﬂe c t io ns A a l t oU ni v e r s i t y S c h o o lo fS c i e nc e D e p a r t me nto fM e d i aT e c h no l o g y w w w . a a l t o . f i BU S I N E S S+ E C O N O M Y A R T+ D E S I G N+ A R C H I T E C T U R E S C I E N C E+ T E C H N O L O G Y C R O S S O V E R D O C T O R A L D I S S E R T A T I O N S A a l t oU ni v e r s i t y 9HSTFMG*aeedhi+ I S BN9 7 89 5 2 6 0 4 4 37 8 I S BN9 7 89 5 2 6 0 4 4 385( p d f ) I S S N L1 7 9 9 4 9 34 I S S N1 7 9 9 4 9 34 I S S N1 7 9 9 4 9 4 2( p d f )

Download PDF

advertisement