Master Thesis Electrical Engineering Thesis no: April 2012 BINAURAL HEARING PLATFORM FOR FUTURE HEARING AID RESEARCH Implementation in Matlab Audio Processing framework ABEL GLADSTONE MANGAM School of Engineering Blekinge Institute of Technology 37179 Karlskrona Sweden This thesis is submitted to the School of Engineering at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Electrical Engineering. The thesis is equivalent to 20 weeks of full time studies. Contact Information Author: Abel Gladstone Mangam E-mail: [email protected] Phone: +46-707 555424 Main Supervisor: Dr. Nedelko Grbic School of Engineering (ING) Blekinge Institute of Technology Phone: +46 455 385727 Additional Supervisor: Dr. Bo Schenkman School of Management (MAM) Blekinge Institute of Technology Phone: +46 455 385647 School of Engineering Blekinge Institute of Technology 371 79 KARLSKRONA SWEDEN Internet: www.bth.se/ing Phone: +46 455 385000 SWEDEN Abstract Binaural technology is used to recreate realistic sound to the listener. However, many signal processing and noise cancellation techniques do not preserve the binaural characteristics of the binaural sound and thus destroying the realism. Binaural technologies are used in various applications like hearing aids, auditory displays, echo location for the blind, etc. The heart of binaural technology is the Head Related Transfer Functions which provide the transfer functions of the human pinnae, the shoulders etc. These Head Related Transfer Functions provide the Inter aural Level Differences that contributes to the binaural sound. Apart from the Head Related Transfer Functions there are many other cues like the Inter Aural Time differences in binaural hearing that provides information about the nature and location of the source. The Inter Aural Time differences are known to contribute to the localization at low frequencies and the Inter aural Level Differences contribute to localization at high frequencies. This thesis provides a design solution for reproducing the sound field (preserving the binaural cues) with the help of microphone arrays placed on either side of the listener’s ears. The primary work in this thesis is to build a model of a binaural hearing system and implement it in real time using the Matlab Audio Processing framework. This provides a binaural platform for further enhancements like noise reduction and speech enhancement, etc. The system is evaluated mathematically and perceptually. The results show that the platform tracks a single user very accurately and perceptual tests show that the system can be used to localize sounds in the azimuthal plane. Keywords: Acoustics, Acoustic signal detection Acoustic signal processing, Microphone arrays, Hearing aids, Speaker Localization. i Contents Abstract i Contents ii List of Figures iv List of Tables vi Introduction 1 1 Introduction 1.1 The Physiology of Hearing (A Signal Processing Perspective) 1.2 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Interaural Time Differences . . . . . . . . . . . . . . . . . . . 1.4 Interaural Level Differences . . . . . . . . . . . . . . . . . . . 1.5 Duplex Theory of Localization . . . . . . . . . . . . . . . . . 1.6 Other cues that help in localization . . . . . . . . . . . . . . . 1 2 3 5 6 6 7 Design 8 2 Analysis of ITD 2.1 Analysis of the Interaural Time differences . . . . . . . . . . . 2.2 Modelling ITD in the binaural hearing system . . . . . . . . . 8 8 10 3 Analysis of ILD 12 3.1 Compression of the HRTF . . . . . . . . . . . . . . . . . . . . 12 3.2 Complex Minimax Criterion . . . . . . . . . . . . . . . . . . . 12 ii Implementation in real time 18 4 Implementation 18 4.1 Time Difference of Arrival (TDOA) estimation and far field source location . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Generalized Cross Correlation (GCC) . . . . . . . . . . . . . 20 4.3 Phase Alignment Transform (PHAT) . . . . . . . . . . . . . . 21 4.4 Steered Response Power - Phase Alignment Transform (SRPPHAT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.5 Windowed Discrete Fourier Transform (WDFT) . . . . . . . . 22 4.6 Results of PHAT in Real Time . . . . . . . . . . . . . . . . . 22 Results 27 5 Methods, Instrumentation and Results 5.1 Construction . . . . . . . . . . . . . . . 5.2 Algorithm . . . . . . . . . . . . . . . . . 5.3 Testing . . . . . . . . . . . . . . . . . . 5.4 Results . . . . . . . . . . . . . . . . . . . 5.5 Analysis of Variance . . . . . . . . . . . 5.6 Other Conclusions and Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 31 32 34 37 38 6 Conclusion and future work 40 Appendix 42 A Matlab Audio Processing (MAP) Framework 42 Bibliography 44 iii List of Figures 1.1 1.2 1.3 Components of a Binaural Hearing System . . . . . . . . . . . The Interaural Coordinate Axis . . . . . . . . . . . . . . . . . Illustration of Interaural Time Differences . . . . . . . . . . . 2 4 5 2.1 2.2 Illustration of ITD . . . . . . . . . . . . . . . . . . . . . . . . Plot of ITD for various elevations and the azimuths assuming a head radius HR = 87.5mm and v = 340.3m/s the colorbar indicates the the time difference . . . . . . . . . . . . . . . . . Diagram showing the excess path difference because of the introduction of the Ear cups . . . . . . . . . . . . . . . . . . . 8 2.3 3.1 3.2 3.3 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.1 5.2 5.3 Magnitude and phase plots of with 25% compression . . . . Magnitude and phase plots of with 34% compression . . . . Magnitude and phase plots of with 54% compression . . . . the . . the . . the . . Complex Minimax . . . . . . . . . . . Complex Minimax . . . . . . . . . . . Complex Minimax . . . . . . . . . . . Design . . . . . Design . . . . . Design . . . . . Far Field model for estimation of Angle of Arrival. . . . . . . The relationship between the Direction of Arrival vector and the Angle of Arrival. . . . . . . . . . . . . . . . . . . . . . . . Model for estimating the TDOA . . . . . . . . . . . . . . . . The output of the PHAT with different simulated delays with a sampling frequency of 32000 Hz . . . . . . . . . . . . . . . . PHAT in Real time implementation using the MAP framework Schematic of the PHAT setup . . . . . . . . . . . . . . . . . . Components used for real time testing . . . . . . . . . . . . . Calculated Angle using the PHAT . . . . . . . . . . . . . . . Spline(red) and cubic(blue) interpolation of the angles . . . . Left side of Binaural Hearing System. . . . . . . . . . . . . . BHS showing microphones on the Left and the Right Side . . Microphone bias and the sound card used to interface with the MAP framework . . . . . . . . . . . . . . . . . . . . . . . iv 10 11 15 16 17 19 19 20 23 24 24 25 26 26 27 28 29 5.4 Block diagram for the implementation of the system using the Matlab Audio Processing framework . . . . . . . . . . . . . . 5.5 Frequency response of the BOSE AE2 headphones used. (source: http://www.head-fi.org) . . . . . . . . . . . . . . . . . . . . . 5.6 Flow chart of the BHS implementation. . . . . . . . . . . . . 5.7 User Interface for behavioural testing . . . . . . . . . . . . . . 5.8 Deviation of location plots for various sources, The radius and the color bar both show the deviation. . . . . . . . . . . . . . 5.9 Mean position plot . . . . . . . . . . . . . . . . . . . . . . . . 5.10 The Mean Squared Error of the localization for each user at all the positions . . . . . . . . . . . . . . . . . . . . . . . . . . 5.11 Bar graph showing the mean opinion score of the ease of localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Room Impulse response of the test room. . . . . . . . . . . . A.1 Schematic showing the MAP framework. . . . . . . . . . . . . A.2 A sample UI from the MAP framework . . . . . . . . . . . . v 30 31 32 34 35 36 37 38 39 42 43 List of Tables 5.1 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . vi 37 Chapter 1 Introduction Binaural signal processing is an important aspect of hearing aids. It refers to the processing of the audio signals without the loss of binaural characteristics in the signal. Intelligent hearing aids further process these signals to reduce noise and provide high quality of speech. However, binaural processing is much more complex when compared to a mono signal, because both the signal (left and right) are correlated with each other and they have more information, like the spatial location, nature of the source, etc. Most hearing aids process various audio signals at the ear by amplification; pitch shift, etc. so that the persons with impairment will still be able to hear. However, these hearing aids often do not preserve the binaural characteristics. Some hearing aids try to preserve these characteristics by making them as in ear i.e. the hearing aids are inserted into the person’s ear. However in ear hearing aids are very costly and may not be used by persons other than persons with hearing impairment. Moreover, it is not possible to test new methodologies on such a small device for the cost is very high. The main objective of this thesis is to recreate the effects of spatial hearing in a bigger over the ear type headphone which are inexpensive and to create a system that can be used for testing various hearing aids. Such a system can also be used for other purposes like noise cancellation devices for people working in industrial areas which have high noise levels. If a speech enhancement system is added it will improve the hearing capabilities of a person in challenging environments with high noise levels like industries. Figure 1.1 shows the schematic of the components of such system. It must consist of two microphone arrays on either side of the head for estimation of Direction of Arrival (DOA) of the source, a processor for performing various signal processing tasks and a pair of speakers for rendering the sound. The microphone arrays are used for the direction of arrival. Here an assumption is made that the source is in the far field. Only the DOA can 1 CHAPTER 1. INTRODUCTION 2 Figure 1.1: Components of a Binaural Hearing System be estimated in the far field. The far field assumption makes the waves arriving at the sensors (microphones) as plane waves thus simplifying the mathematics substantially. The processor is the core of the system design which performs all the signal processing required. For this thesis the processor is the MAP framework that has been developed in [1]. The headphones are used for the reproduction of the sound at the listener’s ears. Ideally the headphones must be high fidelity headphones which have an almost linear response throughout the audible frequencies i.e. 20 Hz - 20 kHz. The following chapters discuss in detail about the system and its evaluation. The problem is to reproduce the sound field at the ears as accurately as possible in real time. The hypothesis in this thesis is that the sound field at the ear can be reproduced fairly accurately by using the Binaural Hearing System. The measure of the accuracy is the ability to localize (to locate) the source when presented to the subjects. 1.1 The Physiology of Hearing (A Signal Processing Perspective) Human hearing plays a vital role in the localization of sound sources, and we are able to locate very accurately the spatial location just by listening. This information is confirmed by the visual feedback of the eyes. Blind people rely completely on their ears to locate the sources of sound very accurately. There are three main parts of the human ear; the outer ear (auris externa), the middle ear (auris media) and the inner ear (auris interna) [2]. The external ear is a very important part for location determination consists of the pinna or auricular and the external ear canal. The sound that reaches the human ears is transformed by the shape of the pinna and also the human body (nose, shoulders, etc). They act as obstacles to the sound thus causing gain difference (attenuation and resonance) and slow down (time and phase difference). These changes can be described by measuring the spectral changes in the amplitude and the phase of the signal in different spectral bins or components. The spectral changes caused by the head and CHAPTER 1. INTRODUCTION 3 torso is found out by first determining the spectrum of the source and then subtracting it from the spectrum of the wave after it enters the ear canal [3]. These spectral differences in the amplitude and the phase describe the sound pressure differences due to the intervening structures (Head, Torso and the Pinnae) can be modelled as a transfer function H(.). Mathematically, using the Fourier transform notation is can be shown as described in Equation 1.1. Where, y(f, θ) is the spectrum of the wave after it enters the ear canal, x(f, θ) is the input spectrum both defined as a function of the frequency f and the phase θ. G(.) is the gain. y(f, θ) H(f, θ) = G(f, θ) (1.1) x(f, θ) Here, H(.) is a complex function that is specific for each direction of the wave around each of the ears. ”These transfer functions that describe the changes between the source and the source and the outer ear are called Head Related Transfer Functions (HRTF)” [3]. ”HRTFs characterize the transformation of a sound source to the sounds reaching the eardrums, and are central to binaural hearing” [4]. The HRTF is highly specific to a person. It is very difficult to mathematically model a general HRTF that would work with all persons. However, many models are available using wavelets and genetic processing [5][6][7]. One simple model assumes the HRTFs H(θ, z) to be a function of the angle of the location of the source θ which can be subdivided into the azimuth and elevation angles. z is the distance of the sound source from the listener. This is called the Common Acoustic Pole Zero Model (CAPZ model). However, this thesis uses a set of standard HRTFs available from the CIPIC (Center for Image Processing and Integrated Computing) database [8]. The reason for using the CIPIC database is that the HRTFs are given in a simple Finite Impulse Response (FIR) filter forms of 200 coefficients for each angle which can be easily searched from the database. However, this thesis proposes a method to compress these HRTFs without any loss of the binaural characteristics. 1.2 Localization Localization refers to the ability of humans to localize a sound source in space. The localization can be done in space assuming the listeners head as the origin. The coordinate system used to denote the angles relative to the listener in space can be called interaural coordinate system, with the line joining the center of the two ears as the interaural axis. It is a spherical coordinate system with the origin at the center of interaural axis. The azimuthal plane is defined as the plane that contains the interaural axis. The CHAPTER 1. INTRODUCTION 4 Figure 1.2: The Interaural Coordinate Axis elevation is defined as the angle made with the azimuthal plane see Figure 1.2. The plane perpendicular to the azimuthal plane is called the frontal plane. The median plane is defined as the plane that divides the head into two symmetrical halves along the nose as shown in Figure 1.2. Localization is performed mainly by the time difference and level difference cues. These cues along with the other cues like the Interaural Phase Differences, etc. are collectively called binaural cues. The salience of the binaural cues depend on many factors such as knowledge of the source characteristics, frequency of the source, the reliability and the plausibility of the cues and consistency of the cues across the spectrum [9] ”It is evident that the perception of locale and attention to a sound source require a transformation relative to the arriving wave front. Since acoustic information arrives at the impinging sound front. Since transformation of the sound is necessary for the sound localization it is reasonable to assume that the external ear served as the acoustical device to perform requisite transformation.”[10] CHAPTER 1. INTRODUCTION 1.3 5 Interaural Time Differences Interaural time differences (ITD) are binaural cues that characterize the localization primarily in the azimuthal plane. When the sound reaches the ear from a position in space to the listener, the sound will first reach the ipsilateral ear (the ear that is closest to the source) then after some time delay it will reach the contralateral ear (the ear that is away from the source). This time difference is called the ITD. It is one of the major cues responsible for the localization ability in the azimuthal plane. However, ITD are not unique for all the locations in space. The locus of all the points that have equal ITD forms a cone around each ear, called the cone of confusion. As the ITD arriving from sources that are located in the cone of confusion are the same they cannot be localized uniquely by these cues, they will be differentiated by the use of another set of cues described in the next section. The ITD are dependent on the frequency of sound, as the speed of sound in air is relatively constant. The ITD do not depend on the amplitude. The ITD are used in the low frequencies for localization primarily upto 1500 Hz according to the duplex theory [9]. At higher frequencies the time differences are small and their significance is low. There will be little changes in the ITD of each person due to the anatomical differences. The ITD are illustrated in Figure 1.3 Figure 1.3: Illustration of Interaural Time Differences CHAPTER 1. INTRODUCTION 6 The ITD cues are very dominant below the 1500 Hz frequency, i.e. the low frequencies. [9]. Assuming 90◦ to be the angle that is completely towards the right of the listener and 0◦ to be to the right in front of the listener and −90◦ to be in the left of the listener and 180◦ to be behind the listener, the ITD increases as the source moves in the azimuthal plane towards the ears and is zero at 90◦ and 180◦ respectively. The ITD can be modelled using diffraction theory. a τlf = (3 ∗ sin(θ)) (1.2) v a τlf = (sin(θ) + θ) (1.3) v Equation 1.2 is used when the source is located on the ipsilateral side and Equation 1.3 when the source is located on the contralateral side. The equations 1.2 1.3 give the approximate values of ITD (τlf ) for a source located at an angle θ assuming a spherical head with its center at the origin and the line joining the ears (Interaural Axis) as it axis as shown in figure 1.2. a is the head radius and v is the speed of sound. 1.4 Interaural Level Differences The Interaural Level Differences (ILD) is the other set of binaural cues that are used by people to localize the sound in space. The ILD are more complex and are very hard to model mathematically. The ILD is a cue that will help localize the sources in the median plane. The level differences are primarily due to the presence of pinna and other body parts like the head, neck, torso, etc. The ILD are used for localization at higher frequencies. The ILD are denoted by the HRTF of each ear for and are unique for each spatial location. The ILD cues are frequency dependent and very reliable cues for localization especially at higher frequencies and more importantly in the median plane. If ILD cues are removed from a binaural system it increases the front back confusions. However, there is no definitive evidence that front back confusions are completely eliminated by the ILD cues [9]. 1.5 Duplex Theory of Localization The Duplex theory of localization was proposed by Lord Rayleigh in 1907. According to Feddersen et al. ”ITD are used to localise low frequency sounds, in particular, while ILD are used in the localisation of high frequency sound inputs. However, the frequency ranges for which the auditory system can use ITD and ILD significantly overlap, and most natural sounds will have both high and low frequency components, so that the auditory CHAPTER 1. INTRODUCTION 7 system will in most cases have to combine information from both ITD and ILD to judge the location of a sound source.”[11] The duplex theory does not take into account the role of the external ear (pinna) in localization. 1.6 Other cues that help in localization Apart from the binaural cues there are several other cues that help in localization, these are monaural cues or the Interaural Phase Differences (IPD), the knowledge of the source characteristics, realism and plausibility of the source content, source frequency etc. The next chapter discusses the design and implementation of the two important cues, i.e. ITD and ILD, for designing a binaural hearing system. From here on Binaural Hearing System(BHS) refers to the model of the binaural system being built. Chapter 2 Analysis of ITD 2.1 Analysis of the Interaural Time differences The ITD can be designed mathematically with a fair accuracy using simple geometry and construction. Assuming a Spherical head [12] the construction is shown in Figure 2.1. Figure 2.1: Illustration of ITD COH - Centre of the Head in polar coordinates HR - head radius L - Distance from the Tangent point D - Distance from source to COH DLD - Distance to left ear direct path 8 CHAPTER 2. ANALYSIS OF ITD 9 DLA - Distance from the left ear DL - total distance to left ear DRD - Distance to right ear direct path DRA - Distance from the Tangent point to right ear arc length DR - Total distance to right ear AZ - Azimuth of the source (−180◦ left to +180◦ right) EL - Elevation of the source (−90◦ above to +90◦ below) θL - angle from left ear COH to left ear tangent point COH θL - angle from right ear COH to right ear tangent point COH φ - angle from line D to left and right ears tangent point COH AL - angle from the left ear COH to line D AR - angle from the right ear COH to line D. Xk , Yk , Zk - location of the source in Cartesian coordinates v - Speed of sound. The ITD is defined as in equation 2.1 IT D = Arrival time at the ipsilateral ear − Arrival time at the contralateral ear (2.1) Assuming no elevation (EL = 0). From Figure 2.1 L is calculated as p L = D2 − HR2 (2.2) φ = arccos HR D (2.3) DLD = p D2 + HR2 + 2D HR sin(AZ) (2.4) DRD = p D2 + HR2 − 2D HR sin(AZ) (2.5) ( DLD, if DLD<L DL = L + DLA, otherwise ( DRD, if DRD<L DR = L+DRA, otherwise (2.6) (2.7) DLA and DRA are given by DLA = θL HR (2.8) DRA = θR HR (2.9) For the Non Zero elevations the modifications are made to D and AZ only given by Equations 2.9, 2.10 CHAPTER 2. ANALYSIS OF ITD 10 q 2 + Y 2 + Z2 D = XK K K (2.10) AZ = arcsin (cos(EL). sin(AZ)) (2.11) The ITD are implemented in Matlab for various angles of azimuth and elevation. Figure 2.2 shows the ITD and its dependence on the elevation and the azimuth, It can be observed from Figure 2.2 that the ITD are almost the same for symmetrically placed azimuths and elevations. Moreover it can also be observed that the Elevation angle 90◦ (median plane) is a place where the ITD are zero for all the azimuths. This model confirms that ITD are not the cues used for localization in the median plane, because the median plane geometrically has zero ITD. However, in reality there are some minor differences in the median plane, and the human ear is very sensitive to very small time differences even in the median plane. Figure 2.2: Plot of ITD for various elevations and the azimuths assuming a head radius HR = 87.5mm and v = 340.3m/s the colorbar indicates the the time difference 2.2 Modelling ITD in the binaural hearing system The ITD are not modelled in the binaural hearing system, because the system itself will preserve these binaural cues as it uses two microphones that are approximately placed very close to where the ear canals are located. However, it is very important to understand the models of the ITD, because CHAPTER 2. ANALYSIS OF ITD 11 the sensors lie outside the ear cups, so there will be a slight change in the geometry of the system as seen in Figure 2.3. Figure 2.3: Diagram showing the excess path difference because of the introduction of the Ear cups where ab are the positions of the ears, AB are the positions of the microphone arrays. OO0 is the excess path difference that is caused because of the new positions. Using simple geometry the excess path can be calculated as OO0 = d0 sin(θ) (2.12) In Equation 2.12, θ is the angle of the arrival of the source wave. The location of the source is assumed to be in the far field. From the Figure 2.3, d0 is very small compared to the distance d, which is the Inter Aural distance. Hence, it is ignored in the design. One advantage of using the Binaural Hearing System is that the system by its design with have the ITD of the person wearing the system. Chapter 3 Analysis of ILD The design of interaural level differences mathematically for the purpose of simulating binaural hearing is a difficult task. The Head Related Transfer functions (HRTF) are the basic functions that are used to transform the electrical signals from the ear phones at the listeners ears. They simply replicate the transformation of the sound as it enters the ear canal through the pinna. These HRTF are highly dependent on the spatial location of the source. The HRTF are complex functions of the azimuth, the elevation, the distance of the source and are dependent on the anthropometry of the persons ears. There has been a lot of research done on interpolating the HRTF. One such method is presented in [13]. These HRTF can be obtained from various databases available on the internet. For this thesis the HRTF has been taken from the CIPIC labs available at [8]. The reason for using this database is that it is well documented and its ease of use with Matlab. 3.1 Compression of the HRTF The CIPIC HRTF database is a very large database, although this thesis does not try to interpolate the HRTF, it proposes a method for compressing the HRTF using a filter design technique known as Complex Minimax design. This is done to address the limited memory issues when implementing in a Digital Signal Processor (DSP). The Complex Minimax design is used to design a filter which has complex constraints like magnitude, phase and the group delay. The Complex Minimax Design is described in the next section. 3.2 Complex Minimax Criterion In general the complex minimax criterion can be defined with additional max norm and auxiliary linear constraints as given by Equation 3.1. 12 CHAPTER 3. ANALYSIS OF ILD 13 min max v(ω) |H(ω) − Hd (ω)| (3.1) max v(ω) |H(ω) − Hd (ω)| ≤ ε2 (3.2) Pξ ≤ p (3.3) ζRN ωΩ1 subject to, ωΩ2 Here, Hd (ω) is the desired filter function taken from the HRTF, v(ω) is a strictly positive weighting function, ε2 is the max norm bound. P is M × N constraint matrix, and p is M × 1 constraint vector. It is assumed that the domains Ω1 ,Ω2 are closed and all the associated frequency functions are continuous. The above constraints can be rewritten to a continuous semi infinite linear form by using the Real Rotation Theorem. The Real Rotation Theorem States that: For a complex number Z=x+iy the magnitude of the complex number is bounded by p |Z| = X 2 + y 2 = max < Zej2πu (3.4) −0.5≤u≤0.5 The variable u denotes the rotation on the vector, and a set of new constraints are formed for every rotation. E.g. a rectangular approximation have four values of the rotation of the complex vector. Now the complex constraint problem mentioned in Equations 3.1- 3.3. can be written as a semi-infinite linear programming problem. min δ v(ω) = < (H (ω) − H(ω)) .ejθ ≤ δ, ωΩ , θ2π 1 d (3.5) jθ v(ω) = < (Hd (ω) − H(ω)) .e ≤ ε2 , ωΩ2 , θ2π P ξ ≤ p 15π Here, θ is taken to be an octagonal approximation, i.e. θ[0, π8 , 3π 8 ... 8 , 2π]. And V (ω) = 1. The results of the compression are shown in Figures 3.1 - 3.3. Equation 3.5 can be compactly written in matrix form as ( min δ (3.6) Ax ≤ B A,x and B can be written < {C} o n π < Cej 8 n o j 2π 8 < Ce A= .. o n . < Cej 7π 8 as −1 n πo < De 8 −1 n 2π o < De 8 −1 B = x = h .. δ .. . n o . 7π < De 8 −1 CHAPTER 3. ANALYSIS OF ILD 14 where, 1 ω1 T e−jΩ ω2 vi φ1 vi Hd1 −j2Ω ω3 C = ... D = ... φ = e Ω = .. .. T . vi φI vi HdI . −jN −1Ω e ωI Here h is the desired filter and δ is the maximum chebyshev error and {ω1 ω2 · · · ωI } are the radial frequencies. Equation 3.6 can be implemented in Matlab using the linprog command, which is used for solving the linear optimization problem. It can be noticed from Figure 3.2 that the magnitude of the HRTF is matched very well, but there is a certain frequency at which the phase is not matched. The performance of the Complex Minimax method with 25% (Figure 3.3) compression is almost identical to the actual HRTF. The magnitude plots show the magnitude in dB on the y-axis and the the normalized frequency on the x-axis and the phase plots show the phase angle on the yaxis and the normalized frequency on the x-axis. CHAPTER 3. ANALYSIS OF ILD 15 (a) Magnitude (b) Phase Figure 3.1: Magnitude and phase plots of the Complex Minimax Design with 25% compression CHAPTER 3. ANALYSIS OF ILD 16 (a) Magnitude (b) Phase Figure 3.2: Magnitude and phase plots of the Complex Minimax Design with 34% compression CHAPTER 3. ANALYSIS OF ILD 17 (a) Magnitude (b) Phase Figure 3.3: Magnitude and phase plots of the Complex Minimax Design with 54% compression Chapter 4 Implementation 4.1 Time Difference of Arrival (TDOA) estimation and far field source location Consider two sensors (microphones) p, q placed at locations Lp and Lq , and the line joining these two sensors is called array baseline assuming that the speed of propagation of the sound in the medium is C, and the source is located at a position S. Then the TDOA is given by Equation 4.1. ||Lp − s|| − ||Lq − s|| (4.1) C The idea of location is to determine S from the Equation 4.1 Given the time delay that is observed between the sensors is τ(p,q) is known. However, the solution for S from Equation 4.1 is not unique. It is a hyperboloid surface having the sensors at the foci, Therefore the location of S is on a hyperbola for a given TDOA estimate τ(p,q) . However the Angle of Arrival (AOA) can be determined. This is true because of the assumption of far field makes the hyperboloid surface approach the vector that is drawn from the center of the array base line to the source asymptotically. This can be translated in the wave model as: in the far field the wave is assumed to have a planar wave front as compared to a spherical wave front in the near field. By the far field assumption we can rewrite Equation 4.1 as T (Lp , Lq , S)) = T (Lp , Lq , v) = T LT p v − Lq v C (4.2) Here v is the unit vector also called the Direction of Arrival (DOA) vector joining the base line and the source. This is related to the AOA as v = [sin(α) cos(α)]T (4.3) Here α is the AOA. From Equations 4.2 4.3 the TDOA can be written as 18 CHAPTER 4. IMPLEMENTATION 19 ||Lp − Lq || sin(α) C (4.4) d = sin(α) C Here d. sin (α) is called the effective sensor spacing, as seen from the direction of the source. Here, it is assumed that there are two sensors and the unit vector v is two dimensional, hence the source is assumed to be located in a two dimensional plane [1]. T (Lp , Lq , v) = Figure 4.1: Far Field model for estimation of Angle of Arrival. [1] Figure 4.2 shows the relationship between the Direction of Arrival and the Angle of arrival in the far field. Figure 4.2: The relationship between the Direction of Arrival vector and the Angle of Arrival. [1] Figure 4.3 shows the modelling of a system for estimating the TDOA between any two sensors. In Figure 4.3 the delays τ1 and τ2 are due to the physical separation of the sensors 1 and 2. It was shown in the equation 4.4. The filter banks are used to convert the signal from time domain to frequency CHAPTER 4. IMPLEMENTATION 20 domain. The correlator is used to correlate the signals from different sensors at various frequencies and then add them. The peak detector detects the maximum of the output of the correlator and estimates the TDOA τ1 ∼ τ2 from the estimate of the maximum peak. The correlator used here is the Generalized Cross Correlation (GCC) and the filter banks used is the Windowed Discrete Fourier Transform (WDFT). Figure 4.3: Model for estimating the TDOA 4.2 Generalized Cross Correlation (GCC) The Generalized cross correlation is an extension to the cross correlation, it is achieved by the addition of a specialized gain function, also called, a processor [14]. The cross correlation is given by Equation 4.5 R∞ (4.5) Rxy (τ ) = −∞ Gxy (f )e2πf τ df Where, x(t),y(t), are the signals from uniform linear microphone array whose positions are adjacent to each other and Gxy (f ) is the cross power spectrum. The generalized cross correlation between x(t) and y(t) is given by Equation 4.6. R∞ g Rxy (τ ) = −∞ ψg Gxy (f )e2πf τ df (4.6) Here ψg is given by Equation 4.7 ψg (f ) = H1 (f )H2∗ (f ) (4.7) CHAPTER 4. IMPLEMENTATION 21 Here, ψg is called frequency weighting or the processor and H1 (f ), H2 (f ) are the transfer functions of the sensors and ∗ indicates the complex conjugate. In practice we can calculate only an estimate of the cross power spectrum from finite observations. There are many processors depending upon the application. Here, for localization we use the Phase Alignment Transform (PHAT) processor. 4.3 Phase Alignment Transform (PHAT) The PHAT processor is a processor that is used to whiten the cross power spectrum of two signals. The PHAT processor is given by Equation 4.8 [14]. ψP HAT = 1 |Gxy (f )| (4.8) The advantage of the PHAT processor is that it whitens the cross correlation. If the propagation path is not a single path the cross correlation does not have uniform cross power spectrum, instead it will have many dips (poles) which are caused due to reverberation in a real room environment; using a PHAT processor removes the effects of the multi path propagation. It is a very robust processor in reverberant environments. The GCC-PHAT is used to find the location of the source. The GCCPHAT is given by Equation 4.9. τ̂xy (τ ) = arg max τ 4.4 R∞ 2πf τ −∞ ψP HAT Gxy (f )e df (4.9) Steered Response Power - Phase Alignment Transform (SRP-PHAT) The SRP (steered response power) beamformer is given by equation 4.10. The SRP adds the N microphone elements used by aligning their phase, effectively steering the beamformer towards the direction of the maximum power [15]. This can be used along with the PHAT processor to obtain the SRP-PHAT. Y (ω, q) = N X = Gn (ω)Xn ejω∆n (4.10) n=1 Xn (ω), Gn (ω) are the Fourier transforms of the N th microphone phone element and its associated filter response respectively. The equation for SRP-PHAT is given by P (q) = arg max ∆k −∆l N X N Z X l=1 k=1 ∞ −∞ Ψlk (ω)Xl (ω)Xk∗ (ω)ejω(∆k −∆l ) dω (4.11) CHAPTER 4. IMPLEMENTATION 22 Here Ψlk (ω) = Gl (ω)G∗k (ω) which is analogous to Equation 4.8. Equation 4.11 is an extension of the GCC-PHAT which chooses the GCCs of all the possible microphone pairs and then maximizes the estimate of TDOA using SRP. This improves the resolution of the peak of the true TDOA. The SRP-PHAT is given by Equation 4.11 There are N (when l = k ) auto correlations in Equation 4.11. These terms contribute to the DC offset of the SRP that is independent of the steering delays. 4.5 Windowed Discrete Fourier Transform (WDFT) The windowed DFT is used to calculate the cross power spectrum in each filter bank in Equation 4.9. The use of WDFT allows us to reduce the effects of spectral leakage. The WDFT is given by Equation 4.12. X {m, ωi } = PN −1 n=0 W [n]X {M m + n} e−jωi n (4.12) Here, W [n] is the windowing function and ωi = 2πi/N, i = {0, 1, 2, . . . , N − 1}, M is the overlap and m is the sub band index. For sake of completeness the IWDFT is also given in Equation 4.13. X {M m + n} = 1 PN −1 KN i=0 1 s[i] Pk−1 k=0 X {m + k, ωi } .ejωi n (4.13) Here, the WDFT is used instead of the usual DTFT to avoid spectral leakage and to have a proper estimate of the Cross Power Spectrum. Further results of the WDFT can be found at [16]. 4.6 Results of PHAT in Real Time The performance of the GCC PHAT is simulated and shown below. The delay is simulated by using a delay filter which can be found in [17]. Figure 4.4 shows the simulated results of the PHAT taken from two sensors with varying simulated delays. CHAPTER 4. IMPLEMENTATION 23 Figure 4.4: The output of the PHAT with different simulated delays with a sampling frequency of 32000 Hz Figure 4.5 shows the implementation of PHAT in the MAP framework which is presented in Appendix A. The speakers are first located at fixed positions they are then switched from one side to the other using an analogue mixer. Both the speakers are identical to each other and white noise is played at each of the speakers. The first speaker is placed at 0◦ and the other speaker at −30◦ . The results (calculated angle of arrival) are presented in 4.5. CHAPTER 4. IMPLEMENTATION 24 Figure 4.5: PHAT in Real time implementation using the MAP framework Figure 4.6 shows the implementation schematic of the above set up. Here 1, 2 are the sensors (microphones), and A, B are the speaker positions. Speaker A is placed at 0◦ and speaker B is placed at −30◦ . Figure 4.7 shows the actual set up of the sensor array and the speakers used for the test. The sensors are placed at a distance of 2 cm from each other. Figure 4.6: Schematic of the PHAT setup Figure 4.7 shows the mirophone array and the Fostex 610B speaker used for the testing of PHAT. CHAPTER 4. IMPLEMENTATION 25 (a) Microphone array used for (b) Speaker used for testing testing the PHAT the PHAT Figure 4.7: Components used for real time testing The Figure 4.8 shows the calculated angle using PHAT for various positions of the speaker. It can be noticed that the calculated angle error increases as the source moves towards the end fire regions of the microphone array i.e. towards 90◦ . This can be corrected using a neural network or hidden markov modelling for limited positions of a source. Figure 4.9 shows the interpolated values for the real data in 4.8 with spline and cubic interpolations using the interp1 commmand in Matlab. CHAPTER 4. IMPLEMENTATION 26 Calculated angle using PHAT 90 80 Calculated angle from PHAT 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 90 80 90 postion of the source Figure 4.8: Calculated Angle using the PHAT 100 90 80 Interpolated angle from PHAT 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 Postion of the source Figure 4.9: Spline(red) and cubic(blue) interpolation of the angles Chapter 5 Methods, Instrumentation and Results 5.1 Construction A system was built to preserve the binaural characteristics. The system was then tested both in simulated and real time environments. It consists of a set of hearing protectors, which were modified to fit the microphones that are needed for recording the system. Figure 5.1 shows one side of the BHS. Figure 5.1: Left side of Binaural Hearing System. Each side of the BHS requires an array of at least three microphones for calculating the AOA in space i.e. in three dimensions. The microphones are 27 CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 28 AKG C417 PP model with standard XLR connector for phantom powering. These microphones are all taken from a single batch to ensure that the frequency response of the microphones more or less remain the same. The microphones are numbered from 1-6 from left to right as shown in Figure 5.2. Figure 5.3 shows some more pictures of the BHS and its setup. Figure 5.2: BHS showing microphones on the Left and the Right Side CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 29 Figure 5.3: Microphone bias and the sound card used to interface with the MAP framework The microphones were connected to an external sound card Phase 88 FW Rack which in turn drives the output to the BHS. The sound card supports ASIO (Audio Stream Input/Output) standard. The sound card is controlled via a fire wire cable. The sound card is initiated and controlled by the MAP (Matlab Audio Processing) framework (See Appendix A). The block diagram of the BHS is shown in Figure 5.4. CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 30 Figure 5.4: Block diagram for the implementation of the system using the Matlab Audio Processing framework Each side of the BHS was fitted with an array of three microphones placed in the shape of an L. The horizontal arm of the microphone array is used to find the angle in the azimuthal plane and the vertical arm for finding the angle in the vertical plane. This is the minimal configuration that can be used to find the 3D location of one source in far field. In Figure 5.4 the headphones used are Bose AE2 headphones. These headphones have a very linear frequency response (See Figure 5.5) in the audio frequencies. However, the BHS itself can be fitted with a pair of speakers which can be used for rendering sound in real time. However, this thesis uses the real time recording capability of the MAP framework. CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 31 Figure 5.5: Frequency response of the BOSE AE2 headphones used. (source: http://www.head-fi.org) 5.2 Algorithm An algorithm was developed to implement the BHS. A flow chart for the algorithm is shown in the Figure 5.6. CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 32 Figure 5.6: Flow chart of the BHS implementation. 5.3 Testing Testing the BHS is a challenge, as the system is used to reproduce 3D sound for people. The best way to evaluate and test this system is to have a behavioural hearing test on people. The test was designed in such a way that it can be used for testing the capability of the BHS to preserve the binaural capabilities of the recorded sound, i.e. localization. This test was carried out only in the azimuthal plane. During these tests the record feature of the MAP framework was used. This is done so that the subjects taking the test CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 33 will not have any knowledge of the real location of the source as the recording and the testing can be done in different places. It also prevents any leakage that may occur through the headphone caps. This is also done to counter the head movements for which the BHS is not capable to accommodate as it does not use any fixed point of reference. The test can be conducted in real time provided that • The leakage into the caps of the BHS is very small compared to the signal source. • The head is very still. The behavioural test consisted of four different varieties of sound sources; white noise, click train, male speech and female speech. All the sounds are presented to the user at a white noise equivalent of 32 dB SPL Aweighted. All the source sounds were prerecorded on a dummy head with the average head radius of 87.5 mm. The speaker used for recording was a Fostex 6310B (Figure 4.7(b)) is used. The distance from the center of the head to the speaker was 180 cm. The recordings were made in a natural room which was quiet (26 dB(A) SPL). This has been done to allow for room reverberation which makes the sound feel as if it is outside the head [18] [19]. The sound card used with the MAP framework was Terratec Phase 88 Rack FW 24bit/96KHz recording system. The sound card has the capability to record 8 analog microphones and can drive up to 8 sound channels. For the purpose of this experiment 6 input channels (one from each microphone on the BHS) and two output channels (left and right) were used. A set of 12 measurements for each sound source, therefore a total of 48 measurements were made. There were 19 subjects of which 7 were female subjects. The subjects who took the test were aged between 20 − 39 years with average age being 24 years. None of the subjects had previous experience of taking a hearing test but a few of them reported that they had taken other behavioural tests. All the subjects were given prior instructions about the test. All the subjects were students. None of the subjects reported any hearing impairments, except one (subject 7) who reported a hearing impairment in the high frequencies. The average time each subject took to complete the test was 27 min. The task of the subject was to find the direction of the sound. However, marking the directions is often a difficult task. To make this easier the recordings were made at 30◦ intervals and each of the positions are uniquely identified by a corresponding number on a 12 hour clock. This makes it easier for the subject to identify the direction. A User Interface (UI) was developed to make it easier for the subjects to listen and identify the angles. Figure 5.7 shows the UI for the localization test. The users were given instructions on how to use this UI prior to the CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 34 test. The UI has the capability to provide any of the recorded sources in random order and it also gives the test leader the possibility to store the data provided by each subject in an easy and convenient way for further analysis. Figure 5.7: User Interface for behavioural testing 5.4 Results The subjects marked the locations using the UI. However some subjects could not mark the data for a given angle. These skipped directions are omitted from the results. Figure 5.8 shows the locations plot for all the sound sources. The position plot is a plot of all the locations marked by all the subjects for a given source. This plot shows the deviation for all the positions. The color intensity shows the number of times a position is selected by all the users. The radius of each bubble in Figure 5.8 shows the deviation. ANGLES PERCEIVED BY THE SUBJECTS 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 0 0 50 50 100 150 200 250 ANGLES PRESENTED TO THE SUBJECTS POSTION PLOT FOR MALE VOICE 100 150 200 250 ANGLES PRESENTED TO THE SUBJECTS POSTION PLOT FOR WHITE NOISE 300 300 350 350 5 10 15 20 25 5 10 15 20 25 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 0 0 50 50 100 150 200 250 ANGLES PRESENTED TO THE SUBJECTS POSTION PLOT FOR FEMALE VOICE 100 150 200 250 ANGLES PRESENTED TO THE SUBJECTS POSTION PLOT FOR CLICK TRAIN 300 300 350 350 5 10 15 20 25 5 10 15 20 25 Figure 5.8: Deviation of location plots for various sources, The radius and the color bar both show the deviation. ANGLES PERCEIVED BY THE SUBJECTS ANGLES PERCEIVED BY THE SUBJECTS ANGLES PERCEIVED BY THE SUBJECTS 350 CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 35 CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 36 The mean of the direction as observed by all the users is shown in Figure 5.9. Figure 5.9: Mean position plot Figure 5.9 shows that there are visible front-back confusions at 120◦ , 180◦ , and 300◦ . This is probably a result of using a generic set of HRTFs for all the users. 15% of subjects reported that they had no confusions resolving the front and the back. However, one subject quit from the test as he could not feel that he was able to localize any sounds. Figure 5.10 shows the Mean Squared Error of the positions as found by each subject taking the test, for different types of sources. CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 37 24 White Noise Click Train Male Voice Female Voice MSE of localization 22 20 18 16 14 12 10 8 0 2 4 6 8 10 12 14 16 18 20 Subjects 1 − 19 Figure 5.10: The Mean Squared Error of the localization for each user at all the positions One interesting observation can be found from the Figure 5.10. Subject (Subject 7) who reported to have a hearing impairment at higher frequencies was having a significant higher error for the female voice. 5.5 Analysis of Variance A two way Analysis of Variance was done on the localization data collected from the users to test the significance of the difference in the means of sources, directions and their interaction. Table 5.1 shows the summary of the analysis. Table 5.1: Analysis of Variance Sum of Degrees of Freedom Squares Sound Sources 11.46 3 Directions 6746.93 11 Sources × Directions 81.2 33 Error Term 1990.4 864 Total 8829.99 911 sources Mean Squares 3.819 613.357 2.461 2.304 of It is evident from the last column in Table 5.1 that for a critical significance value of p = 0.05, that there was no significance of the effect of F 1.66 266.25 1.07 Prob >F 0.1746 0 0.3659 CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 38 the sound sources, but the effect of the directions was significant, and the interaction of the sources and the directions was also not significant. This shows that the localization highly depends on the direction of the source rather than the source itself. This also validates that the test designed will only be a test for the localization capability of the subjects. 5.6 Other Conclusions and Observations Ease of localization as measured by Mean Opinion Score (MOS) was collected from the subjects. The subjects were asked to rate the difficulty of finding the direction for various sources on a scale of 1 − 10 with 10 being the most difficult. The average MOS shows that natural sources are much easier to localize than artificial sources. The MOS scores are presented in Figure 5.11. It can be seen from Figure 5.10 that the error of all the sources is almost the same which is contrary to the MOS scores (See Figure 5.11) where the subjects reported that a natural source was easier to localize. This is probably because of the psychological effects on the localization of a natural source. Figure 5.11: Bar graph showing the mean opinion score of the ease of localization Other observations like the precedence effect have also been observed. The precedence effect is defined in [20] as: ”A group of phenomena that are thought to be involved in resolving competition for perception and localization between a direct sound and a reflection.” The precedence of a sound produces a fused image of the direct path and the reflected path of the sound source which will help the listener to localize a source. The BHS has not been psycho-acoustically tested for the precedence CHAPTER 5. METHODS, INSTRUMENTATION AND RESULTS 39 effect. However, the precedence effect can be modelled; a GCC based filter for precedence effect can be found at [19]. Although the recordings were made in a natural room environment, the subjects reported that they could not hear any audible echoes. However, two subjects reported that they were able to hear the echoes when they were presented with the clicks. This shows that the precedence effect is preserved in the BHS. Figure 5.12 shows one of the clicks used on the subjects. It can be seen that the echoes that are recorded are well within the limits of the echo threshold. However the echo is seen to be around 5ms, at this delay the localization dominance of the fused sound image is not effective. The Figure 5.12 is also the room impulse response of the room used for the recording and testing. Figure 5.12: Room Impulse response of the test room. Chapter 6 Conclusion and future work A system was made to build a binaural hearing system using Matlab in real time using the MAP framework. Some basic challenges like addressing the systems capability to locate a single source, rendering the sources using HRTF have been addressed. However, a lot of work still needs to be done to make the BHS a working and a useful model and more subjective and behaviourals tests are required to validate the system. Some of the future works must include the following. • The building of a speaker on the ear cups. This poses a challenge as the system has to be built in such a way that there are no feedback effects as the microphones are very close to the speaker. • Addition of more microphones helps to resolve the sources with lesser error. Various constellations other than the linear arrays can also be tested. Calibration of microphones may also be performed. • Ability to check for the head movements and adjust the HRTF accordingly. • Ability to remove the room reverberations for improving the speech quality. • Ability to resolve the front back confusions completely. • Ability to resolve more than one source and render the sources separately. • Ability to completely resolve all the possible directions around the head completely and clearly. • Development of systems like noise cancellation, echo reduction, frequency shifting, etc. to the BHS thus mimicking an intelligent hearing aid. 40 CHAPTER 6. CONCLUSION AND FUTURE WORK 41 • Implementation of the BHS in a DSP processor to make a standalone system without the use of the computer. • To study the perceptual effects on localization in a quantitative manner. Appendix A Matlab Audio Processing (MAP) Framework The MAP (MATLAB audio Processing tool) framework is a tool that is developed by the acoustic research group at Blekinge Institute of Technology [1]. It is a data acquisition tool that is used to get a data from a sound card that supports the ASIO system in real time. The framework provides an interface between the Matlab environment, the user and the real time environment. Figure A.1: Schematic showing the MAP framework. [1] 42 APPENDIX A. MATLAB AUDIO PROCESSING (MAP) FRAMEWORK43 The Figure A.1 shows the MAP framework. The framework takes the input from the Audio driver. (Note: The audio driver must support the ASIO standard to work with the framework). Matlab environment is used to write a user defined script that changes the signal from the input and then this signal is fed to the output as shown in Equation A.1. All the processing is done in blocks. Xn = f (yn [k]) (A.1) In Equation A.1 k represents the length of the block or the buffer length, n is the frame number. The function f (.) is the user defined script that is used to process the input signal. The major advantage of using the Matlab framework is that it provides the user with a very clean interface that is free from setting up or installing all the user would need is a sound card that supports the ASIO standard. The ASIO standard is a standard that is developed to bypass the operating system kernel mixer so that it provides a low latency interface. The MAP framework has a UI that is used to give the information to the user about the dropped blocks or the usage of the processor. Figure A.2 shows the UI example of the MAP framework. Figure A.2: A sample UI from the MAP framework More information about the MAP framework and its evaluation can be found in [1] Bibliography [1] M. Swartling, Direction of arrival estimation and localization of multiple speech sources in enclosed environments [Elektronisk resurs]. Karlskrona: School of, Blekinge Institute of Technology, 2012. [2] J. Blauert, Spatial hearing: The psychophysics of human sound localization. The MIT press, 1997. [3] W. A. Yost, Fundamentals of hearing: An introduction . Academic Press, 1994. [4] R. O. Duda, “Modeling head related transfer functions,” in Signals, Systems and Computers, 1993. 1993 Conference Record of The TwentySeventh Asilomar Conference on, p. 9961000, 1993. [5] N. M. Cheung and S. Trautman, “Genetic algorithm approach to headrelated transfer functions modeling in 3-D sound system,” in Multimedia Signal Processing, 1997., IEEE First Workshop on, p. 8388, 1997. [6] J. C. Torres, M. R. Petraglia, and R. A. Tenenbaum, “HRTF modeling for efficient auralization,” in Industrial Electronics, 2003. ISIE’03. 2003 IEEE International Symposium on, vol. 2, p. 919923, 2003. [7] J. C. Torres, M. R. Petraglia, and R. A. Tenenbaum, “Low-order modeling and grouping of HRTFs for auralization using wavelet transforms,” in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP’04). IEEE International Conference on, vol. 4, p. iv33, 2004. [8] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The cipic hrtf database,” in Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the, p. 99102, 2001. [9] E. A. Shaw, “Acoustical features of the human external ear,” Binaural and spatial hearing in real and virtual environments, vol. 25, p. 47, 1997. [10] D. W. Batteau, “The role of the pinna in human localization,” Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 168, no. 1011, p. 158180, 1967. 44 BIBLIOGRAPHY 45 [11] W. E. Feddersen, T. T. Sandel, D. C. Teas, and L. A. Jeffress, “Localization of High-Frequency tones,” the Journal of the Acoustical Society of America, vol. 29, p. 988, 1957. [12] J. D. Miller, Modeling interaural time difference assuming a spherical head. Musical Acoustic, Stanford University, 2001. [13] Y. Haneda, S. Makino, Y. Kaneda, and N. Kitawaki, “Commonacoustical-pole and zero modeling of head-related transfer functions,” Speech and Audio Processing, IEEE Transactions on, vol. 7, no. 2, p. 188196, 1999. [14] C. Knapp and G. Carter, “The generalized correlation method for estimation of time delay,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 24, no. 4, p. 320327, 1976. [15] M. Brandstein and D. Ward, Microphone arrays: signal processing techniques and applications. Springer Verlag, 2001. [16] E. Jacobsen and R. Lyons, “The sliding DFT,” Signal Processing Magazine, IEEE, vol. 20, no. 2, p. 7480, 2003. [17] V. Vlim\”aki, “Simple design of fractional delay allpass filters,” in Proc. Euro. Conf. Circuit Theory Design, vol. 1, p. 48, 2000. [18] M. Cobos, J. J. Lopez, and S. Spors, “Analysis of room reverberation effects in source localization using small microphone arrays,” in Communications, Control and Signal Processing (ISCCSP), 2010 4th International Symposium on, p. 14, 2010. [19] K. Wilson and T. Darrell, “Learning a precedence effect-like weighting function for the generalized cross-correlation framework,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, pp. 2156 –2164, nov. 2006. [20] R. Y. Litovsky, H. S. Colburn, W. A. Yost, and S. J. Guzman, “The precedence effect,” The Journal of the Acoustical Society of America, vol. 106, p. 1633, 1999. [21] L. Wang, F. Yin, and Z. Chen, “An out of head sound field enhancement system for headphone,” in Neural Networks and Signal Processing, 2008 International Conference on, p. 517521, 2008. [22] T. Rohdenburg, S. Goetze, V. Hohmann, K. D. Kammeyer, and B. Kollmeier, “Objective perceptual quality assessment for self-steering binaural hearing aid microphone arrays,” in Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on, p. 24492452, 2008. BIBLIOGRAPHY 46 [23] C. P. Brown and R. O. Duda, “A structural model for binaural sound synthesis,” Speech and Audio Processing, IEEE Transactions on, vol. 6, no. 5, p. 476488, 1998.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement