ACOUSTIC BEAMFORMING: DESIGN AND DEVELOPMENT OF STEERED RESPONSE POWER WITH PHASE TRANSFORMATION (SRP-PHAT)

ACOUSTIC BEAMFORMING: DESIGN AND DEVELOPMENT OF STEERED RESPONSE POWER WITH PHASE TRANSFORMATION (SRP-PHAT)

Master Thesis

Electrical Engineering Emphasis on Signal Processing

ACOUSTIC BEAMFORMING: DESIGN AND

DEVELOPMENT OF STEERED RESPONSE

POWER WITH PHASE TRANSFORMATION

(SRP-PHAT)

AJOY KUMAR DEY

AND

SUSMITA SAHA

THIS THESIS IS PRESENTED AS PART OF DEGREE OF MASTER OF SCIENCE IN

ELECTRICAL ENGINEERING WITH EMPHASIS ON SIGNAL PROCESSING

BLEKINGE INSTITUTE OF TECHNOLOGY

AUGUST, 2011

Blekinge Institute of Technology

School of Engineering,

Department of Electrical Engineering

Supervisor: Dr. Benny Sällberg

Co-Supervisor & Examiner: Dr. Nedelko Grbic

BLEKINGE TEKNISKHA HÖGSKOLA

SE-371 79 KARLSKRONA, SWEDEN

TEL. 0455-385000

FAX. 0455-385057

1

ABSTRACT

Acoustic sound source localization using signal processing is required in order to estimate the direction from where a particular acoustic source signal is coming and it is also important in order to find a solution for hands free communication. Video conferencing, hand free communications are different applications requiring acoustic sound source localization.

These applications need a robust algorithm which can reliably localize and position the acoustic sound sources. The Steered Response Power Phase Transform (SRP-PHAT) is an important and robust algorithm to localize acoustic sound sources. However, the algorithm has a high computational complexity thus making the algorithm unsuitable for real time applications. This thesis focuses on describe the implementation of the SRP-PHAT algorithm as a function of source type, reverberation levels and ambient noise. The main objective of this thesis is to present different approaches of the SRP-PHAT to verify the algorithm in terms of acoustic environment, microphone array configuration, acoustic source position and levels of reverberation and noise.

I

2

ACKNOWLEDGEMENT

First of all, we would like to thank from deep of our heart to Dr. Nedelko Grbic for his remarkable contribution and the help provided throughout the thesis work. We believe that without his help and guidelines, we could not able to complete our thesis work.

We would also like to thank God for to give the ability to do this thesis and also give thanks to our families and friends for all their support throughout our study period.

3

II

CONTENTS

ABSTRACT……………………………………………………………………………...

PAGE

I

ACKNOWLEDGEMENT………………………………………………………………. II

TABLE OF CONTENTS………………………………………………………………...

LIST OF TABLES……………………………………………………………………….

LIST OF FIGURES………………………………………………………………………

III

VII

VIII

1. INTRODUCTION…………………………………………………………………... 11

1.1 Acoustic Source Localization………………………………………………….

1.1.1 Acoustic Source Localization and Source Tracking……………………

1.1.2 Acoustic Source Localization Methods………………………………….

1.2 Time Difference of Arrival (TDOA) Estimation………………………………

1.3 Methods of Steered Beam forming……………………………………………

1.4 Hypothesis……………………………………………………………………..

1.5 Organization of the Thesis……………………………………………………..

2. ACOUSTIC MODEL………………………………………………………………..

2.1 Active Sound Sources………………………………………………………….

2.2 Multipath Propagation Model of Acoustic Sound Waves……………………..

15

17

17

17

11

12

12

13

14

15

2.3 Measurement Scenarios………………………………………………………..

2.4 Direction of Arrival……………………………………………………………

2.5 Summary of the Acoustic Model………………………………………………

3. TIME DIFFERENCE OF ARRIVAL APPROACHES……………………………

3.1 Time Difference of Arrival…………………………………………………….

3.2 Estimation to Time Difference of Arrival (TDOA) with LMA……………….

4. GENERALIZED CROSS CORRELATION (GCC) USING THE PHASE

TRANSFORMATION METHOD (GCC-PHAT)…………………………………….

4.1 Generalized Cross- Correlation (GCC)………………………………………..

4.2 Derivation of the Generalized Cross Correlation………………………………

4.3 The Phase Transform (PHAT)………………………………………………….

4.4 Generalized Cross Correlation Phase Transform (GCC-PHAT)……………….

4.5 Generalized Cross Correlation Phase Transform (GCC-PHAT) in the System..

5. STEERED RESPONSE POWER (SRP) USING THE PHASE TRANSFORM

(SRP-PHAT)……………………………………………………………………………..

5.1 Overview of SRP-PHAT……………………………………………………….

5.2 Beam forming for Steered Response of Power….…………………………….

5.3 The Steered Response of Power………………………………………………..

28

29

29

25

25

26

22

22

23

18

19

21

31

31

31

34

5.4 The Phase Transform (PHAT)………………………………………………….

5.5 SRP-PHAT……………………………………………………………………..

5.6 SRP-PHAT as a Source Localization Method………………………………….

6. EXPERIMENTAL DESIGN AND MODELING…………………………………..

6.1 Requirements…………………………………………………………………..

6.2 Architecture…………………………………………………………………….

6.3 Test Environment………………………………………………………………

6.4 Test Signals Used………………………………………………………………

7. IMPLEMENTATION OF SRP-PHAT ALGORITHM……………………………

7.1 Implementation of Steered Response Power Phase Transform Algorithm……

7.2 Acoustic Sound Source Localization by using Linear Microphone Array……

7.3 Implementation of Two-Dimensional Speaker Position with SMA………….

7.4 Implementation of Three-Dimensional SMA…………………………………

7.5 Optimization…………………………………………………………………….

7.5.1 Stochastic Region Contraction in SRP-PHAT…………………………..

7.5.2 Coarse to Fine Region Contraction (CFRC) in SRP-PHAT…………….

7.6 Computational Cost……………………………………………………………..

44

44

46

40

42

44

39

39

40

36

37

37

47

48

48

50

52

6

V

7.6.1 Signal Processing Cost…………………………………………………..

7.6.2 Cost per Functional Evaluation, fe………………………………………

7.6.3 Cost of Full Grid Search………………………………………………...

7.6.4 Cost of SRC and CFRC…………………………………………………

7.7 Experiments and Result Analysis……………………………………………...

7.8 Experimental System…………………………………………………………..

7.9 Preliminary Processing for SRP-PHAT Data………………………………….

7.9.1 Interpolation……………………………………………………………...

7.9.2 Energy Discriminator…………………………………………………….

7.10 Result Analysis……………………………………………………………….

8. CONCLUSION AND THE FUTURE WORK……………………………………..

8.1 Conclusion……………………………………………………………………...

8.2 Future work……………………………………………………………………..

REFERENCES…………………………………………………………………………...

60

60

61

55

55

57

53

53

54

52

52

53

53

62

LIST OF TABLES

Table 1: Summery of the Acoustic Experimental Room Setup for Data Acquisition…..

41

Table 2: Summary of the Signals Used in the System to Drive the Acoustic Source…...

43

Table 3: The SRC algorithm for Finding the Global Maximum………………………...

Table 4: The CFRC algorithm for Finding the Global Maximum………………………

49

51

Table 5: A Simple Energy Discriminator……………………………………………….

Table 6: Performance Evaluation of SRP-PHAT using Full Grid Search Over all

Framces……………………………………………………………………………….....

56

58

LIST OF FIGURES

Figure 1.1: Sound Source Localization system using Time Difference of Arrival

(TDOA)………………………………………………………………………………….

Figure 2.1: Four Microphones in an Array for the Near Field Acoustic Condition of

Direction of Arrival……………………………………………………………………...

Figure 2.2: Determination of Azimuth Angle and Elevation Angle from the far field condition in the Direction of Arrival………………………………………………

Figure 3.1: Acoustic wave arrival to microphones and relative time delay between the consecutive microphone pairs…………………………………………………………...

Figure 4.1: Time Difference of Arrival (TDOA) estimation between two microphones approaches………………………………………………………………………………..

Figure5.1: The Steered Response Power algorithm using the delay sum beam forming method…………………………………………………………………………………….

Figure 6.1: Ideal Reverberant Room Conditions……………………………………….

Figure 6.2: Graphical Representation of Microphone Array design and the Sound

Source Position…………………………………………………………………………...

Figure 7.1: An Acoustic Environment with the Linear Microphone Array

Geometry…..........................................................................................................................

Figure 7.2: In sixteen speaker positions the SMA geometry in two dimensional ways….

Figure 7.3: Position of a speaker on different locations in three dimensional SMA systems…………………………………………………………………………………….

14

19

20

23

25

32

40

42

45

45

46

Figure 7.4: Two Dimensional Example of SRC………………………………………...

47

Figure 7.5: Two Dimensional Example of Coarse to Fine Region Contraction (CFRC).

51

Figure 7.6: Top View of the Microphone Array Where Indicating the Source Location and Panels…………………………………………………………………………………

Figure 7.7: SRP-PHAT Surface (a) Without Interpolation, (b) with Filter Interpolation,

(c) With Cubic Interpolation………………………………………………………………

Figure 7.8: The Simple Energy Discriminator for (a) Source 4, (b) Source 3, (c) Source

4, (d) Source 1……………………………………………………………………………..

54

55

57

Figure 7.9: Performance of SRP-PHAT using CFRC Relative to Grid Search…………

Figure 7.10: Performance of SRP-PHAT using (a) SRC-I, (b) SRC-II, (c) SRC-III

Relative to Grid Search……………………………………………………………………

10

58

59

CHAPTER 1

INTRODUCTION

1.1 Acoustic Source Localization

Now a day, new technologies make our daily life easy, convenient and comfortable and also give the assurance our quality of living. Modern technologies update very fast to meet the demand of fifth generation applications. In communication research field, Sound Source localization takes a huge place of research and day by day its increase its importance in new generation communication research application like automobile speech enhancement, active noise cancellation for audio and voice communication, teleconferencing, speech recognition and localization, true source detection, talker characterization and voice capture in reverberant and various environment [11,15, 25]. There are lot many higher and specialized applications part directly and partially involving with this modern day’s technology like artificial intelligence robot navigation, speech separation and sound tracking, security surveillance systems [7, 10, 16]. There are lots many approaches already have under research and development and in coming days we can look and use the applications of this technology.

Linear systems or distributed microphone array approaches have many applications; most of them are directly related our real life stuffs. Final goal of these applications is sound source detecting and locating in most cases. For a instance, in a meeting or conference environment, it is very important that to detect and locate all voices and beam forming each voice to create and measure the independent channels for each speaker [34, 19]. If it is not work perfectly sort out the active source in such kind of environment then system will be significantly fail to maintain the performance level of such systems [27]. We can use Sound

Source Localization methods in various operations and to measure the performance level of many applications but the basic aim of this method is to confirm the acceptable and desirable performance level in various operational circumstances.

In real world application of active Sound Source Localization needs to meet the different reliability constraints. In practical situation, we are not able to get good performance level of

Sound Source Localization all the time, because off the poor environment of room condition, different scale of noise, relative movement of sound sources, different design of microphone array and robust Sound Source Localization Algorithm [26]. But we can get the better performance of Sound Source Localization in terms of some conditions [2], which are mentioned bellow, i. Have to use the quality full microphones and enough amounts of microphones to create the perfect environment for detect the Sound Source Localization. ii. Have to use the perfect and useful microphones placement geometry. iii. Enough number of active sources in the environment.

11

iv. Ambient noise and reverberation levels.

The detection and detection decision process of Sound Source Localization methods mostly depend on this above factors. If we increase the number of microphones in the microphone array geometry, averagely we can achieve improved performance in adverse environmental conditions. Some important key factors for to get the optimal solution about the geometry of an array are perfect layout of the experimental room, existing acoustic conditions, number and type of sources [8, 12]. For this reasons, now a day’s some important factors like specific application conditions, hardware availability and other cost effective factors are take into consideration for to design many acoustic Sound Source Localization systems.

1.1.1 Acoustic Source Localization and Source Tracking

Sensor Configuration and efficient microphone placement geometry have a strong effect on performance to get the best accuracy of main objective of acoustic localization and sound source tracking systems [20, 29, 46]. Many factors we have to consider, when we design a perfect environment and systems for specific acoustic source localization like perfect layout of the experimental room design, source placement, source movement, speaking scenarios, acoustic conditions and environment, considerable noise level and prevailing environment [31, 36].

Moreover, we have to consider more many things that are related to the specific objectives of the sound source localization. For individual specification (Single Source

Specification/ Multi Source Specification), we have to consider different approaches to find out the actual position of acoustic source.

1.1.2 Acoustic Source Localization Methods

In acoustic source localization methods, there are two effective stage of algorithm we have consider in our real systems in terms of their performance in the real life acoustic system, they are two stage and one stage algorithm [27, 29, 45]. Conditions of the environment, the data being processed, the ways of algorithm implementation process and the active acoustic source itself are the main issue that can be effect the performance and the computational costs of these two categories.

There are two effective steps algorithmic processes have to be completed for the two stage algorithm. The system produces the pair wise time difference of arrivals (TDOA’s) of speech sounds between the pairs of acoustic microphones, in the first step of the process which is commonly known as the Time Delay Estimation (TDE) step [9, 21]. The available time delay estimation and the microphone position date are generated the hyperbolic curve, is the second step. Maximum likelihood estimation, the least square error method, a linear intersection method, spherical interpolations have been proposed methods already to solve this task. On the other hand, pair wise manner is not the perfect process for the one stage algorithm but in order to overcome the limitation associated with some of the making early decisions and reverberation, the one stage algorithm exploits the magnitude of the microphones, in contrast. For an instance, the acoustic beam forming is the correct approach for to support this one. To reinforce the signal with respect to noise or waves propagations in different shorts of directions, beam forming is the

12

process where the systems delaying the outputs of the microphones in the particular array approach and the adding all those together after delaying. In a particular system, we used to use the beam forming methods for to scan or steer over the predefined region for to find out the active acoustic source localization or all possible acoustic source positions [15, 20]. The actual source would be located at this point, where the system can get the maximum power of beam forming, this one the most possible acoustic source location. This method is well known as the

Steered Response Power (SRP) [1, 4]. On short data segment the Steered Response Power

(SRP) is also allows one to process efficiently by the integrating the data process from many.

One stage method is fairly able to recognize and localize the multiple simultaneous acoustic sources, another remarkable advantage [27, 36]. In that kind of case of operation, multiple times corresponding to the locations of multiple talkers will be peaked by the steered beam former.

From the overall discussion, we can categorized the acoustic source localization procedure in three general group in terms of their approaches, i. Approaches employing the Time Difference of Arrival (TDOA) information ii. High resolution spectral estimation concepts techniques adopting. iii. Maximizing the Steered Response Power (SRP) of a beam former.

The method of estimation and application environment is the main considerable thing and the base part of the broad classifications. The source location has been calculated from a set of delay estimation measured across various pattern of acoustic microphone array model procedure is the first category [19, 37]. In the second method, for the source position estimation, any localization methods can be referred associated with the application of the single correlation matrix. The last approach, any situation where the source position is estimated directly is referred from a filtered, weighted and summed version of the signal data received at the sensors.

1.2 Time Difference of Arrival (TDOA) Estimation

The Time Difference of Arrival (TDOA) approaches are the most widely used acoustic source localization approach [21]. Two step procedures can be adapted with this source localization strategy. In the Time Difference of Arrival (TDOA) approach the acoustic microphone pairs are placed in the system in a particular array model and used to determine the

Time Delay Estimation (TDE) of the signal from a point of source [9]. This kind of value and data with the perfect knowledge of the position of microphone establishment in a particular array is used to determine estimation for the source localization. In the system, a specific delay can be mapped to a number of different spatial points along a hyperbolic curve and at the same time, to arrive at source location estimation, the curves are then intersected in some optimal sense [47].

Mainly the bulk of passive talker localization methods are used in the Time Difference of Arrival based model in terms of their computational practicality and reasonable performance under amicable conditions are consider.

For to achieve the effective and perfect acoustic source localization, we have to acquiring good Time Division of Estimation (TDE) of the received speech signals is important.

Background noise and the channel multipath due to the room reverberation process is the two major considerable problems which are responsible for the signal degradation of the system and

13

which can be complicated the acoustic source estimation problem. The inability of the accommodate multi source scenarios is the another noticeable common limitation. By operating at short analysis intervals, some Time Difference of Arrival (TDOA) based methods are used to track several individuals like the sense of the presence of multiple simultaneous acoustic sources, extreme presence of ambient noise, moderate to high reverberation process in the particular acoustic field and typically get the poor Time Difference of Arrival (TDOA) results and the unreliable location estimation where the algorithms assume a single source model [22]. The validity and the accuracy of the delay and location estimation is basic requires for the Time

Difference of Arrival (TDOA) based locator for to locate any acoustic source.

Microphone Array

Point with some Delay

Difference

Microphone Pairs

Estimation of Source Position

Figure 1.1: Sound Source Localization system using Time Difference of Arrival (TDOA).

1.3 Methods of Steered Beam forming

The microphone array has the special capability of the converging or focusing on the specific location and any specific direction in terms of the volume of signal generation. Such

14

kind of capability is referred to as a beam forming [15]. This kind of beam forming method containing the sound source localization can be used to steer over the region. This kind of output is known as Steer Response. When the focus direction matches perfectly the true and actual source location in the systems, the Steered Response Power (SRP) will be peak in the system where the delay and sum method is the basic part of any simplest beam former [4].

When the Phase Transform (PHAT) filter is used by the Steered Response beam former, defines a one stage method which is commonly known as the Steered Response Power using the

Phase Transform, shortly SRP-PHAT [1, 2, 4]. In the high noise environment and the high reverberation process, Steered Response Power using the Phase Transform (SRP-PHAT) has been shown to be more robust than the two stage methods in the system. The focus of this thesis is improving the Steered Response Power using the Phase Transform (SRP-PHAT) method with its different approach and works more efficiently in real time though it has more expensive computational cost problem of one stage algorithm.

1.4 Hypothesis

For enhancing detection and localization of targets, the research work acoustic sound source localization method has focused on algorithm. The main objective of this thesis is to present different approaches of the Steered Response Power Phase Transform (SRP-PHAT), verify the algorithm in terms of the acoustic environment, microphone array algorithm, different acoustic source position and different band of reverberation and noisy environment and after that evaluation and justification the best approach for to get perfect acoustically active source localization [14, 18]. To study the performance level, separate experiments are designed with the respect to active sound source detection in high reverberant condition and noisy rooms and produced an effective methodology for its solution.

For an efficient evaluation of the acoustic sound source localization, this thesis is particularly focus on the performance and implementation of the Steered Response Power using the Phase Transform (SRP-PHAT) algorithm as a effective function of source type, reverberation levels and ambient noise rather focus on the change in specific environmental scenario in the system and the microphone geometry. When the aim of the experiments is to analysis the different source scenario in the system, the above technique are applicable in the system with little bit modification in terms of situation demand.

1.5 Organization of the Thesis

The organization of the thesis is focused on the design and development of Steered

Response Power using the Phase Transform (SRP-PHAT). For this reason, we have modeled and emphasizing our paper with different approaches of SRP-PHAT and their design and how we can develop the algorithm.

In the chapter 2, Acoustic Model we have described about the related issue of active sound source, multipath propagation model, measurement scenario and direction of arrival and letter part we get a summary of the acoustic model and acoustic environment. And in the chapter

15

3, Time Difference of Arrival focused on the basic Time difference of arrival and the estimation between the time differences of arrival with LMA.

In the chapter 4, Generalized Cross Correlation with Phase Transformation (GCC-PHAT) emphasizing the basic model of basic Generalized Cross Correlation (GCC) and the derivation of

GCC, the phase transformation and GCC-PHAT and later part of this chapter we describe about the GCC-PHAT in the system. And in the chapter 5, Steered Response Power with Phase

Transformation (SRP-PHAT) are really focused on the over view of Steered Response Power and the beam forming of the Steered Response Power, and the basic derivation of SRP and after that we discuss about the Phase Transformation (PHAT) and letter part we discuss about the

Steered Response Power with Phase Transformation (SRP-PHAT) and SRP-PHAT as a source localization method.

In the chapter 6, the experimental design and modeling, here we discussed about the requirements, architecture, test environment and test signal are necessary for to create the actual environment of the acoustic sound source localization with using the Steered Response Power with Phase Transformation (SRP-PHAT) algorithm method. And in chapter 7, implementation of the Steered Response Power with Phase Transformation (SRP-PHAT) algorithm, here we discussed about the basic implementation of SRP-PHAT, acoustic source localization by using linear microphone array approaches, implementation of two dimensional speaker position with

SMA and the letter part we discuss about the three dimensional SMA algorithm and in chapter 9, we discuss about the conclusion and the future wok of Steered Response Power with Phase

Transformation (SRP-PHAT).

In the Next Chapter, we will discuss about the different acoustic model of SRP-PHAT, where we will briefly discuss regarding the different propagation model for different acoustic sound wave, their measurement scenario for different field behavior and their direction of arrival.

16

CHAPTER 2

ACOUSTIC MODEL

2.1 Active Sound Sources

In real world, we can get different types and different levels of acoustic speech from different kinds of acoustic sources. An acoustic source, either a human talker or a mechanical effective sound source, is not an ideal spherical radiator. In real time acoustic environments, it possesses directionality and spatial attenuation [6]. It is very natural scenario that the phase of the microphones are facing towards the active sources or talker will receive stronger signals than the microphones which are not facing towards or off to the side or behind the source. For to get the simplified process of this system, we can easily assume that the acoustic sources can be effectively modeled as point sources [37, 38].

To make the whole model simple enough, there are some efficient assumptions needed to be made in there,

I.

Spherical Sound Waves are emitted by the Source- The complex radiation patterns

of the human head models, we never incorporate.

II.

Homogeneous Medium- The acoustic sound propagation is non- refractive which

gives the assurance of speed of sound where c is constant everywhere.

III.

The Lossless Medium- The medium is lossless; this kind of condition ensures that

the medium does not absorb energy from the propagation waves.

In different experimental approaches, the speed of sound c can be change in order to changes of temperature which is varying from one experiment to another experiment. In a single experiment, c does not change.

2.2 Multipath Propagation Model of Acoustic Sound Waves

First we consider an ideal acoustic environment, where the propagation of sound waves is interfered by objects like room walls, furniture and people. This kind of interference formed a reverberation environment and creates the multi path propagation of the waves [30, 32]. The performance levels of the microphones array severely affected by this reverberation process. So we have to consider this reverberation environment and the multi path propagation of the wave’s model into the acoustic model to best cope with the realistic conditions.

17

For both direct path and reflected paths from the active sound source at to microphone

at location , we consider the room impulse response which is denoted through the . For to describe the characteristics of microphone , is response we use in this systems. The response function are mostly depends on the source location , where the location and orientation of microphones are known and fixed as well in the system. Now we can be modeled the micro phone’s signal at microphone are as follows,

Where denotes the source signal and denotes the noise corresponding to the

channel as well. And in the same time symbolize the linear convolution. From the above expression we can easily assume that noise corresponding signal is uncorrelated to the source

and is the convolution of the impulse response from the source output to the microphone output. As we know that the impulse response function depends on the source location where as the microphone is located at a fixed point forever. Denote this response by and the equation becomes,

From this equation, we can easily assume that the signal which are received at microphone

, where the multipath propagation channel’s impulse response and uncorrelated noise factor are taken into account.

2.3 Measurement Scenarios

Microphone setting and phase of the microphone facing towards the pressure wave must be taken into the account, when we want to get the characteristics of an ideal transducer [2, 10,

32]. We can get different types of measurement in order the difference between the active source and microphone. We can divided the measurement scenarios in the following way,

i.

Free Field Behavior- In free field scenario, the microphone or the array of microphones is situated is an ideal anechoic chamber. In ideal anechoic chamber it has a good and convenient conditions of free field since it has no reverberation or multipath propagation conditions. This kind of conditions mean that the fixed point source like microphone, situated in the same place of the anechoic chamber, which is bound to fulfill the condition of decreasing its sound level 6dB each time the measurement distance doubles.

ii.

Near Field Behavior- In this kind of near filed scenario, the microphone is located much closed to the effective source; and the distance between the source

18

and microphones is not further than the 50 cm. In this case, the wave length of the signal has the same order of the magnitude as the size of the source which creates the almost lossless condition in the system. The spherical character of the acoustic field is increased by this measurement process, which is one of the primary goals of this measurement project, since some microphones are present in the different response when the spherical divergence increases.

iii.

Far Field Behavior- In this kind of scenario, the microphone and the source are situated far away from each other. And this difference is more than 50 cm where the distance size does not affect the results of the measurement.

2.4 Direction of Arrival

We know that the near field situation, the microphones are situated very close to the acoustic effective source and in the same time for the element of microphone array, there are

M directions of arrival which is commonly used in DOA (Direction of Arrival) [10, 21]. Each

and every DOA is the direct path from the microphones point to the acoustic source.

Mathematically, we can express the full process by a point on the unit vector expression.

Where, . Now we consider an ideal situation for 4 (Four) microphones array design for the near field acoustic condition.

Source

Mic 1

Mic 2 Mic 3

Mic 4

Figure 2.1: Four Microphones in an Array for the Near Field Acoustic Condition of Direction of

Arrival.

In far field condition, we know that the microphones are located far away from the acoustic sources and all microphones in the array design maintained the same Direction of

19

Arrival (DOA), which is commonly chosen as the path of system from the origin of the array design to the active acoustic source [21, 22]. We can express the full process mathematically, where the origin of the array is expressed as O in the coordinate system.

In this process, we can express the Direction of Arrival (DOA) through the standard

Azimuth Angle and the Elevation Angle .

z x

y

Microphone

Elevation Angle

Azimuth Angle

Figure 2.2: Determination of Azimuth Angle and Elevation Angle from the far field condition in the Direction of Arrival.

In this angle measurement, we can express it educationally in the following way,

In the far filed condition, the distance or the range between the microphone array and the effective acoustic source cannot be determined in the acoustic source localization problems. The

Direction of Arrival is the only spatial information about the Source.

20

2.5 Summary of the Acoustic Model

For an ideal case condition, we would like to localize a single acoustic source in a perfect room condition for an instance. For this particular work, our experimental acoustic model should be the following way, i. An acoustic sound source is effectively modeled as a point source. ii. For single acoustic source localization, the near field condition is applied where we can easily estimate the source position in three dimensional (3D) ways. iii. For a single course of experiment the speed of sound is constant. iv. For to create the best realistic acoustic conditions, reverberation effects, multipath effects and various real life noise should be taken into account. v. Source positions are tested with the real data collected from the human sources.

In the next chapter, we will discuss about the Time Difference of Arrival Approaches, where we will briefly discuss about the TDOA estimation, estimation with LMA, where we can get the idea of different arrival approaches of SRP.

21

CHAPTER 3

TIME DIFFERENCE OF ARRIVAL APPROACH

3.1 Time Difference of Arrival

For to find out the position of the effective acoustic source, we have to use the perfect acoustic source localization algorithm. Time Delay Estimation (TDE) or Time Difference of

Arrival (TDOA) Technique is the key part of the acoustic source localization algorithm [21].

From good knowledge about the perfect positioning geometry of microphone and also from the source signal we get the time difference of arrival at different microphones pairs used in the geometry, we get the estimation of the acoustic source localization technique. The spatial coherence of the acoustic signal reaching the sensors is one of the key parts of the reliability of a time delay estimation which is sometimes influenced by the distance between the two acoustic microphones, the background noise level, and the reverberation volume of the room.

From the delayed microphone pair signals we can entirely estimate the maximum

Generalized Cross- Correlation (GCC), is one of the base parts of maximum Time Difference of

Arrival (TDOA) scheme [22]. We can perfectly estimate the time delays by very popular

Generalized Cross Correlation (GCC) method [33]. Low computational complexity which is achieved by the Fast Fourier Transform (FFT) implementations is one of causes of the popularity of Generalized Cross Correlation. The mathematical expression of the cross correlation between

2 microphone channel, at microphone the signal is denoted by the transform over a finite interval .

, be its Fourier

From the above expression, the weighting function.

denote the cross power spectrum and denote

Here the (*) denotes the complex conjugate. If the weighting function is set to

“1” in the first equation of the cross correlation, the estimated time delay of the Generalized

Cross Correlation method will be,

22

For the multi path propagation, multi source in the system, high level of background noise in the system and reverberation condition can be destroyed the performance level of the

Generalized Cross Correlation (GCC) method [3]. In this kind of condition, the Generalized

Cross Correlation with Phase Transform (GCC-PHAT) methods gives the considerably better performance compare with the conventional acoustic sound source localization approaches for

Time Difference of Arrival (TDOA) based Sound Source Localization systems. For this condition, the weighting function is designed below for GCC-PHAT as below,

3.2 Estimation to Time Difference of Arrival (TDOA) with LMA

The steered response of power is maximized by the linear microphone array approach which is performs for to localized the acoustic source based on the search for a time [39].

Different time will be needed to arrive to individual microphone depends upon the acoustic source location. For computation of GCC the relative delay between pairs of microphones is useful.

Speaker

Position 1

Speaker

Position 2

M1 M2 M3 M4

Figure 3.1: Acoustic wave arrival to microphones and relative time delay between the consecutive microphone pairs.

From the above figure we can observe that, from the two different acoustic sources position in the particular scenario generates the relative time delays of incoming acoustic signal.

23

Above figure also depicts that, the sound wave produced directly from the acoustic source position 1 arrives to the microphones array and also generated the relative time delay equal to zero. On the other hand, the sound wave generated from the source position 2 arrives to the microphone array with the different time delays [5]. So from the above condition, if we want to get the maximum relative time delay between the microphones pair, we have set the acoustic sound source at the right most position in the same line of the linear array of microphones [39].

It is very important to take perfect acoustic source localization algorithm and also important to judge the choice of distance between the microphones in the array [42]. Because those important things make an imperative role of the source localization process and increase the computational load as well. From the time delay estimation we can assume the searching interval which one is depends upon the distance between the distances between the consecutive microphones. The maximum time delay estimation which is characterize the search interval is,

From the above mathematical expression, sampling frequency is denoted by the and c is the speed of sound, and d denotes the distance between the microphones of the particular system. And from the expression, relative time delay is denoted by the , which one is maximizes the output power of the steered beam former. To find out the maximum Generalized

Cross Correlation (GCC), we have to use a set of time delays which is forced by the distance of the two consecutive microphones in the pair [41].

The Time Difference of Arrival (TDOA) is the process of a time delay which one is maximizes the Generalized Cross Correlation (GCC). Having the Time Difference of Arrival

(TDOA) estimate, Direction of Arrival (DOA) of coming from the acoustic sound source is computed by the following way,

The direction of arrival to the microphone array of the incoming acoustic sound signal is denoted by the parameter .

The next chapter, we will discuss about the Generalized Cross Correlation (GCC), derivation of GCC-PHAT, GCC-PHAT in the system, where we can get the idea of derivation of

SRP-PHAT.

24

CHAPTER 4

GENERALIZED CROSS- CORRELATION (GCC)

USING THE PHASE TRANSFORMATION METHOD

(GCC-PHAT)

4.1 Generalized Cross- Correlation (GCC)

Averaging the multi microphones speech signals, we can assume the delay and sum approaches. Coherence is achieved by focused on an acoustic source, which is specified by the estimated Time Difference of Arrival (TDOA). When we have to find out the TDOA approaches between two microphones in a pair environment, Generalized Cross Correlation (GCC) has been a power full method to determine this approach [3, 33]. For to create a general environment for time delay of arrival approaches, we consider here 4 microphones linear array and a source.

Source

Microphones

M

1 r

1

M

2 r

2 r

3

M

3 r

4

M

4 d

1 d

2 d

3

Figure 4.1: Time Difference of Arrival (TDOA) estimation between two microphones approaches.

We denotes all microphones with M

1,

M

2

, M

3

and M

4

and all are placed in a linear microphone array where the distance between the two microphones consider as a d

1, d

2 and d

3

.

25

Now if we consider the time delay which one is traveling time from the acoustic sound source to microphone array is

(4.1)

Let two microphones and are in the system. The Time Difference of Arrival (TDOA) between these two microphones can be defined as the following way,

(4.2)

The above equation is conveying the relationship between the Time Difference of Arrival

(TDOA) and the effective distances from the acoustic source to the microphones, Several techniques like linear intersection, spherical interpolation etc is used for to estimate the active acoustic sound source localization for Multiple Time Difference of Arrivals (TDOA) [22].

Now we discuss the derivation of Generalized Cross Correlation (GCC) and how do we define the Time Difference of Arrival (TDOA) from the Generalized Cross Correlation (GCC) [].

4.2 Derivation of the Generalized Cross Correlation

For both direct path and reflected paths from the active sound source at to microphone

at location , we consider the room impulse response which is denoted through the . For to describe the characteristics of microphone , is response we use in this systems. The response function are mostly depends on the source location , where the location and orientation of microphones are known and fixed as well in the system. Now we can be modeled the micro phone’s signal at microphone are as follows,

Where denotes the source signal and denotes the noise corresponding to the

channel as well. And in the same time symbolize the linear convolution. From the above expression we can easily assume that noise corresponding signal is uncorrelated to the source

and is the convolution of the impulse response from the source output to the microphone output. As we know that the impulse response function depends on the source location where as the microphone is located at a fixed point forever. Denote this response by and the equation becomes,

26

Now consider the another signal at another microphones , we get

For to get accurate calculation, we have to include the time delay factor into the source signal . The delayed version of the source signal at microphone k is denoted by the . The time delay estimation from the acoustic sound source to the microphone k, is 0 because we normalized the condition. In this situation, the relative time difference of arrival

between the two acoustic microphones and is the main concern of us [3].

We will get a peak at the time lag when these two microphone signals are cross correlated with each other where these two shifted signals are aligned and corresponds to the Time

Difference of Arrival (TDOA), . The mathematical expression of the cross correlation of two signals and is derived as follows,

(4.6)

Now, we have to take the Fourier Transform of the cross correlation results in a cross power spectrum,

(4.7)

Now again, we have to apply the convolution properties of the Fourier Transform and we get the following expression,

(4.8)

From the above expression, we get the

and the (*) express the complex conjugate form.

, which one is the Fourier Transform of

Now, if we do the inverse Fourier Transform of the above equation (4.8), we get the cross correlation function in terms of the Fourier Transform of the microphone signals,

(4.9) of

The Generalized Cross Correlation (GCC) is the filtered versions of the cross correlation

and . Let and is the Fourier Transform version of these two filter

27

way,

and . So, we can express the Generalized Cross Correlation (GCC) in the following

(4.10)

Here, is the complex conjugate part of the . Now we define the combined weighting function are given as follows,

We will get Generalized Cross Correlation (GCC) after substituting the equation 4.10 and the equation 4.11,

The time leg is the Time Difference of Arrival (TDOA) between the two microphones and which one maximizes the Generalized Cross Correlation (GCC) in the real range.

(4.13)

In the real life application, has many local maxima which one sometimes make it harder to find out the global maximum. The performance level of Generalized Cross Correlation

(GCC) can be affected by the choice of the Weighting functions, [33].

4.3 The Phase Transform (PHAT)

It has been proven that in real life acoustic environment, the Phase Transform (PHAT) weighting function is robust. In reverberation free conditions, the Phase Transform (PHAT) is the sub-optimal to the maximum likelihood weighting function [2]. So the mathematically we can express the Phase Transform (PHAT) in the following way,

(4.14)

28

4.4 Generalized Cross Correlation Phase Transform (GCC-PHAT)

Now for to create mathematical expression of the Generalized Cross Correlation Phase

Transform (GCC-PHAT), we have to apply the weighting function Phase Transform (PHAT) from the equation 4.14 into the another expression of Generalized cross correlation in the equation no 4.12 and we get the GCC-PHAT expression between the two microphones k and l [2,

3, 10] The Expression is defined as below,

Here, from the above mathematical expression we know that is the Phase

Transform part of the expression and apart of this one, rest of the expression belong the

Generalized Cross Correlation expression.

In a linear microphone array system, mathematically we can assume that there are

pairs of microphone in the system. Assume that the , is the subset of these pair of microphone. Now for to estimates the Time Difference of Arrival (TDOA) of any subset D pairs of microphones, we have to use the Generalized Cross Correlation Phase Transform (GCC-

PHAT). In the experimental room condition surrounding with the active acoustic sound source, is the hypothesized point in 3D space in the room for to calculate the true Time Difference of

Arrival (TDOA) for D pairs of microphones. We can mathematically express the root mean square (RMS) error estimation from the estimated Time Difference of Arrival (TDOA),

as the following way, and the true Time Difference of Arrival (TDOA),

And is the estimation of the source location. We can express it mathematically in the following way,

4.5 Generalized Cross Correlation Phase Transform (GCC-PHAT) in the System

LEMS algorithm which one is the Generalized Cross Correlation Phase Transform

(GCC-PHAT) based acoustic source localization algorithm, can be used in any real time acoustic sound system. In huge microphone array systems like which has 512 (Five Hundred Twelve)

29

microphones, which are able to implement 8 (eight) simultaneous LEMS algorithm locators in real time acoustic environment. 16 (sixteen) pairs of acoustic microphones are selected per locator manually in LEMS algorithm [6, 11, 18]. We select three groups of microphones from the 24 (twenty four) microphones of each locator, taking two pairs from each of the group. From the orthogonal sections of the array, maximum acoustic microphones are selected from this zone, that from panels near a corner of the linear array. Microphone pairs are different by its characteristics and complementary sensitivity of their Time Difference of Arrival (TDOA) to the acoustic source direction which are selected on the orthogonal planes and the directional discrimination are improved by the exploiting this effect [44, 48].

The reverberation process and noise are the key factors which can make a good role in the performance level of LEMS Algorithm where we know that we can get good performance level on LEMS algorithm when the reverberation and the noise are relatively low. In real time scenario we can get the long latency from the LEMS algorithm, when its implemented uses over

200 ms of data. It will degrade its performance level in the comparatively high noise and high reverberation conditions.

Steered Response of Power Phase Transform (SRP-PHAT) is the one-stage sound source localization process which one is fulfill our need for perfect acoustic sound source location estimation in the presence of high noise and high reverberation process [1]. In next few chapters, we will discuss about the SRP-PHAT and its implementation in different acoustic environment.

30

CHAPTER 5

STEERED RESPONSE POWER (SRP) USING THE

PHASE TRANSFORM (SRP-PHAT)

5.1 Overview of SRP-PHAT

We naturally use the speech array applications like linear array microphone set up in the system for the voice capture when we applied the sound source localization based beam forming.

Again when we applied this one for to get acoustic source localization, the output of the beam former is maximized when the acoustic array is focused on the target location. In order to conquer the limitation in estimation accuracy of Time Difference of Arrival (TDOA) based approaches, the Steered Response Power (SRP) algorithm uses the multitude of microphones, in the presence of noise and reverberation [23]. The spatial filtering capability of a microphone array is used by the Steered Response Power which one is used for the further increment of its applicability for the sound source localization problem. The selective enhancement of the signal from the source of interest is enabled by the Steered Response Power (SRP) [4]. This property of

Steered Response Power (SRP) algorithm makes its more robust application for sound source localization approaches.

Compare to the Time Difference of Arrival (TDOA) with the Steered Response Power, the improved feature of SRP make its better in terms of high robustness to reverberation condition for the acoustic sound source localization problem [2]. In this chapter, we will discuss the Steered Response Power (SRP) with the Phase Transform (PHAT), which applies a magnitude normalizing weighting function to the cross spectrum of two microphone signals.

5.2 Beam forming for Steered Response of Power

For both direct path and reflected paths from the active sound source at to microphone

at location , we consider the room impulse response which is denoted through the . For to describe the characteristics of microphone , is response we use in this systems. The response function are mostly depends on the source location , where the location and orientation of microphones are known and fixed as well in the system [17,

49]. Now we can be modeled the micro phone’s signal at microphone are as follows,

Where denotes the source signal and denotes the noise corresponding to the

channel as well. And in the same time symbolize the linear convolution. From the above

31

expression we can easily assume that noise corresponding signal is uncorrelated to the source

and is the convolution of the impulse response from the source output to the microphone output. As we know that the impulse response function depends on the source location where as the microphone is located at a fixed point forever. Denote this response by and the equation becomes,

From this equation, we can easily assume that the signal which is received at microphone

, where the multipath propagation channels’s impulse response and uncorrelated noise factor are taken into account.

Delaying the microphone signals with the appropriate steering delays where,

m = 1, 2,…,M can be creating the unitarily weighted delay and sum beam former in a M

microphone acoustic array system to make them aligned in time, and then summing all these time aligned signals together.

.

.

.

Delay

Delay

.

.

.

Output

Source

Figure5.1: The Steered Response Power algorithm using the delay sum beam forming method

In the above figure, we can see that an array of M microphones, defines that a delayed and filtered version of the source signal exist in the microphone channel and we get this delayed versions of the by the time aligning. While the uncorrelated signals present in

, the resulting signals can be summed together so that all copies add constructively.

By the setting of some steering delays equal to the negative values of the propagation delays plus some constant delays, , the copies of the at each of the individual microphones can be time aligned,

32

Where, we can get the value of m is 1,2, ……,M, the phase center of the acoustic array design is denoted by the , and among all the microphone array design it is set a largest propagation delay, making all the steering delays greater than or equal to zero [43]. All shifting operations are causal, this idea are implies by the above feature and the requirement for practical implementation in a system satisfied. This idea also makes the steering delay values relative to one acoustic microphone. is the time delay factor from the source to microphone m. So mathematically we can express the output equation for delay and sum beam former in the following way,

Where, are the defined as the steering delay M, the source’s spatial position or the direction are focused or steer by this steering delay. The signal received at the m

th

microphone is denoted by the .

Now, in terms of the source signal, the channel’s impulse response and the noise, we can mathematically express the output of a delay and sum beam former [49]. So now from the equation no 5.4 the delay and sum beam former output terms of the microphone signal model and the steering delays

can be expressed in

. The expression are given below,

A filter and sum beam former is achieved when an adaptive filter is applied to the delay and sum beam former. Considering h is the impulse response of the individual microphone channels to approximate a band pass filter. With the amplitude of M, the output of the beam former will be a band limited version of s(t), which one is the larger signal that the signal from any single acoustic microphone. Separating the noise issue from the equation 5.5, we will get the following expression,

From the above equation (5.6), we get the output of an M element which one is the delay and sum beam former in time domain. In the frequency domain, the filter and sum beam former output is,

33

Where, the

Fourier Transform of the filter response is the

is the Fourier Transform of the microphone signal

.

, and the

5.3 The Steered Response of Power

A function of M steering delays is the general form of steered response. To find out the beam former at a particular position or direction in the space, the steering delays are used.

By sweeping the focus of the beam former, the steered response is obtained. The time aligned signals in the acoustic microphone channels add up and the power of the steered response reaches the maximum due to the constructive interference when the main focus of the beam forming corresponds to the source location [4, 6, 15].

In general, the output power of the filter and sum beam former, we get the Steered

Response Power (SRP) when steering the beam former over all points in a predefined region.

In the frequency domain, we can express its mathematically in the following way, the

From the above equation, is the complex conjugate part of

, which one is the output of the filter and sum beam former. Now substituting the equation 5.8 into the equation 5.7 and we get the following expression,

Now we rearranging the whole expression and we get the following one,

From the equation 5.3, we can express it like as follows,

34

Now inserting the equation no 5.11 into the equation no 5.10 and we will get the following expression from this two equations,

Here the information should be notified that the microphone signals and the filter have finite energy because of its integral convergence. Here we can interchanged the summation with the integral and the form of expression are given as follows,

The combined weighting function are defines as below,

And again from the equation, we get

Now final expression of the equation no 5.14 and 5.15, we will put in the equation no 5.13 and we will get the following expression,

Now we can recall the Generalized Cross Correlation (GCC),

35

The Steered Response Power (SRP) and the Generalized Cross Correlation (GCC) have the almost identical expressions, except that the Steered Response Power (SRP) is summed over the all pairs of microphones and also have a constant offset of . For this reason, by summing the Generalized Cross Correlation (GCC) of all pairs of microphones in the array, we can calculate the Steered Response Power (SRP) of a microphone array [2].

5.4 The Phase Transform (PHAT)

The filter and sum beam forming operation is one of the key factors of Steered Response power (SRP) which are directly involved with the noise power reduction proportional to the number of uncorrelated acoustic microphone channels [1]. For beam forming, the correlated noise present great challenge than the uncorrelated noise. The approaches taken by the correlated noise always belongs the independent sources and various types of spectral weighting are included by the reverberation process and directly involving with the Generalized Cross

Correlation (GCC) [33].

We can get deemphasize low Signal to Noise Ratio (SNR) spectral regions by the developing the maximum likelihood weights, if the noise spectrum is known. A Phase Transform

(PHAT) can be introduced which one effectively whitens the signal spectrum when the noise spectrum is not known [35]. This approach is very much accepted and as well these correlations are done for the Steered Response Power (SRP) likelihood functions and estimate the time delay factor in the system. Different reverberant environments are the key factors which one can play a role in the performance level of Phase Transform (PHAT) estimation [13]. The optimal weighting strategy for minimizing the variance of the time delay estimate is the factor which one is performed by the Phase Transform (PHAT).

In generally, we can express the Phase Transform (PHAT) function mathematically in the following way,

From the above mathematical expression, we get that is the weighting function aimed at emphasizing the actual acoustic sound source over the undesired signal and is the signal spectrum. This filter whitens the acoustic microphone signal spectrum in the Phase

Transform (PHAT) factor [4,15]. The acoustic microphone signal spectrum is effectively flattens by this whitening technique. The Steered Response Power (SRP) can be perfectly introduced in the effective way into the microphone array applications in the acoustic system by whitening the acoustic microphone signals.

The effective comparison between the Phase Transform (PHAT) and other similar weighting function in terms of the effect on Steered Response Power (SRP), we get that the

PHAT effect on SRP perform better to compare with the other under the realistic reverberant

36

operating conditions [26]. Within the high noise and high reverberation condition, Steered

Response Power Phase Transform (SRP-PHAT) will peak at the actual acoustic sound source location, which one is the hypothesis of the SRP-PHAT.

5.5 SRP-PHAT

From the above discussion, we know that we can get the Steered Response Power with using the Phase Transform (SRP-PHAT) when the weighting function Phase Transform, PHAT is applied in the Steered Response of Power. By the equation, we can express it in the same way if we take the Phase Transform (PHAT) from the equation 5.18 and applied it in the equation no

5.16 and we get the mathematical expression of SRP-PHAT which one is express in below,

It is the perfect expression for the Steered Response Power Phase Transform (SRP-

PHAT).

5.6 SRP-PHAT as a Source Localization Method

The above SRP-PHAT function formed by the summing of the elements and this SRP-

PHAT form a symmetric matrix with fixed energy terms on the diagonal since the Generalized

Cross Correlation (GCC) between the microphone k and the microphone l is the same factor as the Generalized Cross Correlation (GCC) between the microphone l and k [3, 4]. Therefore, the upper part of the matrix or the lower part of the matrix that can be changed the part of the

Steered Response Power Phase Transform (SRP-PHAT) with the

On the other hand, the part of the Steered Response Power Phase Transform (SRP-PHAT) that can be change with the can be computed by the Generalized Cross Correlation (GCC) with the elements summing factor of not all pairs of the M acoustic microphone array design, but only a subset D of the pairs, where, ,

So, that the acoustic sound source location is in that place, where the points give the maximum weighted output power the Steered Response Power Phase Transform (SRP-PHAT) of the beam former. The location estimates for a single sound source measurement is,

37

Where Steered Response Power Phase Transform (SRP-PHAT) is denoted by the at point which one is the described in the equation no 5.20. We know that functional evaluation is the calculation of any particular point of .

38

CHAPTER 6

EXPERIMENTAL DESIGN AND MODELING FOR

SRP-PHAT

6.1 Requirements

Need hardware and software both support for capturing acoustic sound data and computerized simulation result for implementation of acoustic sound source localization.

Different types of microphone array geometry like linear array and square array geometries are commonly used in beam forming in terms of hardware part. Each array contains four, eight or sixteen microphones according to the array size or acoustic system model [18, 40].

In the real acoustic system, for amplification the sound energies for to reach up to the microphone layer, we use a pre-amplifier. But in real time data processing, the sound signal naturally in weak when it is captured by the microphone and requires amplification factor before processing them. For to get good recoding signals from the acoustic microphone array design, we need a good sound recording system which one can be connected with both computer and a speaker as well. In the laboratory, we need an acoustic speaker which sometimes acts as a source in a acoustic environment [47].

For to implement the proposed algorithm, we can use our personal laptop or desktop computer both in laboratory and the simulation based environment. We can use the MATLAB latest version in terms of software and organizing the simulation. For interfacing the device with a computer and sound recording we need a device driver and recoding software respectively.

6.2 Architecture

The Steered Response Power Phase Transform (SRP-PHAT) simulation are occupies to generate the white noise and the captured signals are modeled to come from the different angles and from the different sources as well [50]. The acoustic microphones are also simulated to capture an effective acoustic sound signal by using different algorithm and estimation process like Steered Response Power Phase Transform algorithm and Difference of Arrival estimation.

Data acquisition is another important part of this kind of experiment where we can use a number of devices for data acquisition in a real time reverberant biometric acoustic environment.

Acoustic Speaker, Effective Microphone array model, Pre-amplifier, Sound recorder and a computer laptop or desktop is the main required device for the efficient data acquisition.

In the real acoustic experimental environment, the speaker placed in different position in the acoustic system and act as an active sound sources and this generated sound is captured by

39

the array of microphones. The sound recorder is recording the sound in the acoustic environment.

For this reason we use the sound recording software in our personal computer for recording the sound and the speaker is directly connected with this computer and after recording the sound we can play through the computer and we get the sound through the speaker and that time speaker act as active sound source in this kind of acoustic experimental environment [44, 45]. An array comprising of four, eight or sixteen microphones according to the array size or acoustic system model are connected with the pre-amplifier, which is involves the amplifying the weak signals in the system which are ready for recording. After that this amplified good quality recorded sound is ready for the computer based simulation system for acoustic sound source localization which is done by some high level software like MATLAB.

6.3 Test Environment

Let we think an acoustic experimental room set up for data collection with the audio lab facilities where acoustic treatments can be mounted on the wall of the particular acoustic system to comprehend various kind of noise and reflective properties [24].

The dimension of the acoustic experimental room is 3.66 m for both its length and width side and the height of this room is 2.22m. Sound from the predetermined effective acoustic source location, using the measured delay of arrival between the 2 (two) microphones the average speed of sound can be estimated. In our particular acoustic sound environment let we think that the speed of sound is around 346.2 meter/second.

Figure 6.1: Ideal Reverberant Room Conditions.

40

For the data collection, we have to consider a acoustic microphone array design where the acoustic microphone are placed in various array model like linear array design or the square array design, consisting of four, eight or sixteen Omni-directional microphones according to the array size or acoustic system model [35, 38]. Let, in this microphone array design, the spacing between the 2 (two) microphones are 1.284m. Let each microphones are placed in the 1 meter 57 centimeter above from the ground level, so the effective height of the microphones in this system is 1.57 meter and microphones are placed in the 28 centimeter perpendicular from the cage surface [27]. We can also use the new technique like laser measuring device for to verify the actual microphone position in the system. The summarization regarding the room condition are given below,

Table 1: Summery of the Acoustic Experimental Room Setup for Data Acquisition.

Acoustic

Room

Condition

Acoustic Experimental

Environment

Length and Width

Height

Effective Measurement

3.66 meter

2.22 meter

Sound

Velocity of Sound

Microphone Spacing

346.2 meter/second

1.284 meter

Microphone

Array Model

Source Height 1.57 meter

The acoustic sound sources which are the placed predetermined location are moved inside the fixed part of the region during the data capture time. For to get the various level of data from the experiment, we can change the acoustic environment with vary of the room reverberation levels [13, 23]. We can change the environment through the use of different kinds of material in the room’s wall to control the reverberation level. For to create the low reverberation level we can use the acoustic soundproof foam and Plexi glass is use for to create the high reverberation acoustic environment [24].

When we use the acoustic soundproof glass in the acoustic experimental room’s wall, the soundproof foam increased its absorption power that it can absorb the multipath signals in the system. This kind of soundproof foam decreased the reverberation level which one is depending on the thickness of the foam. Low frequencies signals are pass through the soundproof foam while the others signal are attenuated [26].

Again when we use the Plexi glass in the acoustic experimental room’s wall, the room walls act as very good reflector in the environment and create some worse case of multipath scenario. For to increase the reverberation condition effectively, we have to use Plexi glass in the defined acoustic system.

So from the above discussion, we know that we can switched our acoustic system from low reverberation environment to high reverberation environment with the effective change of

41

the room construction materials and we can get different types of data acquisition for the different types of acoustic environment. Efficiently measured and control the acoustic environment is one of important parts of sound source localization method [35].

Sound

Source

S

Microphone

Array

M1 M2 M3 M4

S- Speech Source

- Relative Time Delay

Figure 6.2: Graphical Representation of Microphone Array design and the Sound Source

Position.

6.4 Test Signals Used

In our expected acoustic test environment, 2 (two) input signals used in the system to derive the sound source speaker. Impulse response to a Butterworth filter of order 4 with 3dB lower cutoff frequency at 400 Hz and upper cutoff frequency at 600Hz for the narrow band signal is one of the input signals in the system and another input signal is 5600Hz which one is for broadband signal [38]. For its maximally flat spectrum in the pass, we choose the

Butterworth impulse response and for the uniform distribution of the spectral power while the impulse response is a causal signal with perfect phase transform, we choose the stop band signal.

From a white nose source using 3dB lower cutoff frequency at 400 Hz and upper cutoff frequency at 600Hz for the narrow band signal and 5600Hz for broadband signal, a colored noise signal is generated. As a test signal, colored noise is selected because off its power spectrum covered all frequencies in the range interest [37].

In terms of the signal that is totally spread out in time and that which exists only for a small time interval in the system, the choice of impulse and colored noise signal sources helps in analyzing the performance level. Again, in terms of the signal that have the different spectral

42

characteristics, the broadband and narrowband variations help analyze the performance. All signal used in the system are generated at the sampling rate of 32 (thirty two) KHz. For to reduce the actual size of the audio data file saved in the computer hard drive, the signals are down sampled at 16 KHz in the system. The down sampling rate 16 KHz does not do any affect of the performance level because offs the bandwidth range, which one is the 300 Hz to 6 KHz.

Table 2: Summary of the Signals Used in the System to Drive the Acoustic Source

Bandwidth

Types of the Signal

Narrowband Signal Broadband Signal

Impulse Signal 400Hz- 600Hz 400Hz- 5600Hz

Colored Noise 400Hz- 600Hz 400Hz- 5600Hz

43

CHAPTER 7

IMPLEMENTATION OF SRP-PHAT ALGORITHM

7.1 Implementation of Steered Response Power Phase Transform Algorithm

In this particular section, we will discuss and demonstrate about the acoustic sound source localization system and implementation the other supporting algorithms. We also discuss the implementation of Steered Response Power Phase Transform (SRP-PHAT) through the different types of microphone geometrical array model [1, 4, 17].

Now we will discuss the Steered Response Power Phase Transform (SRP-PHAT) implementation for to localize the active acoustic sound source position by using different types of microphone array model like Linear Microphone Array model, implementation in two- dimensional speaker position with SMA, implementation in three dimensional SMA [7, 8].

7.2 Acoustic Sound Source Localization by using Linear Microphone Array

An acoustic environment speaker generated sound that travel omni directional. These sound signal propagating in space region and travel the distance and reached in the linear microphone array approaches [18]. The speaker volume is adjusted accordingly to the signal to noise ratio (SNR).

If any weak microphone is captured by any microphone among the acoustic microphone linear array model, the microphone transmitted the signal to a pre-amplifier state for to amplify the signal again and later it transmits to the acoustic sound recorder. The acoustic sound recorder is connected to the personal computer like laptop or desktop which has an Steered Response

Power (SRP) algorithm [41]. In an ideal acoustic experimental environment, the Standard distance from the acoustic source to the linear microphone array model should be in 160 cm whereas the distance between the two consecutive microphones in the array should be in 4 cm.

We know that, the sound which one we use for the speaker in the system in here is called white noise. For the Linear microphone array approach, white noise should be created with a length of 2, 40,000 samples and the sampling rate should be in 8 (Eight) KHz. The White noise is used in the system as a input portion of the main function where it trigged the speaker and commanding the recording system for to start the system [44]. Both the speaker and the recording materials used to start the system at the same time.

The Steered Response Power Phase Transform (SRP-PHAT) basically involves the active acoustic sound source localization in a particular acoustic environment where mainly SRP-

PHAT is related with some important major functions like Welch function, relative delay function, SRP function and they are mostly simulated by the MATLAB scripts. The active

44

relative delay function is involves to produce the relative time delay estimation between the different pairs of microphones in linear microphone array approach. The SRP function is used to compute the steered power for each and every microphone with their respective relative time delays. In this system, we use in the system another important function named Welch function where we use a block size of 1024 samples and an overlapping block of 512 samples.

M1 M2 M3 M4

Source position

Linear

Microphone

Array

Figure 7.1: An Acoustic Environment with the Linear Microphone Array Geometry.

The Hanning window is used by the Welch algorithm where this Hanning window is used the FFT length of 1024 (One Thousand twenty four) samples. For to maximize the steered response power (SRP), the time delay is divided into 1024 elements for linear microphone array approach [47, 48].

In this acoustic experimental environment, there are 6 (six) pairs of microphone are usually involved in acoustic system with generalized linear microphone array approach where the microphones are also used for computation of Generalized Cross Correlation (GCC) as well

[33]. From the above figure 7.1, we can easily assume that the microphone pairs which are directly involve in the system are M1and M2, M1 and M3, M1 and M4, M2 and M3, M2 and

M4, M3 and M4.

Zero degree angles are considered in the system in the direction of positive x-axis and angles are increasing in the counter clock wise direction. When the acoustic source signals are arrived in the microphones which are placed in the linear microphone array approaches in the system, there are positive angles can be generated which we can defined with .

45

7.3 Implementation of Two-Dimensional Speaker Position with SMA

There are two and three dimensional ways that we can implement the SMA geometry in the acoustic system. The white noise that we used in the linear microphone array approach, the same function we have to use in this particular system [45]. Again the Welch function that we used in the linear microphone array approach, the same Welch function we also used in there.

There are two and three dimensional ways, Steered Response Power (SRP) can be implemented in the system as well.

Position 6

Position 5

Position 4

Position 7

Position 3

Position 8

Position 9

180 Degree

MIC 2

MIC 3

0 Degree

Position 2

Position 1

Position 10

Position 11

MIC 1

MIC 4

Position 16

Position 12

Position 13

Position 14

Position 15

Figure 7.2: In sixteen speaker positions the SMA geometry in two dimensional ways.

In the above figure, 7.2 we can observe that the microphone array approach and the acoustic sound sources are placed in the same locations on the conference table and we observed that the both are placed in the XY plane.

From 0 to 359 degree angles are uses for the steering angle, used by the two dimensional

SMA geometry. We can easily estimated the azimuth angle, because off when we implement the

SMA in two dimensional acoustic speaker position environments, it is always involves for scanning the azimuth angle. From the above illustration, we observed that there are sixteen

46

different positions where we can get the acoustic speaker and from the each location there are ten measurements can be recorded [38].

7.4 Implementation of Three-Dimensional SMA

After successfully established the two dimensional acoustic speaker position, the three dimensional SMA is nothing but an extension of the two dimensional SMA which has used for active acoustic source localization [28]. After perfectly scanning both the azimuth angle and the elevation angle, we can determine the acoustic source localization for the three dimensional

SMA. The degrees of angles are pre set for the azimuth and elevation angle, 0 to 359 degree and

0 to 90 degree accordingly.

Optimal non parametric Welch’s algorithm is using in there for to estimate the power spectrum. In this research, the cross correlation between the acoustic microphone pairs are estimated by the spectrum estimation, which is very important, turn are used to compute the

Steered Response Power (SRP) [4, 8]. It is very natural scenario that in real life experiment that always we have to majorly consider the noise problem which is associated with the signal source.

To bind the noise problem in a tolerance range, we have to use the Welch’s algorithm in the system, where the noise and acoustic signal spectra both estimated [12]. For the suppression of the noise power perturbing the signal uses the Wiener filter which can increase the performance level of acoustic source localization.

Position 3

Position 4

Position 5

Imaginary Point

Position 2

Position 1

M3

M4 M1

M2

Y-axis

Figure 7.3: Position of a speaker on different locations in three dimensional SMA systems.

47

The above figure, we can observed that the various acoustic source establishment in three dimensional way always involve to search the azimuth angle and the elevation angle with the computation complexity of algorithm [21]. In many different cases, we use the statistical standard deviation for to average the measurement of acoustic source localization.

7.5 Optimization

7.5.1 Stochastic Region Contraction in SRP- PHAT

Stochastic Region Contraction (SRC) is commonly known as the technique to optimize the acoustic microphone array’s pattern. By minimizing the Power Spectral Density Function

(PSD) we can optimize the problem for placements and gains of microphones in an array. The nonlinear optimization technique, SRC has been shown it’s robustness in finding the global optimum [2]. Given an initial rectangular search volume where contains the desired global optimum and so many local maxima or minima in an iterative process, contract the original volume until a sufficiently small sub volume is reached in which the global optimum is trapped, is the basic idea of SRC [2, 3, 5]. A basic stochastic exploration of the SRP-PHAT usually controls the contraction operation on iteration where the surface, is functional in current sub volume. As the first step, we have to determine the number of random points, , where we have to evaluated this point to make sure that the one or more is likely to be in the volume,

, of higher values surrounding the global maximum of .

Figure 7.4: Two Dimensional example of Stochastic Region Contraction (SRC). is the iteration index. The rectangular regions show the contracting search regions.

48

When the initial search volume is , than the probability of a random point hitting is,

On the other side, we can also calculate the probability of a random point missing is,

Throwing a random point is independent from one to another point. So the probability of throwing random points missing will be,

, for the

After taking the logarithm parts in both sides, we get,

How many random points needed to throw to ensure that the missing probability,

, we can determine from the relationship between the probability of throwing random points missing and the basic ratio between the . In our experiment,

( ) and which we get from the preliminary experimental results for a low signal to noise ration situation and this one makes .

Now we have to define the , as the number of random points evaluated for iterations , is the number of points used to define the new source volume, having a rectangular boundary vector, , and the number of iteration , the total number of evaluated iteration , where Φ is the maximum number of allowed to be computed [2, 5]. Now as the above description about the and , we can fixed the SRC algorithm like as below for to find the global maximum,

Table 3: The SRC algorithm for finding the global maximum

Category

Initialize

Evaluate

, , 100,

for points

49

Sort The best points

Contract

Test

The search region to the smaller region points. that contains these

If , or

10); determine

Else if

and , where

, , Stop, Keep Result.

, Stop, Discard result.

is a parameter (about

Else : Among the points, keep a subset points that have values greater that the mean, of the points.

Evaluate

Form new random points in .

The set of the as the union of and the best

just evaluated. This gives high points for iteration

points from the

.

Go to Step 4. Iterate

Now depending on , we can find out three variants of SRC algorithm,

1. SRC-I: Let is that the number of random points needed to find out greater than where use the finite value of Φ. Guarantees increasing of .

2. SRC-II: Let is that the number of random points needed to find out

points

points higher than , which one is the minimum of the full set where almost all iterations increases. Here also uses the finite value of Φ.

3. SRC-III: Now we have to fix the is equal to the . For each and every iteration, keep the highest points of parameter Φ

.

. There are no guarantees of increasing . And set the

7.5.2 Coarse to Fine Region Contraction (CFRC) in SRP- PHAT

Another idea of region of contraction is Coarse to fine region contraction (CFRC), which is very much similar to the stochastic region contraction (SRC). In the current sub volume, the iteration is the based on the sub grid search of the is functional is CFRC. For a low

Signal to Noise ratio case, is determined to be preliminary experimental data and our is

, from our

Now in axis we have to evaluate grid points and in the same way, in axis we have to evaluate

grid points and in axis we have to evaluate , implying

3000 equally spaced grid points in 3D to have at least a grid point in .

=

50

Figure 7.5: Two Dimensional example of Coarse to Fine Region Contraction (CFRC). is the iteration index. The rectangular regions show the contracting search regions.

We can defined by the different methods used to update

and for CFRC, the same way as in SRC. Naturally,

for each iteration, CFRC can be implemented in many different ways. The general algorithm is,

Table 4: The CFRC algorithm for finding the global maximum

Category

, , , Initialize

Evaluate

for points.

Sort

Contract

Test

Evaluate

Form

Iterate

The best points.

The search region to the smaller region points. that contains these

If

Else if

, or ; determine , Stop, Keep Result.

, Stop, Discard Result.

Else : Among the points, keep a subset points that have values greater that the mean, of the points. new gird points in .

To obtain the set of the as the union of , the just evaluated.

Go to Step 4.

51

For to select, we have to chose the simplest algorithm from our data set. For to give the perfect performance level with the lowest cost, was selected, points and again

= 750

turned out to preserve perfect performance at a low cost [3, 5, 10].

Now, SRC and CFRC, two efficient algorithms, to see how they improve the SRP-PHAT, we would like to see their significantly reduced computational cost as well as their correct performance relativity to a full grid search. Now, using full grid search as well as the SRC and

CFRC, we have to calculate the computational cost of the SRP-PHAT.

7.6 Computational Cost

7.6.1 Signal Processing Cost

To do the Phase Transformation, Steered Response Power Phase Transformation (SRP-

PHAT) requires the frequency domain processing [3]. If the number of microphones used in particular location, Phase Transformation is required by the computation of the

. Counting additions and multiplications as separate arithmetic operations, for a DFT size of L, an FFT takes complex multiplications and complex addition.

1. DFT- When we do an Fast Fourier Transform (FFT) for number of microphones, the total DFT cost will be .

2. Spectral Processing- We can calculate the cost for spectral processing in the following way, for each pair of microphones, i. Cross Power Spectrum: A Complex Multiplication for L point cross power spectrum, or (4+2) L = 6L operations. ii. Phase Transform: L point over the magnitude of the cross power spectrum, which costs L operations.

According to this process, microphones. is the total spectral processing cost for

3. IDFT- 5QL L is required by the pairs of microphones.

pairs of

7.6.2 Cost per Functional Evaluation,

The following steps are required for each point of

1. Obtain the (where ) Euclidean distances,

,

from to each microphone.

Cost: 3 multiplications, 5 additions and 1 square root (

480ops/fe.

). Cost : 20ops/mic or

52

2. Determine (where = 276) TDOA’s. where the inverse off the speed of sound is denoted by the . Cost: 3ops/pair or 728ops/fe.

3. Now sum up the PHAT values and requires a multiplication, addition and truncation to determine TDOA value. Cost: 5ops/ pair or 1380ops/fe.

In this way, we need (480+ 728+ 1380) ops/fe= 2588 ops/fe.

7.6.3 Cost of Full Grid Search

In this case, our search volume is . For = 1 , we required fe’s. Therefore, the cost of full grid search is, Grid Search Cost= 24

(2588) =

62,112 mo/frame. The computational cost of full grid search is greater compare with the computational costs of signal processing and interpolation [1, 3].

7.6.4 Cost of SRC and CFRC

The same signal processing and interpolation is required for SRP-PHAT when it is using

SRC or CFRC and on the other side, to find the global maximum, we need the significantly reduces the number of fe’s [2, 3].

A few small additional computations are required for both SRC and CFRC. These are determine each random point (in the particular case of SRC) which is cost about 21 ops/fe and grid point (CFRC) which is cost about 12 ops/fe [2, 3].

7.7 Experiments and Result Analysis

For our experiment, we consider the Huge Microphone Array approach which can support up to 512 microphones in real time [3]. Only the Huge Microphone Array system and experimental room with a below

that we used in our experiments have been described

7.8 Experimental System

Approximately facing the locator microphones, A human talker repeated the first four seconds from four locations with the distances and SNR as shown in the following figure,

53

PANEL A

6.5 M

PANEL P

Source 4

Avg. Dist = 4.25 m

PANEL O

PANEL N

Avg. Dist = 3.47 m

Avg. Dist = 2.76 m

Avg. Dist = 2.14 m

SNR 7.9dB

PANEL F PANEL G PANEL H PANEL I

Figure 7.6: Top view of the microphone array where indicating the source locations and panels.

7.9 Preliminary Processing for SRP-PHAT Data

7.9.1 Interpolation

The problem of SRP-PHAT is it is very discontinuous near the peak point, so some interpolation is really needed in order to make the surface smoother in this particular region [3, 5,

12]. Two interpolation techniques, low pass FIR filter interpolation and cubic interpolation on the spectral samples are implemented on the SRP-PHAT surface. For to get over than 8001 interpolated points , 131 long low pass FIR filter interpolates almost 824 original samples [3].

After interpolation with the low pass FIR filter interpolation technique, the surface becomes more smother (Figure 7.7(b)), compare to the original (Figure 7.7(a)). The cubic interpolation also gives almost the same smoothing effects (Figure 7.7(c)) as the low pass filter interpolation

(Figure 7.7(b)).

54

(a) (b)

(c)

Figure 7.7- SRP-PHAT surface (a) without interpolation, (b) with filter interpolation, (C) with cubic interpolation.

7.9.2 Energy Discriminator

When we get the reliable estimation from the SRP-PHAT, using the full grid search we would like to compare the performance level of SRP-PHAT with using Stochastic Region

Contraction (SRC) and Coarse to Fine Region Contraction (CFRC) [2, 3]. This kind of comparison would be proved the effectiveness of the SRC and CFRC in SRP-PHAT. Now the question is how we would say one estimation of SRP-PHAT as a reliable estimation [3].

From our observation regarding the SRP-PHAT methods, whenever the speech information are present sufficiently in the fame, Steered Response of Power Phase

Transformation (SRP-PHAT) seems to make good estimates all the time [1, 3, 8]. SRP-PHAT only fails or cannot show its performance of correct estimation on frames containing a large

55

portion of silence data. That’s why, if we get the good frames, where the speech information is present properly, using the full grid search we can compare the performance level of SRP-PHAT using the SRC and CFRC [2, 3, 8, 10].

It is very normal scenario that the speech frames normally consumed the higher energy compare to the non speech frames and at the same time, it is very difficult to detect the low energy gathered events in highly noisy conditions [3].

Now we use a discriminator based on energy power can be used to detect the frames containing the speech or non speech information. In this particular research work, we have created the energy discriminator as in the following way,

Table 5: A Simple Energy Discriminator

Category

Initialize

Determine

Select

Calculate

Test

Increase

Energy threshold, to background energy (in our case we use -30dB)

Energy of all frames in the processed data, normalizing highest frame to

0dB.

A set of frames, G, that have energy greater than .

The standard deviations, of the source locations estimates given by SRP-PHAT using a full grid search on G frames.

IF , and , STOP. Determine , and .

KEEP .

Threshold to a higher value.

Iterate Go to step 2.

We nearly obtained , after implementing this discriminator over 4 positions with different SNR and we also use as the threshold in our energy discriminator.

For the SRP-PHAT, all frames that have energy consider as a good frames and on those good frames, we only can see the comparison the performance of SRC and CFRC with the full grid search. If the implementation of SRC and CFRC achieve the global maxima everywhere, the performance level will be listed as 100% [3, 7, 11].

56

(a) (b)

(d)

(c)

Figure 7.8: The Simple Energy Discriminator for (a) Source 4, (b) Source 3, (c) Source 2, (d)

Source 1.

7.10 Result Analysis

Results are shown in the following table for the percentage of correct approaches or accuracy and the average number of functional evaluation used for grid search on all frames,

CFRC and SRC on the good frames. For our evaluation, frames of 102.4 ms, advancing 25.6 ms within the speech were used for experiment and if it were either off by more than 5cm in

10cm in , than the estimation was consider an error estimation [3, 7].

57

Table 6: Performance evaluation of SRP-PHAT using full grid search over all frames; CFRC and three SRC (SRC-I, SRC-II, SRC-III) over good frames for 4 different locations.

ALGORITHM

Source

No

SNR

Grid

Search

%

Corr

CFRC SRC-I SRC-II SRC-III

100 %Corr 100 %Corr 100 %Corr 100 %Corr 100

Source

1

7.9dB

Fe’s Fe’s

9123

Fe’s

7486

Fe’s

7729

Fe’s

20164

%

Corr

96.6 %Corr 100 %Corr 100 %Corr 99.1 %Corr 100

Source

2

5.7dB

Fe’s Fe’s

9960

Fe’s

6531

Fe’s

6538

Fe’s

20391

%

Corr

87.8 %Corr 100 %Corr 100 %Corr 97.5 %Corr 98.14

Source

3

3.47dB

Fe’s Fe’s

15889

Fe’s

22568

Fe’s

24185

Fe’s

26778

%

Corr

67.3 %Corr 100 %Corr 100 %Corr 98.4 %Corr 84.61

Source

4

1.9dB

Fe’s Fe’s

19273

Fe’s

33601

Fe’s

53304

Fe’s

31219

I, SRC-II, SRC-III) are given below,

The Visualization of the performance level using CFRC and three variants of SRC (SRC-

100

Performace of CFRC Relative to Grid Search

90

80

Source 1

Source 2

Source 3

Source 4

70

60

50

40

30

20

10

0

1

4 Different Sources (Source 1, Source 2, Source3, Source 4)

Figure 7.9: Performance of SRP-PHAT using CFRC relative to grid Search

58

100

90

80

70

60

50

40

30

20

10

0

Performace of SRC-I Relative to Grid Search Performace of SRC-II Relative to Grid Search

100

Source 1

Source 2

Source 3

Source 4

90

80

70

60

50

40

Source 1

Source 2

Source 3

Source 4

1

4 Different Sources (Source 1, Source 2, Source3, Source 4)

(a)

100

90

30

20

10

0

1

4 Different Sources (Source 1, Source 2, Source3, Source 4)

Performace of SRC-III Relative to Grid Search

(b)

80

70

60

50

Source 1

Source 2

Source 3

Source 4

40

30

20

10

0

1

4 Different Sources (Source 1, Source 2, Source3, Source 4)

(c)

Figure 7.10: Performance of SRP-PHAT using (a) SRC-I, (b) SRC-II, (c) SRC-III relative to grid

Search.

59

CHAPTER 8

CONCLUSION AND THE FUTURE WORK

8.1 Conclusion

Especially under the higher noise conditions, real time, two stage location estimation algorithm, we have evaluated here that Steered Response Power Phase Transform (SRP-PHAT) is superior. In relative to a full grid search (From the Table 6), SRC and CFRC both shown the perfect performance level and at the same time, we can minimize the SRP-PHAT’s large computational cost by using the SRC and CFRC.

In our research, at worst case conditions ( ), using CFRC, we can able to get the perfect accuracy of SRP-PHAT with a computational advantage of 1217:1. And on the other side, at our best case ( ), CFRC gives full accuracy with a computational advantage of 2631:1, if the conditions are less noisy. Now, using SRC-I ( ) gives the computational advantage of 714:1 and SRC-I ( ) gives the computational advantage of 3206:1 to the full grid search. It is very important to see that the CFRC is less costly under the noisy conditions ( ) and at the same time under the less noisy case (

) the system are more expensive. So, when the noise is comparatively high, we have to use CFRC rather use to SRC and on the other side, when the noise is low, we have to use the SRC to get better performance and reduce the computational cost.

There is lots of optimizing technique that we can be used in Steered Response Power

Phase Transformation (SRP-PHAT) for to reduce the computational cost of SRP-PHAT, among all those two global optimizing technique Stochastic Region Contraction (SRC) and Coarse to

Fine Region Contraction (CFRC) are more efficient and robust which can make the SRP-PHAT more practical in real time by the three orders of magnitude relative to a full grid search [2].

Over the fast, two stage method and three dimensional ways, both optimizing techniques have been shown to preserve perfectly over the performance of the Steered Response Power Phase

Transformation (SRP-PHAT) for to implement the real time in our system.

In the real life conditions, the algorithmically Steered Response Power Phase

Transformation (SRP-PHAT) is important for to localize the active sound source localization where Phase Transformation (PHAT) itself allowed the parametric variation conditions which directly influence to the original spectral amplitude on the final coherent power values [4, 20].

Under the different mode of operating conditions, the choose of perfect parameter is important for to get the justification of the impact of Phase Transformation which is based on the operating conditions and for to make a performance path of the Phase Transformation (PHAT) more robust over a narrower range and the broadband source [2].

60

8.2 Future work

For to fulfillment the overall goal of this research Acoustic Beam forming, Design and

Development of Steered Response Power Phase Transformation (SRP-PHAT), the described simulated environment and the condition are almost remain same in terms of their effectiveness of the partial weighting factor, the experimental setup and the test conditions. For to get and evaluate the more comprehensive performance, different mode of setup can be investigated in terms of the following issues, a. Number of acoustic microphones can be changed in the array model. b. Rearrangement of the spacing of the acoustic microphones. c. Development of the microphone array geometry.

From the different phase of investigation, Generalized Cross Correlation (GCC) with pitch based weighting system is the more robust application than any conventional Phase

Transformation (PHAT) under fully noise environment and shows the same efficiency level as the Phase Transformation (PHAT) under reverberation conditions. As the result of it, under reverberation conditions, than the maximum likelihood situations, it is shows its’ robustness and perform better than the Phase Transform (PHAT) under the noise only situation. It would be interesting to investigate to see that if this kind of weighting function is applied to the Steered

Response of Power (SRP) will make any kind of improvement or not.

61

REFERENCES

[1] A. Johansson, Nedelko Grbic and S. Nordholm, “Speaker Localization Using the Far

Field SRP-PHAT in Conference Telephone”, International Symposium on Intelligent

Signal Processing and communication systems, Kaohsiung, 2002.

[2] Hoang Tran Huy Do, “Real-Time SRP-PHAT Source Localization Implementations on a

Large-Aperture Microphone Array”, Brown University, Providence, RI, Sep. 2009.

[3] H. Do, H. F. Silverman, and Y. Yu, “A real-time srp-phat source location implementation using stochastic region contraction(src) on a large-aperture microphone array,” in Proc.

IEEE Int. Conf. Acoust. Speech, Signal Process., Honolulu, HI, April 2007, vol. 1, pp.

121–124.

[4] J. Dmochowski, J. Benesty, and S. Affes, “A Generalized Steered Response Power

Method for Computationally Viable Source Localization”, IEEE Transactions on Ausio,

Vol.15, pp. 2510-2526, Nov. 2007.

[5] J. Dmochowski, J. Benesty, and S. Affes, “Fast Steered Response Power Source

Localization Using Inverse Mapping of Relative Delays”, ICASSP, pp. 289-292, 2008.

[6] M. Brandstein, J. Adcock and H. Silverman, “A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays”, IEEE Transaction on Speech and Audio

Processing, Vol.5, pp. 45-50, Jan. 1997.

[7] A. Karbasi and A. Sugiyama, “A New DOA Estimation Method Using a Circular

Microphone Array”, EURASIP, pp. 778-782, 2007.

[8] J. E. Adcock, Y. Gotoh, D.J. Mashao and H. F. Silverman, “Microphone Array Speech

Recognition via Incremental Map Training”, IEEE International Conference on Acoustic

Speech Signal Processing, Atlanta, GA, USA, 1996.

[9] M. F. Berger and H. F. Silverman, “Microphone Array Optimization by Stochastic

Region Contraction”, IEEE Transaction on Acoustic, Speech, Signal Processing, PP.

2377.

[10] M. S. Brandstein and H. F. Silverman, “A

Robust Method for Speech Signal Time Delay Estimation in Reverberation Rooms”,

IEEE International Conference on Acoustic Speech Signal Processing (ICASSP-97),

Munich, Germany, pp.375-378, Apr. 1997.

[11] J.C. Chen, R. E. Hudson and K. Yao, “Joint Maximum Likelihood Source Localization and Unknown Sensor Location Estimation for Near Field Wideband Signals”, Proceeding of SPIE, Vol.4474, pp. 521-532, Nov. 2001.

62

[12] C. Che, Q. Lin, J. Pearson, B. De Vries and J. Flanagan, “Microphone Arrays and

Neural Networks for Robust Speech Recognition”, Proceeding of the Human Language

Technology Workshop, pp. 342-347, Plainsboro, NJ, Mar. 1994.

[13] P.L. Chu, “Desktop Microphone Array for Teleconferencing”, Proceeding of IEEE

International conference of Acoustic Speech, Signal Processing, vol. 5, pp. 2999- 3002,

MI, USA, May 1995.

[14] J. H. DiBiase, “A High Accuracy, Low Latency Technique for Talker Localization in

Reverberant Environment Using Microphone Arrays”, PhD Thesis, Brown University,

Providence, RI, May. 2000.

[15] P.L. Chu, “Super Directive Microphone Array for a Set Top Video Conferencing

System”, IEEE International conference of Acoustic, Speech, Signal Processing, vol. 1, pp. 235-238, May 1997.

[16] M. Fiala, D. Green and G. Roth, “A Panoramic Video and Acoustic Beam Forming

Sensor for Video Conferencing”, IEEE International Workshop on Haptic Audio Visual

Environments and Their Applications, 2004.

[17] J. L. Flanagan, D. A. Berkley, G. W. Elko, J. E. West and M. M. Shondhi, “Auto

Directive Microphone Systems ”, Acoustica, vol. 73, pp. 58-71, 1991.

[18] D. Giuliani, M. Omologo and P. Svaizer, “Talker Localization and Speech Recognition using a Microphone Array and a Cross Poor Spectrum Phase Analysis”, ICSLP, vol. 3, pp. 1243-1246, Sept. 1994.

[19] G. W. Elko and A. N. Pong, “A stterable and Variable First Order Differential

Microphone Array”, IEEE International Conference of Acoustic Speech, Signal

Processing, May 1997.

[20] T. B. Hughes, H, Kim, J. H. DiBiase and H. F. Silverman, “Using a Real Time Tracking

Microphone Array as Input to An Hmm Speech Recognizer”, IEEE International

Conference on Acoustic Speech and Signal Processing, 1998.

[21] T. B. Hughes, H, Kim, J. H. DiBiase and H. F. Silverman, “Performance of an Hmm

Speech Recognizer Using A real Time Tracking Microphone Array as Input”, IEEE

Transaction of Speech Audio Processing, vol. 7, pp. 346-349, May 1999.

[22] M. Morf, J. Delosme and B. Friedlander, “A linear equation Approach to locating

Sources from Time Difference of Arrival Measurements”, IEEE International conference

Acoustic, Speech, Signal Processing, pp. 818-824, 1980.

[23] Ping Zou, Zheng Huang and Jianhua Lu, “Passive Stationary Target Positioning Using

Adaptive Particle Filter with TDOA and FDOA Measurements”, Joint Conference of the

63

10 th

Asia Pacific Conference on Communications and the 5 th

International Symposium on

Multi dimensional Mobile Communications Proceedings, 2004.

[24] D. Ward, E. Lehmann and R. Williamson, “Particle Filtering Algorithms for Tracking an

Acoustic Source in a Reverberant Environment”, IEEE Transactions Speech and Audio

Processing 2003.

[25] J. Vermaak and A. Blake, “Nonlinear Filtering for Speaker Tracking in Noisy and

Reverberant Environments”, IEEE International Conference on Acoustic, Speech and

Signal Processing (ICASSP), 2001.

[26] G. Shi and P. Aarabi, “Robust Speech Recognition Using Near Field Super Directive

Beamforming with Post Filtering”, IEEE International Conference on Acoustic, Speech and Signal Processing, 2003.

[27] T. Gustafsson, B. Rao and M. Triverdi, “Source Localization in Reverberant

Environments: Modeling and Statistical Analysis”, IEEE Transactions on Speech and

Audio Processing, pp. 791-803, 2003.

[28] J. H. DiBiase, H. Silverman, M. S. Brandstein, “Robust Localization in Reverberant

Rooms, In Microphone Arrays, Signal Processing Technique and Applications”, Springer

Verlag, Berlin, pp. 157-180, 2001.

[29] P. Svaizer, M. Matassoni and M. Omologo, “Acoustic Source Location in a Three-

Dimensional Space Using Cross Power Spectrum Phase”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP-97), Munich, Germany, pp. 231-

234, 1997.

[30] Michael S. Brandstein and H. Silverman, “A Practical Methodology for Speech Source

Localization with Microphone Arrays”, Computer, Speech and Language, pp. 91-126,

1997.

[31] Scott M. Griebel and Michael S. Brandstein, “Microphone Array Speech

Dereverberation using Coarse Channel Modeling”, IEEE International Conference on

Acoustic, Speech and Signal Processing Proceeding, pp. 201-204, 2001.

[32] E. Jan, P. Svaizer and J. Flanagan, “Matched Filter Processing of Microphone Array for

Spatial Volume Selectivity”, IEEE International Conference of Acoustic Speech Signal

Processing, pp. 1460-1463, 1995.

[33] W. Kellerman, “A Self Steering Digital Microphone Array", IEEE International

Conference on Acoustic Speech and Signal Processing, May 1991.

[34] C. H. Knapp and G. C. Carter, “ The Generalized Correlation Method for Estimation of

Time Delay”, IEEE Transaction of Acoustic, Speech and Signal Processing, pp. 320-327,

Aug. 1976.

64

[35] J. Mosher, and R. Leahy, “Source Localization using Recursively Applied and Projected

Music”, IEEE Transaction of Signal Processing, pp. 332-340, Feb. 1999.

[36] M. Omologo and P. Svaizer, “Acoutic Source Localization in Noisy and Reverberant

Environments using CSP Analysis”, IEEE International Conference on Acoustic Speech and Signal Processing, pp. 901-904, 1996.

[37] H. Schau and A. Robinson, “ Passive Source Localization Employing Intersecting

Spherical Surfaces from Time of Arrival differences”, IEEE Transaction of Acoustic,

Speech, Signal Processing, pp. 1223-1225, 1987.

[38] R. Schmidt, “A New Approach to Geometry of Range Difference Location”, IEEE

Transaction of Aerosp. Electronics, pp. 821-835, 1972.

[39] R. O. Schmidt, “Multiple Emitter Location and Signal Parameter Estimation”, IEEE

Transaction of Antennas Propagation, pp. 276-280, Mar. 1986.

[40] H.F. Silverman, “Some Analysis of Microphone Arrays for Speech Data Acquisition”,

IEEE Transaction of Acoustic, Speech and Signal Processing, pp. 1699-1711, Dec. 1987.

[41] H. F. Silverman, W. R. Patterson III and J. L. Flanagan, “The Huge Microphone array

(HMA)- Part I”, IEEE Transaction Concurrency, pp. 36- 46, Oct- Dec 1998.

[42] H. F. Silverman, W. R. Patterson III and J. L. Flanagan, “The Huge Microphone array

(HMA)- Part II”, IEEE Transaction Concurrency, pp. 32- 47, 1999.

[43] H. F. Silverman, W. R. Patterson III and J. M. Sachar, “Early Results for a Large

Aperture Microphone Array System”, Proceeding of SAM 1999,Boston, MA, pp. 207-

211, 1999.

[44] H. F. Silverman, W. R. Patterson III and J. M. Sachar, “First Measurements of a Large

Aperture Microphone Array System for Remote Audio Acquisition”, Proceeding of IEEE

International Conference of Multimedia and Expo. 2000.

[45] H. F. Silverman, W. R. Patterson III, J. M. Sachar and Y. Yu, “Performance of Real

Time Source Location Estimators for a Large Aperture Microphone Array”, IEEE

Transaction of Speech , Audio Processing, pp. 593-606, July 2005.

[46] J. Smith and j. Abel, “The Spherical Interpolation Method for Closed form Passive

Source Localization Using Range Difference Measurements”, IEEE International

Conference of Acoustic Speech and Signal Processing, 1987.

[47] D. E. Sturim, “Talker Characterization Using Microphone Array Measurements”, PhD

Thesis, Brown University, Providence, RI, 1999

65

[48] H. Wang and P. Chu, “Voice Source Localization for Automatic Camera Pointing

System in Video Conferencing”, IEEE International Conference of Acoustic Speech

Signal Processing, vol. 1, pp. 187-190, Munich, Germany, Apr. 1997.

[49] H. Wang and M. Kaveh, “Coherent Signal Subspace Processing for the Detection and

Estimation of Angles of Arrival of Multiple Wideband Sources”, IEEE Transaction

Acoustic, Speech and Signal Processing, pp. 823-831, 1985.

[50] E. Weinstein, K. Steele, A. Agarwal and J. Glass, “Loud: A 1020 Node Modular

Microphone Array and Beam Former for Intelligent computing spaces”, MIT, Apr. 2004.

[51] R. Zelinski, “A Microphone Array with Adaptive Post Filtering for Noise Reduction in

Reverberant Rooms”, IEEE International Conference of Acoustic Speech, Signal

Processing, vol. 5, pp. 2578-2581, New York, USA, Apr. 1998.

66

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement