R T I D

R T I D

4. Implementation

4. Implementation

This chapter explains implementation related topics. Section

4.1

describes the steps that are followed to prepare the data set to work with our application.

4.1. Data set pre-processing

As explained in the previous chapter special preparation of the binaural data-sets is required to work with our application. The ITD of the data set to process has to be collected and minimum phase IRs extracted from it.

These tasks were accomplished using a Matlab

™ script consisting in the following steps:

• Read HRIRs (stored as a .wav files),

• Up-sample to a factor of 10 (ie. 44100 Hz to 441000 Hz),

• Find indexes of the given threshold (ie. -35 dB from peak) in left and right onsets,

• Write matrix of detected indexes for left and right IRs,

• Compute ITD as the difference of left and right (up-sampled) onset indexes.

• Convert ITD in µs

• Find the maximal time of flight

• Find the new length of the IRs as: new_length

= size_IR - max_time_of_flight

• Extract minimum phase IR of length new_length starting at the onset indexes.

• Down sample minimum phase IRs in a factor of 10.

• Store minimum phase IRs

• Store ITD matrix in a machine readable format (.txt).

• Generate a description of the modified dataset (start and end angles, angular resolution in degree for azimuth and elevation) in a machine readable format (.xml)

Code from the above mentioned steps can be found in Appendix

B .

28

4. Implementation

4.2. Functional requirements

Guidelines for the development of the ITD individualizer (ITD-I) are taken from the functional requirements of the complete system. These can be listed as follows:

• Real-time audio. Since dynamic binaural synthesis consists in essence of interactive processes, the application must be capable of operating in real-time at low latencies.

• Low signal to noise ratio (SNR).

• Bandwidth covering the frequency range of human hearing.

• Compatibility with the fwonder project. Modular integration with the real-time convolution engine should be granted.

• Fractional delays. According to

Mills ( 1958 ) the least noticeable difference (lsd) of

ITD changes can be as low as 10 µs. Since at a samplerate of 44100 Hz the period is already 22.7 µs, the ITD-I has to be capable of delivering finer resolutions, that means fractional delays.

• Support of configuration scripts. To facilitate the start of the software with customized parameters, easy to configure parameter-scripting should be possible.

• Command-line configuration. Another technique to provide pre-configuration is to use command-line instructions and arguments.

• Real-time control. ITD individualization should be possible not only with pre-configured settings, but also in real time using a graphical user interface (GUI), run-time hotkeys and OSC messages.

4.3. Software components

This section reviews the main software components employed in the application. It should be first stated that the application was written in C++ since several efficient libraries of methods are available in this programming language.

29

4. Implementation

4.3.1. Low latency high priority audio thread: The JACK Audio application programming interface (API)

One of the requirements listed in the previous section is compatibility with the fwonder project. fwonder was written for Linux and uses JACK as real time low latency audio system. JACK’s most important functional characteristics are:

• Low latency audio recording and reproduction,

• Sending audio between applications,

• Sharing an audio interface (soundcard),

• JACK guaranties sample accurate synchrony between applications,

• Open source software, thus free available and modifiable.

Usage

In its most typical use, applications working with JACK do not start a new thread to execute their audio processes, instead they provide a callback function to the JACK server (thus, acting like a plugin or client) which takes care of calling that function and keeping synchrony between all clients.

From the developer point of view this are the steps required to run a JACK client using the

C++ API:

1. Include JACK’s header file

Necessary for the compiler in order to include JACK’s functions, type definitions and enumerators in the application.

#include <jack / jack . h>

2. Opening a new client in the server

jack_client_open has to be called to create a new client, requisites are:

• a unique name for reference purposes,

• starting options like server name to register the client to, or session identification, etc,

• a status pointer used by JACK to return client information.

30

4. Implementation jack_client_t * jack_client_open ( const char

* client_name , jack_options_t options , jack_status_t * status )

3. Register the client’s ports

The pointers (memory addresses) to the buffers that will be used for collecting input and output should be registered in the JACK server. The parameters to register are:

• the client whose port(s) we are registering

• the name of the port

• the type of port, audio or midi (ie. JACK_DEFAU LT _AU DIO_TY PE)

• the flags indicating if the port is input, output, physical, monitor or terminal jack_port_t * jack_port_register ( jack_client_t * client , const char

* port_name, const char

* port_type , unsigned long flags , unsigned long buffer_size )

4. Register a process callback-function

Here we tell JACK what audio processing routine it should call. The parameters to use are:

• the name of the client

• the name of the callback-function

• a pointer for passing our own arguments (if any) to the function int jack_set_process_callback ( jack_client_t *

JackProcessCallback void

*

The callback-function should have the form: client , process_callback , arg )

31

4. Implementation int process ( jack_nframes_t nframes, void

*arg )

{

−operations−

} and it is here where the audio process has to be executed. For us, real-time fractional delay using sinc interpolation. Further in this chapter the algorithm developed to dynamically reinsert the individualized ITD as a VDL will be explained in detail.

5. Activate the JACK client

By calling this method the server starts processing audio. The only parameter required is the name of the client that was previously registered.

int jack_activate ( jack_client_t * client )

4.3.2. Delay-lines based on sample rate conversion (SRC)

In section

3.4

we saw that the optimal approach to achieve fractional delays in discretetime systems is to reconstruct the audio signal and resample it after shifting the sampling filter (sinc function), thus, using Nyquist sampling theorem. The implementation of SRC to achieve delay lines will be explained in this section with the aid of an example from

Wefers

( 2007

).

Let us consider a time discrete signal s

(n) with a sample period T , let us also define f (n) as a function that for every sample in s

(n) assigns the point in time of its reproduction. For a non delayed s

(n), we have f (n) = nT .

Let us now consider two delays of s

(n):

s

1

(n) = s(n n

1

) ⇒

f

1

(n) = (n + n

1

)T

s

2

(n) = s(n n

2

) ⇒

f

2

(n) = (n + n

2

)T

Now, during the time interval

[n

a

T

, n

b

T

] (n

a

, n

b

∈ N , n

a

n

b

) the delay changes from n

1 to n

2 samples. If k

1

a

denotes the sample of s

1

(n) that is reproduced at the beginning of the interval

[n

a

T

, n

b

T

], then:

f

(k

1

a

) = (k

1

a

+ n

1

)T

!

= n

a

T

k

1

a

= n

a

n

1

And for both signals at the beginning and ending of the interval it would be:

32

4. Implementation

k

1

a k

2

a

= n

= n

a a

n

n

1

2

k

1

b k

2

b

= n

b

n

1

= n

b

n

2

At the beginning of the interval

[n

a

T

, n

b

T

] (at the time n

a

T ) the delay has not changed yet, hence, the sample to reproduce is k

1

a

. On the other hand at the end of the interval (at the time

n a

T ) the new delay n

2 should be reached, meaning that the sample k

2

b

should be reproduced at this point. Since n

1

6= n

2 the only way to meet this condition is to stretch or squeeze the signal by altering its sampling rate by a factor of r

∈ R

+

.

Let us consider the modified function ˜

(n) as a variant of s(n) with a sampling period rT .

If ˜

(n) is delayed by a time ˜t and ˜f(n) corresponds to f (n) we have:

s

˜

(n) = s(n) and ˜f = (n − ˜t)rT

Now the idea is to stretch or squeeze ˜

(n) and adjust ˜t to meet the conditions:

i

)

f

˜

(k

1

a

) = n

a

T ii

)

˜f(k

2

b

) = n

b

T

From i

) ˜t can be isolated:

˜f(k 1

a

) = (k

1

a

− ˜t)rT

⇒ ˜t = k

1

a

n a r

= n

a

T

k

1

a

− ˜t =

n a r

applying this in ii

) the sampling rate conversion ratio r can be obtained:

˜f(k

2

b

) = (k

2

b

k

1

a

n a r

)

r

(k

2

b

k

1

a

) = n

b

n

a

= n

a

T

r =

n b

n

a k

2

b

k

1

a

Inserting the definitions of k

1

a

and k

2

b

we obtain:

r

=

n b n b

n

a

n

2

n

a

+ n

1

=

n b n b

n

a

n

a

+ n

1

n

2

If the interval of sample rate conversion is kept constant N

= n

b

n

a

, and we express the delay as

△ = n

2

n

1 the samplerate conversion ratio is:

N r

=

N

− △

(4.1) where N

, △ ∈ N

A decrease in delay is indicated by

△ < 0 ⇒ r > 1 on the other hand an increase in delay means

△ > 0 ⇒ r < 1.

Note that N

= △ should be avoided to prevent division by zero. As well as |△| ≤ N in oder to keep r

∈ R

+

.

33

4. Implementation

Figure 4.1.: Time stretching for achieving one sample delay (22µs). Note the use of sample rate conversion at the stretching region.

Example

Let us suppose the simple case of delaying one sample of an audio stream as presented on figure

4.1

. Here, the sample rate conversion ratio to apply in the yellow region would be:

r

=

N

N

− △

=

input samples out put samples

(4.2)

In our example, the amount of input samples in the SRC is N

= 5, and the amount of output samples is N

− △ = 4, so, the conversion ratio equals:

r

=

5

4

= 1.25

Now, if we want to delay the signal on 2 more samples we can either use the SRC with

r

= 1.25 two times, or change the conversion ratio to:

r

=

5

3

= 1.66667

and use it once. As long as no further delay is needed the SRC should be driven with a conversion ratio r=1.

4.3.3. The libsamplerate API

Since the realization of a SRC software is very complex and out of the scope of this work, an open source library,

Libsamplerate SRC API

was used. This specific library was chosen because:

• SRC on audio streams is possible,

Libsamplerate implements sinc interpolation (among other interpolators) in three bandwidth variants: 97% , 90% and 80% ,

• The API has been maintained and actively developed since 2002,

34

4. Implementation

• SNR reaches 97dB for all its sinc interpolators.

The libsamplerate API has three operation modes: simple, for static audio files; full and callback mode for audio streams. We used the full API since it allows more operation parameters to be specified.

Usage

To use libsamplerate’s functions it is required to include its header file:

#include <samplerate . h>

The SRC needs to be initialized using following method:

SRC_STATE* src_new( int converter_type, int channels , int

* error )

This function returns a pointer to a SRC object and requires as parameters the converter type, the number of audio channels and an error pointer that the converter will fill if errors are encountered. The library can work with five converters:

SRC_SINC_BEST_QUALITY A sinc SRC with a bandwidth of 97%

(ie.

(44100 ÷ 2) · 0.97 = 21388.5Hz)

SRC_SINC_MEDIUM_QUALITY A sinc SRC with a bandwidth of 90%

SRC_SINC_FASTEST A sinc SRC with a bandwidth of 80%

SRC_ZERO_ORDER_HOLD A very fast converter with poor quality

SRC_LINEAR A very fast converter based on linear interpolation

Once a SRC handle is created the function src_process needs to be called every time there is a process to execute.

int src_process (SRC_STATE *state, SRC_DATA *data)

This function uses the data of an object of type SRC_DATA containing following information:

35

4. Implementation

data_in

A pointer to the input data samples

input_frames

The number of frames of data pointed to by data_in

data_out

A pointer to the output data samples

output_frames

Maximum number of frames pointer to by data_out

src_ratio

Equal to output_sample_rate / input_sample_rate

end_of_input

Equal to 0 if more input data is available and 1 otherwise

The functions returns the number of output samples generated as well as the number of input samples used in the conversion (using members of the SRC_DATA struct). This information should be used to manage the audio buffers in order to assure glitch free coherence of the audio streams. In our application it is extremely important to guarantee a steady size of output samples to fill the output buffer. Remaining samples are reinserted in the audio stream with the aid of a ring buffer.

Every time there is a new delay to reach, a properly computed new conversion ratio should be set using: int src_set_ratio (SRC_STATE *state, double new_ratio) ;

4.3.4. OSC control: The liblo API

Another functional requirement of the application is to receive open sound control (OSC) messages to update the individualized ITD according to the user’s head position and to customize the ITD scaling factor in real time.

OSC was developed at the Center for New Music and Audio Technology (CNMAT) of the University of California at Berkeley. One of its goals was to create a protocol among

multimedia devices that is optimized for modern networking technology ( Wright 2005

).

Another goal was to create a protocol “open" to the generality of messages. Unlike the limited messaging possibilities of MIDI, OSC can handle strings, numbers as 32-bit floating point, 32-bit integer or 64-bit double precision among other formats, it also handles boolean

values, binary “blobs" and more ( Liblo OSC API ).

OSC is also called “open" because no standard messaging parameters are established, thus, allowing the implementer to decide the nature and organization characteristics.

In our project OSC was implemented using the

Liblo OSC API

, an open source ( GNU LGPL

) very easy to use and well documented API.

36

4. Implementation

Usage

To use Liblo in C++ there are basically 5 steps to follow:

1. Include the header files:

#include

"lo/lo.h"

2. Create a OSC server

Defining a port and a function that will be called in the event of an error being raised: lo_server_thread lo_server_thread_new ( const char

* port , lo_err_handler err_h )

3. Register methods for parsing OSC messages

The functions to be called on the occurrence of a given kind of message are passed to the

OSC server thread using the lo_server_thread_add_method function.

lo_method lo_server_thread_add_method ( lo_server_thread const char

* const char

* lo_method_handler void

* st , path , typespec , h, user_data )

The parameters are: st The server thread the method is to be added to.

path The OSC path to register the method to. If NULL is passed the method will match all paths.

typespec The type specification the method accepts. Incoming messages with similar h types (e.g. ones with numerical types in the same position) will be coerced to the typespec given here.

The method handler callback function that will be called if a matching message is received.

user_data A value that will be passed to the callback function h.

4. Start the OSC server

Once configured the OSC server can start receiving messages. The function lo_server_thread_start performs this task using the registered thread as parameter.

37

4. Implementation int lo_server_thread_start ( lo_server_thread st )

5. Stop the OSC server

If the server is not required anymore or if the application is being closed, the thread should be stopped and the reserved memory released. The Liblo API provides two functions for these purposes, both take as argument the OSC object thread: int lo_server_thread_stop ( lo_server_thread st ) and void lo_server_thread_free ( lo_server_thread st )

4.3.5. XML script parsing: The libxml++-2.x API

Another requirement of our software is to implement parsing of start-up configuration scripts. The extensible markup language (XML) is for that purpose very appropriate since, for the amount of parameters we are using, it can as well be easily generated an/or edited per hand or automatically using a computer.

A XML file contains root nodes, child nodes and their elements (attributes, parameters). The procedure to parse the contents of the parameters is to compare strings iteratively to first find a given node and then assign its content to a receptacle (ie. a member of a class-object).

The

Libxml++

API, also an open source library ( GNU LGPL

) contains methods to facilitate the navigation in XML files (Document :: get_root_node

(), Node :: get_children() among other functions).

The ITD-I parses three XML files at start-up:

• Application configuration file. Containing relevant information regarding OSC port,

JACK-client name to adopt, SRC method to apply, etc. Table

4.1

presents a detail of the nodes and their configuration attributes.

• Modified BRIR information file. Containing information about the angular ranges and resolution of the data set.

38

4. Implementation

• fwonder configuration file. Some operation parameters from fwonder are read in the

ITD-I to assure coherence between both softwares. These are: The path to the data set in use, and the JACK name, which is used to automatically connect to fwonder’s output ports to the ITD-I input ports.

The first configuration file is explained in the next section, while the other two are referred in the appendix.

Application configuration file

The ITD-I needs configuration at start-up, the parameters required are reviewed in this section in detail.

<?

xml version =

"1.0"

?>

<!

DOCTYPE stretcher_config PUBLIC

"" "stretcher_config.dtd"

>

< individualizer_config >

<jack name=

"ITD_stretcher"

/>

<path fwonder_config_file =

"/home/jorgose/configs/fwonder_config.xml"

/>

<config OSC_listen_port=

"58800" source_number=

"1"

SRC_modus=

"2" processing=

"4" scale=

"1" user_tragus=

"0" data_tragus=

"148"

/>

</ individualizer_config >

Root node Child node

jack name path config

Attribute

Client name to be used when working with JACK fwonder_config_file The path of the configuration file that fwonder uses.

OSC_listening_port Number of the OSC port to “listen" to.

source_number Specify the source number that fwonder uses.

SRC_modus One of the 5 SRC modus explained in section

4.3.2

.

processing scale

Smooths the squeezing and stretching of new delays to a given amount of processing chunks.

ITD scaling factor.

user_tragus data_tragus

Intertragus distance of the user.0 means use scaling factor instead of anthropometric formula. Section

5.3

refers this use in detail

Intertragus distance of the dataset.

Table 4.1.: Starting parameters of the configuration file of the ITD individualizer.

39

4. Implementation

Figure 4.2.: Graphical user interface of the ITD individualizer developed using GTK+2.2

4.3.6. GUI control: The GTK+2.0 Project

In order to provide the end-user with intuitive manipulation of the ITD, a graphical user interface was developed using the GTK+ project. The GTK+ toolkit was chosen because it is an open source project (licensed under the

GNU LGPL ), it also has a very accurately

documented API, and a big supporting community of developers.

Figure

4.2

shows the developed GUI. The GUI offers:

• A space for displaying text messages.

• Smoothing the change of one delay to another by increasing the amount of processing chunks to achieve a new delay.

• ITD scaling factors using anthropometry (see section

5.3

).

• Enable verbose output with messages about the actual ITD or other events (mute, bypass, scaling factor)

40

4. Implementation

• A horizontal fader scales the ITD in real time with the factor displayed on top of it.

The fader ranges from 0.000 (no ITD) to 2.000 (twice as much as the ITD in the data set).

• Bypass, here the output ports reflects input ports exactly.

• Mute, output ports buffers are filled with 0.

4.4. Flowchart of the audio process

Figure

4.3

shows the flowchart of the callback function that JACK processes in its audio thread. This function is executed in real-time for every incoming audio buffer and it is here where the ITD is individualized and reinserted in the audio path. In the following the numbered blocks of the flowchart are explained:

1 When the function is called the pointers to the input and output buffers have to be defined, these memory addresses are changed by JACK every time the function is called.

The approach used in our application is to only affect one ear’s audio stream with positive and negative delays to synthesize the individualized ITD. Since positive delays imply more output as input samples, ring buffers initialized with zeros are necessary to provide the system some headroom. Therefore the incoming left and right ears’ samples are both inserted into ring buffers.

2 Every new delay (ITD) can only be updated after the previous delay has been reached.

The user can decide how many processing blocks does a delaying task takes to complete. This is controlled with the smoothing_size parameter. In case of no delay update the SRC keeps processing conversions using the already computed SRC ratio.

3 A counter for the delay-smoothing feature is incremented every processing block.

4 When the amount of processing blocks has completed a flag is set scheduling the computation of a new delay.

5 The SRC converter is called using the parameters explained in section

4.3.3

. In case

of a new SRC ratio the change_ratio flag is passed to the SRC to make use of the

src_set_ratio

() method.

6 The change_ratio flag should only be set at every new delay computation, therefore it has to be cleared after its use.

41

4. Implementation process callback function

1 get JACK's IN/OUT ports

Fill ring buffer L, R with new input data

2

3 new_delay_enabled?

yes no num_procs++ get current: smoothing_size, scaling_factor K

9

ITD_new= K* ITD(azimuth,elevation)

10

5

7

4 num_procs

= smoothing_size?

no yes process_SRC(change_ratio, SRC_ratio, IN/OUT buffers,etc)

6

8 change_ratio=FALSE

Bypass or Mute?

yes no

OutL <-- SRCbufferOut

OutR <-- ringBufferR.GetData() new_delay_enabled = TRUE update Ring_BufferL() inBufferLength=samples_in_RB+nframes yes

Mute?

no

OutL = InL

OutR = InR

OutL = 0

OutR = 0 verbose?

yes

11

ITD_new no

12 delay_new=ITD_new/sample_size delta = (delay_new - delay_old)/smoothing_size delay_old = delay_new num_procs = 0 13

SRC_ratio = nframes/(nframes - delta) 14 change_ratio = TRUE new_delay_enabled = FALSE

15

Return

Figure 4.3.: Flowchart of the callback function that manages time stretching.

42

4. Implementation

7 Since we provide the SRC with more samples as needed (in order to be capable of big delay changes), the unused samples have to be reinserted into the ring buffer. We have to take account of the amount of samples in the ring buffer to indicate the SRC how many samples it may use as input.

8 Now that the SRC has delivered the processed samples, they have to be written in the output buffers JACK gave us. At this point the algorithm checks if mute or bypass flags are set. In that case we write zeros or copy the input samples in those buffers respectively. Note that even if the ITD-I is muted or bypassed the output stream containing the individualized ITD is still computed, thus, returning to normal operation is fast and easy.

9 In case of delay update the algorithm gathers the current scaling factor and smoothing_size to compute the parameters of the new delay.

10 As explained before the individualized ITD consists in the ITD of the data set scaled with a frequency independent factor. The ITD of the data set collected in the preprocessing stage (see section

4.1

) was already parsed from a .txt file and gathered in a

two dimensional array at start-up.

11 If the verbose flag is set the new ITD is displayed in console.

12 To compute the SRC ratio the new ITD must be expressed in samples according to the sample rate in use. The new delay (since last computing) is divided by amount of processing blocks stated in smoothing_size.

13 The counter of those processing blocks is re-started.

14 The SRC ratio is computed using equation

4.1

.

15 The flag for a new SRC ratio is set and the flag of for computing a new delay is cleared.

Figure

4.4

shows a schematic description of our JACK client process callback function.

43

4. Implementation

Head position

Left

INPUT

Right

Buffer

Buffer

Scale factor delay to sample

rate conversion

SRC smooth stretching

Left

OUTPUT

Right

ITD.txt

BRIR info

Figure 4.4.: Schematic description of the processing callback-function of the ITD individualizer

44

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement

Table of contents

Languages