A mobile augmented reality audio system with binaural microphones

A mobile augmented reality audio system with binaural microphones
A Mobile Augmented Reality Audio System with Binaural
Robert Albrecht, Tapio Lokki, and Lauri Savioja
Aalto University School of Science
Department of Media Technology
P.O. Box 15400, FI00076 Aalto
{robert.albrecht, tapio.lokki, lauri.savioja}@aalto.fi
from the surroundings. Using audio for the presentation of
information, and possibly also for the input of information,
leaves the user’s sight as well as hands free to observe and
interact with the environment.
In mobile applications, certain criteria can be defined for
the usage of audio. The audio reproduction system must
naturally be mobile, which implies that it should be small
in size and wearable. In most applications, it is also desirable
that the audio is heard only by the user, either not to disturb
people in the surrounding, or because the information might
be of a private nature. These criteria typically restrict the
sound reproduction method to the use of headphones of some
kind. Most types of headphones, however, attenuate sounds
from the environment, and thus impair the perception of the
acoustic environment.
Techniques allowing a user to hear the surrounding acoustic environment while adding virtual sounds to the auditory perception have been studied in the field of augmented reality. Two possible techniques are ”acoustichear-through” (or hear-through) augmented reality and
”microphone-hear-through” (or mic-through) augmented reality (AR) [4]. Hear-through AR can be achieved using,
e.g., bone-conduction headphones, which do not attenuate
sounds from the surroundings. The headset used in micthrough AR attenuates sounds from the surroundings, but
has microphones located on both earpieces. The microphone
signals are mixed with virtual sounds and played through
the headphones. In this way, the acoustic environment is
perceived unattenuated.
This paper discusses design issues related to mic-through
AR. It presents an implementation of mic-through AR hardware consisting of insert headphones with binaural microphones, and a separate mixer and equalizer unit. The usability of the implementation is briefly evaluated. Finally,
some applications that could take advantage of an AR system like this are presented.
This paper presents a microphone-hear-through augmented
reality hardware system, consisting of insert headphones
with binaural microphones, i.e., a microphone at each ear,
together with a separate mixer and equalizer unit. The
equalizer compensates for changes in ear canal resonances
caused by the headphones blocking the ear canals, and
is used to achieve a natural-sounding reproduction of the
acoustic environment when playing the microphone signals
through the headphones. The system can be used both for
pure augmented reality applications and for any applications
presenting audio through headphones while allowing the user
to listen to the environment at the same time. Examples of
both types of applications are given. The applications can
utilize the binaural microphone signals and the user can adjust the level of the environmental sounds heard through the
Categories and Subject Descriptors
H.5.1 [Information Interfaces and Presentation]:
Multimedia Information Systems—artificial, augmented,
and virtual realities; audio input/output
General Terms
Design, Measurement
Microphone-hear-through augmented reality, audio, binaural microphones
While the original use of mobile phones is voice communication, new mobile applications typically present information through visual displays and accept input of information
through touch interfaces. These types of interfaces, however,
draw the attention of the user to the mobile device and away
Although using bone-conduction headphones to render
virtual sound sources would allow one to hear natural sounds
unattenuated and without loss of quality, using headphones
with binaural microphones has its advantages. Firstly, reproducing the natural sound environment using headphones
allows the listener to either attenuate the level of natural
sounds, when desired, or alternatively amplify those sounds.
Secondly, the signals from the binaural microphones can be
used for many interesting applications. The sound quality
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
IWS ’11, August 30, 2011, Stockholm, Sweden.
Copyright 2011 ACM 978-1-4503-0883-0/11/08 ...$10.00.
of bone-conduction headphones is also in general inferior to
that of headphones. Using headphones for displaying both
the natural sound environment and sound events that augment it would probably result in a better integration of natural and virtual sounds.
universal serial bus (USB), allowing input of the microphone
signals on, e.g., laptops, which often lack an analog stereo
microphone input. The mixer should also draw its power
through the USB connector, positively affecting both size
and ease of operation compared with the original batterypowered mixer. Additionally, the level of the microphone
signals passed to the headphones should be adjustable.
Ideally, the mixer would not be a separate unit, but integrated either into the headset or into the mobile phone.
The microphone signal level should be adjustable in software. The novel mixer is, however, aimed for providing a
reasonably sized platform for testing mobile augmented reality audio applications, and not as a protoype for a final
The design of a mic-through AR system introduces some
issues. The environmental sounds heard through the system must sound close to natural, especially if the system
might be used for extended periods of time. For this reason,
it must be taken into consideration, e.g., how the natural
acoustics of the ear canal change when blocking the earcanal entrance.
Ear Canal Resonances
The headsets used for the implemented mic-through augmented reality system are modified Philips SHN2500 headphones (see Figure 1). For noise-cancelling purposes, these
headphones have a microphone attached to each earpiece.
The noise-cancelling electronics from the headphones have
been removed, and instead, an additional 3.5 mm tip-ringsleeve connector has been added for microphone signal output.
The open ear canal acts as a quarter-wavelength resonator.
The lowest resonance lies between 3 and 5.5 kHz [2], mainly
depending on the length of the ear canal. When the earcanal entrance is blocked, e.g., by a headphone earpiece,
the ear canal instead acts as a half-wavelength resonator,
with the lowest resonance at a frequency double that of the
quarter-wavelength resonance. When designing headphones,
the missing quarter-wavelength resonance is often taken into
account by adding a peak in the magnitude response, but the
unnatural half-wavelength resonance is seldom compensated
for [7]. Both these resonances should be taken into consideration when designing a naturally-sounding mic-through AR
Although insert headphones quite effectively attenuate
sounds from the environment, some sound always leaks
through and past the headphones. This leakage is larger at
lower frequencies [8]. In a mic-through AR system, where
the leaking natural sound is heard together with the microphone signals reproduced with headphones, this leakage
would boost environmental sounds at low frequencies if not
compensated for.
Another problem is also introduced by this leakage. If
the sound reproduced with the headphones is delayed relative to the leaking natural sound, a comb-filtering effect
is heard. The latency introduced when processing the microphone signals and mixing them with the signals from
an external source must therefore be kept at a minimum.
Therefore this task cannot be performed on a computer, for
Figure 1: Earpieces of the Philips SHN2500 noisecancelling headphones. These have microphones located at the end of the earpieces opposite to the
headphone drivers.
Mixer and Equalizer
The mixer in the mic-through augmented reality audio
system performs two tasks. First, it performs equalization
of the microphone signals. Then, it mixes the equalized
microphone signals with signals from external sources, and
passes the mixed stereo signal to the headphones.
In the equalization circuit, there are two peak/notch filters
with adjustable center frequency, quality factor, and gain.
These can be used to add a peak to the magnitude response
representing the missing quarter-wavelength resonance, and
remove the peak caused by the half-wavelength resonance.
Additionally, a high-pass filter with adjustable cutoff frequency can be used to attenuate low frequencies, which otherwise would be pronounced because of leakage. Insert headphones often also have a pronounced low-frequency response
due to the pressure chamber principle [7]. To ensure a short
enough delay for the signal travelling from the microphone
The hardware used for mic-through AR presented here
is based on earlier research [7, 8]. The system consists of
a headset with binaural microphones and a separate mixer
and equalizer unit, henceforth simply called the mixer. A
usability evaluation made with the original mixer concluded,
among other things, that the mixer was unnecessarily large,
and suggested that users should be able to adjust the level
of the microphone signals reproduced with the headphones
Based on the original mixer, a novel mixer was designed.
Two main requirements were set. It should be smaller in
size than the original mixer and thus easier to carry around.
It should also connect to computers and mobile devices via
to the headphone, the equalizer is implemented as an analog
Suitable parameters for the equalization filters of the original mixer were found by doing measurements with four test
subjects. The transfer function from a loudspeaker in front
of the test subject and a microphone 5 mm inside the entrance of the ear canal was measured both with the ear canal
unoccluded and with the headset and mixer used without
equalization. Figure 2 shows the transfer functions measured with one of the test subjects. The black line shows
the magnitude response with the ear canal unoccluded and
the grey line using the headset and mixer without equalization.
Individual equalization curves were calculated for each
test subject as the difference between the magnitude response of the two measurements on the dB scale. Figure
3 shows a generic equalization curve based on the average of
the individual curves, as measured from the mixer. Figure
4 shows transfer function measurements with the headset
and mixer using individual equalization (grey line), compared with measurements done with the ear canal unoccluded (black line).
Figure 5: The mixer of the augmented reality audio
system. On the left side of the mixer are the headphone output, the auxiliary and microphone inputs,
and the USB connector. On the right side, there
are two potentiometer knobs for adjusting the level
of the microphone signals passed to the headphones.
The dimensions of the box, excluding knobs and terminals, are 73 mm × 50 mm × 25 mm.
The analog equalization circuit receives power through the
USB connector. The mixer is shown in Figure 5. Two potentiometer knobs allow adjusting the level of the microphone
signals passed to the headphones. A block diagram of the
mixer is shown in Figure 6.
Figure 2: The transfer function measured from a
loudspeaker in front of a test subject to a microphone inside the subject’s ear canal. The black line
is the magnitude response with the ear canal unoccluded, and the grey line when using the headset
and mixer without equalization. Adapted from [7].
Figure 4: The transfer function measured from a
loudspeaker in front of a test subject to a microphone inside the subject’s ear canal. The black line
is the magnitude response with the ear canal unoccluded (the same as in Figure 2), and the grey line
when using the headset and mixer with individual
equalization. Adapted from [7].
For the usability study, the mixers were powered by 1150
mAh rechargable batteries supplying the required 5 V operating voltage through a USB socket. Three test users, of
which two were authors of this paper and the third also familiar with the previous mixer, tested the mixer and headset
in different situations for several hours in total and reported
their experiences with the system.
Figure 3: A generic equalization curve based on
measurements with all four test subjects. The curve
was measured from the mixer. Adapted from [7].
Two of the users commented that the timbre of the system was not entirely natural, but that it had a bit too
much emphasis on mid frequencies or higher frequencies.
The colouration seemed to amplify certain sounds like, e.g.,
keyboard strokes. In general, however, the audio quality was
considered good and adaptation to the sound might not take
more than a few minutes. One user said that after maybe
The mixer also contains an integrated circuit performing
the task of a USB sound card. The sound card allows input
of the microphone signals to a computer, and output from
the computer to the headphones via the mixer. The mixer
also has an auxiliary analog stereo input. The equalized microphone signals are thus mixed with the signals input from
an external source, either via USB or analog input, or both.
Two of the users reported wanting to take off the headset when talking to other people. One user speculated that
because you have your ears plugged while talking, you assume that you cannot understand other people. He thought
that wearing the headset in reality might affect speech intelligibility less than you think. Over all, however, speech
intelligibility was considered good and the most problematic
thing with talking to other people might be that your own
voice is amplified and sounds strange. It might also seem
impolite to talk to people while wearing a headset.
While using the headset and mixer, talking on the phone
was possible, but the fact that the headset earpiece slightly
extrudes from the ear was found problematic. This makes
it difficult to put the phone in the correct position where
the phone speaker is placed against the microphone of the
headset. Naturally, a headset like this should be connected
directly to the mobile phone and used like any hands-free
Figure 6: Block diagram of the mixer. On the left
side are the analog microphone and auxiliary inputs.
With the USB audio chip (at the bottom) the mixer
can function as a USB audio device, allowing digital
auxiliary input and microphone output to platforms
supporting such devices. The USB connector also
provides the mixer with a 5 V operating voltage.
The equalizer (at the top) provides natural-sounding
reproduction of the microphone signals through the
headphones. The level of the equalized microphone
signals can be adjusted, and the signals are mixed
with any signals from the auxiliary inputs, before
they are passed to the headphones.
When riding a bike, wind noise was found disturbing when
using the headset and mixer. When walking outside wind
noise was generally not a problem. One of the users reported that he didn’t hear bikes approaching from behind.
This might have been because of wind noise, but it still felt
that they should have been heard. Increasing the level of the
microphone signal might help, but would also increase the
level of noise. When driving a car, the headset felt transparent and comfortable.
When riding a bus, especially when listening to music,
the possibility to attenuate environmental sounds played
through the headphones completely was seen as a good option. In this case the headphones block sounds from the
outside quite nicely. On the other hand, it might be nice
to hear environmental sounds while walking and listening to
music, allowing you to be aware of your surroundings.
one hour you might even forget that you are wearing the
Localization of sound sources worked well with the headset. Noise sources like laptop fans might however be problematic to localize. Normally, there was no noticeable distortion of sounds, but listening to music from loudspeakers
at loud levels caused distortion.
The Occlusion Effect
The earpieces of the headset occlude the ear canals, amplifying especially low-frequency sounds which are conducted
through bone and tissue, or through the headset, to the
ear canal and cannot escape through the ear canal opening.
This causes, e.g., the headset wearer’s own voice to sound
boomy and strange. Sounds of eating and drinking are also
conducted to the ear canal through bones and flesh and thus
are amplified when wearing the headset.
Mobility and Comfort
The test setup was somewhat difficult to wear, because it
consisted of the mixer and the separate battery with quite
a long cable in between. It would definitely be much easier
to use the mixer if it was, e.g., integrated into the headset,
also reducing the number of cables required.
The current mixer has two separate potentiometers for
controlling the amplification of the microphone signal passed
to each ear separately. A single stereo potentiometer for
adjusting both channels at the same time would be much
more convenient, possibly with separate balance adjustment.
The potentiometer knobs were found to be unnecessarily
large and easily adjusted by mistake, e.g., when keeping the
mixer in the pocket.
All of the users reported some discomfort wearing the
insert-type headphones. One user reported feeling slight
pain after 30 minutes of usage.
The noise level from the microphones and mixer is somewhat high, masking low-level sounds from the environment.
When the level of the microphone signal is kept at a normal level, the noise is not particularly disturbing. When
the level is increased, the noise begins to disturb. One user
complained about noise caused by movement of the headset
On the Go
The applications that can take advantage of the presented
AR system can be divided into two categories. The first
category consists of what can be called pure AR applications, where the real environment is augmented with virtual
objects, which depend on the real environment and add information related to it. These applications typically require
information about the user’s location and head orientation
with respect to real-world points of interest, to be able to
render relevant information at the correct locations. Alternatively, these applications could extract and analyze information from the binaural microphone signals. The second
category of applications only uses the system as a means to
present virtual objects without hindering perception of the
real environment at the same time. An example of each category is given below. More application scenarios have been
presented in previous research [3, 5, 6].
An example of the first category would be a virtual museum guide [10]. Using 3D sound, the guide presents the
exhibit that the visitor is focusing his or her attention on,
based on location and head orientation. The guide also
makes recommendations about other exhibits that the visitor might find interesting, based on which types of exhibits
the visitor has shown interest for. The same concept could
be used, e.g., for a virtual tourist guide.
An example of the second category is the Mobile Augmented Messaging application [1]. The application is used
for sharing short audio messages between members of groups.
Messaging happens asynchronously, i.e., users can browse
through and listen to old messages at any time, in addition
to listening to new messages when they arrive. Thus, it is
an audio equivalent to text chats and discussion forums. A
mic-through augmented reality system is not necessary for
the application, but allows users to receive messages and listen to the environment at the same time, as well as record
messages using the microphones in the headset. In situations where recording of audio messages is not possible or
desirable, messages can be input as text and converted to
audio using text-to-speech synthesis. The application also
includes the possibility to do binaural recordings and post
these on Facebook.
If a user belongs to several groups, messages from different
groups are played back from different directions in front of
the user, using head-related transfer functions. Test users
found this to be a convenient way to identify which group
a new message belonged to in some cases. However, the
ability to separate different angles was quickly reduced as
the number of groups increased.
In addition to voice message input being faster than text
input and leaving the user’s hands and vision free for other
tasks, test users found the expressiveness of spoken messages
to be one of the biggest advantages. Browsing through old
voice messages was considered problematic and slow compared with text chats. The test users also thought it was
easier and less confusing to participate in several discussions
at the same time with text chats. Compared with, e.g., a
real-time audio teleconference, the asynchronous communication in this application does not allow for a natural flow
of conversation. This was, however, not considered particularly problematic by the test users. In some cases, it might
actually be an advantage, allowing all users to say what they
want to say without having to wait for their turn.
This kind of application is ideally suited for push-to-talktype communication. The users can constantly wear headsets and send each other audio messages. Other users hear
these messages directly, but also have the possibility to listen
to earlier messages again.
the level of sounds from the environment, at the cost of
slightly impaired perception of the acoustic environment,
compared with hear-through techniques. Also, the binaural
microphone signals available makes many interesting applications possible. The implemented system with headset and
mixer provides a useful system for testing AR applications
on platforms supporting USB audio devices. The main areas for improvement are reduction of noise and colouration
of the sound, as well as making the system integrate better
with mobile phones.
The research leading to these results has received funding from Nokia Research Center, the Academy of Finland,
project no. [218238] and the European Research Council under the European Community’s Seventh Framework
Programme (FP7/2007-2013) / ERC grant agreement no.
[1] R. Albrecht. Messaging in mobile augmented reality
audio. Master’s thesis, Aalto University School of
Electrical Engineering, 2011.
[2] D. Hammershøi and H. Møller. Sound transmission to
and within the human ear canal. The Journal of the
Acoustical Society of America, 100(1):408–427, 1996.
[3] A. Härmä, J. Jakka, M. Tikander, M. Karjalainen,
T. Lokki, H. Nironen, and S. Vesa. Techniques and
applications of wearable augmented reality audio. In
114th Audio Engineering Society Convention, 2003.
Preprint no. 5768.
[4] R. Lindeman, H. Noma, and P. de Barros.
Hear-through and mic-through augmented reality:
Using bone conduction to display spatialized audio. In
Proc. ISMAR 2007, pages 173–176. IEEE, 2007.
[5] T. Lokki, H. Nironen, S. Vesa, L. Savioja, A. Härmä,
and M. Karjalainen. Application scenarios of wearable
and mobile augmented reality audio. In 116th Audio
Engineering Society Convention, 2004. Preprint no.
[6] M. Peltola, T. Lokki, and L. Savioja. Augmented
reality audio for location-based games. In 35th
International Conference of the Audio Engineering
Society, 2009.
[7] V. Riikonen. User-related acoustics in a two-way
augmented reality audio system. Master’s thesis,
Helsinki University of Technology, 2008.
[8] V. Riikonen, M. Tikander, and M. Karjalainen. An
augmented reality audio mixer and equalizer. In 124th
Audio Engineering Society Convention, 2008. Preprint
no. 7372.
[9] M. Tikander. Usability issues in listening to natural
sounds with an augmented reality audio headset.
Journal of the Audio Engineering Society,
57(6):430–441, 2009.
[10] A. Zimmermann and A. Lorenz. LISTEN: a
user-adaptive audio-augmented museum guide. User
Modeling & User-Adapted Interaction, 18(5):389–416,
Using mic-through techniques for augmented reality applications provides the benefit of allowing users to adjust
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF