Techniques and Applications of Wearable Augmented Reality Audio

Techniques and Applications of Wearable Augmented Reality Audio
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
Audio Engineering Society
Convention Paper 5768
Presented at the 114th Convention
2003 March 22–25
Amsterdam, The Netherlands
This convention paper has been reproduced from the author’s advance manuscript, without editing, corrections, or
consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be
obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York
10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof,
is not permitted without direct permission from the Journal of the Audio Engineering Society.
Techniques and applications of wearable
augmented reality audio
Aki Härmä1 , Julia Jakka1 , Miikka Tikander1 , Matti Karjalainen1 , Tapio Lokki2 , Heli Nironen2 , Sampo Vesa2
1
Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015, HUT, Finland
2
Telecommunications Software and Multimedia Laboratory, Helsinki University of Technology, P.O. Box 5400, 02015, HUT,
Finland
Correspondence should be addressed to Aki Härmä ([email protected])
ABSTRACT
The concept of augmented reality audio characterizes techniques where real sound environment is extended
with virtual auditory environments and communications scenarios. This article introduces a framework
for Wearable Augmented Reality Audio (WARA) based on a specic headset conguration and a real-time
audio software system. We will review relevant literature and aim at identifying most potential application
scenarios for WARA. Listening test results with a prototype system will be presented.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
1
1
INTRODUCTION
The era of wearable audio appliances started with the introduction of a portable cassette player two decades ago.
The development of digital technology led to portable
CD players and finally to fully digital mp3 players. Digital cellular phones have been around for ten years. While
speech communications is the main application, many
manufacturers have recently integrated a digital audio
player to a phone to enable high-quality audio playback.
However, the basic application scenario for wideband
audio is still the same as in early Walkmans. Another
aspect standing with the aging population in developed
countries is that the number of users for hearing aid devices is constantly increasing. With digital technology
the quality of hearing aid devices has significantly improved while the prices are dropping and thus preparing
the way for even higher number of users. Yet another
application that relates to this is personal active hearing
protectors.
We may have multiple wearable audio appliances but
only one pair of ears. At some point it may make sense
to integrate all those functions into the same physical device. Mechanical and electrical integration is already
feasible. However, in application scenarios there are
many interesting new possibilities and problems to explore. Also, the progress in speech and audio technology,
computing, and communications predict introduction of
completely new types of intelligent and interactive audio
and speech applications. For example, auditory displays
which can provide a user different types of information
in the form of spatialized sound events have been introduced.
We consider a device which a user could be wearing at all
times. It would resemble portable audio players in some
respects and also provide speech and audio communications services, e.g., over a wireless network. But, at the
same time, it would also make it possible for a user to
hear and interact with the real acoustic environment in a
natural way. Thus it would facilitate or even make easier
ordinary speech communications with other people, enable safe navigation in traffic, and operation of machines
where acoustic feedback is important. In addition, there
would be a large number of new functions which provide
information and communications channels which are not
available in a natural acoustic environment or in current
appliances.
The possibility to hear the natural acoustic environment
around a user differentiates the concept of augmented
reality audio from the traditional concept of a virtual
audio environment where a user is typically immersed
into a completely synthetic acoustic environment. The
proposed system for WARA requires specific transducer
systems and auralization techniques. In the prototype
system introduced in this article the transducer configuration is based on a headset where miniature microphones are integrated into earphone elements in both
ears. When microphone sounds are routed directly to
earphones, a user can perceive a representation of the
real acoustic environment. Since the experience may differ from the open-ear case we call this representation a
pseudo-acoustic environment. It has been demonstrated
that users can adapt to a modified binaural representation
[1]. Virtual and synthetic sound events, such as talk of a
remote user, music, or audio markers are superimposed
onto the pseudo-acoustic sound environment in a device
which may be called an augmentation mixer. At one extreme, virtual sounds can be combined with the pseudoacoustic signals in the augmentation mixer in such a way
that a user may not be able to determine which sound
sources are local and which ones are artificially rendered
by means of digital signal processing. Preliminary listening tests show that this is relatively easy to achieve
in using personalized in-situ HRIRs in the synthesis of
virtual sounds.
Real-time implementation of WARA system requires
low latency for audio and seamless integration of audio
streams, signal processing algorithms and network communications. We will introduce a modular and flexible
software architecture developed for testing of different
application scenarios.
One class of applications is based on localized audio
messages. For example, we may consider auditory PostIt application where a user can leave and receive audio
messages related to certain places or objects in our environment. We review the technology for detecting the
location and orientation of a user in those of applications
and report recent experimental results. In speech communication applications the use of binaural microphones
makes it possible to explore new ways of communication
between people.
We start with an introductory section where we put the
WARA concept into a wider context and define the central parts of the proposed framework. Then we review
the literature and give an organized representation of pre-
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
vious works and ideas in related application fields and
hardware solutions available at the time of writing this
paper. In Section 4 and 5 we introduce a software platform developed for testing of different techniques and
applications.
2
REAL, VIRTUAL, AND AUGMENTED AUDIO ENVIRONMENTS
The basic difference between real and virtual sound environments is that virtual sounds are originating from
another environment or are artificially created. Augmented reality audio (or augmented audio reality) combines these aspects in a way where real and virtual sound
scenes are mixed so that virtual sounds are perceived as
an extension to the natural ones. At one extreme an augmented reality audio system should pass a test which is
closely related to the classical Turing test for artificial intelligence [2]. That is, if a listener is unable to determine
whether a sound source is a part of the real or a virtual
audio environment, the system implements a subjectively
perfect augmentation of listener’s acoustic environment.
At the other extreme, virtual auditory scenes could be
rendered in high quality such that they are easily separable from real ones by their characteristics that are not
possible in normal acoustic environments.
In the current paper the focus is in developing techniques
for future wearable applications. Hence, it is clear that
transducers used for producing virtual sounds must be
wearable. In practice, headphones or earphones are most
viable alternatives. Headphones have been used successfully in many virtual reality applications reviewed in Section 3. Figs. 1 a and b illustrate a user in a real acoustic
environment and with a headphone-based virtual acoustic system where sound environment is created using a
computer.
2.1 Headphone problems in virtual spatial audio
Headphone auralization often leads to a perceived effect
of having the virtual source localized inside listener’s
head. This is usually called intracranial, or inside-thehead locatedness (IHL) [3]. Spatial localization of sound
sources which are perceived to be inside listener’s head
is termed lateralization. It has been demonstrated that a
listener can make a clear distinction in headphone listening between localized (that is, sounds outside-of-head)
and lateralized sound sources and that these two types
can coexist in listener’s experience [4].
The effect of a lateralized sound in headphone listening
can be produced using amplitude and delay differences in
a)
b)
Fig. 1: A listener in a) real and b) virtual environment.
two headphone channels corresponding to each source.
In order to make a sound source externalized, more sophisticated binaural techniques are needed [5]. In particular, we may list the following aspects
1. Spectrum differences in the two ear signals due to
head-related transfer functions, HRTF’s. In laboratory environment personalized HRTF’s can be
used to produce a realistic illusion of an externalized source [6]. However, there is a great variability
in HRTF’s among subjects.
2. Acoustic cues such as the amount of reverberation
in virtual sound. It has been demonstrated that the
use of artificial reverberation can help in forming
an externalized sound image in headphone listening
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
3
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
Attenuation of WARA headset model I ear plugs
[7].
4. Multimodality aspects such as connection of a
sound event to a visible real-world object.
5. User’s expectations on the performance of the system also affect externalization and localization of
virtual sound sources, see, e.g., [8].
2.2 The proposed transducer system
In this study, we focus on a rather specific type of
microphone-earphone systems. There are many other alternatives but this was chosen as a starting point because
it seems to have many beneficial properties and it directly
facilitates testing of rather unconventional applications
and services. This is a microphone-earphone system
where microphones are mounted directly on earplug-type
earphone elements and are therefore placed very close to
the ears. Ideally, the whole system could fit in user’s ears
and there would be no additional wires. The wearability of this device would be the same as for a hearing aid
device. Earplug-type earphones have a low power consumption which results in extended operating times and
low weight of the device. For testing we constructed two
different types of headsets. Attenuation of direct sound
in model I measured in an anechoic chamber with an audiometer is plotted in Fig. 2. In model I, the earphones
are of earplug-type and provide 10-30 dB attenuation of
direct sound. In model II, the headphones are traditional
open earphones placed at the entrance of the ear canal
and providing only 1-5 decibel attenuation. The right ear
element of the model II is shown in Fig. 3.
The use of earplug-type headphones makes it possible
to control the signals entering listener’s ears more accurately than, e.g., with open headphones. For example,
one may mute or amplify sounds selectively if the direct
acoustic propagation of sound into the ears is suppressed
by the blocking of the ear canal by the device. The attenuation of external sounds is one of the parameters which
need to be studied in comparing different transducer systems.
−5
−10
Attenuation
3. Dynamic cues related to head turning and other
movements of a listener or a source. The virtual
source should be stable in relation to the environment. In headphone listening this requires that the
auralization processor is controlled by information
on the position and orientation of listener’s head,
see review on head-tracking techniques in Section
6.
0
−15
−20
−25
−30
−35
−40
3
10
Frequency
Fig. 2: Attenuation in headset model I (earplug).
Microphones are located in the ears or very close to the
ears at both sides. When microphone signals are routed
directly to earphones the system exposes a user to a ’binaural’ representation of the real acoustic environment
around the user. However, in practice it is almost impossible to position and tune transducers so that signals
entering listener’s ears are identical to those in the openear case. Hence, the produced spatial impression corresponding to the real acoustic environment is also altered. To make a distinction between the physically real
acoustic environment and its electronic representation in
user’s ears we call the latter one a pseudo-acoustic environment. This is illustrated in Fig. 4 a, where a user is
wearing a microphone-earphone system.
2.3 Pseudo-acoustic environment
The pseudo-acoustic environment is a modified representation of the real acoustic environment around a user. In
many applications it is most convenient to try to make
the pseudo-acoustic environment as identical to the real
environment as possible. In principle, this could be
achieved by means of digital filtering of microphone signals. Equalization filters could be estimated in a measurement where a probe microphone was inserted into
the ear canal. However, this is difficult and would probably lead to highly individualized filters specific for some
particular piece of equipment. Therefore, in the current
phase, we only try to control the signal level and do some
coarse equalization of signals to make the pseudo environment sound as natural as possible. Accordingly, some
difference is expected to remain in the user’s spatial impression. However, it has been demonstrated in many
experiments that listeners can adapt to atypical [1] or su-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
4
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
the position and orientation of a user is needed. Several
alternative techniques for head-tracking are reviewed in
Section 6.
Fig. 3: Model II headset element with an open-type earphone (constructed of a Sony MDR-ED268LP earphone
and an electret microphone element). Position of the microphone element is indicated by the arrow.
2.5 Augmented reality audio environment
An augmented audio environment is produced by superimposing virtual sound environment onto the pseudoacoustic environment. First, the local environment
should be delivered to a user in such a way that its loudness and binaural properties are acceptable for the user.
Secondly, virtual sound environment should be mixed
with the local environment carefully to produce a coherent perception for a user. A goal is to find the best
mixing rules for local and virtual enviroments leading to
a meaningful fusion of these two main components of
augmented reality audio. In Fig. 4b the mixing of the
pseudo-acoustic and virtual audio environments is performed in a device which is called augmented reality audio (ARA) mixer.
pernormal [9] binaural inputs. Therefore, we may expect
that a user could adapt to the use of the proposed transducer system, too.
The proposed model could also be used to produce a
modified representation of reality which could be advantageous, more convenient, or just an entertaining feature
for a user. For example, in some cases the system would
provide hearing protection, hearing aid functions, noise
reduction, spatial filtering, or emphasize important sound
signals such as alarm and warning sounds.
a)
2.4 Virtual sound environment
A virtual sound enviroment is created using traditional
techniques of auralization [10, 11, 12, 13]. This mainly
involves binaural filtering using HRTF filters. However,
this may not be sufficient for good results in externalization of sound sources especially in a reverberant pseudoacoustic environment. Therefore it is probably necessary
to bring some early reflections and reverberation also to
the virtual sound environment to make it match better
with the local environment. It is currently an open research issue how this should actually be done and therefore many different approaches have to be tested.
In some applications the rendered virtual sound environment should be independent of the position and rotation
of user’s head. The sound source should be localized and
often somehow connected to the real environment around
the user. In those applications some system for finding
ARA
ARA
mixer
mixer
b)
Fig. 4: A listener in a) pseudo-acoustic and b) augmented
environment.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
5
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
LOUDSPEAKER
ESTIMATING
HRIR L
HRIR R
FILTERING
INPUT SIGNAL
Fig. 5: A listener in an augmented environment and another user experiencing telepresence based on a real-time
binaural recording.
The proposed scheme makes possible a number of different application ideas. One of them is illustrated in Fig.
5. The person on the left hand side is using the WARA
system introduced earlier and the person on the right at
a remote location can experience telepresence based on
direct transmission of first user’s pseudo-acoustic environment. For the user on the right, the ARA mixer would
combine the local pseudo-acoustic environment with remote pseudo-acoustic environment to produce a specific
type of augmented audio reality experience.
2.6 Listening test
A preliminary listening test was carried out in order to
evaluate the level of performance of the WARA system in
virtually imitating localized pseudo-acoustic sound. The
test was carried out in a standardized listening room and
authors of this paper were used as test subjects. The setup
consisted of a head-supporting chair for the test subject
wearing the WARA headset, and a loudspeaker not visible to the subject. A similar test setup has been used earlier in studies on externalization of virtual sources, e.g.,
in [6]. The listening test system is illustrated in Fig. 6
In the beginning of the test session, an impulse response
from the loudspeaker to the WARA headset microphones
in the test subject’s ears was measured. The binaural
impulse response contains both the subject’s individual
head-related impulse response in relation to the location
of the loudspeaker, as well as the room response. Two
types of test signals were then played for the test subject
to compare. The pseudo-acoustic signals were acousti-
HRIR L
HRIR R
Fig. 6: The listening test configuration. Head-related
Room Impulse Responses (HRIR) were measured for
each listener. The HRIRs were then used to synthesize
test signals for headset playback. In a listening test,
test signals were switched between loudspeaker and earphone playback.
cally dry signals played from the loudspeaker. The virtual test signals were the same signals filtered with the
measured binaural response in order to auralize them.
The test was a two-alternative forced choice test, where
the subject was asked which one of the signals is played
from the loudspeaker.
As both types of the test signals are perceived through
the earphones, hypothetically no difference in the signals should be audible. On the other hand, a perceivable
difference may occur due to the direct sound leaking into
the ear past the headset system in the case of loudspeaker
playback, and also the low frequency partials being conducted to the ear by bone. In particular, the model II
headset attenuated direct sound leaking into the ear by
only a few decibels. In the listening test, the level of the
pseudoacoustic environment in listeners ears was 3-5 dB
higher than would be without the headset. The listening test results show that indeed pseudo-acoustic sound
can be almost indistinguishably imitated. The test subjects were able to recognize or guess the source of the
sound correctly only in 68% of the cases (pure guessing
gives 50 %). The sources of the samples with broader
frequency band were easier to distinguish, whereas dis-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
6
HÄRMÄ ET AL.
tinguishing the sources of simple instrument sounds or
speech was almost impossible. It seems that the pseudoacoustic sound masks the leaking direct sound very
efficiently.
The test was carried out using the WARA model II headset. Similar results are obtained with the headset model
I with a limited number of listeners. A more extensive
listening test on mixing of pseudo-acoustic and virtual
sound environments is currently being prepared.
3
TAXONOMY AND OVERVIEW OF PREVIOUS APPLICATIONS
Different types of applications can be categorized in
many ways. Some cases are clearly examples of communications while in some cases they can be seen as information services. Another meaningful division is to
consider a particular application to represent human-tohuman or human-to-machine communications. Another
important division is in the way how virtual audio events
are connected to the real environment. In some cases created virtual events are localized and in other cases freelyfloating. In the following, ideas and applications have
been categorized into three different classes.
3.1 Speech communications
It can be expected that speech communications will remain as one of the most important applications for wearable augmented reality audio systems. However, it may
take many different forms. As a communications application Figure 1b would show the case where a remote
talker is represented to a user using techniques of binaural auralization. Fig. 4 b shows the case of augmented
reality audio where the virtual source is mixed with the
pseudo-acoustic environment around the user. Conceptually this can be interpreted as a situation where the
remote talker has been rendered to user’s local environment. Figure 5 illustrates a quite different scenario where
the user on the right hand side is virtually transported to
the location of the first talker. This is called telepresence.
The idea of telepresence was approached already in
1933, when binaural hearing was examined by letting
test persons with two receivers listen to sound examples through microphones in a dummy head’s ears, the
dummy head being in another room [14].
Cohen et al. introduced a system where a robot puppet was controlled from a distance by a pilot wearing
effectors corresponding to the puppet’s sensors, so that
the puppet could venture into hazardous environments
WEARABLE AUGMENTED REALITY AUDIO
like fires or toxic waste dumps [15]. The relationship
between acoustic telepresence and virtual reality presentations was examined, considering virtual reality as the
perception of the sound field by the dummy head, yielded
by HRTF filtering. Also the user dynamically selecting
the transfer functions and thus placing the sound sources
himself was experimented. No head-tracking was used.
Visiphone [16] is an "augmented home" system, a telephone augmented with graphics, a system that can be
used to enable a casual hands-free limitedly mobile conversation over distance with a visual reminder, feedback
system and a focal point for directing speech. In some
sense the idea could be defined as an audio version of
internet relay chat.
Another speech communications application is teleconferencing. An application of teleconferencing, a spatialized stereo audio conferecing system was designed by J.
Virolainen [17]. The advance of being able to virtually
localize the other attendees around the user helps the user
to separate the different speakers making the conference
more natural and easier to lead among multiple attendees. The naturalness and accuracy of the perception of
directions are significantly increased when adding headtracking to the system. It is also possible to extend the
use of the application to wearable augmented reality teleconferencing.
3.2 Localized acoustic events connected to real
world objects
The idea here is to immerse in one’s normal audio scene
additional audio information of the real environment,
e.g., alerting sounds, instructions, even advertisements.
The point is in connecting them to real world objects by
auditorily localizing them. Superimposing synthetized
sound on top of the real audio scene instead of isolating
one from the real world by building a virtual audio reality
was originally proposed by M. Krueger [18].
The LISTEN project [19] reports on augmenting the
physical environment through a dynamic soundscape,
which users experience over motion-tracked wireless
headphones. The application experiment promotes a type
of information system for intuitive navigation of visually dominated exhibition space, in this experiment a
museum. The augmenting part of the application consists of speech, music, and sound effects creating a dynamic soundscape including exhibit-related information
as well as "the right atmosphere" immersed individually in the visitor’s auditory scene based on his/her spa-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
7
HÄRMÄ ET AL.
tial behaviour and also on preliminarily entered information. The experiment includes a pedagogical as well as
an artistic approach.
The idea of an automated tour guide was already proposed by Bederson [20] to replace the taped tour guides,
which were found even obtrusive considering normal socializing. Bederson’s prototype application included a
random access digital audio source, a microprocessor
to control the audio source, an infrared receiver to tell
the processor the location of the viewer, and an infrared
transmitter in the ceiling above each object to tell the system to play the pre-recorded descriptions of the piece,
and also to stop the playing when the subject moved
away.
A spatial auditory display was evaluated by air crews on
simulated night mission flights [21]. The ordinary visual
display was compared to one augmented with spatial auditory display. The results indicate that an improved auditory system would improve navigation.
An aiding navigation system for visually impaired is considered in [22], where the person’s visual environment is
registered by a camera and transformed to a sound scene
providing information of obstacles or possible danger.
The equipment used is a camera, a portable computer,
and a pair of earphones.
A method to teach visually impaired persons orientation
and mobility skills has been described by Inman [23],
where simulated environments are used to train sound
identification, localization and tracking sound sources.
The simulation was implemented on a performance platform of a Pentium-based computer, headphones, a headtracker and a joystick. As the student moved his or her
head, the virtual environment sound source or sources
changed accordingly. The different training tasks included identifying and reacting to the simulation sound
sources in a particular way, completing an action like
crossing a street in the virtual environment and completing an action by getting audio feedback from the student’s own actions like walking along a line. The environment was also displayed on a screen so that the trainer
can follow the student’s actions.
The KnowWhere System is a method to present geographic information to blind people [24] by using a
touchable virtual map surface recorded by a camera and
transferring the features touched in the map to signals
describing the objects.
WEARABLE AUGMENTED REALITY AUDIO
Guided by Voices [25] is a game application that uses
audio augmented reality system to provide locationtriggered context-related playback sounds that inform the
player. The sounds depend on the current location and
also the history of the locations of an individual player.
The system is based on wearable computers and an RF
based location system with transmitters placed in the environment.
Another game based on auditory display system is Sleuth
[26], a game where the player is set in the middle of a
crime scene where he is supposed to use the information
given by the auditory clues and determine the course of
the crime and find out the one quilty.
3.3 Freely-floating acoustic events
The concept of freely-floating acoustic events, compared
to localized events, are events that are not connected to
objects in the subject’s physical environment. Typically
this means that the only anchor point relative to which the
event is localized is user’s head. Potential applications
are information services such as news, calendar effects,
or announcements, or many different forms of entertainment such as music listening.
A parade example of a freely-floating spatial audio event
is in the DateBook application introduced in [27]. Here
the calendar application of a PDA device was remapped
to a spatial audio display of calendar events. Calendar
entries at different times were rendered around the user
so that noon appeared at the front of the user, 3 pm at the
right and 6pm behind the user.
The Nomadic Radio project [28, 29] presents an idea of a
wearable audio platform that provides access to all nowadays normal desktop messaging and information services
as well as an awareness of people and background events
situated elsewhere in a location of individual interest.
The audio messaging and information services, such as
email, voice mail, news broadcasts and weather reports,
are downloaded to the device throughout the day and
delivered to the user as synthesized speech. The messages are localized around the user’s head based on their
time of arrival. Different categories of messages can
also be assorted to different levels of urgency or interest,
and localized to different distances away from the head.
The awareness of a certain remote location is based on
the Coctail Party Effect [30], meaning that one is able
to monitor several audio streams simultaneously, selectively focusing on one and leaving the rest in the background. This is even easier if the streams are segregated
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
8
HÄRMÄ ET AL.
by localizing them away from each other. The device
chosen as the Nomadic radio is a wearable shouldermounted audio-only device. The messages are broadcasted as spatial audio stream and synthesized speech.
The navigation and controlling of the system is done by
using a set of speech commands.
MacIntyre et al. [31] considered a combination of auditory and visual displays to provide both foreground interaction and background information and on the other
hand both public and personal information. The focus
is in developing new user interface paradigms and software, and creating testable prototype applications. The
current audio part of the system is Audio Aura [32]. Audio Aura is a distributed system that employs a wearable active badge to transmit the individualized information to wireless headphones. The goal is to provide the
user with auditory cues on other people’s physical actions in the workplace, such as information on how long
ago a certain person has left a room or where he is at
the moment. The system detects the traffic and updates
a database. Certain changes in the database result the
auditory cues to be sent.
One aspect of localizing acoustic events is sonification.
It is mostly understood as an additional, supporting dimension of visual interfaces to help navigation or performing tasks. Sonification as a topic has been studied
extensively. The concept in relation to a prototype computer graphics extension has been studied by V. Salvador
et al. [33] and an experiment with mobile phone interface was carried out by S. Helle et al. [34]. In the computer graphics system study it was found that there are
even more ways to implement sonification in the system
than was predicted. In the mobile phone experiment the
sonification was found both irritating and also helpful in
some occations.
3.4 Other topics
In relation to wearable augmented audio equipment there
are some interesting new aspects concerning hearing aid
design. To reduce noice amplified along with the desired
signal (usually speech), some new techniques have been
introduced. Perceptual Time-Frequency Subtraction algorithm presented by Min Li et al. [35] is developed to
simulate the masking phenomena and thereby enhance
the speech signal, and it is meant to be applicable both
in digital hearing aids and portable communication systems.
Individualizing spatialized audio displays is researched
WEARABLE AUGMENTED REALITY AUDIO
by P. Runkle et al. [36] and a system of active sensory
tuning, AST, is developed, where subjective preferences
are possible to set.
4
THE WARA SOFTWARE SYSTEM
A generic block diagram of the WARA system sketched
in Section 2 is shown in Fig. 7. The signals entering
user’s ears are composed of pseudo-acoustic input signals captured with microphones located close to the ears,
and virtual sounds which may be speech signals from
remote talkers or some other signals, such as recorded
or synthesized announcements, advertisements, instructions, warnings, or music. Mixing of the two compounds
is performed in the Augmented Reality Audio (ARA)
mixer. The network shown above the head in Fig. 7
is used to produce the output signal of the system. Microphone signals are recorded and some preprocessing
may be applied to produce a monophonic signal which
is, in a communications application, transmitted to another user. Alternatively, the binaurally recorded signal
(remote pseudo-acoustic signal for the other user, see,
Fig. 5) could be transmitted in its original form. Both
options have been implemented in the current software
system.
The WARA software system is written in C++ and is (almost) completely built over the Mustajuuri framework.
The Mustajuuri is a plugin-based real-time audio signal
processing software package which is designed for lowlatency performance [37]. The different configurations
required by applications are constructed by adding plugins into a mixer console window (Figure 8) of the Mustajuuri program. The Mustajuuri plugin system allows
different configurations to be built in a very quick and
flexible way.
5
WARA APPLICATIONS
Different application schemes have been tried using
the plugins of the WARA software system as building
blocks. In this section these applications are overviewed
and discussed.
5.1 Binaural telephone
The basic application of the WARA system is a binaural telephone. It is a two-way voice-over-IP communication scheme with binaural signals that WARA transduders provide. Binaural telephone enables a kind of telepresence instead of normal telephone, since the whole 3-D
soundscape is transmitted to the receiver end. The binaural signals are transmitted via UDP protocol in its orig-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
9
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
PREPROCESSING FOR
TRANSMISSION
INPUT
SIGNAL
OUTPUT SIGNAL
ARA
ARA
MIXER
MIXER
ACOUSTICS
MODEL
AURALIZATION
HRTF’
LOCATION AND ORIENTATION
ACOUSTICS PARAMETERS
Fig. 7: A generic system diagram applicable for the majority of augmented reality audio applications studied in
this report.
inal form to the remote end of communication. In current implementation no compression is applied to binaural signals, but for more efficient transfer some compression could be utilized.
In informal usability testing it was found that implemented binaural telephone really gives a convincing feeling of telepresence. The remote end soundscape is naturally transmitted to the other user. It is easy to get confused on which people actually are at which room, if
there are other people besides the ones wearing the headsets in both ends. However, when utilizing the system in
a meeting (only one user wearing WARA headset) an obvious shortcoming was discovered: the user at the remote
end gets frustrated when he or she cannot talk to the other
persons not wearing the headset at the other end of communication. Another problem in such a multiuser situation is that the volume level of the local speaker tends to
be much louder than the level of the other speakers in the
room.
5.2 Speech communications with headtracked auralization
When applying the WARA system as a binaural telephone the remote talker is localized inside the user’s
head. This does not sound very natural and it would
Fig. 8: Mustajuuri mixer console with two-way speech
communications application.
be better if the remote talker can be positioned to a certain place in user’s auditory environment. To position a
remote talker, a speech communication application with
headtracked auralization has been implemented. This application is similar to the binaural telephone, but it contains also plugins for head-tracked or non-head-tracked
auralization for the incoming audio signal. The binaural signal is transmitted via UDP protocol in its original
form and then converted to a mono signal and finally auralized with HRTFs. An example of such an application,
implemented with Mustajuuri [37] is illustrated in Fig. 8.
The remote talker can be positioned outside the user’s
head by means of room acoustic modeling and HRTF
filtering. Room acoustic modeling would consist of the
synthesis of a number of early reflections and late reverberation.
In the current version, the WARA auralization has been
implemented by applying a well-known direct sound,
early reflections and statistical late reverberation concept
(see, e.g., [10]). For auralized sound source (a remote
talker) the direct sound and 14 early reflections, six for
the first-order and eight for the lateral plane second-order
early reflections are calculated from a simple shoe-box
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
10
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
room model with user-adjustable wall, floor, and ceiling
locations. Each of these reflections is panned to correct direction using FIR filters, the responses of which
have been matched to the minimum-phased HRTF functions. The applied filter orders for each of the three virtual sound source groups (direct sound, 1st and 2nd order
reflections) can be adjusted individually and the computational burden can thus be reduced [38]. Informal listening tests suggest that the early reflections can be rendered
with less filter taps than the direct sound without degrading the spatial impression.
In addition to the HRTF filtering, each early reflection is
filtered with material absorption. Such filtering smooths
out the comb-filter-like coloration caused by successive
summing of an identical sound source - the fact that
1/r-law attenuation is implemented and the echoes come
from different directions means that the summed sounds
are not exactly identical, but close enough for some unnatural coloration to take place. In current version of the
WARA auralization the material filters are first order IIR
filters with coefficients calculated from given absorption
coefficients at octave bands [38]. The most important
thing with the material filtering is just to get somewhat
realistic low-pass filtering for the reflections. The early
reflections and their parameters are controlled with a plugin illustrated in Fig. 9.
The early reflections, although they are computed from a
simplified geometry, help in externalization of the virtual
sound sources, but the auralized sound is still somewhat
unnatural. For even better results, some diffuse late reverberation is added to match the length of the reverberation with the reverberation of the pseudo-acoustic environment. The applied reverberation algorithm [39] is
a variant of feedback delay network (FDN) algorithm.
The parameters for the artificial reverberation (reverberation time etc.) are chosen manually by analysing the reverberation of the pseudo-acoustic environment. A more
sophisticated approach would be to estimate parameters
automatically from the binaural microphone signals (see
section 6.4).
In cases where virtual sources are localized in the environment both room modeling and binaural synthesis
need additional information about the orientation and location of the user. Orientation and location information
of a user can be estimated using some head-tracking device. A limitation in many current technologies (see Section 6) is the need of an external device placed in the en-
Fig. 9: Control window of the HRTF auralize plugin.
vironment. Those can be used only in a close range and
are therefore basically limited to applications where an
augmented sound source is localized to some real world
object which could host also a transmitter of a headtracking system.
5.3 Auditory Post-it application
The WARA framework enables also different locationbased applications. One such application is an auditory
Post-it, in which short messages can be attached to certain objects or messages can be left to certain places.
With current implementation of auditory Post-it, it is possible to leave and receive five messages. However, with
small adjustments to the code it would be possible to
compile versions with any number of messages. A message is left by recording it to one of the files found from
the common soundfile directory. The coordinates of the
message(s) are given in a text file. Messages are received
by selecting command Get new messages, see Fig. 10.
After this by selecting Play all messages once, Play all
messages repeatedly or Play selected message it is possible to listen received messages. If the last one: Play
selected message is chosen, user has also to select a mes-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
11
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
6
OVERVIEW OF RELATED TECHNOLOGY
6.1 Headsets
Fig. 10: Auditory Post-it application
sage to play. All these commands are controlled using a
GUI (Fig. 10). It is also possible to change the coordinates of the reference point (The coordinates of messages
given in the text file are referenced to this point) and the
size of the message area (radius). The localization of the
user is done by a head-tracker which sends the coordinates of the user to the application. When user, wearing
a detector, is inside a message area, the message is played
if selected command allows that.
The current version of the auditory Post-it application
works well when the messages are located not more than
1-1.5 m from the head-tracker. However, with better
tracker it should be possible to detect messages from a
larger area. The next improvement to the auditory Post-it
application will be the implementation of the possibility
to leave a message while an application is running. In the
current version messages have to be defined beforehand.
5.4
Calender application
The last implemented application is a 3-D audio calendar reminiscent, based on the idea presented by Walker
et al. [40]. The user can record calendar entries and
place them into directions corresponding to the time of
the day: 9 A.M. at -90 degrees azimuth (left of the user),
noon at 0 degrees, three o’clock at +90 degrees (right of
the user) and so on. The calendar application was implemented in two versions: head-tracked and non-headtracked. The head-tracked version spins the calendar entries around a reference point (which is specified relative
to the transmitter of the head-tracking system) and the
non-head-tracked version obviously has the user’s head
as the only reference point.
The calendar application seems to work quite well, even
though it certainly would help to have an easier interface
to it, some kind of external controller device with which
the user could browse through the calendar entries.
In order to send an acoustic environment from one place
to another and then make it audible there is a natural
need for microphones to record the environment or the
communication and transducers to make the sound audible. As the system is aimed to be worn all the time, the
transducers have to be small and light. Nowadays some
people are wearing headphones throughout the day to listen to music. Adding a microphone to headphones could
then be a logical way to make a two-way transducer system to record and reproduce the sound.
For augmented reality purposes it would be ideal if the
surrounding and the augmented acoustic environment
could be totally controlled with the headset. To achive
this the headset should block the surrounding sounds
as effectively as possible and simultaneously play the
acoustic environment back to the user. With headsets
there are practically two ways to attenuate surrounding sound: in-ear passive earplugs and active noise cancelling headphones. Traditional passive hearing protectors would be effective but rather unpractical for this matter. There are various application-dependent headset systems available in the market. Most models rely on passive background noise rejection caused by tight fit of the
headset inside the ear canal - though working like a traditional earplug. To get more attenuation at low frequencies some models have added electronics to cancel out
background noise. There is a small microphone in the
earplug and the signal it detects is fed back to the earplug
in inversed phase. In some more advanced models there
are two microphones and a separate DSP unit for processing the signals. The microphone signal is not available as an output in these headset systems. On the other
hand, there are systems where the microphone signal is
available either for communication or recording use. Table 1 lists some of the headsets available in the market,
that have some relevance to the system considered in this
article.
6.2 Head-trackers and location devices
Many of the applications discussed in this paper are
based on knowledge of the location or orientation of the
user. In the following presentation we roughly follow the
taxonomy introduced in a recent review article [41].
In WARA applications, tracking of user’s location and
orientation takes place at least on two levels. First, in
some applications we may be interested in the global po-
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
12
HÄRMÄ ET AL.
sition of a user, where the system is aware of the physical coordinates, such as in using the Global Positioning
System (GPS), or the emerging European Galileo positioning system. Global information is typically needed
in applications in which augmented acoustic events are
localized to known real world objects (such as the Central train station, the parking lot of movie theater, or the
department copying machine). An overview of different
techniques operating at global, metropolitan, building, or
room level are given in [41]. Global position information
is needed to launch an application related to some object.
In this article we put more emphasis on the problem of
head-tracking. Estimates on the short-range relative location and orientation of the user’s head are needed for
successful augmentation of the acoustic environment.
The three major techniques for location-sensing are triangulation, proximity and scene analysis. Triangulation
can be done via lateration, which uses multiple distance
measurements to know points or via angulation, which
measures bearing angle relative to points with known
separation. In proximity technique the distance to a
known set of points is measured and in scene analysis
a view from a particular vantage point is examined.
GPS The Global Positioning System (GPS) is the most
widely publicized location-sensing system at the moment. GPS is a satellite-based navigation system made
up of a network of 24 satellites. The system is governed
by the U.S Department of Defense. GPS satellites circle
the earth twice a day in a very precise orbit and transmit signal information to earth. GPS receivers take this
information and use triangulation to calculate the user’s
exact location. Essentially, the GPS receiver compares
the time a signal was transmitted by a satellite with the
time it was received. The time difference tells the GPS
receiver how far away the satellite is. With distance measurements from a few satellites, the receiver can determine the user’s position. The accuracy of the system is
around 1-5 meters. The system doesn’t work reliably indoors. [42]
GALILEO The Transport Council of the EU started
GALILEO project in 1999. The system is a satellitebased global positioning system and is inter-operable
with other systems such as GPS and UMTS. The
GALILEO will cover all European states. The services
offered are divided in three distinct services: General-
WEARABLE AUGMENTED REALITY AUDIO
Purpose, Commercial and Public-Utility Services. The
General-Purpose Services will be available for everybody without any need for authorisation. The accuracy
will be 5-30 meters depending on technology. The Commercial Services provide an accuracy of 1-10 meters and
are available as added value services on payment of a fee.
The Public-Utility Services will provide a highly secure
and accurate services for safety-of-life and other critical
applications. The accuracy will be 1-6 meters. By the
end of 2007, GALILEO should be fully operational.[43]
Active Badges Active Badge is an indoor locating system developed at the AT&T Cambridge. The system consists of a central server, fixed infrared sensors around the
building and badges worn by the users. The badges emit
a globally unique identifier every 10 seconds or on demand. A central server collects this data and determines
the absolute location of the badge by using a diffuse infrared technique. The system can only tell the room
where the badge is. On the other hand, the badge may
have globally unique ID (GUID) for recognition. Sunlight and fluorescent light interfere with infrared.[41]
Active Bats Also developed by the AT&T researchers,
the Active Bat location system uses an ultrasound timeof-flight lateration technique to provide a physical location of the Active Bat tag. The tags emit ultrasonic
pulses, synchronized by radio frequencies, to a grid
of ceiling-mounted sensors. The distance measurement
data is forwarded to a central unit that calculates the position. The accuracy is 9 cm at 95 percent of the measurements. The tags can have GUID’s for recognition.[41]
Cricket Instead of central computing, the Cricket Location Support System lets the user calculate the location. The infrastructure consists of infrared transmitters
and the user objects act as receivers. Radio frequencies
are used for synchronization and to delineate the time
region during which the sounds should be received by
receiver. A randomized algorithm allows multiple uncoordinated beacons to coexist in the same space. Cricket
implements both the lateration and proximity techniques.
The user can be located in around an area of 1.2 x 1.2
square meters within a room. Because all the computation is done at user side the power issues in wearable
devices may become a burden.[41]
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
13
HÄRMÄ ET AL.
RADAR RADAR, developed by the Microsoft Research group, is an RF-based location system using the
IEEE 802.11 WaveLAN wireless networking technology. At the base station, the system measures the signal
strength and signal-to-noise ratio of signals sent by wireless devices. This data is then used in computing the 2D
position within the building. There are two implementation for RADAR: one uses scene analysis and another
uses lateration. In scene analysis the accuracy is around
5 meters and in lateration 4.3 meters in 50 percent of the
measurements. The system requires at least three base
stations per floor. There are also some companies offering 3-D indoor position tracking. [41]
SpotON SpotOn is an example of an ad hoc application where no infrastructure is implemented. The system
implements ad hoc lateration with low-cost tags. SpotON uses radio signal attenuation to estimate intertag distance. Repeating the measurement with a few tags provides more accuracy. The achieved accuracy depends on
the amount of tags. Ad hoc systems could be made to
communicate with other, fixed, location systems to yield
more accurate and usable positioning.[41]
6.3 Head-trackers
Head-tracking technologies can be roughly divided into
acoustic, electromagnetic, inertial, mechanical, and optical tracking. Any of these techniques can be used to
make a 6-degrees-of-freedom tracker (x, y, and z for position and yaw, pitch, and roll for orientation).
Acoustic tracking uses stationary microphones, and one
or more movable ’user’ elements. The user’s movable
elements emit high-frequency sound that is detected by
a collection of stationary microphones. By measuring
the time for sound to travel from transmitter to receivers
it is possible to calculate the distance. The orientation
is achieved by using multiple sensors. Because it takes
some time for sound to travel from the transmitter to the
sensors, the time-of-flight trackers suffer from slow update rate. Also, the speed of sound is affected by the
enviroment variables (temperature, pressure,..) and this
may occasionally cause problems. Acoustic tracking can
also be utilized by using so called phase coherence tracking. In phase coherence tracking the stationary sensors
detects the phase difference between the signal sent by
the user emitter and a stationary emitter. As long as the
distances are shorter than the wavelenghts used, the distance can be determined. By using multiple sensors the
WEARABLE AUGMENTED REALITY AUDIO
orientation can also be calculated. The phase coherence
periodically updates the position and over some time error may accumulate in the location.
In electromagnetic tracking, a stationary element emits
a pulsed magnetic field. A movable sensor attached to
the user senses the field and reports the position and orientation relative to the source. Electromagnetic tracking
can be accurate to within 2.5 mm in position and 0.1 degree in rotation, although accuracy deteriorates with distance from the source. Electromagnetic tracking is also
susceptible to interference from metallic objects in the
environment.
Inertial tracking devices represent a different mechanical approach, relying on the principle of conservation
of angular momentum. These trackers use a couple of
miniature gyroscopes to measure orientation changes. If
full 6-DOF tracking ability is required, an inertial tracker
must be supplemented by some position tracking device.
A gyroscope consists of a rapidly spinning wheel suspended in a housing. The mechanical laws cause the
wheel to resist any change in orientation. This resistance
can be measured and converted into the yaw, pitch, and
roll values. Inertial trackers sometimes suffer from drifting from the exact location - an error that accumulates
over time. Some kind of occasional position verification
would fix this problem.
Mechanical tracker devices measure position and orientation by using a direct mechanical connection between
a reference point and a target. This could be done for
example by an arm that connects a control box to a headband. The sensors at the joints measure the change in
position and orientation with respect to a reference point.
Mechanical trackers in general are very fast and accurate.
Optical trackers come in two variants. In the first one,
one or several cameras are mounted on top of the headset, and a set of infrared LEDs is placed above the head
at fixed locations in the environment. In the alternative
setup, the cameras are mounted on the ceilings or a fixed
frame, and a few LEDs are placed at fixed and known
positions on the headset. In both approaches, the projections of the LEDs on the cameras image planes contain enough information to uniquely identify the position
and orientation of the head. Various photogrammetric
methods exist for computing this transformation. Optical
trackers in general have high update rates and sufficiently
short lags. However, they suffer from the line of sight
problem, in that any obstacle between sensor and source
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
14
HÄRMÄ ET AL.
seriously degrades the tracker’s performance. Ambient
light and infrared radiation also adversely affect optical
tracker performance.
Table 2 lists some headtrackers available in the market.
6.4 Estimation of room acoustical parameters
Acoustic parameters, such as amount of reverberation
and geometric properties of the real environment around
the user, may be needed for successful augmentation
of the pseudo-acoustic environment. There are basically two mechanisms for acquiring this information. In
some applications this information can be derived offline and it would be available for the system as information related to that particular location. This would
rely on a global localization system and an extensive
database. This is basically the approach taken in the LISTEN project [19]. A more generic alternative would be
based on on-line estimation of acoustic parameters from
microphone input signals. The latter approach may be
more suitable for wearable devices.
7
CONCLUSIONS
In this article we presented an overview of the concept of
wearable augmented reality audio (WARA). We defined
the idea of WARA and compared it to previous works
on virtual and augmented reality audio. Different types
of applications were discussed and compared in terms of
their signal processing and hardware requirements. We
also presented an overview of related transducer technology and localization and headtracking devices available
at the time of writing this article.
In the case of open ears a listener may perceive the
acoustic environment in its natural form. When wearing a headset with binaural microphones and earphones
the user is exposed to a modified representation of the
acoustic environment around the user, which is called
here pseudo-acoustic environment. A traditional example of a pseudo-acoustic environment is that of a person
wearing a binaural hearing aid device. Virtual sound environment is a synthesized binaural representation or a
binaural recording. In augmentation of user’s acoustic
environment virtual sound objects are rendered on the
natural, or a pseudo-acoustic representation of the sound
field around the user. In this article, we focused on the
latter case where virtual sound objects are combined with
a pseudo-acoustic representation of the acoustic environment around a user. This is obtained using specific headset where miniature microphone elements are integrated
into small earphone elements. Pseudo-acoustic represen-
WEARABLE AUGMENTED REALITY AUDIO
tation is produced by routing microphone signals to the
earphones.
We argue that it is technically and perceptually easier to
produce a convincing fusion of virtual sounds and the
acoustic environment around a user if virtual sounds are
embedded into a pseudo-acoustic representation than the
natural acoustic representation with open ears. In preliminary listening test results presented in this article we
found that for an idealized case of using in-situ HRIRs in
producing virtual sound objects even experienced listeners typically cannot discriminate virtual sounds from test
sounds coming from loudspeakers placed behind a curtain in a listening room. It even turned out that the difference in results is small between earplug type earphones
which block the ear canal, and conventional in-the-ear
earphones which attenuate direct sounds by only a few
decibels. Users also reported relatively fast adaptation to
the pseudo-acoustic representation. Results are encouraging and suggest that the proposed transducer system
may provide a potential framework for the development
of applications of wearable augmented reality audio.
Finally, we introduced a system developed by the authors for testing of different applications and techniques
of WARA. It is based on a specific headset configuration
and a real-time software system running on a Linux machine.
8
ACKNOWLEDGEMENTS
A. Härmä’s work was partially supported by the Graduate School GETA and the Academy of Finland.
9
REFERENCES
[1] R. Held, “Shifts in binaural localization after prolonged exposures to atypical combinations of stimuli,” J. Am. Psychol., vol. 68, 1955.
[2] A. M. Turing, “Computing machinery and intelligence,” Quaterly Rev. Psychol. Phil., vol. 109, October 1950.
[3] J. Blauert, Spatial Hearing: The psychophysics of
human sound localization. Cambridge, MA, USA:
The MIT Press, 1999.
[4] G. Plenge, “On the differences between localization and lateralization,” J. Acoust. Soc. Am., vol. 56,
pp. 944–951, September 1972.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
15
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
[5] B. M. Sayersu and E. C. Cherry, “Mechanism of
binaural fusion in the hearing of speech,” J. Acoust.
Soc. Am., vol. 29, pp. 973–987, September 1957.
environments,” in Proc. of IEEE International
Workshop on Robot and Human Communication,
pp. 361–364, IEEE, July 1993.
[6] W. M. Hartmann and A. Wittenberg, “On the externalization of sound images,” J. Acoust. Soc. Am.,
vol. 99, pp. 3678–3688, June 1996.
[16] J. Donath, K. Karahalios, and F. Viégas, “Visiphone,” in International Conference on Audio Display April 2000, ICAD, April 2000.
[7] N. Sakamoto, T. Gotoh, and Y. Kimbura, “On outof-head localization in headphone listening,” J. Audio Eng. Soc., vol. 24, pp. 710–715, 1976.
[17] J. Virolainen, “Design and implementation of a
stereo audio conferencing system,” Master’s thesis,
Helsinki University of Technology, 2001.
[8] T. Lokki and H. Järveläinen, “Subjective evaluation of auralization of physics-based room acoustics modeling,” in Proc. Int. Conf. Auditory Display, (Espoo, Finland), pp. 26–31, July 2001.
[18] M. W. Krueger, Artificial Reality II.
Wesley, 1991.
[9] B. G. Shinn-Cunningham, N. I. Durlach, and R. M.
Held, “Adapting to supernormal auditory localization cues. II constraints on adaptation of mean response,” J. Acoust. Soc. Am., vol. 103, pp. 3667–
3676, June 1998.
[10] D. R. Begault, 3-D sound for virtual reality and
multimedia. New York, USA: Academic Press,
1994.
[11] J. Huopaniemi, Virtual Acoustics and 3-D Sound
in Multimedia Signal Processing. PhD thesis,
Helsinki University of Technology, Laboratory of
Acoustics and Audio Signal Processing, Espoo,
Finland, 1999. Report no. 53.
[12] L. Savioja, Modeling techniques for virtual acoustics. PhD thesis, Helsinki University of Technology, Telecommunications Software and Multimedia Laboratory, Espoo, Finland, 1999. Report
TML-A3.
[13] T. Lokki, Physically-based Auralization - Design, Implementation, and Evaluation. PhD thesis, Helsinki University of Technology, Telecommunications Software and Multimedia Laboratory, report TML-A5, 2002.
Available at
http://lib.hut.fi/Diss/2002/isbn9512261588/.
[14] H. Fletcher, “An acoustic illusion telephonically
achieved,” Bell Laboratories Record, vol. 11,
pp. 286–289, June 1933.
[15] M. Cohen, S. Aoki, and N. Koizumi, “Augmented
audio reality: Telepresence/vr hybrid acoustic
Addison-
[19] G. Eckel, “Immersive audio-augmented environments: The listen project,” in Proc. of Fifth International Conference on Information Visualisation,
pp. 571–573, IEEE, 2001.
[20] B. Bederson, “Audio augmented reality: A prototype automated tour guide,” in Human Computer in
Computing Systems, pp. 210–211, ACM, 1995.
[21] R. D. Shilling, T. Letowski, and R. Storms, “Spatial auditory displays for use within attack rorary wing aircraft,” in International Conference on
Auditory Display April 2000, (Atlanta, Georgia,
USA), ICAD, April 2000.
[22] N. Belkhamza, A. Chekima, R. Nagarajan,
F. Wong, and S. Yaacob, “A stereo auditory display
for visually impaired,” in IEEE TENCON 2000,
pp. II–377–382, IEEE, 2000.
[23] D. P. Inman, K. Loge, and A. Cram, “Teaching orientation and mobility skills to blind children using computer generated 3-d sound environments,”
in International Conference on Auditory Display
April 2000, ICAD, April 2000.
[24] M. W. Krueger and D. Gilden, “Knowwheretm: An
audio/spatial interface for blind people,” in Internationl Conference on Audio Display November 1997,
ICAD, November 1997.
[25] K. Lyons, M. Gandy, and T. Starner, “Guided by
voices: An audio augmented reality system,” in International Conference on Auditory Display April
2000, (Atlanta, Georgia, USA), ICAD, April 2000.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
16
HÄRMÄ ET AL.
[26] T. M. Drewes, E. D. Mynatt, and M. Gandy,
“Sleuth: An audio experience,” in International
Conference on Auditory Display April 2000, ICAD,
April 2000.
[27] A. Walker, S. A. Webster, D. McGookin, and
A. Ng, “A diary in the sky: A spatial audio display for a mobile calendar,” in Proc. BCS IHM-HCI
2001, (Lille, France), pp. 531–540, Springer, 2001.
[28] N. Sawhney and C. Schmandt, “Design of spatialized audio in nomadic environments,” in International Conference on Auditory Display November
1997, ICAD, November 1997.
[29] N. Sawhney and C. Schmandt, “Nomadic radio:
Speech and audio interaction for contextual messaging in nomadic environments,” ACM Transactions on Computer-Human Interaction, vol. 7,
pp. 353–383, September 2000.
[30] E. C. Cherry, “Some experiments on the recognition of speech, with one and two ears,” J. Acoust.
Soc. Am., vol. 25, pp. 975–979, September 1953.
[31] B. MacIntyre and E. D. Mynatt, “Augmenting intelligent environments: Augmented reality as and
interface to intelligent environments,” in AAAI
1998 Spring Symposium Series, Intelligent Environments Symposium, (Stanford University, CA),
AAAI, March 1998.
[32] E. D. Mynatt, M. Back, R. Want, and R. Frederick, “Audio aura: Light-weight audio augmented
reality,” in The Tenth ACM Symposium on User Interface Software and Technology 1997, (Banff, Alberta, Canada), ACM, October 1997.
[33] V. C. L. Salvador, R. Minghim, and M. L. Pacheco,
“Sonification to support visualization tasks,” in International Symposium on Computer Graphics, Image Processing, and Vision, 1998, pp. 150–157,
SIBGRAPI, 1998.
[34] S. Helle, G. Leplâtre, J. Marila, and P. Laine,
“Menu sonification in a mobile phone - a prototype
study,” in Proceedings of the 2001 International
Conference on Auditory Display, (Espoo, Finland),
pp. 255–260, ICAD, July 2001.
[35] M. Li, H. G. McAllister, N. D. Black, and T. A. D.
Pérez, “Perceptual time-frequency subtraction algorithm for noice reduction in hearing aids,” in
WEARABLE AUGMENTED REALITY AUDIO
IEEE Transactions on Biomedical Engineering
2001, pp. 979–988, 2001, September 2001.
[36] P. Runkle, A. Yendiki, and G. H. Wakefield, “Active sensory tuning for immersive spatialized audio,” in International Conference on Auditory Display April 2000, ICAD, April 2000.
[37] T. Ilmonen, “Mustajuuri - An Application and
Toolkit for Interactive Audio Processing,” in Proceedings of the 7th International Conference on Auditory Display, 2001.
[38] L. Savioja, J. Huopaniemi, T. Lokki, and R. Väänänen, “Creating interactive virtual acoustic environments,” Journal of the Audio Engineering Society,
vol. 47, pp. 675–705, Sept. 1999.
[39] R. Väänänen, V. Välimäki, J. Huopaniemi, and
M. Karjalainen, “Efficient and parametric reverberator for room acoustics modeling,” in Proc. Int.
Computer Music Conf. (ICMC’97), (Thessaloniki,
Greece), pp. 200–203, Sept. 1997.
[40] A. Walker, S. Brewster, D. McGookin, and A. Ng,
“Diary in the Sky: A Spatial Audio Display for
a Mobile Calendar,” in Proceedings of BCS IHMHCI 2001, pp. 531–540, 2001.
[41] J. Hightower and G. Borriello, “Location systems
for ubiquitous computing,” in Trends, Technologies & Applications in Mobile Computing (N. Dimitrova, ed.), special report 5, pp. 57–66, IEEE Computer Society, 2001.
[42] “Gps primer.” http://www.aero.org/publications/
GPSPRIMER/index.html.
[43] “Global satellite navigation services for europe,”
in GALILEO Definition Summary, esa, European
Space Agency, 2000.
[44] “Invisio and bluespoon communication earplugs.”
http://www.nextlink.to.
[45] “KOSS in-ear headsets.” http://www.koss.com.
[46] “Etymotic headsets.” http://www.etymotic.com.
[47] “Okm microphones.”
http://www.okmmicrophones.com.
[48] “Sintef.”
http://www.sintef.no/units/informatics
/RoSandSoS/.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
17
HÄRMÄ ET AL.
WEARABLE AUGMENTED REALITY AUDIO
[49] “Polhemus 3D tracking devices.
” http://www.polhemus.com.
[50] “Motion trackers for computer graphic applications.” http://www.ascension-tech.com.
[51] “Intersense motion trackers.”
http://www.isense.com.
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
18
HÄRMÄ ET AL.
Brand
NEXTLINK.TO [44]
Invisio
Bluespoon
Sony
MDR-NC10
MDR-NC11
MDR-EX70LP
KOSS [45]
ThePlug
Complug
Etymotic Research [46]
ER-4
ER-6
OKM [47]
in-ear microphones
PARAT [48]
Communication earplug
WEARABLE AUGMENTED REALITY AUDIO
Description
The Invisio is an in-ear communication earplug with an internal microphone. The microphone detects the jaw-bone vibration and this way is
fairly insensitive to surrounding noise and other sounds as well. The
plug is wire connected to radio unit.
This is a wireless, bluetooth version of the Invisio. Available only for
the right ear
An in-ear noise cancelling headphone. The earplug fits tightly in ear
canal and attenuates surrounding sound fairly well even without switching the noise-cancelling on. The plugs are a bit bigger than conventional
earplugs. The noise cancelling is done by analog means. The noisecancelling circuitry works fairly well at low frequencies.
This is a newer model of the NC-10. The earplug is much smaller and
lighter. Otherwise similar to the NC-10.
This is a small in-ear earplug. The fits tightly inside the ear canal and attenuates surroundings fairly well, especially at higher frequencies. Similar to the KOSS’s ThePlug.
The plug consists of a foam-like cushion that fills the ear canal and
this way blocks the surrounding sounds. Similar to the Sony’s MDREX70LP.
A communication earplug with a microphone. The earplug is an in-ear
type and background noise rejection is increased with an electric noisecancelling circuitry
A high-end in-ear headphone with a very high background noise attenuation (20-25 dB). The attenuation works passively. There slightly
different models (binaural, power and stereo) for different applications.
Similar to the ER-4 but with little less isolation from the background
noise.
These are in-ear electret microphones for binaural recordings. There is
no headphone in this system.
A communication earplug developed in the research institute SINTEF
(Trondheim, Norway). The earplug consists of a miniature loudspeaker
and two microphones. The other microphone is outside the plug and
the other microphone is located inside the plug. The earplug has also
a built-in microchip for signal processing and a built-in radio unit
for wireless communication. In normal situations, he earplug passes
the surrounding sounds through but when high noise levels occur, the
earplug blocks the surrounding sounds. The users voice can be recorded
via the outside microphone or when in noisy environment, via the inner
microphone.
Table 1: Transducer review
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
19
HÄRMÄ ET AL.
Brand
Logitech
Head-Tracker
Polhemus [49]
FASTRAK
ISOTRAK
Ascension [50]
3D Bird
Flock of Birds
Laser Bird
WEARABLE AUGMENTED REALITY AUDIO
Method
Range
DOF
Other
acoust.
1.6 m
6
max angle 100 degree cone
magn.
magn.
1.2-2 m
1.6-2.3 m
6
6
max. 16 receivers
max two receivers
inertial
magn.
optical
cable length
3m
2m
3
6
6
pciBIRD
magn.
76 cm
6
miniBIRD
Nest of Birds
Motion Star
magn.
magn.
magn.
76 cm
91 cm
up to 3 m
6
6
6
Only orientation, computing in a PC
Up to four sensor
High update rate (240 Hz), max angle ± 55
horizontal and ± 54 vertical to 1.2 m
Small sensors (8 and 22 mm), computing
in a PC
Miniature sensors (10 x 5 x 5 mm)
up to four sensors, USB
Also a wireless version, primary for motion tracking up to five characters with 18
sensors
InterSense [51]
900-series
inertial
20 - 72 m2
6
600-series
inertial/ultrasonic
16 - 50 m2
6
300-series
InterTrax
InertiaCube
inertial
inertial
inertial
cable length
cable length
cable length
3
3
3
Also a wireless version available, up to
four sensors simultaniously
can be used wireless if only orientation is
needed
Only orientation, up to four sensors
Only orientation, USB
Only orientation, computing in a PC
Table 2: Headtrakers
AES 114TH CONVENTION, AMSTERDAM, THE NETHERLANDS, 2003 MARCH 22–25
20
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement