A Weakly Intrusive Ambient Soundscape for Intuitive State Perception

A Weakly Intrusive Ambient Soundscape for Intuitive State Perception
A Weakly Intrusive Ambient Soundscape for Intuitive State
Fredrik Kilander & Peter Lönnqvist
Department of Computer and Systems Sciences, Stockholm University/KTH
Electrum 230, SE-164 40 KISTA, Sweden
{fk, peterl}@dsv.su.se
This paper report on the ongoing work with our WISP, a weakly intrusive ambient soundscape
which conveys information to a recipient by means of subtle sound cues. Our WISP is stationary
in personal or shared locations and does not depend on wearable sound generation. This poses
several interesting problems, especially in shared spaces where the soundscapes from different
users mix, interfere and possibly conflict.
The use of audio cues in computer interfaces has seen a remarkable development in recent years
with the general proliferation of soundcards and sound sources. The audio schemes in the
Microsoft Windows operating systems, for example, offer the association between a rich set of
system events and sounds. The events most commonly encountered are, perhaps, login and logoff
events, notification of new email and the proverbial beep when something goes wrong. All these
events, however, are closely tied to events and changes of state which are either obvious anyway,
and with the unconcealed intent to impound the user’s attention.
In this paper we present WISP, the concept of an ambient soundscape in which events and states in
the computing and physical environment are reflected in the form of non-intrusive audio cues. A
WISP is meant to service a physical space, like a personal office or a common room. The listening
experience is intended to convey intuition rather than interruption; each signal should be
sufficiently non-intrusive to be accepted without disturbing the focus of the task at hand, while
distinctive enough to be separable from the other cues.
The selection of auditory messages that humans choose to process can be seen as depending on
context rather than physical characteristics [1]. At a cocktail party we can follow one conversation,
but as we hear our name mentioned in another conversation our focus is suddenly changed. Thus
we can say that auditory messages that bear meaning are the ones grasping our attention. Meaning
in this sense might not always be mapped to semantics, but might as well be learned. For instance,
when the sound of the telephone is heard, we know that someone probably are calling us. Auditory
messages of this type are often quite disturbing, and it is also their intention to be so, to be able to
grasp our attention. We propose more non-intrusive auditory messages and predict that if we map
meaning to these they might be just as useful without disturbing the user in the same way as a
phone signal might do, and they might even be comfortable to listen to.
A technology that emits sounds intended to be heard by a particular user must solve several
problems in terms of addressing and audibility. For example, what is the dynamic range provided
by the sound reproduction equipment and the natural level of noise in the environment? The issue
gets more even more complicated when several receivers may occupy the same physical area in
which sounds may interfere with each other or be misunderstood by unintended recipients.
There has been a number of interesting efforts in the area of auditory monitoring and signalling.
Gaver, Smith and O’Shea [5] found in the ARKola bottling plant simulation that auditory icons
helped collaboration and diagnosis among workers in a shared and complex task. By virtue of its
many different sounds, the simulation automatically forced the user to place certain sounds in
focus. One result of this was that the simple cessation of a sound did not always convey the
information that the corresponding process had stopped.
Bederson and Druin [6] constructed an automated tour guide which supported ad-hoc paths and
pauses in a museum tour, with personal, auditory addresses at selected exhibits. Like many other
techniques, the tour guide depended on wearable equipment for the reproduction of sound,
whereas a WISP (at least for now) relies on loudspeakers concealed in the physical environment.
Hudson and Smith [7] demonstrated an interesting encoding technique in a system for electronic
mail preview using non-speech audio. In their system, incoming mail was analysed for size,
recipients and content, and then encoded into a sentence of audio symbols, carefully designed to be
non-intrusive yet distinctive. The effort of loading semantics into a language of noises is very
interesting, because human are often adept at learning such languages, just as we learn Morse code
or the semantics of movie music. With learning and adaptation the need to attend a sound is
reduced and this seems to argue for the possibility of loading a WISP with a rich set of semantics,
provided that the learning curve has an acceptable slope.
Mynatt et al [8,9] developed Audio Aura, a system with wearable playback and personal
positioning facilities. The system was designed for task and calendar reminders, email status and
information of the activities of colleagues. In particular, a user departing one location could leave a
lingering ‘audible aura’ which the system would pick up and play back to other users if they
approached the spot in time. Depending heavily on localisation, Audio Aura could provide audio
which was relevant not only in the general context of the recipient, but also semantically
connected to the physical environment.
Ishii et al reports on ambientROOM [2,3,4], a specially designed room in which lighting, air, video
and sounds together form an ambience laden with content. The use of natural, ambient sounds
carrying information makes ambientROOM the earlier work most closely related to WISP.
However, whereas ambientROOM explores many dimensions as conduits for awareness, WISP
only concerns itself with ambient sound.
Sawhney and Schmandt [10,11] have reported on Nomadic Radio, another system in which the
user wears the sound generation equipment. Just as Walker and Brewster [12] they employ
techniques for three-dimensional sounds, by which another dimension of semantics can be
introduced by positioning the perceived sound in relation to the listener’s head. Nomadic Radio is
very sophisticated and hosts a range of features; seven levels from silence, through ambient
background information to insistent signalling. In addition, tactile and spoken commands may be
given to the system via the wearable. Nomadic Radio also monitors the user’s activities and speech
in order to minimise intrusiveness. In relation to a WISP, Nomadic Radio is designed to follow
and interact directly with its user at almost all times. It is much more like a personal assistant
shaped like a collar, while WISP attempts to be a more ghost-like entity. In particular, we do not
intend the WISP for signalling, but for intuition.
A particularly interesting result from Nomadic Radio is that user’s preferred to have ambient
background information continuously audible, as it reassured them that the system was
operational. Ambient sound was also most easily located at the periphery of the awareness.
For our prototype we elected to work with natural sounds. We recognise that human reaction to
sound is individual and culturally motivated, it is therefore important to emphasise that no
particular set of natural sounds is a priori more suitable than others. Still, we believe that a
selection of natural sounding soundscapes can be narrowed down to an environment in which the
WISP is not perceived as noise pollution. In order to provide support for both discrete events and
long-term state intuition, we are interested in both short sounds and backdrops, which can be
conveniently looped without sounding artificial. Short sounds, i.e. sounds with discrete playback
times, can of course be quite long. The sound of rolling thunder, for example, may be of several
seconds duration.
As a starting point we explore the sounds of the forest. These include many possible variations that
include both natural and man-made audio cues. Birdcalls from various species, insects, and the
sound of wind, rainfall and thunder are all immediately available. Man-made sounds include
trains, planes and automobiles passing in the distance as well as church-bells and the occasional
far voice, although the latter two are probably to craving of attention to be non-intrusive.
Before we go on to the usage of these sounds in a WISP, we should recognise other soundscapes
of interest. The sounds of a city provide a powerful backdrop on which to project suitable signals:
traffic, car horns, passing music, dogs etc. Likewise, the seashore, a train station or an airport
should provide interesting avenues of exploration. Common to these man-made environments is
that they are generally much more noisy than a forest, which may work to be counter the intended
effect of the WISP, namely that it should not pollute the natural environment.
In this section we will sometimes assume that availability of a localizer service that can be queried
for the location of people. The WISP concept is no way dependent on such information, but the
particular application we like to use in the example is.
4.1 Co-worker presence
We could, for example, associate the presence of selected co-workers with certain birdcalls and
playback frequencies. The net effect is that as more and more colleagues arrive at work, the WISP
forest fills up with birds. Rather than consciously watching for a particular sound, the listener is
expected to acquire an intuition for the number of co-workers present (and presumably out of
4.2 Email inbox state
The simple arrival of yet another mail is in many cases not reason enough to disrupt ones work.
While the notification may often be welcome, it can also be perceived as intrusive and stressful.
Because of this, we explore the possibility of using the WISP to convey more information about
the waiting mail. A classifier scans the waiting mail and provides the wisp with categories of
email: personal mail, mailing-list submissions and suspected spam. Each category then results in a
different sound cue, which may be either looped or repeated with a particular intensity. The net
result is that the listener is bestowed an intuition of the expected contents of the unread mail, rather
than the interruption of each new message.
The deployment of a WISP in a shared space, like a meeting room or kitchen is escorted with
issues of how to join the preferences of several people when each is to be catered by a single WISP
(again we assume a localisation service which tracks the movement of people within a building).
One immediate conclusion is that users of the shared WISP also subscribe to a shared set of
available notifications. In this way, each notification is associated with a single sound, forcing
listeners to share sound cues. This greatly reduces the possibility that users confuse a certain cue
with different semantics. It also allows the WISP to join events directed at several subscribers into
a single playback event.
Another property of a shared WISP is that it becomes a broadcast environment. With the sound
available for eve ryone to hear it becomes difficult to aim particular cues at a single person. This in
turn makes it important to look at the risk of the WISP as a source of noise pollution. For example,
when a heavy subscriber joins a group that does not subscribe to the services of a present WISP
suddenly, the group may be annoyed by the sudden presence of audio cues. To counter this, the
WISP should allow users to be subscribers to silence as well as sound. For shared physical spaces,
like conference rooms, hallways and kitchens we see basically three alternatives for the coexistence of simultaneous soundscapes.
1. Silence, i.e. sound spaces are banned from shared physical spaces.
2. Shared cues, i.e. only sounds from a common set may be presented.
3. Mixed cues, i.e. each person present in the shared physical space is served with his or hers
personal sounds.
The first alternative, silence, does appear to have its advantages, but will not be lingered upon here
for obvious reasons. There may, however, be a point in designating certain areas as silent,
especially if the sound producing equipment is worn by the user or if the sound space conflicts
with social or safety concerns.
The second alternative, shared cues, is dependent on a standard or set of conventions in which the
semantics of a particular sound are agreed on by most visitors to the location. Unfortunately, the
concept of shared sound cues defeats the original purpose of the WISP, namely that it should be
personal and intuitive. If the semantics of a particular noise is obvious to everyone it cannot be
addressed to a specific person and is transformed into a broadcast message with either existential
or global qualification. For example, if a particular sound in a personal soundscape means that
email is waiting for the user, the corresponding sound in a shared sound space would either mean
that someone among the present has email or that all present has email. The intuition gets lost
without an addressing mechanism.
Assume for the moment that an addressing mechanism is introduced. Each user is given a personal
quality, like a prefix sound followed by a cue from the common set. In this case we have increased
the amount of sounds being heard in order to convey the message. We have also introduced the
possibility that less trained visitors misinterpret the address for the message, and added the
requirement of parsing the sequence ADDRESS, CUE in order to completely acquire the
semantics. For the above reasons, we argue that shared sound cues are only applicable when it is
desired that the acquired intuition is relevant to each visitor in the physical space. For example, the
time of day, the weather outside or the completion of a joint process may have states that are of
interest and relate in the same way to a group of people.
The third alternative for shared spaces, mixed cues, simply allows each visitor to be exposed to the
mixture of all personal soundscapes. This may of course have a deteriorating effect simply because
the amount of cues may cause noise pollution. There is also the possibility that humans behave
socially and talk sufficiently to void all hope of appreciating discrete sounds. Simultaneous
playback of sound cues aimed at different users at low amplification levels may also create
interference and distort both sounds. The psychological conflict that may arise from the apparent
inconsistency between disparate soundscapes (for example, a forest and church) may also defeat
the purpose of the soundscape by provoking the attention and interest focus of the visitors. A
certain amount of technological control may be introduced to deter, if not completely abate some
of the problems with mixed cues. A queuing mechanism may be used to schedule the playback of
sounds and prevent unwarranted mixing and interference. Microphones for ambient sound can be
used to detect the basic level of background noise in the physical location as well as the amount of
conversation. Playback volume may then be adjusted to the ambient noise and to fit certain cues
into pauses in the conversation. Fine-grained location information and motion tracking can be used
to direct a particular cue to the loudspeaker closest to the recipient (this has obvious application for
signalling systems as well). The final issue, that of conflicting sound metaphors, is less readily
solved by straightforward engineering. We believe that the experience offered by a particular
soundscape is personal to the listener. On the other hand, we also hold it plausible that listeners
under regular exposure will adjust and accept what at first appears to be impossibility. Is appears
likely that a particular soundscape will not be confused with its simulation, but rather with the
presence of its owner.
Given the assumption that a soundscape is allowed to track its owner, other interesting
developments in social interactions may follow. For example, being present is enhanced in the
audio dimension in that a person is being heard beyond the sounds emitted by the person's own
body. A dark room can reveal the identity of its occupants by their soundscapes. Also, with human
vanity being what it is, a certain amount of competition can be expected where some users try and
vie to excel each other in terms of refinement, posture or abuse. A soundscape loaded with
obnoxious vocalisations is probably far more attention grabbing and intrusive than the aural
metaphors outlined in this paper. We can only hope that such excess surely manifests itself in other
modes of communication as well, and that the potential problem therefore is one of social
interaction and control, and not of weakly intrusive soundscapes.
We have implemented a simple WISP in Java. The audio playback features of JDK 1.2 offers
simultaneous playback of up to 64 channels of sampled sound (or MIDI data which we do not
use). At the time of this writing an email state provider is being prepared.
[1] Anderson, John R. (1995) Cognitive Psychology and its Implications (Fourth Edition), New
York: W.H. Freeman and Company.
[2] Ishii, H. and Ullmer, B., Tangible Bits: Towards Seamless Interfaces between People, Bits and
Atoms, in Proceedings of Conference on Human Factors in Computing Systems (CHI '97),
(Atlanta, March 1997), ACM Press, pp. 234-241
[3] Ishii, H., Wisneski, C., Brave, S., Dahley, A., Gorbet, M., Ullmer, B., and Yarin, P.,
ambiemtROOM: Integrating Ambient Media with Architectural Space, in Proceedings of CHI '98,
ACM, New York, pp. 173-174.
[4] Ambient Displays: Turning Architectural Space into an Interface between People and Digital
Information Wisneski, Craig and Ishii, Hiroshi and Dahley, Andrew and Gorbet, Matt and Brave,
Scott and Ullmer, Brygg and Yarin, Paul, Proceedings of the First International Workshop on Cooperative Buildings (CoBuild'98), February 25-26, Springer, 1998.
[5] Gaver, W., W. and Smith, R., B. and O'Shea, T., Effective Sounds in Complex Systems: The
ARKola Simulation, Proceedings of CHI'91, April 28-May 2, 1991.
[6] Audio Augmented Reality: A Prototype Automated Tour Guide Bederson, B.B and Druin, A.,
The ACM Human Computer in Computing Systems conference (CHI'95), conference companion,
pages 210-211.
[7] Electronic Mail Previews Using Non-Speech Audio Hudson, Scott, E. and Smith, Ian, CHI'96
Conference companion, ACM, pp. 237-238.
[8] Audio Aura: Light-weight Audio Augmented Reality Mynatt, D., Elizabeth and Back,
Maribeth and Want, Roy and Frederick Ron, Xerox Palo Alto Research Center, 3333 Coyote Hill
Road, Palo Alto, CA 94304,
[9] Designing Audio Aura Mynatt, D., Elizabeth and Back, Maribeth and Want, Roy, Xerox Palo
Alto Research Center, [mynatt,back,want]@parc.xerox.com, Baer, Michael, Stanford University,
calbaer@stanford.edu, Ellis, B., Jason, Georgia Institute of Technology
[10] Nomadic Radio: Scaleable and Contextual Notification for Wearable Audio Messaging
Sawhney, Nitin and Schmandt, Chris, Proceedings of CHI'99, the ACM SIGCHI Conference on
Human Factors in Computing Systems, May 15-20, Pittsburgh, Pennsylvania.
[11] Nomadic Radio: Speech and Audio Interaction for Contextual Messaging in Nomadic
Environments Sawhney, Nitin and Schmandt, Chris, ACM Transactions on Computer-Human
Interaction, Vol. 7, No. 3, September 2000, Pages 353-383.
[12] Spatial audio in small screen device displays Walker, Ashley and Brewster, Stephen,
Department of Computing Science, University of Glasgow, Scotland G12 8QQ,
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF