Low Resolution for Screen Viewing ()

Low Resolution for Screen Viewing ()
AES 40th INTERNATIONAL CONfERENCE
Spatial Audio: Sense the Sound of Space
NHK Science & Technology Research Laboratories & Tokyo University of the Arts Tokyo, Japan
October 8–10, 2010
Thursday, October 7
18:00–20:00
PRE-CONFERENCE SPECIAL WORKSHOP
Venue: Tokyo University of the Arts
Registration Fee: Free
Organizers: AES Japan Student Section, AES 40th Conference
Committee, and Tokyo University of the Arts
Anatomy of 3-D technology
Chair:
Erisa Sato, Chair of AES Japan Student
Section / Tokyo University of the Arts
Panelists: Sungyoung Kim, Yamaha Corporation
Seigen Ono, Saidera Paradiso Ltd.
mix, and later dematrixed (or decoded) recreating a number of speaker
feeds, which generally can be larger than the number of transmitted
channels. This can be very clever, but requires that the signals being
matrixed and subsequently dematrixed are perfectly time-aligned (so the
microphones should all be coincident).
Between these two extremes, a number of intermediate approaches
were developed. The presentation will exploit advantages and disadvantages of the two concepts, providing an historical analysis of the past and
an up-to-date information about current development and forthcoming
research results.
The presentation will be accompanied by short sound samples prepared in 5.1 format, which will be played during the presentation, if the
lecture room is equipped with a surround sound system.
Faculty Adviser: Toru Kamekawa, Tokyo University of the Arts
Friday, October 8
In this workshop we will invite corporate members, manufacturers, and
engineers from the areas of professional sound recording and videogame
production, allowing students to deepen their understanding in the technology and R&D activities of 3-D audio and visual production in a Q&A setting.
PAPER SESSION 1: PERCEPTION AND EVALUATION
OF SPATIAL AUDIO—PART 1
1-1
FRIDAY, OCTOBER 8: ALL EVENTS TAKE PLACE AT NHK
SCIENCE & TECHNOLOGY RESEARCH LABORATORIES
Friday, October 8
Opening Remarks
09:00–10:00
KEYNOTE ADDRESS 1
History and Current State of the War between
the Two Main Approaches to Surround Sound:
Discrete Loudspeaker Feeds versus Hierarchical Matrix
Keynote Speaker: Angelo Farina, University of Parma,
Parma, Italy
This keynote speech reconstructs the history of the competition, clashing, and hybridization between two opposite concepts, which were proposed for creation of surround sound.
The first approach is the discrete method, which became feasible only
when the storage media allowed for complete channel separation of
more than 2 channels; in practice, with the advent of the DVD. Conceptually this approach is based on the paradigm “one microphone feeding
one loudspeaker,” repeated as many times as the number of loudspeakers employed during playback. Of course, these “microphones” can
either be real capsules properly placed and aimed, or the speaker feeds
can be created as “virtual microphones,” the most common case is by
amplitude-panning a number of dry mono signals obtained by closelymiking musicians and singers. The competitor of this approach was,
since the beginning, the usage of a hierarchical system, in which the signals are first matrixed (or encoded), possibly employing a reduced number of channels employed for storage or transmission of the surround
Validation of a Simple Spherical Head Model as a Signal
Capture Device for Head-Movement-Aware Prediction of
Perceived Spatial Impression—Chungeun Kim, Russell Mason,
Tim Brookes, Institute of Sound Recording, University of Surrey,
Guildford, Surrey, UK
In order to take head movement into account in objective evaluation of perceived spatial impression (including source direction),
a suitable binaural capture device is required. A signal capture
system was suggested that consisted of a head-sized sphere
containing multiple pairs of microphones that, in comparison to a
rotating head and torso simulator (HATS), has the potential for
improved measurement speed and the capability to measure
time varying systems, albeit at the expense of some accuracy.
The error introduced by using a relatively simple sphere compared to a more physically accurate HATS was evaluated in
terms of three binaural parameters related to perceived spatial
impression—interaural time and level differences (ITD and ILD)
and interaural cross-correlation coefficient (IACC). It was found
that while the error in the IACC measurements was perceptually
negligible, the differences in measured ITD and ILD values
between the sphere and HATS were not perceptually negligible,
although they were reduced slightly when thesphere was mounted on a torso. However, it was found that the sphere-with-torso
could give accurate predictions of source location based on ITD
and ILD, through the use of a look-up table created from known
ITD-ILD-direction mappings. Therefore the validity of the spherewith-torso as a potential head-movement-aware binaural signal
capture device for perceptually relevant measurements of source
direction (based on ITD and ILD) and spatial impression (based on
IACC) was demonstrated.
08:55
OPENING REMARKS
Friday, October 8
10:00–11:30
1-2
Effects of Individualized Headphone Correction
on Front/Back Discrimination of Virtual Sound Sources
Displayed Using Individualized Head Related Transfer
Functions—Abhishek Guru, William Martens, Doheon Lee,
The University of Sydney, Sydney, NSW, Australia
Individualized Head Related Transfer Functions (HRTFs) were
used to process brief noise bursts for a 2-interval forced choice
(2IFC) front/back discrimination of virtual sound source locations presented via two models of headphones, frequency
responses of which could be made nearly flat for each of 21
listeners using individualized headphone correction filters. In
order to remove virtual source timbre as a cue for front/back
discrimination, spectral centroid of sources processed using
rearward HRTFs were manipulated so as to be more or less
similar to that of source processed using frontward HRTFs. As
this manipulation reduced front/back discrimination to chance
levels for 12 out of 21 listeners, performance of 9 listeners
showing “good discrimination” was analyzed separately. For
these 9 listeners, the virtual sources presented using individualized headphone correction filters supported significantly
better front/back discrimination rates than did virtual sources
presented without correction to headphone responses.
1-3
Head-related transfer function and virtual auditory display are
hot issues in research of acoustics, signal processing, and
hearing, etc., and have been employed in a variety of applications. In recent years they have received increasing attention
in China. This paper reviews the latest developments of headrelated transfer function and virtual auditory display in China,
especially works accomplished by our group.
2-2
In this paper we present a comparative study on directly
aligned multi point controlled wavefront synthesis (DMCWS)
and wave field synthesis (WFS) for the realization of a highaccuracy sound reproduction system, and the amplitude and
phase characteristics of the wavefronts generated by DMCWS
and WFS are assessed on the computer simulations and measurement in actual environments. First, the results of computer
simulations revealed that the wavefront in DMCWS has wide
applicability in both the spatial and frequency domains with
small amplitude and phase errors, particularly above the spatial aliasing frequency in WFS. Next, we developed wavefront
measurement system and measured a DMCWS wavefront
with this system and proposed algorithm. The results of measurements clarified the effect of a reflected wave and the frequency characteristics of a loudspeaker. Also, DMCWS has
wide applicability in frequency domains in actual environments.
From these findings, we concluded the advantageousness of
DMCWS compared with WFS.
Learning to Remediate Sound Localization in the Median
Plane Using a Virtual Auditory Display—Kenji Ozawa,
Tomohiro Sato, University of Yamanashi, Kofu, Japan
Previous studies have shown the efficacy of sound localization
training with non-individualized head-related transfer functions
(HRTFs) using a virtual auditory display (VAD) in the horizontal
plane. The efficiency of training of sound localization in the upper
part of the median plane was experimentally examined in this
study. During each training session, noise stimuli for nine elevation angles were prepared, and each of them was presented five
times in random order with feedback regarding the correct position. During test sessions, in contrast, music stimuli for ten elevation angles were tested. Just one training session was required to
induce a significant learning effect with regard to remediation of
sound localization, while a ceiling effect was observed by three
sessions. Seven training sessions resulted in the persistence of
learning efficacy for over one month. The efficiency of short periods of training will enable us to utilize a VAD with non-individualized HRTFs in various applications.
Friday, October 8
2-3
Measuring High-Quality Room Impulse
Responses for Artistic Applications
Wieslaw Woszczyk, McGill University
Panelists: Ralph Kessler, Pinguin
Masahito Ikeda, Yamaha Corporation
Akio Takahashi, Yamaha Corporation
The workshop will focus on techniques for measuring high-quality
impulse responses of architectural spaces that find applications in various artistic domains including music production and post-productions,
recording, live sound reinforcement, and acoustic support of music performance. The topics will include the selection of sound sources, microphone systems, processing techniques, and techniques for capturing the height information. Examples of productions will illustrate the
audible differences between different approaches taken in measuring
rooms and reverberant spaces.
Friday, October 8
11:30–13:30
PAPER SESSION 2: SPATIAL RENDERING
AND REPRODUCTION—PART 1
2-1
Some Recent Works On Head-Related Transfer Functions
and Virtual Auditory Display in China—Bo-Sun Xie,1 XiaoLi Zhong,1 Guang-Zheng Yu,1 Shan-Qun Guan,2 Dan Rao,1
Zhi-Qiang Liang,1 Cheng-Yun Zhang1
1South China University of Technology, Guangzhou, China
2Beijing University of Posts and Telecommunications,
Beijing, China
Local Sound Field Synthesis by Virtual Secondary
Sources—Sascha Spors, Jens Ahrens, Deutsche Telekom
Laboratories, Technical University of Berlin, Berlin, Germany
Sound field synthesis techniques like Wave Field Synthesis and
Higher-Order Ambisonics aim at the physical synthesis of a
desired sound field over an extended listening area. However, for
practical setups the accuracy up to which the desired sound field
can be synthesized over an extended area is limited. For certain
applications it is desirable to limit the spatial extent of the listening area in order to increase the accuracy within this limited
region for a given loudspeaker arrangement. Local sound field
synthesis aims at a higher accuracy within a local listening area.
An approach to local sound field synthesis is presented that is
based on the concept of using virtual loudspeakers that are
placed more densely around the local listening area than the
existing loudspeakers. The approach is illustrated using Wave
Field Synthesis as an example.
10:00 – 12:00
WORKSHOP 1
Chair:
Sound Field Reproduction by Wavefront Synthesis Using
Directly Aligned Multi Point Control—Noriyoshi Kamado,1
Haruhide Hokari,2 Shoji Shimada,2 Hiroshi Saruwatari,1
Kiyohiro Shikano1
1Nara Institute of Science and Technology (NAIST), Nara,
Japan
2Nagaoka University of Technology, Nagaoka, Niigata,
Japan
2-4
Comparison of Higher Order Ambisonics and Wave Field
Synthesis with Respect to Spatial Discretization Artifacts
in Time Domain—Jens Ahrens, Hagen Wierstorf, Sascha
Spors, Deutsche Telekom Laboratories, Technische
Universität Berlin, Berlin, Germany
We present a time domain analysis and comparison of spatial
discretization artifacts in near-field compensated higher order
Ambisonics and wave field synthesis. Simulations of both methods on the same circular loudspeaker array are investigated and
the results are interpreted in terms of fundamental psychoacoustical properties of the human auditory system, most notably the
precedence effect. It can be shown that both methods exhibit
fundamentally different properties regarding the synthesized first
arriving wave fronts as well as additional correlated wave fronts
(echoes). The properties of both types of wave fronts are a consequence of the combination of the spatial bandwidth of the
loudspeaker driving function and the fact that a finite number of
spatially discrete loudspeakers are employed.
Friday, October 8
12:00 – 13:30
the gel surround. The panel breaks into different characteristic
modes of vibration as frequency increases. At high frequencies
above 1,600 Hz, the extent of the moving area of the panel
begins to reduce and becomes localized on the position of the
gel transducer. This results in both the sound radiating area
and volume velocity being reduced so that the directivity of the
sound field narrows. As a result, a single panel with the two gel
transducers attached can radiate independent sound sources
with minimal acoustic cancellations at high frequencies. The
current paper reports on this effect using theoretical and practical approaches.
WORKSHOP 2
Space Builder: A Comprehensive Production Tool
for 22.2 Channel Sound Design
Co-chairs: Kimio Hamasaki, NHK
Wieslaw Woszczyk, McGill University
Panelists: Richard King, McGill University
Doyuen Ko, McGill University
Brett Leonard, McGill University
Kentaro Matsui, NHK
The researchers from McGill University’s Centre for Interdisciplinary
Research in Music Media and Technology (CIRMMT) and the NHK Science and Technology Laboratories will present a convolution-based
Space Builder developed for ambient sound design and flexible spatial
processing applied in 22.2 channel audio productions. They will show
hardware and software implementations, detailed system architecture
with a dedicated GUI, and present mixes mastered using the system. The
goal is to deliver a production tool that easily lends a high quality spatial
treatment in three dimensions to recordings originally made in mono,
stereo, or 5-channel surround, without height. The attendees will have a
chance to try the Space Builder user interface and to hear the results.
Friday, October 8
3-4
To develop sound generators suitable for multichannel audio
we studied the transformation efficiency of electroactive elastomer (EAE), which is soft material that can be transformed by
applying voltage. From the results of our analysis, we proposed two types of sound generators, both of which are
lightweight because they do not use conventional driving
parts. The first type is a cylindrical sound generator that radiates sound omnidirectionally in the horizontal plane using the
EAE’s flexibility. The second type is a push-pull sound generator with a comparatively improved frequency response by
more effective use of the transformation of EAE. The obtained
acoustic characteristics and future applications are discussed
in this paper.
14:30–17:30
PAPER SESSION 3: SPATIAL RENDERING
AND REPRODUCTION—PART 2
3-1
A Design Tool to Produce Optimized Ambisonic Decoders
—David Moore,1 Jonathan Wakefield2
1Glasgow Caledonian University, Glasgow, UK
2University of Huddersfield, Huddersfield, UK
This paper describes a tool for designing Ambisonic surround
sound decoders. The tool is highly flexible and provides a
decoder designer with powerful features to enable the design
of a decoder to their specific requirements. The tool employs
computer search to find decoder parameters that best meet
design criteria specified in a multi-objective fitness function.
Features include: objective range-removal and importance,
even performance by angle, performance that correlates with
human spatial resolution, and frequency dependent and independent decoders of different orders. Performance can be optimized for a single listener or multiple off-center listeners. The
current tool works for 5.0 surround sound; however it can be
extended to other horizontal-only and 3-D configurations.
Results are shown that demonstrate the tool’s capability and
flexibility for various scenarios.
3-2
Sound Image Localization Tests of a 21-Channel Surround
System—Jae-Hyoun Yoo, Jeongil Seo, Kyeongok Kang,
ETRI, Yuseong-gu, Daejeon, Korea
A 21-channel sound field reconstruction system has been
developed with physical reconstruction of a three dimensional
target sound field over the pre-defined control volume. We
performed subjective listening tests of distance and vertical
localization compared with VBAP (Vector Based Amplitude Panning), and the results showed that LSM (Least Squared Method)
had better performance on distance localization than VBAP, and
similar performance in vertical localization with VBAP.
3-3
A Novel Design for a Gel-Type DML Transducer
Incorporating a Solid Panel Projecting Multiple
Independent Sound Sources—Minsung Cho,1,2
Elena Prokofieva,1 Mike Barker,1 Jordi Munoz2
1Edinburgh Napier University, Edinburg, UK
2SFX Technologies Ltd., Edinburgh, UK
The gel-type DML transducer (referred to as the gel transducer
in this paper) excites a panel to radiate sound waves through
Sound Generators Using Electroactive Elastomer
for Multichannel Audio—Takehiro Sugimoto,1 Kazuho Ono,1
Yuichi Morita,2 Kosuke Hosoda,2 Daisaku Ishii,2 Akio Ando1
1NHK Science and Technology Research Laboratories,
Setagaya-ku, Tokyo, Japan
2Foster Electric Co., Ltd., Akishima, Tokyo, Japan
3-5
A New Method for B-Format to Binaural Transcoding—
Svein Berge, Natasha Barrett, Berges Allmenndigitale
Rådgivningstjeneste, Oslo, Norway
A frequency-domain parametric method for transcoding firstorder B-format signals to a binaural format is introduced. The
method provides better spatial sharpness than linear methods
allow. A high angular resolution planewave decomposition of
the B-format establishes two independent direction estimates
per time/frequency bin. This alleviates the requirement that the
sound sources in a mix are W-disjoint orthogonal, implicit in
previous nonlinear methods. The characteristics and causes of
audible artifacts are discussed. Methods are introduced that
suppress the different types of artifacts. A listening test is presented that ranks the sound quality of the method between
third-order and fifth-order linear ambisonics systems.
3-6
A Virtual Acoustic Film Dubbing Stage—Stephen Smyth,
Michael Smyth, Steve Cheung, Lorr Kramer, Smyth Research
LLC, Camarillo, CA, USA
By capturing personalized binaural room responses within a
film dubbing stage a highly accurate three dimensional virtualization of the stage acoustics is possible using standard stereo
headphones. For the first time a virtual dubbing stage can be
captured and brought to the desktop audio workstation. This
paper focuses on one particular virtualization technology, SVS,
that discusses some of the virtualization issues relating to dubbing stages, describes how the technology addresses these
issues, and highlights some of the remaining problems of the
virtualization technique. Finally, the actual measurement of a
large film theater is described.
Friday, October 8
14:30–17:30
WORKSHOP 3
Periphony—More than Just Over Your Head
Chair:
Jeff Levison, SurroundExpert.com
Panelists: Stuart Bowling, Dolby Laboratories
Kimio Hamasaki, NHK
Ulrike Kristina Schwarz, Bayerischer Rundfunk
Wilfried Van Baelen, Galaxy Studios
Helmut Wittek, Schoeps Mikrofone GmbH
Wieslaw Woszczyk, McGill University
A growing number of playback formats are being made available—
either in experimental systems or real life possibilities—for playback of
audio that not only is in surround sound but also incorporates additional channels to create the sensation of height thus maximizing what
some have referred to as the sensations of envelopment and engulfment. Systems in use include UHDTV, IMAX, Dolby ProLogic IIz,
Chesky 2+2+2, DTS-HD, Ambisonics—plus other experimental methods. Great interest has been rekindled in these systems with the success of 3-D motion pictures and now the introduction of 3-D video to
the consumer at home.
Presentations will include discussions regarding scientific research,
classical recording, pop music production, radio-style drama, expanded cinema, and theatrical stage sound design. A brief history will be
offered for the uninitiated to understand the basic premises at work in
periphony. The panelists will offer playback of samples of their work in
formats ranging from 5.1+2 to 22.2.
Friday, October 8
comprehensively and precisely so that the sensed (recorded) sounds
can be suitably reproduced by auditory displays that are responsive to
a listener’s movement, beyond place and time. Finally, we introduce a
high-definition auditory display based on high-order Ambisonics
(HOA) architecture with the fifth order: the highest order realized to
date. This system is implemented with a surrounding loudspeaker
array of 157 loudspeakers. These systems are expected to be useful
to realize new and advanced communications systems providing high
sense-of-presence. Moreover, such systems are expected to be useful as experimental systems to accumulate new knowledge related to
human perceptions, which is crucially important for the advancement
of communications.
Acknowledgments: Parts of this research are supported by Tohoku
University GCOE program CERIES, Grants-in-Aid for Specially Promoted Research (no. 19001004) to Suzuki from JSPS, and by
SCOPE (no. 082102005) to Sakamoto from MIC Japan.
Saturday, October 9
PAPER SESSION 4: APPLICATIONS OF SPATIAL AUDIO
4-1
17:30
SPECIAL EVENT
Super Hi-Vision and 22.2 Multichannel
Sound Demonstrations
Friday, October 8
SPECIAL EVENT
Welcome Concert: Japanese Traditional Music
and Multichannel Sound Music
SATURDAY, OCTOBER 9: ALL EVENTS TAKE PLACE
AT TOKYO UNIVERSITY OF THE ARTS
08:55
AES PRESIDENT ADDRESS
Saturday, October 9
Applying Spatial Audio to Human Interfaces: 25 Years
of NASA Experience—Durand R. Begault,1 Elizabeth M.
Wenzel,1 Martine Godfroy,1,2 Joel D. Miller,1,3 Mark R.
Anderson1,3
1Human Systems Integration Division, NASA Ames
Research Center, Moffett Field, CA, USA
2San José State University Foundation, San José, CA, USA
3Dell Services-Perot Systems, Plano, TX, USA
From the perspective of human factors engineering, the inclusion
of spatial audio within a human-machine interface is
advantageous from several perspectives. Demonstrated benefits
include the ability to monitor multiple streams of speech and nonspeech warning tones using a “cocktail party” advantage and for
aurally-guided visual search. Other potential benefits
include the spatial coordination and interaction of multimodal
events and evaluation of new communication technologies and
alerting systems using virtual simulation. Many of these technologies were developed at NASA Ames Research Center,
beginning in 1985. This paper reviews examples and describes
the advantages of spatial sound in NASA-related technologies, including space operations, aeronautics, and search and rescue.
The work has involved hardware and software development as
well as basic and applied research.
19:30
Saturday, October 9
10:00–11:00
09:00–10:00
KEYNOTE ADDRESS 2
Audio Displays and Microphone Arrays
for Active Listening
Keynote Speaker: Yôiti Suzuki, Tohoku University
To realize future communications interactively with a high sense-ofpresence, it is important to recall that we humans are active creatures,
moving through the environment to acquire accurate spatial information. For instance, in terms of spatial hearing, humans usually make
slight head and body movements unconsciously, even when trying to
keep still while listening. Actually, such movement is known to be
effective for improving the precision of auditory spatial recognition. We
designate this style of listening as active listening. Therefore, it is particularly important that sound systems to synthesize sound fields,
which we call auditory displays, be responsive to a listener’s movement, at least to a listener’s head rotation. Auditory displays matching
the motions of active listening are therefore eagerly sought for use in
future communications. In this presentation, we first show that a
sound field that is synthesized to a listener’s movement in a responsive manner significantly enhances the listener’s perceived sense-ofpresence. Our auditory display, which is responsive to a listener’s
movement based on binaural reproduction architecture, was used for
this experiment. Then we introduce our high-definition small spherical
microphone array based on Symmetrical object with ENchased Zillion
microphones (SENZI) architecture and its implementation with 252
channels of microphones. SENZI can sense spatial sound information
4-2
“GABRIEL”: Geo-Aware BRoadcasting for In-Vehicle
Entertainment and Localizability—Julian Villegas, Michael
Cohen, University of Aizu, Aizu-Wakamatsu, Japan
We have retrofitted a vehicle with location-aware advisories
/announcements, delivered via wireless headphones for passengers and “nearphones” or bone-conduction headphones for
the driver. Our prototype differs from other projects exploring
spatialization of the aural information: besides the commonly
used landmarks to trigger audio stream delivery, our prototype
uses geo-located virtual sources to synthesize spatial soundscapes. Intended as a “proof of concept” and testbed for future
research, our development features multilingual tourist information, navigation instructions, and traffic advisories rendered
simultaneously.
Saturday, October 9
10:00 – 12:00
WORKSHOP 4
3-D Sound
Co-Chairs:
Kazuho Ono, Thomas Sporer
Panelists:
Diemer de Vries, Delft University of Technology
Kimio Hamasaki, NHK Science and Technology
Research Laboratories
Shiro Ise, Kyoto University
Toshiyuki Kimura, National Institute of Information
and Communications Technology
Frank Melchior, IOSONO GmbH
Sascha Spors, Deutsche Telekom Laboratories/TU Berlin
The goal of an immersion system is to create the illusion of
proximity for people in different areas. To achieve this, it is essential to pick-up and regenerate all the crucially visual and
aural information that is perceptible by human senses. One of
the important decisions in any audio processing system is to
choose appropriate features. The criteria may count on how
exactly the signals are represented and how easily the following process can be performed. In addition, it is more challenging if we would like to simulate the acoustical characteristics
around a specific instrument based on the universal recording.
The totally distinct spectral characteristics generated by the
target instrument should dominate the recording, whereas the
subtly surrounding audio clues must be preserved as well. In
this paper we analyze such spatial audio and demonstrate how
it can be systematically computed by the perception-based
empirical mode decomposition. As an example of our method,
analysis for reference microphone signal and its counterpart
that simulates what a spot microphone would pick up near a
percussive instrument is presented.
Current state of 3-D sound includes various ideas and technologies.
This workshop overviews the current state of these technologies.
WFS, Ambisonics, 22.2ch audio, together with other innovative ideas
such as focusing sound and boundary control method, etc., will be
presented and discussed.
Saturday, October 9
11:00
POSTER SESSION
P-1
Developing Common Attributes to Evaluate Spatial
Impression of Surround Sound Recording—Toru
Kamekawa, Atsushi Marui, Tokyo University of the Arts,
Tokyo, Japan
For the evaluation of spatial impression, several attributes are
used. However, in investigating more critical evaluation of spatial impression such as surround microphone setting, it is very
difficult to share common meanings for each perceptual
attribute. The authors attempted to elicit common attributes
from surround sound recordings by triadic elicitation procedure. Three attributes, “brightness,” “temporal separability,”
and “spatial homogeneity” were elicited. Pair-wise comparison
was implemented to evaluate five different microphone placements for surround recordings using these attributes. From the
results of ANOVA, significant differences between microphone
placements and effect of sub-jects’ individual differences were
observed at all attributes. After removing the subjects who had
circular triads and by using cluster analysis procedure, 60 to
70 percent (depending on the attributes) of the professional
subjects remained. This is more stable compared to the
student subjects for all three attributes. It is suggested that
training is necessary for naïve listeners to share the same
meanings of these attributes. Focusing one of the elicited attributes related to spatial impression, “spatial homogeneity,”
which has significant differences between a pair of recording
excerpts, authors stud-ied correspondence with physical factors. D/R (the direct to reverb ratio) was calculated from running IACC, which was measured from a binaural recording of
music and filtered into third-octave bands. It is hypothesized
that differences of mean D/R at certain frequency band
(around 3 kHz), what were measured between different listening positions, corre-spond to “spatial homogeneity.”
P-2
Spatial Audio Analysis Based on Perceptually Empirical
Mode Decomposition—ChingShun Lin, YungCheng Chao,
National Taiwan University of Scince and Technology, Taipei,
Taiwan
Improved ITD Estimation in Reverberant Environments —
Gavin Kearney, Damien Kelly, Frank Boland, Trinity College,
Dublin, Ireland
In this paper we present an improved model for Interaural
Time Difference (ITD) estimation in reverberant environments.
The Phase Transform (PHAT) weighting function, used in
generalized cross correlation is investigated here for application to Interaural Cross Correlation (IACC) measurements
from binaural microphones. A binaural model is developed to
accommodate the method, and demonstrations of its implementation in a real reverberant room show improved ITD estimation over non-weighted IACC, when compared to subjective listening results.
P-5
Wave Front Reconstruction Using the Spatial Covariance
Matrices Method—Hiroki Hagiwara,1 Yoshinori Takahashi,1,2
Kazunori Miyoshi1
1Kogakuin University, Tokyo, Japan
2Tokyo Metropolitan College of Industrial Technology,
Tokyo, Japan
This paper describes a sound-field reproduction method called
a Co-Variance Method (CVM) that reproduces a spatial covariance matrix of the original sound field without information of
sound-source locations. Multichannels of reproduced sound in
a listening area were made from the recorded signals in the
original field according to the adoptively optimized process
where the quasi-Newton method is utilized in order to get the
solution. Numerical simulation results confirmed that the
method works well in principle, and consequently reconstructed wavefronts are similar to the original ones. Relationship
between the number of microphones or loudspeakers and
reproduction errors are investigated. It is shown that the phase
error diminishes as the number of pairs for microphones and
loudspeakers increases more than that of positions set for the
covariance analysis.
Standardization of PEAQ-MC: Extension of ITU-R BS.13871 to Multichannel Audio—Jusith Liebetrau, Thomas Sporer,
Sven Kämpf, Sebastian Schneider, Fraunhofer Institute for
Digital Media Technology IDMT, Ilmenau, Germany
PEAQ (Perceptual Evaluation of Audio Quality) is a standardized
algorithm for the objective measurement of perceived
audio quality. It predicts the perceived audio quality of mono and
stereo audio files as listeners would do in a subjective listening
test according to ITU-R BS.1116-1. Unfortunately this prediction
is not intended for multichannel material, such as 5.1 or beyond.
Additionally, the quality estimation of the standard does not consider spatial artifacts. Members of ITU-R Working Party 6C are
developing an extension toward multichannel compatibility, as
well as an integration of the modeling of spatial artifacts assessments. In this paper the current status of the standardization
progress is illustrated. The concept of one of the three proposals
currently under consideration is explained and the results of a
first verification test conducted by ITU-R are presented.
P-3
P-4
P-6
Inverse Wave Propagation in Wave Field Synthesis—
Shoichi Koyama, Yusuke Hiwasaki, Ken’ichi Furuya, Yoichi
Haneda, NTT Cyber Space Laboratories, Musashino-Shi,
Tokyo, Japan
A combined method of wave field synthesis and inverse wavefield propagation that enables the recreation of sound images
in front of a loudspeaker array is presented. By use of inverse
wave propagation, the sound pressure at the virtual receiving
plane, which is in the backward direction of wave propagation
from the actual receiving plane, can be estimated. This estimation is an analytical method based on the wave equation typically used in holography. Shifting the virtual receiving plane
backward from the actual receiving plane corresponds to shift-
ing the reconstruction plane forward from the secondary
sources. As a result, this method has the advantage of being
able to reproduce virtual primary sources in front of secondary
sources even when the position or directivity of the primary
sources is unknown; this was not possible by the conventional
focused-source technique. Numerical simulation results are
also presented to show the efficacy of the proposed method.
P-7
Sound Field Equalization by Active Acoustic Impedance
Control—Jyunji Hagio,1 Akihiro Kakiuchi,1,2 Akira Omoto1
1Kyushu University, Minami-ku, Fukuoka, Japan
2Currently with NHK, Tokyo, Japan
This study examined sound field equalization using active control. The specific acoustic impedance, the ratio of sound pressure, and particle velocities were adopted as a source of control.
Providing the impedance in the aimed direction was controlled to
be the characteristic impedance of the medium, the propagating
plane wave in any particular direction was expected. In addition,
the directional characteristic of the sound propagation was also
controlled by adjusting the weight of ratios of pressure and velocity in three orthogonal directions. The results of numerical
simulations indicated the potential efficiency of the proposed
method. Furthermore, results of subjective experiments showed
the possibility of the proposed control changing the perceived
direction of the incoming sound.
P-8
Microphone Configurations for Teleconference
Application of Directional Audio Coding and Subjective
Evaluation—Jukka Ahonen, Aalto University School of
Science and Technology, Aalto, Finland
Real-Time Tracking of Speech Sources Using Binaural
Audio and Orientation Tracking—Marko Takanen,
Matti Karjalainen, Aalto University School of Science
and Technology, Espoo, Aalto, Finland
This paper presents a method for real-time estimation of the
directions of speech sources from captured binaural audio. Accurate direction estimates are required in order to embed the
sound sources correctly to the auditory environment of the farend user in telecommunication between two augmented reality
audio (ARA) users. The dependency of the estimation accuracy on the orientation of the near-end user is avoided in this
method by combining the information from an orientation tracker to the direction estimates. The results from the anechoic
experiments illustrate that the presented method can estimate
the direction(s) of non-simultaneous speech source(s) in realtime, and that head movement improves the estimation accuracy of sources on the sides of the user.
P-12
On the Influence Coding Method on Japanese Speech
Intelligibility in Virtual 3-D Audio Space—Yosuke
Kobayashi,1 Kazuhiro Kondo,1 Kiyoshi Nakagawa,1
Yukio Iwaya2
1Yamagata University, Yamagata, Japan
2Tohoku University, Miyagi, Japan
In this paper we investigated the influence of stereo coding on
the 3-D audio for Japanese. We encoded localized test samples
using joint stereo and parametric stereo of the HE-AAC encoder
at identical coding rates. The Japanese word intelligibility test
employed was the Japanese Diagnostic Rhyme Tests (JDRT).
First, we localized the speaker in front of the listener at an arbitrary distance a (1.00a). Next, we compared the effect of noise
located at a distance of 0.25a from the listener at one of the
angles 15 degrees apart on the horizontal plane. The result
showed that the target speech cannot be separated from the
noise for any stereo coding when the noise was in front of speaker between azimuths of +30 deg. to –30 deg. However, at other
azimuths, the intelligibility scores were far better. Stereo coding
shows degraded intelligibility compared to the reference at any
noise azimuths. However, joint stereo was shown to be constantly better compared to parametric coding, suggesting that the former is the stereo coding of choice for transmission of localized
3-D audio.
Sound Field Reproduction Applied to Flight Vehicles
Sound Environments—Cédric Camier, Philippe-Aubert
Gauthier, Yann Pasco, Alain Berry, Université de
Sherbrooke, Sherbrooke, Quebec, Canada and McGill
University, Montreal, Quebec, Canada
This paper proposes a preliminary theoretical study for sound
field and sound environment reproduction in flight vehicles. A
fully-coupled cavity, cylindrical shell, and exterior radiation model approximates an aircraft cabin mock-up. Material and geometry characteristics are inspired by measurements performed on a
cabin mock-up. The sound field reproduction is based on reproduction error minimization at a microphone array positioned in
the cavity. Two reproduction systems, based on actuators or
loudspeakers are simulated in order to compare their feasibility
and performance. The model linking excitator strength with the
pressure on the spatially extended array region is developed in
a matricial form. The promising results obtained in terms of
reproduced pressure in the array region in both cases presume
the reliability of such dedicated systems.
P-10
P-11
Interactive Enhancement of Stereo Recordings Using
Time-Frequency Selective Panning—Maximo Cobos, Jose
J. Lopez, Universidad Politecnica de Valencia, Valencia, Spain
Localization of sounds in physical space plays a very important
role in multiple audio related disciplines, such as music, sound
art, or sound editing for audiovisual productions. The most well
known technique for providing such spatial impression is
stereo panning, which creates a virtual location of a sound
source by distributing its energy between two independent
channels during the mixing process. However, once all the
sound events have been mixed, re-distributing source locations to widen the stereo image is not certainly an easy task.
Motivated by this problem, we propose a source spatialization
technique based on the time-frequency processing of the
stereo mixture. The energy distribution over the stereo panorama is modified according to a nonlinear warping function, providing a widening effect that enhances the stereo experience
without degrading the sound quality and preserving the original
conception of the mixing engineer.
P-9
Directional Audio Coding (DirAC) is a spatial-sound processing
technique where the arrival direction and diffuseness of sound
are analyzed in frequency-bands from microphone signals,
transmitted with one or multiple audio channels, and utilized
for various purposes in synthesis. Among other applications,
DirAC has been used in low bit-rate teleconferencing to
provide spatial separation of remote talkers corresponding to
reality. Here, the use of different microphone configurations,
consisting of omnidirectional or directional microphones, is
discussed in DirAC teleconferencing. The audio quality with
different microphone techniques for DirAC teleconferencing is
evaluated by the subjective listening tests, and the results are
presented in this paper.
P-13
Acoustic Measurement System for 3-D Loudspeaker
Set-Ups—Andreas Silzle, Matthias Lang, Jose Angel Pineda
Pardo, Oliver Thiergart, Giovanni Del Galdo, Fraunhofer
Institute for Integrated Circuits IIS, Erlangen, Germany
For any reproducible listening set-up it is crucial to verify
whether the reproduction system is operating properly. This is
a tedious and cumbersome task, as state-of-the-art listening
rooms are equipped with intricate signal processing chains and
a higher number of loudspeakers arranged at very specific
positions. Building an automatic test-system with an adequate
accuracy and reliability represents a rather challenging engineering problem. This paper presents a multi-loudspeaker test-
system, which accomplishes this task, realized by carefully
combining existing measurement techniques. The thorough
validation of the developed test-system indicates an accuracy
in determining the position of ±3 degree, the distance of ±4
cm, and the magnitude response of ±1 dB, for each loudspeaker in the listening room with a total measurement time of
less than 10 s per loudspeaker.
P-14
uli, yet not by the condition whether or not a listener had a
reference position.
5-3
Spatial Audio and Reverberation in an Augmented Reality
Game Sound Design—Natasa Paterson, Fionnuala Conway,
University of Dublin, Trinity College, Dublin, Ireland
In this paper we describe the sound design of a location based
mobile phone game and investigate the role of spatialized
audio and reverberation. Particular attention was given to the
effect of audio on immersion and emotional engagement of
participants. We present the sound design, implementation
and evaluation of our working prototype, Viking Ghost Hunt
and the results and interpretations obtained. Evaluation of the
game was undertaken over a three-day period with the participation of 19 subjects in which the prototype was evaluated for
game immersion and engagement. Further tests of a subset of
participants was undertaken in order to test the specific audio
parameters of spatialized audio and reverberation and their
effect on game immersion and engagement. The results indicated that audio and specifically reverberation, play an important role in immersing a player within the game space.
Saturday, October 9
This paper describes some effects related to the perception of
elevated sources in Wave-Field Synthesis using HRTF
elevation cues. In this recently proposed hybrid system, the
conventional WFS approach is used to achieve localization in
the horizontal plane, whereas elevation effects are simulated
by means of spectral elevation cues. Since the simulated
HRTF cues are the same within the listening area, the height
of the virtual source changes depending on the listening position. Thus, different listeners perceive different source heights,
having a perception that changes when they move around the
listening area. Experiments aimed at investigating this effect
are presented.
5-4
Perception of Sound Image Elevation in Various Acoustic
Environments—Kentaro Matsui, Akio Ando, NHK Science &
Technology Research Laboratories, Setagaya-ku, Tokyo,
Japan
To investigate the discrimination threshold of sound image elevation for a three-dimensional sound system, we conducted
subjective evaluation experiments using a paired comparison
method. Experiments on the discrimination of sound image
elevation with loudspeakers were done in three chambers,
each of which had a different reverberation time. Experiments
with a headphone system were also done, in which stimuli
were recorded binaurally. The experiments showed that (1)
when the loudspeakers were set laterally to the subject, the
elevation of the perceived sound image increased linearly according to the elevation of the sound source; (2) when the
loudspeakers were set in front of or behind the subject, the
perceptual resolution deteriorated as the sound source
ascended. The experiments also indicated that there is no
relation between the room reverberation time and perceptual
resolution, and that the deterioration of perceptual resolution is
conspicuous in headphone listening.
5-2
Investigating Listeners’ Localization of Virtually Elevated
Sound Sources—Sungyoung Kim, Masahiro Ikeda, Yusuke
Ono, Akio Takahashi, Yamaha Corporation, Iwata, Shizuoka,
Japan
The object of this study was to experimentally observe and
compare the perceived directions of elevated sound sources in
two conditions: one reproduced from a real loudspeaker and
another from a virtually manipulated loudspeaker using a newly
proposed transaural crosstalk cancellation. A total of twelve listeners evaluated perceived directions of various sound sources
through a direct estimation of azimuth and elevation angle. The
results showed that virtually elevated sound sources were generally perceived as lower compared to physically elevated
ones, possibly due to the discrepancy between the Head Related Transfer Function (HRTF) used and the listeners’ HRTFs.
Subsequent analysis showed that localization of both conditions was influenced by the type and the bandwidth of the stim-
The Effect of Processing Two Overhead Channels on the
Spatial Impression of Virtual Rooms Rendered in EightChannel Surround Sound—Wieslaw Woszczyk, Doyuen Ko,
David Benson, Brett Leonard, McGill University, Montreal,
Quebec, Canada
In eight-channel surround sound reproduction, two of the channels are elevated above the listener and their signals are
transformed to provide three listening conditions for the evaluation of spatial impression. The elevation channels are either
unprocessed, convolved with a short impulse response of a
dummy head, or convolved with a dummy head response and
also equalized. The six horizontal channels remain the same
for each of the overhead conditions and the loudness of the
overhead channels is calibrated to be equal for all three conditions. Two anechoic monophonic sound sources are convolved
with eight-channel impulse responses previously measured in
two large rooms using eight microphones with the overhead
information captured by a pair of bidirectional spaced-apart
microphones angled 90° between them and pointing diagonally
upward. Listening tests conducted with ten expert subjects
show a dependence of spatial impression (height, immersion,
preference) on the nature of overhead signals, on the program
(sound source), and the choice of room.
13:00–16:00
PAPER SESSION 5: SURROUND SOUND WITH HEIGHT
5-1
Influence of the Listening Position in the Perception
of Elevated Sources in Wave-Field Synthesis—
Jose J. Lopez,1 Maximo Cobos,1 Basilio Pueo2
1Universidad Politecnica de Valencia, Valencia, Spain
2Universidad de Alicante, Alicante, Spain
5-5
Evaluating Candidate Sets of Head-Related Transfer
Functions for Control of Virtual Source Elevation—
Hyun Jo,1 Youngjin Park,1 William Martens2
1KAIST, Daejeon, Korea
2The University of Sydney, Sydney, NSW, Australia
The performance of 3 candidate sets of generic Head-Related
Transfer Functions (HRTFs) was evaluated in terms of differences in their ability to create distinct elevation angles for virtual sources located on a listener’s median plane. In comparison
to directional errors observed when 6 listeners were presented
with sources processed using their own individually measured
Head-Related Impulse Responses (HRIRs), the errors associated with the 3 alternative candidate sets were tested for 7
virtual source target elevation angles on the median plane,
spanning the upper hemifield with 30-degree resolution from
front to back. One candidate set was that of a selected subject’s HRIRs that had been modified through a customization
procedure designed to provide optimal elevation performance
for that subject (which was the selected ‘representative’ subject denoted ‘CH’ in Hwang, et al. [12]). Another candidate set
was that formed by taking the mean over the optimized HRIRs
of 9 subjects that resulted from that same customization procedure. The third set was generated by taking the mean over the
HRIRs of 43 subjects found in the CIPIC database. Performance under identical conditions was also observed for
sources convolved with the appropriate individually customized
HRIR for each of the 6 listeners, and also using a familiar
KEMAR dataset to serve as a reference. Localization performance was quantified using the 3 error metrics of absolute
polar angle error, vertical polar angle error, and front-back confusion rate. Only in terms of front-back confusion rate was performance shown to be inferior for virtual sources processed
using the mean of the HRIRs from the CIPIC dataset when
compared to either those processed using the representative
subject’s customized HRIRs, or using mean of customized
HRIRs of 9 subjects. Although the study investigated virtual
sources varying in elevation on the median-plane only, the
results may generalize to the whole upper hemifield.
5-6
Friday, October 8
PAPER SESSION 6: SIGNAL PROCESSING AND CODING
6-1
Saturday, October 9
6-2
13:00 – 15:00
Surround Recording for Music
Akira Fukada, NHK
What is the goal of surround recording? This is the eternal question
for all the engineers who are trying this area. This workshop is expected to be a hands-on session demonstrating the recording process between musicians and recording engineers though the actual recording
of a jazz trio performance. Participants can experience what will happen during the recording process in the studio or control room or both.
Performers: Aaron Choulai (piano), Yasushi Fukumori (drums),
Akiyoshi Shimizu (bass)
Saturday, October 9
15:00 – 16:00
WORKSHOP 6
New Theoretical Model of Sound Field Diffusion
Presenter: Toshiki Hanyu, Nihon University
A new theoretical model for quantitatively characterizing sound field
diffusion based on scattering coefficient and absorption coefficient of
walls was developed. The concepts of equivalent scattering area,
equivalent scatter reflection area, average scattering coefficient and
average scatter reflection coefficient are introduced in order to
express all walls’ capability of scatter in a room. Using these concepts and the mean free path, scatter-to-absorption ratio, mean
scatter time and diffusion time are defined in order to evaluate
degree of diffusion of a space. Furthermore the effect of spatial scattering objects to sound field diffusion is formulated. In addition the
time variation of specular and scattered components in a room
impulse response is formulated. The verification of these characterization methods was performed with computer simulations based on
the sound ray tracing method. The results supported that the ideas
presented are basically valid. Because the method for measuring
the scattering coefficient has already been defined by an ISO standard, it is possible to prepare a database of the coefficients gradually. Therefore one can design the degree of sound field diffusion by
applying the equations presented.
Signal Models and Upmixing Techniques for Generating
Multichannel Audio—Mark S. Vinton, Mark F. Davis, Charles
Q. Robinson, Dolby Laboratories, San Francisco, CA, USA
Most systems for upmixing stereo content have traditionally
used sums and differences of source signals, an arrangement
referred to as matrixing. Matrix-based upmixers have evolved
from passive operation to sophisticated active matrix designs
and have achieved widespread commercial use. This paper
introduces a new algorithm for upmixing from two to five channels using a hybrid of both scale factor and variable matrix
techniques. The algorithm applies equally well to both Lt/Rtencoded and conventional stereo programs. It improves on traditional methods, providing better reproduction of the front
sound stage, while anchoring center images for off center
listeners and creating more compelling ambience and envelopment. Data from subjective listening tests are provided to support these conclusions.
WORKSHOP 5
Presenter:
Multichannel Audio Coding Based on Minimum Audible
Angles—Adrien Daniel,1,2 Rozenn Nicol,1
Stephen McAdams2
1Orange Labs, Lannion, France
2McGill University, Montreal, Quebec, Canada
The method described in this paper provides a scheme for
encoding multichannel audio signals representing a spatial
auditory scene based on human sound perception in space. It
relies on a psychoacoustic model based on measures of minimum audible angles (MAA) in the presence of distracting
sound sources. A compression gain is obtained by truncating
the order of representation of the auditory scene in the HigherOrder Ambisonics (HOA) domain according to the psychoacoustic model. Numeric simulations were conducted in order to
link the error of representation of the field with an angular distortion of the apparent direction of the sound sources.
Live Production of 22.2 Multichannel Sound for Sports
Programs—Tsuyoshi Hinata, Yuichi Ootakeyama, Hiromi
Sueishi, Japan Broadcasting Corporation (NHK), Tokyo, Japan
The NHK Outside Broadcast Division has produced many live
5.1 surround sound broadcasts of sports events using highdefinition television (HDTV) broadcasting. With future live
broadcasting in mind, this paper covers methods of realizing
surround sound that takes advantage of the characteristics of
the 22.2 multichannel sound system, the methods of realizing
a superior special presentation, and the problems involved in
the live mixing of 22.2 multichannel sound for sports programs
produced with the Super Hi-Vision system developed by the
NHK Science and Technology Research Laboratories.
16:00–17:30
6-3
Three-Dimensional Sound Field Analysis with Directional
Audio Coding Based on Signal Adaptive Parameter
Estimators—Oliver Thiergart, Giovanni Del Galdo, Magdalena
Prus, Fabian Kuech, Fraunhofer Institute for Integrated
Circuits, IIS, Erlangen, Germany
Directional audio coding (DirAC) provides an efficient description of spatial sound in terms of an audio downmix signal and
parametric side information, namely the direction of arrival and
diffuseness of sound. The sound scene can be reproduced
based on this information with any audio reproduction system
such as multichannel playback or binaural rendering. Input to
the DirAC analysis are acoustic signals, e.g., captured by a
microphone array. The accuracy of the DirAC parameter estimation can suffer from a low signal-to-noise ratio (SNR) and a
high temporal variance of the input signals. To handle these
problems, this contribution proposes signal adaptive parameter
estimators that increase the estimation accuracy by considering the SNR and the stationarity interval of the input. Simulations show that the DirAC analysis is significantly improved.
Saturday, October 9
16:00 – 17:30
WORKSHOP 7
Mixing and Production of Spatial Audio
Mixing with Perspective
Presenter: Florian Camerer, ORF, Austria
Concerts of classical music, as well as operas, have been a part of
broadcast programing since the beginning of television. The aesthetic
relationship between sound and picture plays an important part in the
satisfactory experience of the consumer. The question how far the audio
perspective (if at all!) should follow the video angle (or vice versa) has
always been a subject of discussion among sound engineers and
producers. In the course of a diploma work this aspect has been
investigated systematically. One excerpt of the famous New Year's
Concert (from 2009) has been remixed into four distinctly different
surround sound versions. Close to 80 laymen who expressed an interest
in classical music had the task of judging these versions to the same
picture if they found the audio perspective appropriate to the video or not.
A Disruptive Production Tool i nthe Workflow of Spatial Audio
Masters
Presenter: Wilfried Van Baelen, Galaxy Studios, Belgium
evaluate the angles of octave-band signals. The subjects
changed the panning angle until the real sound source and virtually panned source were coincident. A spatial blurring can be
measured by examining the differences of the panning angles
perceived with respect to each band. The listening tests show
that the triplet panning method has better performance than vertical panning in view of perceptual localization and spatial blurring at both on-axis and off-axis positions.
7-2
This presentation will highlight the many practical, and technical
(sometimes hard to remedy) issues that have to be solved to enable
spatial audio formats in the market. Content owners, audio engineers,
and consumers each have their own specific requirements, and have
become very demanding when it comes to compatibility combined with
high quality audio. Based on presenter's pioneering work since many
years in the audio and film industry, he will share with you his vision,
strategy and vast experience that led to the development of a unique
production tool allowing for an easy workflow to create spatial, discrete
3-D Sound Masters that are backwards compatible with existing main
standards and formats without any audible concession.
Saturday, October 9
Speech intelligibility in Teleconference Application
of Directional Audio Coding—Jukka Ahonen, Ville Pulkki,
Aalto University School of Science and Technology, Aalto,
Finland
Directional Audio Coding (DirAC), which is a method to
parametrize the directional sound field, can be applied to a low
bit-rate teleconferencing. The direction and diffuseness of the
sound field are analyzed from microphone signals within frequency-bands in one end, transmitted to the other end as a
metadata in a single channel with an audio signal, and used to
reproduce spatial sound. In this paper one- and two-dimensional arrays of omnidirectional microphones to provide input
signals for DirAC teleconferencing are reviewed. A listening
test to measure speech intelligibility in DirAC teleconferencing
was conducted, when both one- and two-dimensional microphone arrays were utilized. The results for the test are presented in this paper.
18:00
SPECIAL EVENT: BANQUET
Dinner Cruise—Japanese Cuisine
in a Japanese Houseboat
7-3
SUNDAY, OCTOBER 10: ALL EVENTS TAKE PLACE
AT TOKYO UNIVERSITY OF THE ARTS
Sunday, October 10
09:00–10:00
A speech-based and binaural Speech Transmission Index is
presented and evaluated in a variety of acoustical degradations and spatial conditions. The proposed method facilitates
the assessment of speech intelligibility in classical room
acoustics and electroacoustics by simply comparing a binaural
speech recording in adverse conditions with its clean original.
Both the binaural processing stage and the speech-based
Speech Transmission Index method are effective and computationally fast realizations. The central part of the binaural processor forms a cross-correlation stage that is designed to replicate psychoacoustic data of binaural interaction.
Supplemented with the head shadow effect, which is generated in a “better-ear” fashion, a fair amount of the binaural advantage in speech intelligibility is modeled. An evaluation of
the method was performed in a battery of listening tests.
These tests incorporate different degradations, as e.g., stationary noise and fluctuation noise, a set of nonlinear signal alterations, including a speech enhancement processor, and a
multitude of spatial configurations with different room acoustics
and with up to four interferers. As a result, the objective
method offers a stable prediction of the subjective results in
binaural speech intelligibility throughout most of the linear
degradations. In spite of that, the full account of the binaural
advantage is not achieved by the current implementation of the
method, which suggests further research.
KEYNOTE ADDRESS 3
Space Concept in the Contemporary Music
Keynote Speaker: Mikako Mizuno, Nagoya City University
In the presentation the space idea in the contemporary music will be
discussed. The term “Contemporary Music” has a limited meaning,
that is, the music of avant-garde style, especially created by the composers of the second half of the twentieth century.
The relationship between space and music can be discussed only
when the composers have some technical method to realize their spatial
ideas. The main method here to be presented is that of “Tekkokan,”
which was produced by the up and coming composer at that time, Toru
Takemitsu in the World’s Fair 1970. His special idea was realized by the
architectural design and the huge loudspeaker system controlled by in
original way. Another example hat realizes the musical idea as architecture will be also discussed, including the musical pieces by Iannis
Xenakis and Luigi Nono.
Sunday, October 10
10:00–12:00
PAPER SESSION 7: PERCEPTION AND EVALUATION
OF SPATIAL AUDIO—PART 2
7-1
Perceptual Localization of a Phantom Sound Image
for Ultrahigh-Definition TV—Young Woo Lee, Sunmin Kim,
Samsung Electronics Co., Ltd., Suwon, Gyeonggi-do, Korea
This paper presents a localization perception of a phantom
sound image for ultrahigh-definition TV with respect to various
loudspeaker configurations: two-horizontal, two-vertical, and
triplet loudspeakers. Vector base amplitude panning algorithm
with modification for non-equidistant loudspeaker setup is
applied to create the phantom sound image. In order to practically study the localization performance in real situations, the
listening tests were conducted at the on-axis and off-axis positions in normal listening rooms. A method of adjustment that
can reduce the ambiguity of a perceived angle is exploited to
Evaluation of a Speech-Based and Binaural Speech
Transmission Index—Anton Schlesinger,1 Juan-Pablo
Ramirez,2 Marinus M. Boone1
1Delft University of Technology, Delft, The Netherlands
2Deutsche Telekom Laboratories, Berlin Instutute of
Technology, Berlin, Germany
7-4
Listening and Conversational Quality of Spatial Audio
Conferencing—Alexander Raake, Claudia Schlegel, Matthias
Geier, Jens Ahrens, Deutsche Telekom Laboratories, TU
Berlin, Berlin, Germany
We present the results of a listening and a conversation tests
on the quality of spatial and non-spatial audio conferences. To
this aim, we have developed conversation test scenarios for
audio conferences with three remote participants in order to
carry out quality evaluation tests for audio-conferences that
are comparable with similar scenarios for traditional one-toone telephone conversation assessment. We have applied the
test scenarios during a conversation test, to (i) validate the test
Microphone Arrays, and Spatial Filters—Philippe-Aubert
Gauthier, Cédric Camier, Yann Pasco, Éric Chambatte, Alain
Berry, Université de Sherbrooke, Sherbrooke, Quebec,
Canada and McGill University, Montreal, Quebec, Canada
scenarios, (ii) in a realistic usage context measure the advantages of spatial versus non-spatial audio conferencing, and in
relation with the quality impact due to the transmitted speech
bandwidth, and (iii) provide recordings of conferences for later
use in listening tests. In the conversation test, we have compared different bandwidths (narrowband/NB, 300-3400 Hz;
wideband/WB, 50-7000 Hz; full-band/FB, 20-22000 Hz), spatial versus non-spatial headphone-based rendering, and channels with and without talker echo. In a subsequent listening
test using recorded conferences, we have attempted to assess
the quality of spatial and non-spatial audio-conferencing in a
more detailed fashion, including aspects such as speaker identification and memory.
Sunday, October 10
Sound field extrapolation is useful for measurement, description,
and characterization of sound environments and sound fields
that must be reproduced using spatial sound systems such as
Wave Field Synthesis, Ambisonics, etc. In this paper two methods are compared: inverse problems and virtual microphone
arrays with filtering in the cylindrical harmonics domain. The
goal was to define and identify methods that could accommodate to various non-uniform sensor arrays (i.e., non array-specific methods) and that are less sensitive to measurement noise.
According to the results presented in this paper, it seems that
the method based on the inverse problem with Tikhonov regularization is less sensitive to measurement noise.
10:00 – 12:00
8-2
WORKSHOP 8
The Art and Practice of Multichannel Field Recording
Chair:
Charles Fox, University of Regina
Panelists: Florian Camerer, ORF
Yasuo Hijikata, Field recordist
Mick Sawaguchi, Mick Sound Lab.
Carrying their audio kit in a backpack, sound recordists have been
trekking into the wild, meeting the unique demands of multichannel
field recording in the natural environment with creativity and skills that
continue to push the boundaries of sound recording. The extreme
weather and remote locations have not prevented these adventurous
audio practitioners from developing and experimenting with recording
methods, achieving high quality results that provide unique, engaging
sonic experiences for audiences. The “The Art and Practice of
Multichannel Field Recording” panel will share their knowledge in
multichannel, immersive location recording research and methods that
are an invaluable part of creating the three-dimensional audio
experience.
A Spherical Microphone Array for Synthesizing Virtual
Directive Microphones in Live Broadcasting and in
Postproduction—Angelo Farina,1 Andrea Capra,1 Lorenzo
Chiesi,1 Leonardo Scopece2
1University of Parma, Parma, Italy
2RAI CRIT Research and Technology Innovation Center,
Turin, Italy
The paper describes the theory and the first operational results
of a new multichannel recording system based on a 32-capsule spherical microphone array. Up to 7 virtual microphones
can be synthesized in real-time, choosing dynamically the
directivity pattern (from cardioid to 6th-order ultradirective) and
the aiming. A graphical user’s interface allows for moving the
virtual microphones over a 360-degree video image. The system employs a novel mathematical theory for computing the
matrix of massive FIR filters, which are convolved in real time
and with small latency thanks to a partitioned convolution
processor.
Sunday, October 10
13:00 – 14:30
WORKSHOP 9
Sunday, October 10
12:00
New Spatial Audio Coding Methods Based
on Time-Frequency Processing
SPECIAL EVENT
The Surround-Scape Lunchtime Concert
Presenters: Shiro Murakami, Tokyo University of the Arts
Mick Sawaguchi, Mick Sound Lab.
What Is “Surround-Scape”?
You may have heard of the term “sound scape,” and surround-scape
is related to that. The concept is a combination of Nature surround
sound and Music. Mick Sawaguchi has been recording surround
sound out in nature since 2000, concentrating on specific themes
such as wind, waves, forest sounds, or the sounds of the seasons so
far. A careful selection of these sounds forms the basis for musicians
who play live hearing those sounds, combining composed elements,
and improvisation—all inspired by the natural surround sound
ambiences. The Surround-scape will let you feel the “Power of the
Earth” and give you a “Hug by Nature”!
The lunch concert will consist of a few pieces of Sawaguchi’s
surround clips with piano music played by Shiro Murakami who is a
fourth year student of Tokyo University of the Arts. The venue will
have a surround sound loudspeaker setup around the seating area,
the piano will be placed in the middle of the studio. Enjoy a peaceful
atmosphere and feel a hug by Nature Surround Sound after lunch!
Sunday, October 10
13:00–14:00
PAPER SESSION 8: MICROPHONE AND MIXING TECHNIQUES
8-1
Sound Field Extrapolation: Inverse Problems, Virtual
Chair: Ville Pulkki, Aalto University
Panelists: Jean-Marc Jot, DTS, Inc.
Juha Vilkamo, Fraunhofer IIS
The time-frequency resolution of human hearing has been taken into
account for long time in perceptual audio codecs. Recently, the spatial
resolution of humans has also been exploited in time-frequency
processing. This has already lead into new methods to
represent parametrically multichannel audio files and B-format
recordings in time-frequency domain. This leads to drastic decrease in
bit rate, and in reproduction of B-format recordings, also to increased
audio quality compared to traditional methods. This workshop covers
the capabilities and incapabilities of human spatial hearing and the
audio techniques that exploit these features. Typically the techniques
are based on estimation of directional information for each auditory
frequency channel, which information is then used in further
processing.
Sunday, October 10
14:00–16:30
PAPER SESSION 9: SPATIALIZATION AND REVERBERATION
9-1
Binaural Reverberation Using Two Parallel Feedback
Delay Networks—Fritz Menzer, Ecole Polytechnique Fédérale
de Lausanne, Lausanne, Switzerland
Binaural reverberators often use a tapped delay line to model
early reflections, to which a delayed late reverberation tail is
added. This paper introduces a reverberator structure where
the early reflections are modeled using a feedback delay network (FDN) and the late reverberation is modeled by a second
FDN in parallel. The impulse responses of both FDNs are
highly overlapping, simulating the presence of diffuse reverberation from the beginning of the impulse response. A particular
feature of this reverberator is the reproduction of first and second order reflections using only head related transfer functions
(HRTFs) for the directions of the first order reflections, allowing
to reduce the computational complexity.
9-2
Space Builder: An Impulse Response-Based Tool for
Immersive 22.2 Channel Ambiance Design—Wieslaw
Woszczyk, Brett Leonard, Doyuen Ko, McGill University,
Montreal, Quebec, Canada
The convolution-based Space Builder employs segments of
impulse responses to construct flexible spatial designs using
an intuitive graphic interface. The system uses multiple lowlatency convolution engines loading data from a library of multichannel impulse responses, a 128-channel MADI router and
mixer operating at 24/96 resolution, and a MIDI controller. The
controller reveals different levels of complexity depending on
the needs of the user. The design, architecture, and functionality of the modules and the system are described. The system
provides spatial up-conversion from 1 to 24 channels and has
capacity to expand the number of channels beyond 22.2. Future applications for the system, including live sound applications, will be presented.
9-3
Sparse Frequency-Domain Reverberator—Juha Vilkamo,
Bernhard Neugebauer, Jan Plogsties, Fraunhofer IIS,
Erlangen, Germany
Sunday, October 10
WORKSHOP 10
The Importance of the Direct to Reverberant Ratio
in the Perception of Distance, Localization,
Clarity, and Envelopment
Presenter: David Griesinger
The Direct to Reverberant ratio (D/R)—the ratio of the energy in the
first wave front to the reflected sound energy—is absent from most
discussions of room acoustics. Yet only the direct sound (DS)
provides information about the localization and distance of a sound
source. This paper discusses the perception of DS in a reverberant
field depends on the D/R and the time delay between the DS and the
reverberant energy. Threshold data for DS perception will be
presented, and the implications for listening rooms, hall design, and
electronic enhancement will be discussed. We find that both clarity
and envelopment depend on DS detection. In listening rooms the
direct sound must be at least equal to the total reflected energy for
accurate imaging. As the room becomes larger (and the time delay
increases) the threshold goes down. Some conclusions: typical
listening rooms benefit from directional loudspeakers, small concert
halls should not have a shoe-box shape, early reflections need not be
lateral, and electro acoustic enhancement of late reverberation may
be vital in small halls.
Sunday, October 10
10-1
9-5
Creation of Binaural Impulse Responses for HighPresence Reproduction—Yakashi Nishi,1 Shintarou Hijii2
1University of Kitakyushu, Fukuoka, Japan
2Kyushu Regional Police Bureau Info-Communications
Division, Fukuoka, Japan
Geometrical acoustic simulation is time consuming and can
only achieve fixed impulse responses. A new system to create
binaural impulse responses, whose acoustical characteristics
can be arbitrarily specified, is proposed. Psychologically important parameters such as reverberation time, clarity index, and
inter-aural cross-correlation can be independently and significantly controlled. The fundamental procedures to realize this
system are described. Simulation experiments were conducted
to verify the availability and effectiveness of the algorithm.
Theoretical Study and Numerical Analysis of 3-D Sound
Field Reproduction System Based on Directional
Microphones and Boundary Surface Control—Toshiyuki
Kimura, National Institute of Information and Communications
Technology, Koganei, Tokyo, Japan
Three-dimensional sound field reproduction using directional
microphones and wave field synthesis can be used to synthesize wave fronts in a listening area using directional microphones and loudspeakers placed at the boundary of the area;
the position of the loudspeakers is the same as that of microphones in this technique. Thus, it is very difficult to construct
an audio-visual virtual reality system using this technique
because the screen or display of the visual system cannot be
placed at the position of loudspeakers. In order to reproduce
the 3-D sound field of the listening area even when the loudspeakers are not placed at the boundary of the area, this
paper proposes a 3-D sound field reproduction system using
directional microphones and boundary surface control.
Results of a computer simulation show that the proposed system can reproduce the 3-D sound field in the listening area
more accurately than the conventional system.
Design and Implementation of an Interactive Room
Simulation for Wave Field Synthesis—Frank Melchior,1
Christoph Sladeczek,2 Andreas Partzsch,1 Sandra Brix2
1IOSONO GmbH, Erfurt, Germany
2Fraunhofer IDMT, Ilmenau, Germany
This paper describes a novel concept and implementation of
reproduction system independent room simulation. The system
is based on circular array measurements that deliver high
spatial resolution impulse responses. Using this data a reproduction system independent interaction method has been
developed. A detailed description of the system and interaction
methods will be given. Furthermore the first implementation of
the system for Wave Field Synthesis will be described in detail.
16:30–18:00
PAPER SESSION 10: MEASUREMENT AND ANALYSIS
OF 3-D SOUND
Numerous applications require realistic and computationally efficient late reverberation. In this paper the perceptually relevant
properties of reverberation are identified, and a novel frequency
transform domain reverberator that fulfills these properties is proposed. Listening test results confirm that the proposed reverberator has a perceptual quality equivalent to the ideal solution of
decaying Gaussian noise in frequency bands.
9-4
14:30 – 15:30
10-2
Characteristics of Near-Field Head-Related Transfer
Function for KEMAR—Guang-Zheng Yu, Bo-Sun Xie, Dan
Rao, South China University of Technology, Guangzhou,
China
A spherical dodecahedron sound source was designed to
approximate a point source. The resulted sound source is
approximately non-directivity below 10 kHz; the multiple scattering caused by sound source are negligible, and low-frequency characteristics of sound source is improved. By using
the sound source, the near-field head-related transfer function
(HRTF) database of a Knowles Electronic Manikin for Acoustic
Research (KEMAR) has been established. Based on the
database, characteristics of the near-field HRTFs were analyzed and compared with those of the far-field HRTFs in
frequency domain and time domain respectively. Finally, the
variations of binaural localization cues including interaural level
difference (ILD) and interaural time difference (ITD) with
source distance were analyzed, so as to evaluate the distance
CONFERENCE SPONSORS
localization cues contained in the near-field HRTFs.
10-3
Application of Synchronous Averaging Method
in Frequency Domain for Asynchronous System to Sound
Distance Measurement—Hiroshi Koide, Qiusheng Xie,
Kouichi Tsuchiya, Tomohiko Endo, Akira Ebisawa, Shokichiro
Hino, Etani Electronics Co., Ltd., Tokyo, Japan
In this paper, first we introduce the principle of synchronous
averaging in frequency domain that is applicable to asynchronous systems where the transmitting side and receiving
side are not synchronized. Then we discuss the possibility of
applying synchronous averaging for measuring the distance of
real or virtual sound sources. If the position of real or virtual
sound sources in asynchronous systems can be identified,
then there will be no need to consider the wiring of transmitting
side and receiving side. This works well in applications of measurement at two or more points over a large space. Here, we
identify the virtual sound source in asynchronous systems by
using an adjacent four points method.
Sunday, October 10
15:30 – 18:00
WORKSHOP 11
In Pursuit of Spatialized Sound in Games
Co-chairs: Steven P. Martz, THX Ltd.
Kazutaka Someya, Art Spirits Inc.
Panelists: Kanako Kakino, Namco Bandai Games Inc.
Tetsukazu Nakanishi, Namco Bandai Games Inc.
Masayuki Sato, Square Enix Co. Ltd.
Astushi Suganuma, Square Enix Co., Ltd.
Kazuya Takimoto, Capcom Co., Ltd.
Recently, it’s been possible to create highly interactive surround environments in games due to the rapid technological advancements in the latest game consoles. Now we can enjoy high quality surround sound with
a very realistic experience.
We usually create spatialized sound by mixing individual sound
elements and use sound effects to get them to fit each scene of the
game in real-time as it progresses. In order to accomplish that, we’ve
been trying to combine various raw sounds and make full use of current technology. We, game audio producers, would like to create a
sense of reality in spatialized sound. We think is a very important role
for us.
Therefore, in this workshop, we’d like to focus on a “pursuit of spatialized sound in games” and take a good look at various examples of spatialized sounds that are based on current technological approaches and
techniques. Also, we'll examine a range of issues and perspectives in
game audio. All of our presenters are well-known game-audio creators
with various backgrounds and work in Japan, USA, and Europe. Each
panelist will supply an original presentation, which will be related to each
culture and background. We plan to have a discussion as well as a Q&A
session with the audience at the end of our workshop, hoping to enrich
their view of game audio.
Sunday, October 10
CLOSING REMARKS
18:00
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement