Research & Development
White Paper
WHP 324
September 2016
From Clean Audio to Object Based Broadcasting
Mike Armstrong
White Paper WHP 324
From Clean Audio to Object Based Broadcasting
Mike Armstrong
This paper reviews the history of the “clean audio” concept, the nature of the
proposals and the limited scope of the work done in this area. It then examines
the issues that would have to be investigated in order to justify the creation of a
specific clean audio access service, using the development of the television
audio description service as a comparison. It then explores at the role of speech,
sound effects and music in television sound and how they combine to create the
complete narrative and what may happen if the sound is remixed by an automatic
process. It also lists all the other factors which cause problems for our audience
when trying to follow the speech content in television programmes. It then goes
on to look at the progress made in recent years in providing alternative mixes as
part of work on an object based broadcasting which is an approach which can
provide a wider set of tools for the personalisation of content.
Additional key words: hearing impairment, deaf, television, radio, music, sound
effects, background sound, object based audio.
White Papers are distributed freely on request.
Authorisation of the Chief Scientist or General Manager
is required for publication.
© BBC 2016. All rights reserved. Except as provided below, no part of this document may be
reproduced in any material form (including photocopying or storing it in any medium by electronic
means) without the prior written permission of BBC except in accordance with the provisions of the
(UK) Copyright, Designs and Patents Act 1988.
The BBC grants permission to individuals and organisations to make copies of the entire document
(including this copyright notice) for their own internal use. No copies of this document may be
published, distributed or made available to third parties whether by paper, electronic or other
means without the BBC's prior written permission. Where necessary, third parties should be
directed to the relevant page on BBC's website at for a copy of
this document.
White Paper WHP 324
From Clean Audio to Object Based Broadcasting
Mike Armstrong
It is well known that background sounds reduce the ability to comprehend speech and that this
effect is greater for those who have some form of hearing loss. With changing population
demographics the proportion of TV audiences with age related hearing loss is going to increase, so
the proportion of our audience affected will increase. For broadcasters, a tension can arise
between the desire to create rich multilayered soundtracks and the ability of sections of the
audience to follow the speech free of background sounds. However, the issues that affect whether
our audience can follow the content of a programme are more complex than just the
comprehension of speech in noise. The background sound in a TV programme is not just noise, it
contains sound effects and music that help tell the story, pointing the audience towards events and
moods that complete the narrative.
It has been suggested that broadcasters could provide a separate “clean audio” soundtrack with
reduced or even removed background sound effects and music. It has even been proposed as a
fourth access service [1]. However, for broadcasters the production of an alternative mix can be
both time-consuming and expensive and as a result there has been little progress, if any, towards
defining the needs of a clean audio service.
One of the most common audience complaints made about the BBC’s programming is that
narration or dialogue is obscured by excessive background sound and it regularly features in BBC
programmes like Points of View and Feedback. Whilst occasionally the BBC broadcasts
programmes where the sound balance is poor or faulty, the majority of complaints are about
programmes where the sound is broadcast as intended. Audience complaints about background
music are often portrayed as a recent issue caused by modern trends in TV production; however
this is not entirely the case. The TV programme Points of View 30th November 2008 carried an
item featuring a cutting from over 60 years ago complaining about the incidental music in
broadcasts being too loud and BBC R&D first published a research paper on the subject by
Mathers 25 years ago [2].
The television audience is, on average, getting older and along with increasing age comes hearing
loss (see below). Figures published in 2015 by the Office for National Statistics said that:The number of people aged 75 and over is projected to rise by 89.3%, to 9.9 million, by
mid-2039. The number of people aged 85 and over is projected to more than double, to
reach 3.6 million by mid-2039 and the number of centenarians is projected to rise nearly 6
fold, from 14,000 at mid-2014 to 83,000 at mid-2039. [3]
So while complaints about background music are not a recent trend, broadcasters will have to
serve an audience in the coming years with an audience which will have a higher proportion of
older people who may have difficulty following the speech content of their programmes.
2.1 Age Related Hearing Loss
Around 14% of the UK population suffer noticeable hearing loss of which about 2 million use
hearing aids [4]. Hearing, as measured by pure tones, declines with age, more significantly from
the age of 50, a process known as presbycusis or age related hearing loss.
Hearing loss in older people is greatest at higher frequencies and this loss increases with age and
moves to lower frequencies. As the hearing loss impinges on the frequency bands important for the
comprehension of speech the problems of communication and understanding become more
significant. As a result, older people tend to listen to the television or radio at higher volume
settings. However, the volume at which sound becomes uncomfortable tends to reduce, resulting
in discomfort when broadcast sound levels suddenly rise. This can cause particular problems at
programme boundaries, during commercial breaks and programme trails [5].
Stanley Coren at the University of British Colombia plotted the trend of the listeners’ preference
across a group of 799 subjects ranging from 17 to 92 years of age. From the results he predicts a
comfort level for listening to the TV or radio ranges from around 55dB at 20 years of age, to 63dB
at 40, 71dB at 60 and around 82dB at 80 years of age. Coren notes that this trend does not ‘keep
pace’ with age related loss in hearing sensitivity and speculates as to whether this is because older
people become pragmatic about getting their TV or radio to produce such high sound levels [6].
The fact that an older viewer might need their television set to produce sound levels over 20dB
louder than younger viewers is a challenge for a TV or radio with small loudspeakers and it is likely
that producing such increased sound levels also results in significantly higher levels of distortion.
Older people also have reduced temporal hearing abilities, they are less able to detect short gaps
in sound which reduces their ability to detect differences in words like “catch” and “cash” which are
differentiated by a brief pause. They also have a reduced ability to follow the pitch of sounds, a
process known as phase locking, which is thought to be important for the listener’s ability to track a
voice in the presence of competing sounds. There is also a reduction in spectral resolution with a
widening of the auditory filters resulting in a smeared representation of the spectral shape of a
sound, further reducing the ability to perceive speech [5]. It may be possible to gain some feel for
the impact of these problems with a hearing loss simulator [7].
The ability to comprehend speech therefore declines with age. Even for people with relatively
normal hearing (when measured through the ability to hear pure tones) the ability to hear speech
declines significantly if the speech has suffered distortion or is competing with other sounds.
Bergman tested adults ranging in ages from 20 to 89 and plotted their performance by age. He
tested the ability to understand undistorted speech without interference, alongside tests for speech
with reverberation, speech with overlapping words and speech with interruptions (i.e. regular
dropouts). In all three tests, the older subjects showed a more dramatic drop in performance
compared to the subjects aged 20 to 29 years. The reverberant speech had the least additional
impact, except in the 80-89 year old group. The next greatest impact was caused by the
overlapping speech. The greatest relative decline by age was caused by the interrupted speech
with the 60-69 age group scoring 50% below the 20-29 year olds. On undistorted speech the
performance of the older group had been only 5% below the 20-29 year olds. [8]
Whilst there is a range of different types of hearing loss with different characteristics, on average
there is a trend for increased hearing loss, as measured in terms of sensitivity to pure tones, to
correlate with the need for an increased speech to noise ratio to achieve the same level of
intelligibility. Using speech in noise tests, which adjust the level of background noise at which
subjects achieve a 50% word recognition score, the level of performance of people with normal
hearing requires an increase in the speech to noise ratio of around 4-6dB for subjects with mild to
moderate hearing loss, around 7-9dB for moderate to severe hearing loss and up to 18dB for
severe to profound hearing loss. [9]
Television Sound
Television sound is remarkably complex. Also, compared to film soundtracks, TV sound is a
neglected area of study. Hilmes argues that this may be because the formalised nature of film
soundtracks makes it relatively easy to study when compared to the wide variety and complexity of
television sound [10]. The form and style of television sound has its roots in radio broadcasting and
sound plays a greater role than it does in film. Television and radio, unlike film, are both essentially
live media, continually broadcast at this moment in time whilst film is an entirely recorded medium
with an explicit beginning and end [11]. Television sound also contains many more implicit and
explicit layers. This is perhaps most obvious in the many roles that speech plays in television.
Butler points out that the first function of television sound is to direct the viewer’s attention. Unlike
cinema or theatre, television is watched amongst a variety of competing stimuli, and whilst the
audience can turn away from the screen, sound can pull the audience’s attention back from other
things of interest. Cues like the sound of a ringing phone, the rising excitement of a commentator’s
voice or a change in the music can all cause the viewer to look up from some other task in time to
see an important part of the action. [12]
Television is primarily an entertainment medium and narrative is its dominant mode, even in
documentary or factual programmes. Unlike film, the television narrative is by and large carried by
the soundtrack. As Cohen puts it, “...narrative is primary, and the audience member is actively
engaged in constructing a narrative” [13]. Fisch argues that all programmes contain some level of
narrative [14]. Cohen’s point about the audience constructing the narrative, underlines the fact that
the narrative experienced by a member of the audience is the narrative that they construct for
themselves from what they see and hear, and from the context within which they comprehend
these elements. In this way television can be seen as a process of the encoding of a narrative into
sound and vision, its transmission via the television broadcast system, and its subsequent
decoding and understanding by the audience. The skill of television production is the extent to
which they are able to convey the intended meaning and experience to the audience and the
soundtrack is the primary conduit for the meaning.
Cohen, referencing Stam1, notes the five domains of film, video and multimedia; three sound
domains: music, speech and sound effects and two visual domains: the image and text. However,
in practice because television is often either live and or is made as-live, we need to add noise to
the three sound domains. Noise being any unwanted sounds, be they music, voices or general
environmental sounds like traffic or wind noise. It is also worth noting the difference between
diegetic and non-diegetic2 noise, as the cause of the first is visible and therefore the source is
easily identifiable, whilst the latter is less easily understood by the audience.
3.1 The Role of Speech
In film soundtracks diegetic speech is dominant, and non-diegetic narration is unusual. By contrast
the non-diegetic narrator is commonplace in TV, particularly in factual and news programmes.
Often a programme presenter will appear in shot to begin a narrative that continues as the video is
cut away to a video sequence, whilst at other times in the programme or clip the same narrator’s
voice is added to the programme in post production. In some cases the voice track added in post
production will be bridging gaps or faults in the original shoot whilst in other cases the presenter
may be reflecting back on what they said to camera at the time. In some programmes there may
even be multiple narrators. For example the BBC series “Who Do You Think You Are” has both a
non-diegetic series narrator and both diegetic and non-diegetic narration by the programme’s
subject. Sports coverage will often has several commentators or pundits who are rarely shown on
Stam, R. 2000, “Film Theory,” Malden, MA: Blackwell.
A diegetic sound is one which has a visible or implied source that is part of the on-screen narrative of the
programme, whilst a non-diegetic sound is one which has no on-screen source or implied presence in the
on-screen world.
In addition to the diverse range of non-diegetic voices within a programme, the TV channel itself
usually has a continuity announcer who links the programmes together. The announcer acts as the
host, sometimes talking over part of the programme, especially the end credits and occasionally
intervening to plug gaps caused by technical faults. These multiple layers of voices and roles are
largely taken for granted by television viewers, but can become problematic for people with poor
hearing who cannot distinguish between the voices.
The complexity of these structures and roles can be more clearly understood by examining the
ways in which they were undermined in television programmes like Monty Python’s Flying Circus.
Earlier forms of comedic subversion were pioneered on radio by the Goon Show which created
comedy around the very structures of a radio programme itself, subverting the conventions in a
way that the audience could understand. It is clear from both examples that there is an implicit
understanding of the convention of broadcast soundtracks between the programme makers and
the audience. This convention varies between different audiences and programme genres, and
programme makers who inadvertently cross the boundaries between conventions can find their
work subject to complaints from the audience.
3.2 The Roles of Music
This section concentrates on the use of non-diegetic soundtrack music to accompany narrative and
does not cover music based programmes where the musicians are part of the programme. Cases
where background music is used in a diegetic mode are quite rare and often done for comedic
Whilst little work on the role of music has focused on television, there are some sources which are
broad enough to include TV. From an examination of multimedia, Cohen lists eight separate
functions of music [15]:•
Masking, covering gaps and masking unwanted noises.
Provision of Continuity, signalling the continuation of a narrative with a theme carried
over a scene change, or signalling discontinuity with a change of musical theme.
Direction of Attention, music that is congruent with particular on-screen actions will direct
attention to that action.
Mood Induction, changing how the viewer feels about the scene.
Communication of Meaning, providing interpretation of the visual images. Where a visual
excerpt is ambiguous, music can disambiguate the interpretation of the scene [13]. It can
also be used to signify location in some way through the style of music.
Music as Cue for Memory, links between the visual images and music can form within one
viewing and the repeated use of the combination will reinforce the recognition of the scene.
Arousal and Focal Attention, the inclusion of music will increase the level of stimulation of
the viewer as different parts of the brain are active in listening to music.
Musical Aesthetics, associating the pleasurable experience of listening to the music with
the programme. Though this can also work in a negative manner.
Programme makers use music in many, if not all these ways, though the style and nature of its use
can vary enormously; from a very literal approach, where a well known song is used to directly
comment on the visual images4 all the way through to specially composed music where the
musical score forms the basis for the pace and timing of the narration and video editing. The
impact on the audience will vary according to the tastes and expectations of the individual viewer.
One example of this in film is Hitchcocks film "Rear Window" where the source is either seen or suggested
as coming from a neighbouring apartment window, whilst others done for comedic effect. For example in
“Blue Harvest” the Familly Guy remake of "Star Wars", the cartoon versions of John Williams the composer
and the orchestra are introduced in shot and in "Blazing Saddles" Count Basie and his band appear in shot
as the camera pans following Mel Brooks as he rides across the prairie.
Such as the use of the song “Daisy Bell” whenever there is a tandem in shot.
Whilst many people will be almost unaware of the music, others can find it highly objectionable
[16]. Certainly there is a significant issue around empathy and timing and the extent to which the
music complements the images and shows congruency with the narrative. Howard Davidson,
Professor of composition for screen at the Royal College of Music says that music is best used to
provide a subliminal mood, conducive to understanding a complex narrative and subtext [17].
Studies of the perception of a video sequence when accompanied by different soundtracks show
the impact that the choice of music can have. Infante and Berg studied the impact of music
modality on perceptions of a scene, and showed how it had the stronger impact on neutral or sad
scenes but less effect on happy ones [18]. Cohen points out the importance of temporal structural
congruence between music and the visual activity in the video and the way in which that can
change the impact of a piece of music on the video images [13]. Hung’s studies, based on TV
advertising, showed the effect that music can have on the perceived pace of a video sequence. A
heavy rock track caused viewers to complain about rapid cutting and the camera work, where no
such comments were made when the same video sequence was shown with light classical music
[19]. A further study looking at the impact of exchanging the music soundtracks between two
different coffee adverts. Here the subjects were able to construct a common meaning from one of
the new, incongruent pairings through association with the Indiana Jones films, but the second
incongruent pairing just created confusion. The work demonstrated how the subjects understood
the meaning of the adverts through the combination of the music and video and their cultural
references [20].
Wakshlag, Reitz & Zillman’s work on the impact of music on learning suggests that, for children at
least, appealing, fast tempo background music causes attention to fall off and detracts from
learning [21]. However, another conclusion of their work is the observation that fast and appealing
music was effective in attracting children to select a programme and grabbing their interest. For the
programme maker this may be a key reason to make use of such music. It gets people watching
and this is more important in an entertainment context than any negative impact on learning. It may
also explain the conflict between making programmes for those parts of the audience who are
keen to learn and prefer a music-free soundtrack and making a programme which grabs the
interest of a wider audience.
However, in many high quality factual programmes music is an integral part of the programme. In
many of the BBC’s wildlife series the music is composed alongside draft scripts and rough video
edits and the three elements influence each other during the production process to create the final
product. Composer Ben Salisbury gives an account of this approach for the BBC series The Life of
Mammals where he worked with the presenter David Attenborough, himself an accomplished piano
player, in the composition of the music. The result was a narration which ebbs and flows with the
music, the two complementing each other [22]. The fact that the music is an equal partner with the
speech and video in this style of programme making can be seen in the success of the Blue Planet
Live! orchestral shows where another BBC wildlife series was adapted to create a two hour concert
accompanied by edited video footage projected on a large video screen [23].
3.3 The role of Sound Effects
Sound effects can, to some extent, take on all of the roles of non-diegetic music noted above.
Indeed the boundary between what is classified as music and what is a sound effect blurs
completely when looking at the work of the BBC’s Radiophonic Workshop. Many early soundtracks
created for Doctor Who defy classification [24].
However, the key difference with music is in the role that diegetic sound effects can have in a
programme’s narrative. Sound effects often have a key role in establishing the location or
environment for the onscreen action. The sound effects may have been part of the location
recording at the time of filming, or may be added in post production, but their transition will signal a
scene change, from a quiet office to a station platform, or from a busy roadside to a Scottish loch.
Without the effects the visual scene would feel incongruent and detached from the narrative. The
detachment is even more of an issue with on-screen sources of sound where the sound is missing
or out of sync with the pictures.
Non-diegetic sounds can also be used to convey events that are not seen by the audience. For
example the off-screen crash is a common device used for comedic effect or to save the cost of
filming or creating an otherwise complex and expensive visual sequence [25]. The sound effect
paints the visual images in the viewer’s imagination. Such effects cannot be removed without
substantially affecting the narrative.
Sound effects can also be used in a non-narrative way to give a scene the feeling of continuity
where the video has been heavily edited. The continuity of the background sound, such as bird
song or wind noise can be used to mask the cuts in the video and the soundtrack. They can be
used to create a mood, for example altering a natural sound to give it a creepy, unworldly nature.
3.4 Noise Issues
In the context of this paper, noise is taken to be unwanted sound that have become part of a
recording. In the case of field recordings, such as an interview by a busy main road, the diegetic
nature of the noise gives it a context for the viewer, making it easier to identify and understand
than noise that has no apparent source. Non-diegetic noise is more difficult for the audience to
identify and put in context and so more likely to result in complaints. For example, a presenter
talking to camera in the open stand at a sports event can often struggle to be heard against the
local PA system which may be playing music, giving out announcements or even running a fire
alarm test. In such circumstances an explanation of the noise from the presenter can help place
the noise in context.
3.5 Other Cultural Objections to the Use of Music
So why are so many of complaints all about the background music? The issue may be that the
audience understands that the music is under the control of the production team and something
that could have left out [26]. Also there is a section of the audience who may object to the use of
music as background in many contexts. For example, this particularly forthright opinion on the
matter originally posted on the Points of View message board in June 20085
I strongly dislike ALL background music. In documentarys, in dramas, in lifts, in restaurants.
All of it. Not because I dislike music, but because I do like it. It is an insult to treat it as a
mere background.
Notice that the objection isn’t just the use of music in television; it is about the “inappropriate” use
of music in any context. There are a number of campaigning groups in this area, for example,
“Pipedown, the campaign for freedom from piped music” [27]. Pipedown was amongst the
organisations that supported Lord Beaumont of Whitley’s unsuccessfully 2006 bill to ban piped
music and the showing of television programmes on public transport and in the public areas of
hospitals [28]. Before that in 2000 the then MP for Salisbury, Robert Key submitted a ten minute
rule bill on a similar theme [29]. So there is a wider cultural context for the objection to the use of
music as background that extends well beyond broadcasting.
The message board is no longer online.
3.6 Music Free Alternative Soundtrack – The Nature of Britain
To the best of my knowledge there has only ever been one case of a programme being remade
without music. The programme was “The Nature of Britain: Secret Britain”, broadcast on BBC One
and the Red Button service on 5th December 2007. The programme came about because, the then
head of interactive TV, Tom Williams asked Points of View to run a web survey about audio
switching using the red button service. Examples given were the removal of the background music
from documentaries and the removal of commentators from sports coverage. Subsequently the
decision was made to create a music-free option for a programme. The episode chosen was the
last in the series of The Nature of Britain, and series producer Stephen Moss agreed to remix the
sound of the final programme to make a music free version. Stephen Moss described the new
version as a “very different soundtrack” and commented “personally I still think it works best with
the music there.”
The remixed version of the programme was however, not simply the original soundtrack without
music. To make the new soundtrack work as a complete entity the production team increased the
level and use of sound effects and brought the presenter back into the studio to add new narration
to bridge the gaps between elements of the programme left by the removal of the music. This
illustrated the problem that the removal of music would, in many cases, result in an incomplete
soundtrack. Therefore the production of music free versions of programmes is likely to require a
complete remix to create the new version. Despite claims of positive feedback to the BBC from a
few audience members and lobby groups this exercise has never been repeated. The costs of
making the music-free version are currently too high. The cost of distribution is not an issues as the
BBC Red Button service often provides alternative audio choices for sport. These have been for
live events such as football cup finals where the alternative audio already exists, such as the two
different local radio commentaries, for each team, in place of the TV commentary, or in other cases
the BBC Radio 5 Live commentary.
A clean audio service does not necessarily require a technological advancement. There are many
different ways in which the BBC could provide alternative mixes of TV programmes at present. The
Nature of Britain programme showed that the Red Button service is one route that doesn’t require
the audience to buy new hardware. Another similar option would be to provide a switch to an
alternative audio from within an MHEG application. It would also be possible to provide such
versions via iPlayer in the way that the Audio Description service is provided. The barrier to
providing these alternative mixes is the cost of making the new versions of the programme.
3.7 BBC Vision Audibility Project
Research carried out by BBC Vision in 2010 revealed that the pattern of complaints does not
reflect the far wider range of problems which viewers have with TV sound. BBC Vision carried out
two separate surveys with its Pulse on-line panel of 20,000 viewers to identify the issues which
caused problems with television sound. The research showed that nearly 60% of viewers had
some trouble hearing what was said in at least one TV programme during the one week survey
period [30].
The research identified four key factors that can make it hard for viewers to hear what is being
said, which were, in order of importance:
Clarity of speech: poor and very fast delivery, mumbling and muffled dialogue, turning away
from camera, people talking over each other, trailing off at the end of sentences.
Unfamiliar or strong accents: Audiences find accents other than their own harder to
Background noise: locations with heavy traffic, babbling streams, farmyard animals, in fact
any intrusive background noise can make it difficult to hear what's being said.
Background music: particularly heavily percussive music or music with spikes that cut
across dialogue.
Any of these issues can create problems for viewers, but BBC Vision’s work showed that when
these factors combine, then many people struggle to understand the speech [31].
As part of the study the Voice of the Listener & Viewer also gathered diaries from another 506
people all aged 65 and over who did not use the internet. The VLV give figures for the response
across 22 “problematical” programmes. They report that 11% of the respondents cited problems
with background music and 13% with background noise whilst 19% had problems with accents &
dialects, 14% with mumbling and poor diction and 11% with talking too fast [32]. These figures
suggest that the removal of music and sound effects to create a “clean audio” soundtrack could, at
best, improve the audibility of the speech in only a quarter of cases. New BBC Editorial Policy
Guidelines [31] have been produced alongside a series of online training videos setting out best
practice in creating clear sound [33]. This is part of a series of online guides for audio production
by BBC Academy covering the key issues [34] which includes a video from the Sound Matters
workshops in which I presented many of the issues covered in this paper [35].
The approach that BBC Vision has taken has been about improving the viewing experience for all
audience members. Improving the main soundtrack makes the TV viewing experience as inclusive
as possible, whereas providing alternative mixes for a particular part of the audience would make
TV viewing a more fragmented activity. Most of the issues can be addressed by making better
sound a high priority during production [34]. Any alternative soundtrack without music or with
reduced music and effects still needs to make sense as a complete programme and still doesn’t
address the majority of the issues that the BBC Vision survey identified.
3.8 BBC Vision Audibility Project - Background Sound Trial
The final part of the BBC Vision study of audibility was a test involving different mixes of
programme sound where the initial surveys had identified problems. Each of the eight BBC
programmes was remixed with the music and effects 4dB louder and again with the music and
effects 4dB quieter to give three versions including the original. Each of the 452 people who took
part in the online test was played two out of the three different mixes from random sample of three
programmes. A further test was carried out by Elizabeth Valentine at BBC R&D with a set of
thirteen participants recruited by the RNID who had some hearing loss and included interviews with
the people about their experience of watching television.
The tests with the online panel showed consistently that the 4dB increase in background sound
made the programme harder to follow in all cases. However, the results for the 4dB reduction in
background sound were mixed. In the factual programmes, two showed modest improvements with
the reduced background music and effects, but one was worse. The two drama programmes
showed modest improvements, as did the news clip. The sport clip showed almost no
improvement, whilst the trailer was worse with the reduced music and effects. Interestingly the two
pieces where the panel scored the broadcast mix higher than the mix with reduced background
were the more complex soundtracks.
The interviews with the RNID subjects revealed some of the issues of interest. People with hearing
loss find it difficult to identify and filter background sounds from speech. There is an inherent
conflict then between being able to identify a sound and the problem with that sound masking
speech. If that sound or the musical item forms part of the narrative then even people with impaired
hearing may not benefit from reduced background sounds as much as anticipated. However,
unnecessary background sounds or music with no clear narrative function are undesirable because
they overcomplicate the soundtrack.
3.9 Additional Issues
The BBC Vision work exposed a number of additional issues around the audibility of speech which
cannot be addressed by the creation of an alternative soundtrack. If you look along the complete
chain from production to the ears of the audience then several more issues are obvious. The table
below lists many of the issues which can impact on the audibility of the speech content of a
television programme.
Audibility Problem Space
Writing style –
complexity of
Direction and
Choice of shots –
showing actor’s
face at vital
moments in the
Voices, accents,
dialects & clarity
of delivery
Choice of location
- control of
background noise
Choice of
Skill of sound
Manual level
control vs
Priority given to
sound over video
– e.g.
allowed in shot
Retakes for
Voiceover booth
matching to field
recordings - ADR
Use of digital
compression and
codec choice
(mobile phone
Post Production
Voice processing,
noise reduction
/gating, level
processing or
other dynamic
range control not currently used
Television sound
quality – receiver,
amplifier and
Added music,
level, type,
purpose, style
Added sound
effects, level,
type purpose
Added voiceover
Video edit, choice
of shots
Loudness control
Dynamic range
Stereo or 5.1 mix
Inclusion of open
Audio encoding
quality and any
cascaded coding
Quality of closed
Use of mono,
stereo or 5.1
Room Acoustics
and the positions
of the television
and the viewer in
the room
Viewer’s hearing
- level of interest
in the programme
- knowledge of
the programme’s
subject matter
- expectations of
the programme
- language skills
- willingness to
use subtitles
Whilst all of these issues can degrade the audibility, only the ones that arise in post production and
broadcast can be addressed by a clean audio service. All of the issues in the home will remain the
same, the impact of the television and the room acoustic are out of the broadcaster’s control. The
production and capture issues cannot be overcome by simply removing music and sound effects.
The BBC Vision work on audibility focused on the wider issues of TV production, seeking to
improve the quality of production and sound capture as well as influencing the sound balance to
improve the quality and audibility of the main programme soundtracks to benefit the widest
possible audience.
The History of the Clean Audio Concept
The concept of providing an alternative feed of “clean dialogue” TV sound for can be traced back to
the late 1980s as engineers started to consider the possibilities for sound for HDTV services. The
earliest reference I have been able to find is a paper on HDTV sound to the 10th AES conference
by Meares who said that “additional services such as clean dialogue for the Hard of Hearing”
should be considered [36], also see [37, 38]. Mears references Mathers [2] in support of his case.
However, Mathers’s paper is inconclusive on the matter of background sound levels and does not
support Meares’s case – see below. Meares’s second reference in support of clean dialogue
proposes a special sound track for the hearing-impaired and suggests that music and effects
should be inserted 20dB below normal. The same two documents are referenced in the current ITU
document BS.2001 [39] but no further evidence is given.
4.1 Alternative Soundtrack Mixes
In 1991 BBC Research Department published the results of a joint working party with the IBA,
ITCA, RNID and BAHoH on the subject of sound balances in television [2]. The research asked a
mixed group of normal and hearing-impaired people to rate the “intelligibility of speech” on an
integer scale of 1 to 5. The test material consisted of 18 clips from TV programmes which had
been remixed to make three versions, one as broadcast, one with the background sound 6dB
louder and one with the background 6dB quieter. Each test featured only one version of each item
to prevent training distorting the results. Three versions of the test were made, and each subject
randomly assigned one of the three tapes.
This work did not show a clear relationship between background sound levels and “intelligibility”
and there is no statistical analysis carried out on the data, so it is impossible to draw any firm
conclusions on its value, indeed Mathers expresses surprise at the inconclusive nature of the
results. His conclusion starts:At the outset, the tests were expected to confirm very clearly the general belief that the
intelligibility of television speech is highly dependent on the levels of background effects.
The results suggest that this belief is mistaken; or that the tests do not show what they
purport to show; or that the truth lies somewhere between these two extremes; or finally,
that there is some altogether different explanation of the results.
The lack of any clear results may have been exacerbated by a number of problems with the tests.
Firstly, as Mathers points out, the tests did not control for lip-reading and so those clips featuring
the person talking in shot would have potentially different results. Secondly, the tests did not
measure actual intelligibility of the speech, it only asked the opinion of the subjects, which will be
dependent on the level of interest that the test subject has in the clip. Thirdly, there is no record of
the relative levels between the speech and the background sounds, so no conclusions can be
drawn about mix levels in TV programmes from this work.
4.2 Post-Processing Tests
DICTION - (Digitally Improving the Clarity of TelevisIOn Narrative for hard of hearing viewers) was
an ITC project which ran from 13th Feb 99 to 12th Feb 2000, funded under the DTI/SERC Link
programme. The ambition was to produce a processor which could remove television background
sound in the home. The University of Surrey developed the algorithm and the hardware was
designed as a plug-in box using the SCART connector between set top box and the TV [40].
However, tests carried out by Alex Carmichael at the Age and Cognitive Performance Research
Centre, University of Manchester showed that the algorithm was unsuccessful in meeting its aims
when tested on a sample of target users [41]. The processing algorithm designed to make the
speech more intelligible, actually seemed to make the speech harder to understand.
More recently the DTV4All project carried out a piece of work on “clean audio”. The project was
unable to source separate audio mixes and so used post-processing of TV soundtracks to remove
the background sounds. This processing proved very difficult to achieve and the test results were
very mixed. The test participants were asked to rate how well they could hear the dialogue and did
not test for intelligibility. This work demonstrated the difficulty of separating speech from
background sound but did not provide any insight into the requirements of a clean audio service.
[42, 43, 44]
In response to these issues BBC R&D published a literature review of the research on speech
processing, noise removal and intelligibility. This found no evidence of significantly improved
intelligibility where noise removal is carried out on speech, and in many cases noise removal will
significantly degrade intelligibility [45].
The follow up project, HBB4ALL is now continuing the DTV4All work on clean audio with a mixture
of processed stereo, using a “CenterCut” algorithm from an open source student project called
VirtualDub [46] and altered 5.1 downmix parameters alongside audio processing. The publications
from the project [47, 48] make no reference to the previous work carried out in the UK using
alternative 5.1 downmixes, see below.
4.3 ITC “Clean Audio” Project
The ITC Clean Audio Project ran from April 2003 through to 2006 with the research run by Ben
Shirley of the University of Salford. The premise was that the 5.1 format could be used to improve
the clarity of the dialogue for hard of hearing viewers. [49] For this to work the programme has to
be made in 5.1 and the centre channel has to be used as a dedicated dialogue channel. The tests
were carried out using the ITU-R BS.775-1 5.1 loudspeaker layout which, whilst standard for 5.1
monitoring is rare in a domestic setting.
The work presented video with 5.1 sound through just three loudspeakers, left, centre and right,
the other three signals were discarded [50]. The subjects were asked to choose between two
versions of a soundtrack based on overall sound quality, their enjoyment, and the clarity of the
dialogue. The clips, between 60 and 90 seconds long were presented with four different versions of
the soundtrack, the all three channels at normal level, the left and right attenuated by 3dB, by 6dB
and centre channel only. The test subjects were a mixture of normal hearing and a range of
hearing impairments.
By contrast with the earlier BBC work, both hearing-impaired and non-hearing-impaired subjects
rated the clarity of the dialogue as improving as the side channels are reduced in level. However,
whilst reduced levels in the side channels improved the enjoyment and perceived sound quality for
the hearing-impaired group, for the non-hearing-impaired subjects the reverse trend was
suggested by the data. The paper also notes a significant influence on the results from lip reading
by hearing-impaired subjects.
A second piece of work [51], moved from subjective scores of clarity to measuring true intelligibility
using the SPIN (Speech Perception In Noise) test. The work compared the intelligibility of speech
when presented in a central speaker or as a phantom central stereo image against a background
of (pseudo-stereo) babble through left and right speakers. The work showed an improvement in
intelligibility with the central speaker, but did not include hearing-impaired listeners. A further paper
[52] summarises the papers above and adds more context based on the Dolby Digital 5.1 surround
sound format and points to some of the difficulties experienced using the format.
Following on from this work the UK Clean Audio Forum was brought together by Ofcom. Its
members included representatives from the BBC, Ofcom, RNID, Salford University and Dolby. The
forum met across 2006/7 and produced a cautiously worded example approach to creating a clean
audio downmix from the 5.1 audio in the television receiver for the ITU [53]. This example
approach involves creating an alternative stereo mixdown from the 5.1 audio. This approach was
then cascaded to DVB and appears in Annex E (informative): Supplementary Audio Services of the
DVB Blue Book [1]. In the UK few television programmes are made in 5.1 and not all of these are
presented with all the dialogue in the centre channel. Across Europe this approach has made it into
some receiver requirements, such as the NorDig Unified Requirements [54], however they are not
yet mandatory. A comprehensive overview of Ben Shirley's work on clean audio is available online
in the form of his PhD thesis [55]. This goes on to examine the potential of an object based
approach to production for the creation of accessible audio through work on the FascinatE project,
see below.
4.4 Remaining Issues with Clean Audio
The idea of a “clean audio” service has been around now for over 20 years. It has been put forward
as the fourth access service after subtitling, audio description and signing. So why has this idea not
progressed beyond small research projects and 5.1 downmix solutions?
The first issue is that there is, as yet no substantial body of work examining the requirements for a
"clean audio" soundtrack. It is usually taken to mean a programme mix that is free of music, or has
reduced background music and sound effects. Secondly the three current access services all
provide additional channels of information for the audience. The clean audio concept, however
would involve removing information. This has two problems; firstly the production of the clean
audio mix would have to be part of the programme production process unless programme
production were to undergo a radical change in techniques and formats and secondly there has
been little work done to examine the requirements for the clean audio mix and what it would need
to contain in order to provide a more accessible and meaningful soundtrack for the target
audience. As noted above, all the elements of the TV soundtrack combine to create the meaning
and narrative.
User tests on clean audio have, by and large, used short programme clips and asked about the
clarity of the voice track. It has not tested what happens to the overall experience of watching a
programme and whether a clean audio track would provide an improved viewing experience. The
one exception is the one-off Nature of Britain broadcast, but there was no user testing carried out
with this broadcast and no follow up work. Also, the programme team did not just remove the
music, they remade the soundtrack with additional sound effects and speech elements. This
demonstrates the potential complexity of this issue.
Any system created for the automatic removal of music is likely to have an impact on the meaning
of the programme with the loss of emotional or narrative cues and changes of perceived pace.
Similarly the removal of sound effects can remove cues to location and narrative or lead to
dissonance with the video image. It is likely that any automatic system will need to have the
narrative and diegetic sound effects flagged for inclusion in order to retain the programme
narrative. Automation is likely to work well on simple programme formats where the speech carries
all of the narrative, but not so well on complex material where effects and music play a narrative
role. An evaluation of the impact of clean audio versioning of a diverse range of programmes would
be needed before a service could be considered viable. This exercise is likely to be lengthy and
complex because of the need to include a sufficiently wide range of programme types and
production techniques. One possible route might be to replace or enhance the role of music and/or
effects with alternative modalities through kinaesthetic devices that give a feeling for what is going
on. This was explored by BBC R&D, following on from a project with the Royal College of Art [56].
It is worth noting that there is no sign of activity around the concept of clean audio or music-free
versioning in other media such as film, theatre or DVD & BluRay releases. Whilst there has been a
significant effort put into providing induction loops or similar Assistive Hearing Systems or Hard of
Hearing Systems in public venues, there is no evidence of alternative mixes being provided to such
systems [57]. This is despite the fact that alternative soundtracks are commonplace on DVD and
BluRay releases, providing alternative 5.1 versions along with different language versions and
additional content such as director’s commentaries.
4.5 The Development of Previous Access Services
When considering the possibility of clean audio as the next access service it is instructive to look at
how previous access services came about. All previous TV access services have started in other
arenas; subtitling had been used in the cinema throughout its history, audio description started in
the theatre and sign language interpretation at theatre and other public events. A great deal of
work went into the technical and artistic requirements to create the TV service. The best
documented is that of audio description.
Audio description began in the theatre. Starting in 1981 at the Arena Stage Theatre in Washington
DC the idea spread around the USA and to the UK by the late 1980s. Projects then began
providing audio description in a number of cinemas and the idea was picked up by television. A
regular service was started in 1990 by the Public Broadcast Service network in the USA using the
secondary audio channel which was available in the USA and available to 50% of homes. In 1991
the ITC launched the Audetel consortium, a European project which explored all the artistic,
production and technical issues of providing a service with input from visually impaired people. A
full trial service was run for a few months in 1994 on peak time analogue BBC and ITV services to
around 100 receivers across the UK and was monitored for its impact. [58]
The Broadcasting Act of 1996 paved the way for digital television services to be launched in the
UK and contained minimum requirements for subtitling, audio description and sign language
provision [59]. The provision for audio description was at least 10% of the service after TV service
had been on air for 10 years. It was this legislation that drove through the development of audio
description services. Digital TV was launched in late 1998 and the BBC launched a regular audio
description service in April 2000. However, the first AD capable Freeview set top box did not
become available until 2004 and integrated televisions with AD only began to appear in 2007.
By the time audio description services were launched in the UK there had been around 20 years of
experience of audio description in the theatre. There had been extensive research into both the
technical and artistic requirements for a TV service which had focused on the experience of the
target audience in broadcast trials. Furthermore there was legislation making it mandatory for
broadcasters to develop and provide this service.
Beyond Clean Audio to Object Based Production
Is it possible to provide a soundtrack for with reduced background sound such that the result is an
improved experience for the audience? The evidence from the BBC Vision study is that a different
re-mix may well be needed for different programme styles. The danger of simply removing or
reducing the music and/or effects is that the result may well be a boring or confusing programme
which fails to meet the needs of the target audience. However, if we are to be able to give our
audience control over their listening experience then we need to be able to describe the items
within a sound mix, their roles and how they can be manipulated. This was first publicly proposed
such an approach as a BBC R&D blog post back in 2010 which also outlined many of the
additional benefits that could be enabled for the general audience such as changing the length or
the pace of a programme and removing the music or even the presenter [60].
Giving the audience the ability to control the sound balance of a programme is an idea that is now
being actively explored. In 2011 Radio 5 Live was involved in an experiment to demonstrate the
idea of giving the audience the ability to alter the balance between the commentator and the stereo
sound effects microphones. The test ran during the Wimbledon tennis coverage and was called
“NetMix [61, 62]. The experiment turned out to be complex and technically challenging and there
were problems in providing the correct sound when the coverage switched over to other items like
the travel news. An interesting finding from this test was that the audience was almost evenly split
between people who chose to decrease the background sound and people who chose an increase
in the level of the effects level [63].
Following on from this, another public facing experiment was carried out by BBC R&D in 2013 at
an English Football League Championship Play-off Final between Crystal Palace and Watford from
Wembley Stadium, London. Here three live streams were provided over IP. They consisted of one
pair of stereo microphones pointing at the Crystal Palace fans, one pair pointing at the Watford
fans and a mono feed from the commentary box. The user interface employed the HTML5 web
audio Javascript API to control the streams. The interface provided a single slider that allowed the
audience to move between the two sets of fans at either end of the stadium and the central
position which favoured the commentary. Of the 2692 people who took part in the trial, around
twice as many people chose to increase the contribution of the crowd as compared with
commentary and once they had chosen their preferred balance almost everyone stayed with that
balance for the rest of the coverage [64].
At around the same time a more complex piece of work demonstrating object based audio was
carried out within the FacinatE project. This was a wide ranging project looking at high definition
video, advanced sound formats and audience interaction and included an element demonstrating
how this object based approach could benefit viewers with hearing impairments [65].
Beyond sports coverage the issues involved become more complex. The first pre-recorded
demonstration of this approach from BBC R&D was a short radio drama called “Breaking Out”.
This drama, set in a lift, adapted to the location of the listener and the date to include information
about things like the weather outside and the films showing at the local cinema. This was achieved
by using a text to speech engine to generate the voice of the lift. The user interface also gave the
ability to vary the pace of the drama and the level of the background sounds in the lift. Because all
the audio assets were recorded separately and tagged, changes to the background sound level
could be made separately from sound effects that were key to the narrative, such as the sound of
the lift moving [66].
A follow up project took an existing radio documentary and reconstructed it from the original assets
as an object based production. This demonstrated that an existing programme, built up from
interviews, archive material, sound effects and music, could be re-created in an object based
format. This work also gave a great deal of insight into the narrative structure of the programme.
The result was a radio programme that could be varied in length according to the user's
preferences whilst retaining the structure of the stories and the overall narrative. The results were
also demonstrated the ability to vary the relationship between the speech and background
elements. The work was published and demonstrated at IBC2014 [67]. The demonstration
programme was subsequently labled Responsive Radio and made available to the public on the
BBC Taster site [68].
The next steps are to build tools that describe programme sound in a semantic form that enables
sounds with similar functions to be grouped together and described in a meaningful fashion and
the tools our audience might use to manipulate the mix. BBC R&D is now part of the European
research project ORPHEUS which is working to improve the management of audio content and
develop, implement and validate an end-to-end object-based media chain for audio content [69].
This work is building on new audio standards that enable the exchange of object based audio and
associated metadata [70, 71].
Further work has also taken place at the University of Salford where user tests have been carried
out in partnership with DTS. Object based content from a range of genres was shown to
participants with hearing impairments who could select a mix to suit their own preferences. The
system was demonstrated at IBC2016 by Ben Shirley and the results of the tests will be published
The concept of “clean audio” has been around for over 20 years but there has been little progress
towards providing it as an access service. Technical specifications have been written that describe
how such a service might be delivered, but little work has been done of the production
requirements of the clean audio soundtrack itself. Routes for alternative audio provision already
exist into existing TV receivers such as the Red Button or iPlayer services so the barriers to the
provision of alternative audio mixes are not technical.
The topic turns out to be more complex than the original proponents had envisaged and the costs
of providing alternative mixes are too high for current production techniques. The BBC has gone a
long way in recent years in investigating complaints about television sound. Research work has
identified a range of issues in TV production that have a negative impact on the audience’s
experience and new training materials have been created to get the message out to productions.
Whilst it is possible for a receiver to make use of the 5.1 soundtrack to alter the level of the
background music and effects very few programmes made in this format. Furthermore, the
provision of personalised sound mixes cannot overcome issues with poor quality sound recording
or unfamiliar accents or indistinct diction. There is no substitute for good quality sound capture and
production techniques that ensure that the key elements of a programme narrative are conveyed
with clarity.
The provision of alternative sound mixes is more likely to come about through the production of
interactive and personalised content where the content maker has designed the experience in a
way that enables the audience or their device to create their own version. The ability to remix the
sound would then be part of a wider set of options. Such content would have a much wider
application than access services whilst providing a more effective route to an improved experience
for people who would prefer to watch television with an alternative sound mix. The recent work with
object based audio production has shown that this approach is possible and where audience data
has been gathered the results have shown that the audience have quite diverse needs and
[1] “Digital Video Broadcasting (DVB); Specification for the use of Video and Audio Coding in
Broadcasting Applications based on the MPEG-2 Transport Stream”, ETSI TS 101 154 V2.2.1
(2015-06). Available at
[2] “A Study of Sound Balances for the Hard of Hearing”, Mathers, BBC Research Dept 1991/3.
[3] Office for National Statistics, “National Population Projections: 2014-based Statistical Bulletin”,
29 October 2015.
ons/bulletins/nationalpopulationprojections/2015-10-29 - older-people
[4] Peter Mapp, “Assessing the acoustic performance and potential intelligibility of assistive audio
systems for hard of hearing and other users”, 125th Convention of the Audio Engineering Society,
paper 7626, October 2008.
[5] Hannes Müsch, “Aging and sound perception: Desirable characteristics of entertainment audio
for the elderly”, 125th Convention of the Audio Engineering Society, paper 7627, October 2008.
[6] Stanley Coren, “Most comfortable listening level as a function of age”, Ergonomics, 1994,
vol.37, No. 7, 1269-1274.
[7] HearLoss - Hearing Loss Demonstrator
[8] M Bergman, “Hearing and Aging: Implications of Recent Research Findings”, International
Journal of Audiology, 1971, Vol. 10, No. 3, Pages 164-171.
[9] Mead C. Killion, “Hearing aids: Past, present, future: Moving toward normal conversations in
noise”, Br J Audiol. 1997 Jun;31(3):141-8.
[10] “Television Sound: Why the Silence?”, Michele Hilmes, MSMI, 2:2 Autumn 08, pp 153.
[11] “Television’s First Seventy-Five Years: The Interpretive Flexibility of a Medium in Transition,”
William Uricchio, The Oxford Handbook of Film and Media Studies (Oxford: Oxford University
Press, 2008): 286–305.
[12] “Television: Critical Methods and Applications,” Jeremy G. Butler, Chapter 8, Style and Sound,
Lawrence Erlbaum Associates Inc, 2007.
[13] “How music Influences the Interpretation of Film and Video: Approaches from Experimental
Psychology,” Annabel J. Cohen, in R.A. Kendall & R. W. Savage (Eds.). Selected Reports in
Ethnomusicology: Perspectives in Systematic Musicology, 12, 15-36.
[14] “A Capacity Model of Children’s Comprehension of Educational Content on Television,” S. M.
Fisch, paper presented at the Biennial Meeting of the Society for Research in Child Development
(63rd, Albuquerque, NM, April 15-18, 1999).
[15] “The Functions of music in multimedia: A cognitive approach,” Cohen, A. J., Proceedings of
the Fifth International Conference on Music Perception and Cognition, 1998.
[16] “The Problem of Music in Actuality Television,” R. Bates & S. Deutsch, The Soundtrack, Vol 1,
Number 3, 2008, pp. 183-191.
[17] “Soundtracks with street credibility”, letter to the Guardian, 5th August 2011.
[18] “The Impact of Music Modality on the Perception of Communication Situations in Video
Sequences,” D. A. Infante & C.M. Berg, Communication Monographs, 46, June 1979, 135-148.
[19] “Framing meaning perceptions with music: The case of teaser ads,” K. Hung, Journal of
Advertising, Volume XXX, Number 3, Fall 2001.
[20] “Narrative Music in Congruent and Incongruent TV Advertising,” K. Hung, Journal of
Advertising, Vol XXIX, Number 1, Spring 2000.
[21] “Selective Exposure to and Acquisition of Information From Educational Television Programs
as a Function of Appeal and Tempo of Background Music,” J. Wakshlag, R. Reitz & D Zillmann,
Journal of Educational Psychology, 1982, Vol. 74, No. 5, 666-677.
[22] “Ben Salisbury, Composing for The Life of Mammals,” Tom Flint, Sound on Sound, June 2003.
[23] BBC Worldwide Press Releases - The Blue Planet Live! World Class BBC Show Tours the UK
[24] “Special Sound: The Creation and Legacy of the BBC Radiophonic Workshop”, Louis Niebur
Oxford Music/Media Series, 2010.
[25] TV Tropes wiki page, gives a
number of examples including Spielberg’s film The War of the Worlds.
[26] Coast, Series 5, Galway to Arranmore Island, first broadcast Wed 11 Aug 2010, 20:00 BBC
Two. clip at
[27] Pipedown web site
[28] “Piped Music and Showing of Television Programmes Bill”, House of Lords, 2005-6. see also “Plan to ban
muzak goes to Lords”, BBC News June 2006.
[29] “Broadcasting of Recorded Music in Public Places”, Hansard, March 2000.
[30] Danny Cohen, “Sound Matters”, BBC Academy web site, March 2011.
[31] BBC, “Editorial Policy Guidance: Hearing Impaired Audiences”, March 2011.
[32] Voice of the Listener & Viewer, “VLV’s Audibility of Speech on Television Project will make a
real difference”, VLV News Release 06/11, March 2011.
[33] “Clear Sound: best practice tips”, BBC Academy web site, March 2011.
[35] "Audibility and our audience",
[36] “High Quality Sound for High Definition Television,” David Meares, AES 10th International
Conference: Images of Audio (September 1991)
[37] “Multichannel sound for HDTV”, D.J. Meares, Applied Acoustics 36 (1992) 245-257.
[38] “Developments in multichannel sound for HDTV”, D.J. Meares, R&D Report 1991-13.
[39] “Ancillary Services for the Visually Impaired and Hearing Impaired in Multi-Channel Sound
Systems”, Report ITU-R BS.2001, (Question ITU-R 84/10) 1994,
[40] “Do not adjust your television set”, theEngineer, 14 August 2000
[41] A.R. Carmichael, “Evaluating digital “on-line” background Noise suppression: Clarifying
television dialogue for older, hard-of-hearing viewers”, Neuropsychological Rehabilitation: An
International Journal, 1464-0694, Volume 14, Issue 1, 2004, Pages 241 – 249.
[42] Werner Brückner & Ralf Neudel, “A Shortlist of Emerging Access Services”, DTV4All,
Deliverable D3.1, December 2008.
[43] Werner Brückner, “Interim Report on Expert User Tests”, DTV4All, Deliverable D3.4, January
[44] “2nd Phase Emerging Access Service Demonstrators”, DTV4All, Deliverable D3.5, September
[45] Mike Armstrong, "Audio Processing and Speech Intelligibility: a literature review", BBC R&D
White Paper, WHP 190, May 2011.
[46] VirtualDub, The "center cut" algorithm.
[47] HBB4ALL, "D4.1 – Pilot-B Progress Report", 2014.
[48] HBB4ALL, "D4.2 – Pilot-B Solution Integration and Trials",
[49] Ben Shirley and Paul Kendrick, “The Clean Audio Project: Digital TV as assistive technology”,
Technology and Disability, Volume 18, Number 1/2006, Pages 31-41.
[50] B.G. Shirley, P. Kendrick, “ITC Clean Audio Project”, AES 116th Convention Paper 6027, May
[51] Ben Shirley, Paul Kendrick, “Measurement of speech intelligibility in noise: A comparison of a
stereo image source and a central loudspeaker source”, AES 118th Convention Paper 6372, May
[52] Ben Shirley, Paul Kendrick and Claire Churchill, “The Effect of Stereo Crosstalk on
Intelligibility: Comparison of a Phantom Stereo Image and a Central Loudspeaker Source”, J.
Audio Eng Soc. Vol. 55, No. 10, 2007 October.
[53] UK Clean Audio Forum, “Liaison Statement from UK Clean Audio Forum to ITU FG IPTV”, ITU
Focus Group on IPTV, FG IPTV-IL-0039.
[54] NorDig, NorDig Unified Requirements, ver 2.5.1, Aug 2014.
[55] Ben Shirley, "Improving Television Sound for People with Hearing Impairments", PhD Thesis,
University of Salford, 2013.
[56] Designing Connected Media Spaces for Evolving Audiences, Brendan Crowther, R&D Blog
post, March 2011.
[57] “Assessing the acoustic performance and potential intelligibility of assistive audio systems for
the hard of hearing and other users”, Peter Mapp, AES Convention Paper 7626, 125th convention,
October 2008.
[58] ITC Guidance on Standards for Audio Description – May 2000
[59] Broadcasting Act 1996, as enacted, from the National Archives web site
[60] Mike Armstrong, "Remix Radio - a personalised listening experience," BBC R&D Blog Post,
Oct 2010.
[61] “NetMix: Create your own sound balance from Centre Court”, Rupert Brun blog post, June
[62] Harald Fuchs and Dirk Oetting, "Advanced clean audio solution: Dialogue Enhancement", IBC
Conference 2013.
[63] H. Fuchs, S. Tuff, C. Bustad, “Dialog enhancement – technology and experiments,” EBU
Technical review 2012 Q2, Internet resource:
[64] Mark Mann, Anthony Churnside, Andrew Bonney, Frank Melchior, "Object-Based Audio
Applied to Football Broadcasts", BBC White Paper WHP272, 2013.
[65] Ben Shirley and Rob Oldfield, " Clean Audio for TV broadcast: An Object-Based Approach for
Hearing-Impaired Viewers", Journal of the Audio Engineering Society Vol. 63, No. 4, April 2015.
[66] Forrester I. and Churnside A., “The creation of a perceptive audio drama,” Paper presented at
the NEM Summit, October 2012.
[67] M. Armstrong, M. Brooks, A. Churnside, M.E.F. Melchior & M. Shotton, "Object-based
broadcasting - curation, responsiveness and user experience", IBC2014 Conference, 2014 page
[68] Matthew Brooks, "Future Content Experiences: The First Steps For Object-Based
Broadcasting", BBC R&D Blog Post, March 2015.
[69] ORPHEUS web site
[70] ITU-R BS.2076-0 (06/2015) "Audio Definition Model"
[71] TU-R BS.2051-0 (02/2014) Advanced sound system for programme production