Implementing Methods for Equal Loudness in Radio Broadcasting Matti Zemack Supervisors Royal Institute of Technology: Professor Sten Ternström Swedish Radio: Technical Strategist Lars Jonsson Date of approval: 12th June 2007 • Approved by: Professor Sten Ternström Master of Science Thesis KTH - Skolan för Datavetenskap och kommunikation (CSC) Avdelningen för Tal, musik och hörsel 100 44 Stockholm Table of Contents Implementing methods for equal loudness in radio broadcasting Abstract in English Abstract in Swedish Recommendations for Swedish Radio at implementing better loudness control 1 What is loudness?....................................................................................................1 1.1 1.2 1.2.1 1.2.2 1.2.3 1.2.4 1.3 2 Spectral effects on loudness..............................................................................................2 Phon scale .........................................................................................................................3 Sone scale .........................................................................................................................5 Temporal aspects of loudness ...........................................................................................6 Approach to the problem ...................................................................................... 6 Recent research .......................................................................................................7 2.1 2.1.1 2.1.2 2.1.3 2.1.4 2.1.5 2.1.6 2.1.7 2.1.8 2.2 2.2.1 3 ‘Perceived loudness’ or just ‘loudness’ ................................................................ 2 How does the ear interpret loudness? .................................................................. 2 Different models ..................................................................................................... 7 Leq (Linear, A-, B-, C-, D-, M-, RLB-, R2LB-weighted) ................................................7 PPM ..................................................................................................................................9 Zwicker (SI++, ISO 532-B)..............................................................................................9 CBS Loudness Indicator ...................................................................................................9 Moore & Glasberg ..........................................................................................................10 TC LARM.......................................................................................................................10 TC HEIMDAL................................................................................................................10 Replay gain .....................................................................................................................10 Comparison of models ......................................................................................... 12 Conclusions – comparing methods .................................................................................13 Loudness at the Swedish Radio ............................................................................15 3.1 Measured loudness. Comparing before and after the Swedish Radio final dynamic processors. .......................................................................................................... 15 3.1.1 3.1.2 3.1.3 3.2 Method............................................................................................................................15 Different software measuring systems............................................................................17 Measurements.................................................................................................................19 Results of measurements ..................................................................................... 20 3.2.1 Results – short recordings...............................................................................................20 3.2.1.1 P1 speech channel .................................................................................................20 3.2.1.2 P3 pop music / speech channel..............................................................................22 3.2.2 Results – long recordings................................................................................................24 3.2.2.1 P1 speech channel .................................................................................................25 3.2.3 Comparing Replay gain with Leq(R2LB).......................................................................27 3.3 4 Workflows and levels at the Swedish Radio.........................................................31 4.1 4.1.1 4.2 4.2.1 4.2.2 4.3 4.3.1 4.3.2 4.3.3 4.3.4 5 Conclusions of measurements ............................................................................. 30 The pre-digital era ............................................................................................... 31 Methods and workflow – pre digital era .........................................................................32 Digital era.............................................................................................................. 33 Methods and workflow – pre-produced in the digital era ...............................................33 Methods and workflow – live radio in the digital era .....................................................34 The future – method and workflows .................................................................. 35 Production monitor sound levels?...................................................................................35 How to implement fully controlled loudness levels in an automatic broadcast..............36 Manually calculating parts of a show .............................................................................37 Full sound file levelling ..................................................................................................37 Real life meter usage.............................................................................................39 6 Discussion concerning a new meter.....................................................................41 7 Other comments closely related to loudness in broadcasting..............................43 7.1 7.2 7.3 7.4 Balance speech/music........................................................................................... 43 Classical music...................................................................................................... 43 Different meter usage between channels............................................................ 43 Different dynamics for different usages ............................................................. 45 8 Acknowledgment ...................................................................................................47 9 Glossary .................................................................................................................49 10 Production Flowchart for Swedish Radio............................................................51 11 References .............................................................................................................53 Implementing methods for equal loudness in radio broadcasting Abstract Sound levels are perceived as a growing problem in radio and TV. Quite often, great variations in perceived sound level exist inside a single program or between adjacent programs. Today the broadcaster uses a plethora of media platforms, all with different listener groups. They all have one thing in common; they all want even perceived sound levels. How can a broadcasting company accomplish this? This objective can be achieved by intentional work in consecutive steps. The first step is to assimilate the latest research in this area. The second step is to choose the best measurement method. The third step is to implement this single measurement method in all steps of production. Training and support for all programme producing staff is a must. The fourth step is to implement an automatic gain measurement and correction feature in the metadata of the play out system. The fifth step is that the broadcast company must itself try to control as much as possible of the final dynamic processing. In this paper, the above steps are examined, and some recommendations, large and small, are proposed for Swedish Radio about how their broadcast chain may be improved so that better perceived sound levels are achieved. The methods of measurement that are tested in this report are both Leq(R2LB) and Replay gain. I have also compared the final dynamic processing systems at Swedish radio. Both of these measure methods and the final processing systems, Factum Cadenza together with Orban 8200 work very well. With the use of these tools, Swedish Radio can achieve more even perceived sound levels, which is important to keep and obtain new listeners. Automatiska metoder för jämn hörnivå i rundradio Sammanfattning Ljudnivåer uppfattas som ett allt större problem inom radio och tv. Inom program och mellan program märks stora hopp i ljudnivåerna. Idag använder lyssnaren en mängd olika mediaplattformar, och de har alla olika lyssnargrupper. Men en sak har de gemensamt, de vill alla ha jämna ljudnivåer. Hur kan ett broadcastingföretag åstadkomma uppfattat jämna lyssningsnivåer? Målet kan uppnås genom medvetet arbete i flera steg. Första steget innebär att tillgodogöra sig den senaste forskningen på området. Andra steget är att i denna forskning hitta en tillförlitlig mätmetod. Tredje steget innebär att all produktion måste följa denna enda mätstandard. En utbildning av alla programproducerande medarbetare måste genomföras. Fjärde steget är att en automatisk korrigering av de färdiga programmen måste göras vid eller inför programläggning till utsändaren. Femte steget går ut på att företaget självt ska ta ansvaret för sin egen avgående signal med hjälp av slutprocessorer för att i alla distributionskanaler kunna kontrollera utsänd dynamik och nivåer. I detta examensarbete utreds alla de ovanstående stegen. Rapporten lämnar även rekommendationer, stora som små, till SR om hur just deras sändningskedja ska kunna nå jämnare ljudnivåer, vilket leder till bättre hörbarhet. Mätmetoderna som främst analyseras är Leq(R2LB) samt Replay gain för att mäta uppfattade ljudnivåer i talad radio. Dessa jämförs med den slutprocess som redan idag finns hos Sveriges Radio. Båda metoderna ger mätmässigt ifrån sig ett bra resultat, liksom slutprocessen, Factum Cadenza. Sveriges Radio kan uppnå bättre hörbarhet vad gäller uppfattade ljudnivåer, vilket är viktigt för att hålla kvar gamla lyssnare och för att rekrytera nya. Recommendations for Swedish Radio at implementing better loudness control Below are the recommendations to Swedish Radio regarding the implementation of loudness at Swedish Radio. • Define a metering standard, the same for all channels (including web channels, pod casts, etc.), preferably using a new metering model. The recommendation is Leq(R2LB) as proposed by ITU BS.1770 (ITU-R, 2006). • The meter is recommended to be similar to the BBC meter, which resembles the old-style VU meter. This type is easier to comprehend in the corner of the field of sight. The meter must also be adjusted so that the stipulated Loudness Unit (LU) is with the needle pointing straight up. The meter should also give big response at small level changes, the most interesting metering area is centred around a 12 dB range. • Educate all co-workers as to the usage of this new meter. • Incorporate automatic measurement into the system, such that when a finished sound file is submitted to the broadcast intake, a sound level measurement and metadata adjustment is carried out automatically. • Document and study the main processing units. Try to use them more offensively. Use the dynamic processors harder (or maybe softer for some program types). More experiments and discussions must be introduced. • Equip the voice tracking studios (where the main play out level is set) with different listening devices, such as big speakers, small speakers, computer speakers and headphones. Encourage the producers to use them all, so that they can understand the differences. • Post process the web feeds and pod feeds to a much higher extent. Let the dynamic processors work harder. • Design a few examples of sound excerpts where correct levels are set, to establish a norm for producers and engineers. These examples can also show how loud we usually should mix music or sound effects behind speech, or how loud pop music can be relative to the presenter. Distribute these sound files both through Swedish Radio’s internal distribution systems, and to external production studios. 1 What is loudness? Loudness is a subjective entity. Every individual person’s subjective impression of sound intensity is unique. Can loudness even be measured, can any fragment of sound be given an exact value so that two different sound fragments can be presented in sequence without the user reaching for the volume knob on their radio or TV? And how exactly can this measurement be done? The research concerning loudness evaluation has made great progress recently. This may be as a result of television and radio starting digital distribution. The sound chain, from the producer handling the interview or the ad, to the consumer in front of their screen or radio, is implemented entirely in the digital domain. Globalisation in media production has also led to problems where one TV channel controls their outgoing loudness with the type of meter used in that country, and the other country, adding the commercials controls loudness according to a different standard and meter. And as we know from commercial radio, a ”louder channel is a better channel” which leads to louder and more processed signal levels in a digital signal chain, with little control at the receiving point. Producers of commercials have always been using the technique of hard limited levels in order to outperform their competitors. The sound nowadays is also multi-band compressed in a further attempt to become louder, and as an easy way of levelling out the frequency content in various neighbouring sound signals. All this has led to an increased interest in measuring loudness, as a necessary means to quantify perceived sound levels. 1 1.1 ‘Perceived loudness’ or just ‘loudness’ Loudness is defined as a subjective entity. Research papers in this area often use the term “perceived loudness”. Is the wording “perceived loudness” a tautology? In this paper perceived loudness will be named only “loudness”. 1.2 How does the ear interpret loudness? 1.2.1 Spectral effects on loudness Our sense of hearing assesses loudness by how the cilia and corresponding auditory nerve fibres are excited in the basilar membrane in the inner ear [Bonello, 2007]. This excitation is distributed by frequency bands on the membrane, forming a kind of biological spectrum analyzer. Each frequency excites a certain zone on the basilar membrane. Each excited zone adds up to the total loudness. If two sounds arrive into the ear with similar frequency content they both compete trying to excite the same hair cells and the same nerves. These nerves have a maximum rate of firing, and this is thought to be the reason why doubling the sound intensity does not double the perceived sound level. See fig. 1. Fig. 1. The figure shows why doubling the sound intensity at nearly the same frequencies does not double the loudness. [Hyperphysics loud] 2 On the other hand, if the different sounds contain different frequencies they do not occupy the same hair cells and therefore not the same nerves, hence adding two equally loud sounds doubles the loudness, see fig. 2. Fig. 2. 1) The figure shows why doubling the sound intensity at differing frequencies doubles the loudness. [Hyperphysics loud] 2) Showing the placement of the basilar membrane in the inner ear. [Hyperphysics place] The maximum frequency distance for these differing sounds to fire the same nerves are called critical bands. One way of measuring these bands has been proposed by Zwicker et al. [Zwicker et al., 1957]. These critical bands are narrower at low frequencies (90Hz wide critical bands for sounds under 200Hz) and wider at higher frequencies (900Hz wide critical bands for sounds around 500Hz) [Backus, 1977]. 1.2.2 Phon scale The ear does not have a straight frequency response. The ear has its own built in equaliser. Fletcher/Munson tested many recruits. Their findings were later standardised by ISO, see fig. 3. 3 Fig. 3. Shows the ears frequency response at differing levels of sound intensity. By Fletcher/Munson 1933. Revised by ISO in several steps since then. [Hyperphysics (eqloud)] These measurements were done with sinusoidal tones from the front. As can be seen the curves do not look the same at all test levels. At lower listening levels the lowest frequencies are perceived quieter than the mid frequencies. Therefore home stereo systems usually have a loudness button that enhances these low frequencies at low listening levels. Using the Fletcher/Munson-ISO curves we can find the loudness level of a sinusoidal tone. This is measured in phon. The phon value is defined by that a 1 kHz sinusoidal tone measured in dB gives name to the whole phon curve. For example the 40 phon curve has a 40 dB intensity with a 1 kHz tone. It should be noted that the Fletcher/Munson curves are constructed by subjective responses to a sinusoidal tone presented frontally. If instead a narrowband noise is used and the sound comes from a diffuse and free field the corrections in fig. 4 need to be done to the Fletcher/Munson-ISO curves. 4 Fig. 4. Attenuation necessary to produce the same equal loudness of a pure tone in a diffuse and in a free sound field [Zwicker & Fastl 1999]. 1.2.3 Sone scale The sone is a unit of loudness after a proposal by Stanley Smith Stevens in 1936. The sone scale is developed in tests where listeners are asked to define where the loudness of a sound is doubled. One sone equals 40 phon, doubling the loudness doubles the sones. The Son and the Phone scale coincide in Fig. 5. Fig. 5. Conversion chart between sones and phons. According to ISO/R 131-1959. [sengpielaudio] 5 1.2.4 Temporal aspects of loudness The duration of the sound stimuli is also of importance. Zwicker/Fastl investigated short sound impulses. They noticed a decreasing loudness as the sound impulse became shorter, see fig. 6. Fig. 6, Relative loudness of a 2 kHz tone-burst as a function of duration [Zwicker & Fastl 1999]. Sound bursts above 100 milliseconds are steady in their loudness [Zwicker & Fastl 1999]. Sound bursts as short as 10 milliseconds have a loudness that is reduced by a factor of 2, in other words half the loudness. 1.3 Approach to the problem Why can we not measure loudness with a digital sample accurate peak (fast attack times) measurement? These meters are easy to fool. Take for instance uncompressed (actually today; non limited) pop music. The drums account for most of the peak values. Because our perception is influenced also by duration, these instantaneous drum sounds do not fully add up to the loudness. If we use a meter with slower attack time these drum peaks will go by unnoticed by the meter. So some type of slower response time is better matched by our ears reactions. 6 2 Recent research 2.1 Different models Recent research (Skovenborg / Nielsen) has evaluated the different loudness measure algorithms. Below their findings are presented together with one other algorithm for measuring loudness: 2.1.1 Leq (Linear, A-, B-, C-, D-, M-, RLB-, R2LB-weighted) Leq is the equivalent continuous sound level. Leq is measured over time. The measurement time may be either a whole sound file or live sound material presented with an much shorter integration time, for example 300ms. Leq is measured as the root-mean-square (RMS) in dB relative to a reference level that must be given explicitly or defined by the context. Leq is often measured together with different weighting curves to make the measurement fit the human ears listening curves at different frequencies [Moore 1982]. The different weighting curves can be seen in fig. 7. • Leq(Linear) is Leq without any frequency weighting. • Leq(A) is Leq measured by first applying the A-weighting of the sound source. The A-weighting is a simple approximation of the 40-phon equal loudness curve. Usually the A-weighting is used in sound level measurements dB(A). • Leq(B) is Leq measured with a B-weighting curve, which is an A-weighting modified for lower listening sound levels. The low frequencies are more apparent here than in the A-weighting. • Leq(C) is Leq measured with the C-weighting frequency response filter. The C-weighting is constructed for even lower listening levels than the Bweighting. Additional low frequencies are allowed. • Leq(D) is Leq measured with the D-frequency curve. Hardly used except in “for aircraft engine noise measurement” [Todd, 2007]. 7 • Leq(M) is Leq where the M stands for Movie. It has mainly been “promoted by Dolby, to be used for measuring the loudness off different segments of movie soundtracks such as advertisements” [Skovenborg & Nielsen, 2004]. • Leq(RLB) is Leq measured with the Revised Low frequency B-weighting. This is actually a simple 50 Hz high-pass filter. RLB has since Skovenborg and Nielsens paper been modified by the ITU-R group to also include a high frequency boost [ITU-R; BS.1770, 2006]. The boost was added to account for the acoustics of the head when measuring surround sound, where the head is modelled as a rigid sphere. This new RLB has no official naming, it is seen as an updated RLB. To avoid confusion I call this new frequency curve R2LB and it is not included in Skovenborg and Nielsens comparison below. Fig. 7. The different weighting filters used in Leq measure. A, B, C, M, RLB (green), R2LB (blue). [Lund, 2007] 8 2.1.2 PPM PPM is a Peak Program Meter. This is a Peak meter with a fast attack time, and a slow decay time. The meter has a fast enough attack to make sure no peaks go over the digital full scale, and it is slow enough to make sure the user has time to read the display. Note that the attack time is not as fast as a digital peak meter. The reading is usually in between digital peak meters and VU-meters. For deciding the level of a whole sound file, the PPM measure is often presented in a histogram. The level value of the sound file then becomes the 95th (or 75th or 50th) percentile in this histogram. Swedish Radio uses a version of the PPM called EBU/Nordic PPM. 2.1.3 Zwicker (SI++, ISO 532-B) ISO 532-B: Zwicker has constructed a loudness model in which the frequencies are divided into critical bands. Usually a sound’s frequency content is divided into 32 1/3octave bands. The excitation level in each critical band is calculated. The total loudness is calculated by integrating the levels over all critical bands. The resulting value is in sones. This model became the ISO 532-B standard. Zwicker also described a manual method of this model [Brixen, 2001:2]. SI++: There also exists a variant of the Zwicker model which is implemented in the acoustics software SI++. In this model, the 95th percentile of the loudness values is used as an estimator of the loudness of a sound file. It is said that “The perceived loudness of a long, non-stationary sound is the loudness value that is exceeded 5% of the time in the loudness/time course.” [Akustik Technologie Göttingen, 2004] 2.1.4 CBS Loudness Indicator This is the de facto standard in US broadcast community. This is like the Zwicker model based on filter banks. The difference here is that only eight banks are used, covering three critical bands each. 9 2.1.5 Moore & Glasberg This is also a multi-band loudness model, comparable to the Zwicker model. 2.1.6 TC LARM TC Larm is new single-band way of measuring loudness. It uses RLB-weighting, see fig. 7, and also an asymmetrical low pass filter (with a release time slower than the attack time) so that the peaks in the higher frequencies are accentuated. Esben Skovenborg pointed out in private mail conversation that; ”Essentially, LARM was conceived to demonstrate that another full-band model could be (at least) as accurate as the Leq(RLB)“. [Skovenborg, 2007] 2.1.7 TC HEIMDAL TC Heimdal is a multi-band model. The sound source is filtered into 9 bands. These are further processed. This method also uses an asymmetrical low pass filter. As this model was patent pending at the time of writing, the exact details were not available . Skovenborg further explained that ”HEIMDAL then confirms that a multi-band model is required, in order to achieve an even better accuracy of loudness measurements especially for input signals with atypical spectra”. 2.1.8 Replay gain Replay gain is published openly as a community developed measuring system [Replay Gain, 2001]. Work with Replay gain began in 2001 by David Robinson. It is widely used, for example at Swedish Radio in the music ingestion system A-Wave. The publisher stresses the different loudness measures, short or long term. In this 10 sense the difference short / long term is between CD track mode (“Radio”) or cd album mode (“Audiophile”). The loudness is calculated by first applying two filters: first, an IIR filter which simulates the high frequency response of an average of Fletcher/Munson curves above. To this Replay gain adds a high-pass filter to match the Fletcher/Munson curves. Secondly, the RMS of the sound file is calculated in 50ms windows. These values are plotted in a histogram, and the 95th percentile (sorted from the highest values) is chosen as the loudness of the sound file. Audio engineers at Swedish Radio believe that this algorithm works very well for pop music. It has been very useful when ripping pop music CD’s into files. These files have often been broadcasted by self-operators using only headphones. No audio engineers with good acoustical environments using loudspeakers have been involved in the level adjustment process. 11 2.2 Comparison of models In “Evaluation of Different Loudness Models” by Skovenborg/Nielsen published at AES in 2004, the models were classified into different classes based on their overall performance compared to subjective listener tests at two separate locations (TC electronics, McGill University). Their results are presented in table 1. Performance class Models, best in class first Median Absolute error in dB Class 1 (best) TC HEIMDAL TC LARM Leq(RLB) Leq(C) Leq(Linear) Leq(B) PPM(50th percentile) Zwicker-ISO Zwicker&Fastl(95th percentile) Leq(D) Leq(A) Leq(M) 0.52 0.61 0.67 0.72 0.77 0.84 1.12 1.22 1.22 95th percentile maximum absolute error in dB 1.50 1.64 1.58 1.95 2.16 1.59 2.70 1.95 2.92 1.42 1.78 1.68 3.00 4.13 3.83 Class 2 Class 3 Class 4 (worst) Table 1. Classes of loudness models, based on the overall evaluation. Median (Absolute error chosen from the TC data set because TC HEIMDAL and TC LARM were optimised using the McGill data set.) The evaluation states that TC HEIMDAL and TC LARM have a mean absolute error of only 0.5-0.6 dB compared to subjective tests at TC. Leq(RLB) is not much worse in practical use. The 95th percentile maximum absolute error is even smaller for Leq(RLB) than for TC LARM. The standard error of different subjects for the TC data set is 0.43 dB. This data set consisted of 8 subjects. The mean absolute error of Leq(RLB) is not far off with its standard error being 0.67 dB. 12 2.2.1 Conclusions – comparing methods The Skovenborg/Nielsen research suggests that it may be possible to devise a universal machine for calculating loudness. This machine could calculate the gain setting in an automatic broadcast system. Skovenborg/Nielsen pointed out that the most exact measure is Leq(RLB). In the following chapter everything is measured using the ITU recommendation Leq(R2LB). 13 14 3 Loudness at the Swedish Radio 3.1 Measured loudness. Comparing before and after the Swedish Radio final dynamic processors. The final dynamic processors at the Swedish Radio for P1 (the speech channel) and P3 (mixed pop music and speech channel for young listeners) are constructed with an automatic gain controller and a multi band dynamic processor in series. The measurements were focused around the question; Do broadcasters need Leq(R2LB) meters in live broadcast and automatic pre-broadcast measurement of sound files using the Leq(R2LB), or can we handle the varying loudness satisfactorily with the same final dynamic processors that we use today? Also the previous unmeasured Replay gain algorithm has been tested against the main final processes at the Swedish Radio. 3.1.1 Method For this study, the P1 and P3 channels at the Swedish Radios were recorded both directly before (pre process) and directly after (post process) the dynamic processing, see section 10 of this thesis for a detailed description of the points of recording. P1 is mainly the speech channel with news, discussions, documentaries, theatre, some music and listener contact through telephone. P1 audio engineers for many years have been using one compressor on the master from the mixing desk. Because of the new digital mixing desks with separate compressors on every channel, every microphone had its own dynamic settings at the time of my recordings. P3 is a channel for the younger listeners. It is a high tempo pop music channel with a high degree of speech content. The speech is often loud and heavily compressed at the mixing desk. The music is level controlled when it is imported into the play out system using the Replay gain standard. 15 These two radio channels were recorded digitally on an 8 channels Pro Tools-888 system. The recordings have a sampling rate of 48.000 Hz as this is the sampling rate standard in the Swedish Radio on-air system Digas. The signal both from pre and post process was digital AES/EBU, received through a break-out box in the central apparatus room. The router matrix for the outgoing channels at Swedish Radio is in parts still analogue, therefore the pre-process signal was recorded after the main A/D converters. These A/D converters have different reference levels because the different channels use their PPM EBU/Nordic meters in different ways. The P1 reference level is set to -18 dB equivalent to test tone in studio at TEST level (PPM EBU/Nordic 0 dB). P3’s reference is set to -22 dB equivalent to test tone in studio at TEST level (PPM EBU/Nordic 0 dB). Recordings were done during October 2006. This report is based on recordings with varying types of programme material, for this reason recordings from October the 26th were chosen. The P1 recordings from this day included • Phone-in-programmes containing a mixture of narrowband telephones and broadband studio microphones. • News, both a few short 3 minute versions and one half hour long version from other studio. • Weather, live from SMHI, the weather bureau, without audio engineers. • Presenter, live in channel master studio. • Trailer, pre recorded louder than other programme material. • Trailer, live from other studio. • Theatre, pre recorded with greater dynamic usage than other programmes. • Interviews, pre recorded. • Dialogue in studio, live. • Sound effects, church bell every day at noon. 16 • Music, pre-recorded. Most of the material broadcasted on P1 is mono so only the left channel was used to measure the Leq(R2LB). 3.1.2 Different software measuring systems The Leq(R2LB) measurements in this paper were done using a meter by the Communications Research Centre (Ottawa, Canada). It is called the CRC Loudness Meter, see fig. 8. This is one of the two measuring devices available at the time of writing this paper that properly uses the R2LB frequency weighting. The other measuring software with the correct frequency weighting is the Loudness Meter Comparison Utility (LMCU) by the Australian Broadcasting Corporation, se fig. 9. In this report the CRC was chosen because it can be used as a live meter, which also made it possible to measure live sounds at an early stage of the research. All measurements referred to in this paper are pre-recorded and edited files. The LMCU, on the other hand, has many advantages. It has the facility for easy comparison between PPM, VU and Leq(R2LB) loudness. The LMCU also lets the user set a sound level threshold which in effect would make the meter only react to sound, and not letting the Leq(R2LB) be biased by silence between syllables and sentences in speech. It also gives the user an option of selecting an output filter. This controls how the Leq(R2LB) is biased by the length of the sounds. The fast setting is based on Zwicker/Fastl’s tone-burst calculations in fig. 6. Both the CRC and LMCU meters calculate a cumulative loudness value. Another software measuring system that was tested is BBC’s Baptools, see fig. 10. This software can only measure live input, and it only measures Leq(R2LB). I and another mixing engineer working for Swedish Radio have been using this meter with very pleasing results. 17 Fig. 8. The CRC meter by Gilbert Soulodre. Fig. 9. The LMCU meter by Australian Broadcasting corporation. Fig. 10. The BBC baptools meter by Andrew Mason, BBC. 18 3.1.3 Measurements The above mentioned study [Skovenborg & Nielsen, 2004] have classified Leq(RLB) as the best measure method apart from TC’s own methods. In this report this method and its extension Leq(R2LB) is used. This method is also proposed by the ITUBS.1770 standard. All measurements are done both pre- and post- the final processing at Swedish Radio. The measurements have been done both on short (less than 20 seconds) excerpts of sound and on whole radio shows. The measured Leq(R2LB) before processing gives a hint of the differing loudnesses in the incoming signal. It gives an answer to the question: How big difference do we receive in the incoming material to the central apparatus room? The post processing measurement shows us how much the final processors helped in smoothing the level jumps to improve the listening experience. By measuring both short excerpts and long shows we can test the different methods of measure . One hypothesis is that an audio engineer or a producer works to find an even loudness of her own show, but with no control over the adjacent shows or trailers. Will the 20 second measurement give a good enough hint of the loudness for the full show? Will a 20 second quick listen be enough for an audio engineer to smooth the play out levels between shows? Is it better to use processing equipment for control of loudness inside shows, but use some form of overall loudness calculation as a help to the processors? As an additional measure, Replay gain is compared to the processing equipment at Swedish Radio. This measurement is done to see how well the Replay gain algorithm correlates with Leq(R2LB) and the final processing units at Swedish Radio. 19 3.2 Results of measurements 3.2.1 Results – short recordings The first part of the results concerns the short recordings. This contains excerpts from Swedish Radio the 26th October 2006. The excerpts are actively chosen to exemplify special cases in broadcast, for example some examples are chosen because there is a narrowband telephone and others are chosen because there are loud applauses among the studio guests after a short live song. 3.2.1.1 P1 speech channel Table 2 shows the Leq(R2LB) results from short file excerpts both before and after 1 2 3 4 5 6 7 8 9a 10 11 12 13 14 15 16 17 18 19 male_female_iv male_iv female_studio male2_iv male1LoudStreet_iv female1_tel male_studio_Presenter maleAndmale_trailer female_news maleFemale_trailBlockEnd maleFemale_trailBlockStartNext female_ivQuiet maleAndmale_studio male_studioLoud male_tele female_tele female_studioLoud male_teleSoft male_teleLoud 20 20 20 20 15 20 17 20 20 14 12 15 19 10 19 19 13 20 09 -29,1 -28,3 -23,6 -24,5 -25,7 -24,7 -28,8 -26,0 -24,4 -30,5 -23,8 -35,1 -27,3 -23,9 -24,0 -25,9 -24,5 -28,4 -24,1 CRC Leq Post Processor - CRC Leq Pre Processor - seconds Duration in Info Sample number the final dynamic processors. -21,2 -20,1 -18,4 -18,5 -18,2 -18,1 -20,3 -18,8 -18,1 -21,3 -19,4 -23,6 -20,3 -18,4 -17,9 -18,6 -17,4 -19,3 -17,6 Table 2. Leq(R2LB) measurements from the speech channel, P1, at Swedish Radio. In fig. 11 the measurements are plotted graphically. 20 P1 Short files Pre/Post Final Processors P1 PRE processors P1 POST processors 0,0 0 2 4 6 8 10 12 14 16 20 18 -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Sample Fig. 11. The Leq(R2LB) short files recordings from the speech channel P1 at Swedish Radio. Already from this graph it is clear that the final dynamic processing does a good job with material that is mainly speech. Fig. 12 shows the basic statistics of these data. P1 Short files Pre/Post Final Processors 20,0 10,0 Leq(R2LB ) 0,0 -10,0 -20,0 -30,0 -40,0 Mean Short P1 StandardDev Short P1 StandardErr Short P1 Max distance to Range Short P1 Max Short P1 meanShort P1 P1 PRE processors -26,5 3,0 0,7 8,6 11,5 -23,6 -35,1 P1 POST processors -19,2 1,6 0,4 4,4 6,3 -17,4 -23,6 Min Short P1 Fig. 12 shows the mean, standard deviation, standard error and max distance from mean for short recordings from the speech channel P1 at Swedish Radio. 21 3.2.1.2 P3 pop music / speech channel Table 3 shows the Leq(R2LB) results from short file excerpts both before and after 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 9b female_studLoud male_studEND music_showID music_aCappellaPunkRocker music_EminemSmackThat MaleAndMale_TrailerStuIv FeMale_Studio Music_RedHotChillipeppersSnow Music_ShowID Female_Studio MusicIntro_MartinStenmarkSjumilakliv MusicSong_MartinStenmarkSjumilakliv FemaleAndFemale_studio Female_studioNews MaleAndMale_ivNews MaleAndFemale_teleLittleStudio MaleAndFemale_ShoutAndLaughter Music_GainsbourgTheSOngsThatWeSing FemaleAndFemale_Studio Music_LiveAccordionSong Studio_FewAppleause female_news 20 18 20 20 20 20 18 17 15 17 12 20 21 12 07 17 09 10 18 20 05 20 -28,5 -28,1 -28,2 -29,9 -30,6 -24,8 -28,1 -25,9 -29,6 -33,2 -35,0 -28,1 -30,3 -34,9 -31,0 -30,8 -26,4 -27,6 -28,7 -24,7 -33,6 -28,4 Table 3. Leq(R2LB) measurements from the pop music and speech channel, P3, at Swedish Radio. In fig. 13 the measurements are plotted graphically. 22 CRC Leq Post Processor - CRC Leq Pre Processor - seconds Duration in Info Sample number the final dynamic processors. -17,1 -18,0 -15,6 -16,7 -16,9 -15,8 -17,2 -13,9 -16,2 -19,9 -18,1 -14,3 -18,2 -19,0 -18,7 -18,9 -17,2 -14,4 -17,0 -14,7 -22,6 -17,9 P3 Short Files Pre/Post Final Processors P3 PRE processors P3 POST processors 0,0 0 5 10 15 20 25 -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Sample Fig. 13. The Leq(R2LB) short files recordings from the pop music and speech channel P3 at Swedish Radio. For this channel, it is not obvious that the final dynamic processing does a good job. The Leq(R2LB) does not indicate any major difference pre and post the final dynamic processing. It should be noted both that the music on P3 is pre-processed using the Replay gain algorithm before it enters the play out system, and that parts of the final P3 dynamic processing system works at a lower ratio. Fig. 14 shows the basic statistics of these data. 23 P3 Pre/Post Final Processors 15,0 10,0 5,0 0,0 Leq(R2LB ) -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Mean Short P3 StandardDev Short P3 StandardErr Short P3 Max distance to Range Short P3 Max Short P3 mean Short P3 P3 PRE processors -29,4 2,9 0,6 5,7 10,4 -24,7 -35,0 P3 POST processors -17,2 2,0 0,4 5,4 8,7 -13,9 -22,6 Min Short P3 Fig. 14. shows the mean, standard deviation, standard error and max distance from mean for short recordings from the pop music and speech channel P3 at Swedish Radio. 3.2.2 Results – long recordings The previous results concerned the use of a short-term meter measuring selected sound bytes. The results give us some answers to the question; are our final dynamic processors good enough? In real production, perhaps with an automatic pre-processor that decides what corrections should be made to the sound file prior to broadcast, some research has to be made on the files of complete shows. Below is the speech channel, P1, for the same day, but this time edited so that one file example is exactly one program. This is the way a pre-broadcast measurement device would work. 24 3.2.2.1 P1 speech channel Table 4 shows the Leq(R2LB) results from long file excerpts both before and after the 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 Ring P1 - Phone in show Trailer 1 Trailer 2 Ekot10 - Short News Trailer 3 Trailer 4 Meny - Studio show, Food Trailer 5 Trailer 6 Ekot11 - Short News Presenter 1 - Christoffer Murray Trailer 7 Presenter 2 Tendens - Interview show Presenter 3 Pre Theatre Talk studio Theatre book reading Presenter 4 Noon Church Bells 38.56 00.18 00.31 02.58 00.30 00.33 54.42 00.32 00.30 02.59 01.22 00.35 00.30 29.29 00.26 01.26 22.24 00.43 00.54 -26,4 -28,9 -27,5 -25,1 -28,3 -26,8 -29,6 -29,7 -24,9 -25,3 -28,6 -25,9 -29,2 -25,8 -32,7 -29,4 -30,2 -32,5 -25,8 CRC Leq Post Processor - CRC Leq Pre Processor - Duration mm.ss Info Sample number final dynamic processors. -18,7 -20,1 -18,7 -18,4 -19,5 -19,1 -20,8 -20,5 -19,7 -18,7 -20,5 -18,9 -21,1 -19,0 -22,6 -20,1 -20,6 -22,5 -17,4 Table 4. Leq(R2LB) measurements from the speech channel, P1, at Swedish Radio. In fig. 15 the measurements are plotted graphically. 25 P1 Long files Pre/Post Final Processors P1 long PRE processors P1 long POST processors 0,0 100 105 110 115 120 125 130 135 -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Sample Fig. 15. The Leq(R2LB) long files recordings from the speech channel P1 at Swedish Radio. Fig. 16 shows the basic statistics of these data. P1 Long files Pre/Post Final Processors 15,0 10,0 5,0 0,0 Leq(R2LB ) -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Mean Long P1 StandardDev Long P1 StandardErr Long P1 Max distance to mean Long P1 Range Long P1 Max Long P1 Min Long P1 P1 Long PRE processors -28,4 2,5 0,4 5,3 10,4 -23,1 -33,5 P1 Long POST processors -20,2 1,5 0,3 4,3 7,1 -17,4 -24,5 Fig. 16. shows the mean, standard deviation, standard error and max distance from mean for long recordings from the speech channel P1 at Swedish Radio. 26 3.2.3 Comparing Replay gain with Leq(R2LB) For pop music, Swedish Radio uses the Replay gain algorithm for measuring and correcting the loudness of imported CD-tracks, before storing the file in the central database. Swedish Radio uses a Replay gain software implementation called A-Wave by Fmj-Software (www.fmjsoft.com). Can this software be used on full-length pre produced programs? Below is a comparison between the post processing equipment at Swedish Radio and A-Waves Replay gain algorithm. Table 5 shows the Leq(R2LB) results from long file excerpts both after the final dynamic processors and the same unprocessed files gain corrected by A-Wave. 27 Ring P1 - Phone in show Trailer 1 Trailer 2 Ekot10 - Short News Trailer 3 Trailer 4 Meny - Studio show, Food Trailer 5 Trailer 6 Ekot11 - Short News Presenter 1 - Christoffer Murray Trailer 7 Presenter 2 Tendens - Interview show Presenter 3 Pre Theatre Talk studio Theatre book reading Presenter 4 Noon Church Bells 38.56 00.18 00.31 02.58 00.30 00.33 54.42 00.32 00.30 02.59 01.22 00.35 00.30 29.29 00.26 01.26 22.24 00.43 00.54 -18,7 -20,1 -18,7 -18,4 -19,5 -19,1 -20,8 -20,5 -19,7 -18,7 -20,5 -18,9 -21,1 -19,0 -22,6 -20,1 -20,6 -22,5 -17,4 - CRC Leq Replay gained files CRC Leq Post Processor - Duration mm.ss Info Sample number 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 -31,4 -30,9 -30,2 -30,0 -30,3 -29,9 -30,9 -30,1 -29,6 -30,2 -28,3 -29,7 -28,4 -30,3 -28,5 -29,4 -28,8 -27,8 -29,8 Table 5. This shows the measurements of post processor versus the Replay gainalgorithm. Long recordings (full programs) from the speech channel P1 at Swedish Radio. In fig. 17 the measurements are plotted graphically. 28 P1 Long files Post Final Processors / Replay Gain calculation P1 long Replay Gain P1 long POST processors 0 100 105 110 115 120 125 130 135 -5 -10 -15 -20 -25 -30 -35 Sample Fig. 17. This shows the measurements of post processor versus the Replay gainalgorithm. Long recordings (full programs) from the speech channel P1 at Swedish Radio. P1 Long files Post Final Processors / Replay Gain calculation 10,0 5,0 0,0 Leq(R2LB ) -5,0 -10,0 -15,0 -20,0 -25,0 -30,0 -35,0 -40,0 Mean Long P1 StandardDev Long P1 StandardErr Long P1 Max distance to mean Long P1 Range Long P1 Max Long P1 Min Long P1 P1 Long Replay Gain -29,8 1,3 0,2 3,3 5,8 -27,3 -33,1 P1 Long POST processors -20,2 1,5 0,3 4,3 7,1 -17,4 -24,5 Fig. 18 compares post processor versus the Replay gain-algorithm. Long recordings (full programs) from the speech channel P1 at Swedish Radio. All measurements done according to Leq(R2LB) 29 In fig. 18 it can be seen that the Replay gain algorithm as measured by Leq(R2LB) outperforms the post processors in use today at Swedish Radio. The standard deviation is only 1.3 dB. The important factor “Max distance to mean” shows how poorly the worst case would end up in an automatic system for this type of material. 3.3 Conclusions of measurements All the above measures are calculated with the use of Leq(R2LB), which is the proposed ITU standard BS.1770. According to Skovenborg/Nielsen’s research, this type of measure is subjectively the best in terms of listener satisfaction. The effect of introducing measurements to files prior to broadcast can be seen in fig. 16. This figure displays the values measured from complete sound files from the talk channel. The maximum distance to the channel’s mean loudness is the maximum error from the listeners’ point of view. If an automatic levelling system processed the full radio programme file, this figure shows how big the maximum error in dB would be for the listener. Using this measure, the maximum error would be 4.3 dB compared to a Leq(R2LB) measurement. In the measurement with shorter excerpts from the speech channel (see fig. 12) the maximum error is 4.4 dB. The Replay gain algorithm was slightly closer to the mean value with a maximum distance of 3.3 dB, see fig. 18. The Final processors at the Swedish Radio do even out the loudness. Previous research also tells us that pre measurement using Leq(R2LB) evens out the loudness. An implementation of equal loudness at Swedish Radio could be firstly measurement with Leq(R2LB) meters at production, secondly pre measurements of files using either Leq(R2LB) or Replay gain and thirdly the same final processing that is already in use today. The pop music channel, P3, did not show much difference in measured Leq(R2LB) pre or post the final processing, see fig. 14. The maximum error to mean is 5.7 for the pre process and 5.3 dB for the post process measurement. All music channel measurements were done with short excerpts. The music channel would probably gain in equal loudness if the final processors were used more aggressively. 30 4 Workflows and levels at the Swedish Radio 4.1 The pre-digital era Before digital media became available, most recordings at the Swedish Radio were made (in the very early days) on shellac vinyl or steel tape and later from about 1950 on ¼ inch open reel tape. I have been assessing a great amount of recordings made at different times through my collaboration with Swedish Radios archive channel, SR Minnen. The oldest recordings date back to the 1940’s and the newest are from 2007. The early recordings have a much narrower bandwidth. The only treble that exists is from hiss and cracks from the shellac records. During the Second World War the recordings were mainly done on 800 recycled ¼ inch open reel tapes. The recordings that were to be kept in the archives were transferred to shellac records. In the early nineteen-fifties, Swedish Radio began to save open reel tapes in the archive. The bandwidth and the dynamics of the recordings increased, and hence the sound quality improved. The dynamics were quite large at that time, presumably due to less overall control and no availability of dynamic processing during the recordings. There was still a place in the living room where the family spent time together and where listening was done in full concentration, close to the radio. Radio theatre shows from this time are dynamic. They are totally unusable today without remastering, gain riding or heavy compression, as preparation for usage in an iPod or retransmission on the digital archive channel. But the shows probably worked quite well so long as the radio was the main focus. Soon there was mobile listening in cars and small plastic radios. Gone was the living room with full-concentration listening. Now the radio had to compete with a growing number of loud sources. All radio broadcasts have a maximum modulation level. If your transmitters level exceeds a maximum level of modulation it will interfere with adjacent radio frequencies. The solution to control transmission levels is the compressor. It was also heavily used when recording shows. Around 1960, other channels started their transmissions (Radio Nord) and something had to be done to maintain the competitiveness of public service radio. Soon new 31 public service radio channels were on air, and these had much less dynamics, probably because everyone looked to the U.S., where the radio commercials drove the need for high listener ratings. The thesis was formulated that a louder station caught more listeners, and so the race for more compression was on. Since the compressor began its days in radio, not much has changed in the production process, from the loudness perspective. 4.1.1 Methods and workflow – pre digital era • Recording in the field to 1/4 inch open reel tape, later in time recording to DAT. • Editing by cutting and splicing the original tape, or an open reel copy if the original recording was on DAT. • Correcting levels manually during the copying process. An audio engineer copies the edited tape, constantly moving the fader to counteract differences in loudness. At this stage compression and equalisation may also be applied. At this stage the copy could have been made to either 1/4 inch open reel or later to DAT tape. The sound levels are controlled using a PPM EBU/Nordic bar graph. Usually a RTW meter. For P1 the maximum levels touched +6 dB (0 = -18 dBFS), for P3 the maximum levels were allowed to peak at +9 dB. • Analogue broadcast were constantly controlled in the FM-continuity (Swedish: Programkontroll). Manual level control was applied so that adjoining radio shows could be heard without the need for the listener to reach for the volume control on their radio. This last part was preferably done with the same audio engineer controlling the flow for as many continuous hours as possible. • The A/D conversion is done with 18 dB headroom to Full Scale (the music channel P3 is converted utilising 22 dB headroom). • Digital Final processing at the Master Control of Swedish Radio. This processing consisted of a limiter as the last precaution before leaving the signal for final transmission. Swedish Radio had full control of the settings in 32 these processors even if they physically were placed close to the FM transmitters, operated by Televerket Radio (new name: Teracom). The control of these processors has always belonged to Swedish Radio. • Today the processors consist of two units in series. First there is an automatic gain controller, jointly developed by Swedish Radios Torbjörn Wallentinus and the company Factum Electronics. The Cadenza was developed during a ten-year period of time. It began with an analogue prototype and later it became a digital DSP based unit. The second processor in the row is the multi band processor Orban 8200. Today, this unit is used mainly as a top limiter. These processors are used on all Swedish Radio’s FM channels except the classical music channel P2 which instead uses a processor from Omnia. The web channels solely use the Cadenza for levelling. The sound level measuring device at the Swedish Radio has been a PPM meter using the EBU/Nordic Scale. 4.2 Digital era Here, we define the digital era as the era beginning with non-linear editing with computer software. The workflows below are presented graphically in section 10. 4.2.1 Methods and workflow – pre-produced in the digital era • The journalist uses a compact flash or hard disk recorder. The recordings are done in either uncompressed PCM 16-bit 48kHz stereo files or directly encoded into MPEG I Layer II at 384kbit/s. • At the radio station, the recorded sound files are copied into the sound managing system DiGAS (by DaVID Gmbh in Munich). If originally recorded in PCM, they are at this point converted to Layer II 384 kbit/s. • Editing is done mainly in DaVID’s Multitrack editor where plug-ins for dynamics and equalization are applied. This editing is often done on the ordinary office computer without functional loudness meters. 33 • The final mix is often made together with an audio engineer using a PPM EBU/Nordic Meter. Sometimes the final mix is done by the journalist using a PPM EBU/Nordic Meter in a small pre-production studio. • The final mix is then placed in the play list by an engineer who also makes sure that the levels are correct by using his or her ears. • The broadcast server automatically transmits the packaged radio. • The final processors are the automatic gain controller Cadenza by Factum Electronics followed by a multi band 8200-processor by Orban. • The signal is fed to the transmitter operator, Teracom, via digital J.57 linear PCM circuits. In principle, no gain shift should occur through this chain. • The signal is also fed through a break-out box to web-feeds, reference recordings etc. 4.2.2 Methods and workflow – live radio in the digital era The sound engineer mixes the live show with the help of a Nordic/EBU PPM meter. • Sometimes the pre-recorded sound segments are evaluated prior to broadcast utilising a Nordic/EBU PPM meter. • The final mix is transmitted to the central apparatus room. • The final processors are the automatic gain controller Cadenza by Factum Electronics followed by a multi band 8200-processor by Orban. The latter is used mostly as a top limiter. • The signal is fed to the transmitter operator, Teracom, via digital J57 linear PCM circuits. • The signal is also fed through a break-out box to web-feeds, reference recordings etc. 34 4.3 The future – method and workflows 4.3.1 Production monitor sound levels? As can be seen in fig. 3, the ear does not have a linear frequency response at differing sound levels. How loud should the production environment be? The sound production facility should be listening at the same intensity as the typical listener. The typical listening levels for TV in actual homes is 60 dBA [Benjamin, 2004]. With the new patterns of broadcast consumption this level is not enough. The listeners use both headphones and speakers. And each listening device has its own preferred sound level, see fig. 19. Fig. 19. Preferred listening levels for different groups of employees at the Danish Radio & TV. 1 Administration (non-engineer) 2 Journalists (non-engineer) 3 Classical music engineers 4 Pop/Rock music engineers 5 Noise engineer [Brixen (2001)]. 35 Many believe that there never is a correct volume to listen at during production. One must listen at all levels, both headphones and loudspeakers. If it is a passage where music or sound effects is layered with the speech, this part should be test listened in at least two different configurations. 4.3.2 How to implement fully controlled loudness levels in an automatic broadcast Swedish Radio uses the sound production system DiGAS by DaVID Gmbh [http://www.david-gmbh.de]. This system contains all parts of modern radio production: • DBM is the database manager. Contains all sound files and all their metadata. These metadata contains among other things the play out volume. • Multitrack is the sound editor. • BCS is the play out backbone database system. It contains all data regarding the delivery of the files. • Digairange is the front end of the BCS. It is used in packaging the pre produced sound files. This tool is also used for recording of speech between adjacent sound files (voice tracks). • Digaroc is the play out part of the BCS. It handles play out of both pre produced shows and items in a live show. Automatic loudness control can be used in Swedish Radios workflow in two ways; 1. As an automatic loudness leveller early in the production flow. 2. As an automatic loudness leveller processing of complete sound files or their metadata prior to play out. 36 4.3.3 Manually calculating parts of a show The journalist records in the field without absolute control of the sound levels. The journalist’s focus should be on the interviewed person. The two important aspects for the journalist level wise is digital overload and a recording made at too low level. A topic of discussion relating to loudness is that of True Peak problems. This area of audio technology studies how an almost full-scale digital signal can distort. [Lund & Nielsen, 2004] As a precaution the sound level for MPEG-1 Layer II 384kbit/s is recommended to never exceed -3 dBFS on the digital peak meter on the recording device. When the journalist returns from the field, all the recorded sounds are transferred into the DiGAS database. The show item is edited in the Multitracker or appropriate sound editor. During the editing process the show item is levelled with a short term meter. Later a button can be clicked to calculate the Leq(R2LB) of the complete show item. The usage would be: save the file (bounce) from the Multitracker, click on the loudness button and the appropriate sound level correction would be noted in the files metadata. When this file later is planned for play-out inside a show, the sound level correction is automatically transferred to the planner Digairange, to the play out database BCS and to the delivery system Digaroc. This system could e-mail its correction levels back to the journalist. This would in turn allow the journalist to improve her level usage in future productions. This would also warn the journalist and let the journalist manually override the automatic level settings. 4.3.4 Full sound file levelling The levels can also be automatically set volumes as they are imported in to the planner Digairange. If a show item is planned a computer software could easily be 37 programmed to measure the sound files Leq(R2LB) and then set the play out level according to the measurements correction gain value. 38 5 Real life meter usage I have been using the BBC Leq(R2LB) meter together with a mixing engineer in production for over a year. How well does it perform in real life production? Below are some comments of situations where the Leq(R2LB) measure does not function as expected. The meter is set up in loudness mode. It is calibrated so that normal speech keeps the needle close to 4. Peaks are controlled with a security limiter post metering. We biased our readings to make sure the same loudness was achieved, see table 6. Situation Actual reading to obtain equivalent loudness Normal speed speech 4 Fast speech 3 Close voices (proximity effect) 3 Telephone hybrid 5 Old recordings (from the 1940’s) 4 Modern music without speech 3 Table 6. Biasing meter readings to accomplish equal loudness over different programme material. 39 40 6 Discussion concerning a new meter In the above description it can be seen how the Leq(R2LB) meter functions in real life. The meter can be easily fooled. A user must judge the meter readings. Could we possibly build a meter that did this for us? It is out of the scope of this report, but I believe a meter could be built to work better for non-technical staff or an automatic levelling system. First of all, the meter should classify the material. Is it music, speech or telephone? Is someone too close to the microphone? Is it broadband sound from a recent recording or is it an old narrowband recording. Is it a quick talker? Is it a dynamic talker? Is the measured signal pre-limited? All the above-mentioned different types of sound can after classification be biased before measurement so that the meter becomes a help in loudness measurement. This meter is hereafter called category meter. The idea of a category meter has some resemblance of measuring with the help of critical bands, but in a less universal implementation. In turn, with less universality the meter might improve its exactness. The new category meter could probably be a meter with a needle that constantly should stay in a well-defined space. The implementation could also be used to bias the reading if for example the broadcaster chooses to suppress the music volume. 41 42 7 Other comments closely related to loudness in broadcasting 7.1 Balance speech/music During the spring 2002 I worked as a audio engineer at Swedish Radios young music and speech channel, P3. Usually for morning and day shows we mixed speech 3-6 dB above the highly limited CD music. But in the evening there is a show containing Live Music, P3 LIVE. We all understood that these live listeners sitting at home with the volumes turned up, when the short speech segments were played out they would blow their ears away. We usually fixed this by mixing the speech comparatively much softer in these late live shows. 7.2 Classical music Classical music cannot be measured and automatically processed in the same manner as speech or pop music. Classical music is dynamic, and the listeners often want this wide usage of dynamics to be intact the whole way to the receiver. It can be argued if this will be true in the future with a new generation of listeners and their new listening devices. Can an iPod user on the bus in the morning rush hours really hear the quiet passages in classical music without turning up the volume, and at the same time turning up the peaks in the music. Maybe the iPod’s of tomorrow will include a compressor for these types of dynamics problems. 7.3 Different meter usage between channels P1’s audio engineers let the speech peaks after compression touch the +6 dB mark on the PPM EBU/Nordic meter. P3’s engineers let these peaks touch +9 dB instead. At the Swedish Radio’s central apparatus room the P3 is turned down 4 dB relative P1. This discrepancy was introduced at P3 when they during the seventies installed 43 mixing desks with 22 dB head room instead of the old desks with a head room of 18 dB. This difference between the channels works fine as long as it is a simple FM transmission. The problems arise later in the production. The unprocessed P1 and P3 signals are recorded in the central apparatus room for later reprise use. If this signal is fed back a second time in the broadcast chain, the P3 signal now is 4 dB even lower, a total of 8 dB lower. A different problematic situation is when different channels use each other’s material live. For example, the archive channel SR Minnen is set up as P1 with reference level of -18 dB equivalent to test tone in studio at TEST level (PPM EBU/Nordic 0 dB). Alternative transmission if the main play out computer breaks is the classical music channel P2. P2 too uses a different reference level, being -15 dB equivalent to test tone in studio at TEST level (PPM EBU/Nordic 0 dB). When the broadcast is redirected to use P2 instead of SR Minnen some listener almost always phones in reacting to this massive volume increase (apart from complaining about the missing main programme). 44 7.4 Different dynamics for different usages Today transmission of material from Swedish Radio is done to many listeners in many different ways. Table 7 gives examples of all the different forms of distribution, and their most usual listening preference. Distribution Form Assumed Listening Device FM transmission Car stereo, Small speaker radio, Living room stereo system with large speakers and subwoofer AM transmission Advanced Small speaker radio DAB transmission Car stereo, Small speaker radio, Mobile telephone’s hands free and loudspeaker, Living room stereo system with large speakers and subwoofer Web streaming transmission (cheap) Computer speakers, Soon mobile telephone’s hands free and loudspeaker Web On Demand transmission (cheap) Computer speakers Podcast Earphones Table 7 describing different distribution forms and different listening devices. These different listening situations demand differing dynamic ranges. Dynamic range is the distance between the softest and loudest sound in a recording, usually measured between quietest and loudest speech level. Thomas Lund at TC Electronics has studied varied broadcast consumption. In fig. 20 he shows how the differing consumption patterns need different dynamic ranges. 45 Fig. 20. Dynamic Range Tolerance for consumers under different listening conditions [Lund, 2006] Consumption of media is today done at almost any location. Different locations have varying background disturbances, see table 8. SPL A weighted SPL C weighted Living Room, Suburban 45 dB Living Room, Urban 55 dB 70 dB Inside Car 65 dB 85 dB Inside Jet 75 dB 90 dB Walk in Traffic 80 dB 92 dB Subway 90 dB 100 dB Table 8. Typical surrounding noise levels measured by Lund. All environments are realistic for broadcast consumption today. 46 8 Acknowledgment I would like to thank the following people in helping me with this paper • My boss at Swedish Radio, Mikael Cohen, for discussions and help with sound levels. He taught me the importance of even sound levels, mainly in documentary production. • My professor, Sten Ternström, at Kungliga Tekniska Högskolan in Stockholm. He has opened the world of research for me, and helped with my English. He is also really good at deciding deadlines – and forgiving when I missed them. Without these this paper would never have been completed. • My supervisor at Swedish Radio, Lars Jonsson, for all help with my texts. I would also like to thank for all valuable contacts both inside Swedish Radio, and outside in the world of research. • Pelle Holmquist in the central apparatus room for explaining the final dynamic processes and also letting me record pre-and post-process signals. • Fredrik Nilsson. A sound engineer who has helped me evaluate the BBC loudness meter. • Björn Melander, my tutor in the studio. Produces relaxation music with relaxing speech. Taught me the neurotic behaviour of million level breakpoints in every production to ensure even loudness, very important for relaxation (to the listener, not the engineer). • Gösta Konnebäck. The grand old man in radio. He opened my eyes to the world of radio in my early teens. • My wife Naomi and my two kids Hannah and Maya for helping me test loudness. Every morning and evening I turned the volume of the TV set slowly down, noting where they started complaining. At the same time I measured the dynamic content of the cartoons. Hannah and Maya also helped me with the yellow marking pen at the time of reading all the previous research. 47 48 9 Glossary AES/EBU – A standard for digital sound, both dataformat in the cable and the cables connectors. DAT – Digital audio tape. A digital tape system. Not used in production anymore. DiGAS – A broadcasting system consisting of a sound database, recorder, editor, multitrack mixer, play-out system. Master Control at Swedish Radio – The room at Swedish Radio where all the nationwide signals are collected before transmission to Teracom, the FM-distributor. In this room the final processing units are positioned. Metadata – Extra data associated with a sound file such as production number, title, length and most importantly for loudness purposes; sound play-out volume correction. By writing this figure in metadata instead of recalculating the file saves processing power and leaves the sound file untouched. Nordic PPM – The meter used at the time of this writing at Swedish Radio. Teracom – The FM-distributor used by Swedish Radio. Swedish Radio feeds the master signal to them for further distribution. 49 50 10 Production Flowchart for Swedish Radio 51 52 11 References Akustik Technologie Göttingen, 2004. Webpage http://www.akutech.de/mainpage/psychoa.htm mentioned in Skovenborg and Nielsen, “Evaluation of different Loudness Models” AES convention paper 2004. Backus, John (1977) “The Acoustical Foundations of Music”, 2nd Ed, W W Norton, New York. Benjamin E (2004) “Preferred Listening Levels and Acceptance Windows for Dialog Reproduction in the Domestic Environment”. 117th AES convention, San Fransisco. Preprint 6223 Bonello (2007) “Multiband Audio Processing and Its Influence on the Coverage Area of FM Stereo Transmission” JAES March 2007 Brixen (2001) “Report on Listening Level in Headphones”. Document KKDK-06801-ebb-1 for the Danish Radio, Copenhagen. Brixen (2001:2) “Audio Metering”, Broadcast Publishing & DK Audio A/S, Denmark Hyperphysics (loud) Internet; http://hyperphysics.phyastr.gsu.edu/hbase/sound/loud.html Hyperphysics (place) Internet; http://hyperphysics.phyastr.gsu.edu/hbase/sound/place.html Hyperphysics (eqloud) Internet; http://hyperphysics.phyastr.gsu.edu/hbase/sound/eqloud.html ITU-R (2006) proposition for BS.1770, from Srg-List at yahoogroups.com 2006-04-26 53 Lund, Thomas & Nielsen, Søren H. (2003) “Overload in Signal Conversion”. Presented at 23rd AES International conference, Copenhagen, Denmark, 2003-05-23. Lund, Thomas (2006) NAB BEC Proceedings. Lund, Thomas (2007) Presentation at AES 2007-05-05. Vienna. Privately mailed keynote. Moore 1982, “An introduction to the Psychology of Hearing” ReaplayGain (2001) http://replaygain.hydrogenaudio.org/ Sengpielaudio Internet; www.sengpielaudio.com/calculatorSonephon.htm Skovenborg & Nielsen (2004) “Evaluation of different Loudness Models”, AES convention paper. Skovenborg (2007) Private mail to the author of this text 2007-04-24 Ternström, Sten, (2002) “Ljud”, (2. ed.), KTH Todd, Craig (2007) In e-mail to the maililnglist Srg3-List at yahoogroups.com 2007-03-23 Zwicker E. & Fastl H. (1999) "Psychacoustics: Facts and Models" (2. ed.), Springer Series in Information Sciences, 22, Berlin: Springer-Verlag. Zwicker E. & Flottorp G. & Stevens S. S. (1957) “Critical bandwidth in loudness summation.” J. Acoustical Society of America, 29(548). 54
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement