The Audio Expert
The Audio Expert
Everything You Need to Know About Audio
Ethan Winer
Focal Press is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
The Boulevard, Langford Lane, Kidlington, Oxford, OX5 1GB, UK
© 2012 Elsevier Inc. All rights reserved.
All figures and photos not explicitly credited are copyright © 2012 by Ethan Winer or RealTraps, LLC. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements
with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website:
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or
medical treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein.
In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of
products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.
Library of Congress Cataloging-in-Publication Data
Winer, Ethan.
The audio expert : everything you need to know about audio / Ethan Winer.
p. cm.
ISBN 978-0-240-82100-9
1. Sound—Recording and reproducing. 2. Music—Acoustics and physics. I. Title.
TK7881.4.W56 2013
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
For information on all Focal Press publications visit our website at
12 13 14 15 5 4 3 2 1
Printed in the United States of America
For my Wife, Elli Mastrangelo Winer, who endured many late dinners and missed pinball games during the ten months it took me to write
this book.
It’s important to thank the people who were most in uential over the years in teaching me “how technology works.” First was Marty Yolles,
who, when I was about 12, spent many Saturday afternoons helping me overhaul an old lawnmower engine I would later use for a home-made
go-cart. Then later in my teens and early 20s, Cli Mills, Alan Podell, Marvin Fleishman, and others graciously gave their valuable time to
answer my endless beginner questions about audio and radio electronics. Later, Leo Taylor and I designed and built several analog synthesizers
and other audio devices, with Leo doing most of the designing and me doing all of the learning. Leo worked as an engineer at Hewlett-Packard,
so I got to play with every type of audio test gear imaginable as we created and re ned our designs. Finally, my thanks especially to Bill Eppler,
who for more than 30 years has been my friend, mentor, and advisor on all technical matters.
I mention these in uences because, aside from lawnmower engines and electronics, these people also taught me what I call the “ham radio
ethic” of sharing information freely and with endless patience. When I grew up in the 1950s and 1960s, ham radio was popular, and many
towns had a local ham radio club where you could learn about electronics for free in a friendly and informal atmosphere. This ethic of sharing
has been a driving force throughout my entire life. It was why I wrote my rst book in 1991—a manual for programmers to explain the
unpublished internal workings of Microsoft’s DOS BASIC compilers—even though at the time I owned a software company that sold products
based on the same information! My company was unique because we included all of the assembly language source code so people would not
only buy a packaged solution but could also learn how the programs they bought worked.
It always bothers me when people are secretive or protective of their knowledge, as if sharing what they know means they’ll no longer be
as valued or even needed. This is why today I continue—through my posts in audio forums and articles on my personal website—to help people
build acoustic treatment, even though my current business manufactures and sells acoustic treatment products.
Many people contributed and helped me over the ten months it took to write this book. I’d like to thank Catharine Steers of Focal Press for
recognizing the value of this book and helping to make it a reality. Carlin Reagan, also of Focal Press, guided me throughout the process. Expert
audio engineer and journalist Mike Rivers did the technical review, o ering many incredibly helpful comments and suggestions. My friend,
cellist, and electronics engineer, Andy Woodru , provided formulas for the decibels.xls spreadsheet, and also performed the cello demonstration
video. My friend, black belt electronics engineer Bill Eppler, contributed his considerable wisdom and helped with fact checking, as did
microphone expert Bruce Bartlett, video expert Mark Weiss, acoustics expert Wes Lachot, and loudspeaker maven Floyd Toole.
I must also thank Collin Wade for the saxophone demo; David Gale for his violin expertise; Steve Isaacson and Terry Flynn, who
demonstrated the piano; expert arranger Arnie Gross for years of musical tutoring; luthier Bob Spear for his string instrument expertise; and
composer Ed Dzubak, who contributed several pieces for the plug-in e ects demos. I o er my special thanks to John Roberts, a professional
electronics engineer and friend since we worked together in the 1970s, for his many valuable suggestions.
About the Author
Ethan Winer is a reformed rock ’n’ roll guitar and bass player who sold his successful software business in 1992 at the
age of 43 to take up the cello. Ethan has, at various times, earned a living as a recording engineer, studio musician, computer programmer,
circuit designer, composer/arranger, technical writer, and college instructor. In addition to a best-selling book about computer programming,
more than 100 of his feature articles have been published in audio and computer magazines, including Mix, PC Magazine, Electronic Musician,
EQ Magazine, Audio Media, Sound on Sound, Computer Language, Microsoft Systems Journal, IBM Exchange, Strings, Keyboard, Programmers
Journal, Skeptic, The Strad, Pro Sound News, and Recording. Ethan is also famous (some might say infamous) for his no-nonsense posts about
audio science in online audio forums.
Besides his interest in audio and computers, Ethan produced two popular Master Class videos featuring renowned cellist Bernard
Greenhouse and ve CDs for Music Minus One, including a recording of his own cello concerto. His Cello Rondo video has received more than a
million views on YouTube and other websites. Besides writing, playing, and recording pop tunes, Ethan has composed three pieces for full
orchestra, all of which have been performed publicly. He has also played in several amateur orchestras, often as principal cellist.
Ethan lives in New Milford, Connecticut, with his wife Elli and cat Noah. In 2002 he started the company RealTraps to manufacture bass
traps and other acoustic treatment, which he continues to this day. When he’s not watching reruns of The Simpsons or writing music, Ethan
enjoys cooking and playing his collection of vintage pinball machines.
Hundreds of books about audio have been published over the years, so you might well ask why we need another book on the topic. I’ll start
with what this book is not. This book will not explain the features of some currently popular audio software or describe how to get the most out
of a speci c model hard disk recorder. It will not tell you which home theater receiver or loudspeakers to buy or how to download songs to
your MP3 player. This book assumes you already know the difference between a woofer and a tweeter, and line versus loudspeaker signal levels.
But it doesn’t go as deep as semiconductor physics or coding digital signal processing algorithms in assembly language. Those are highly
specialized topics that are of little use to most recording engineers and audio enthusiasts. However, this book is definitely not for beginners!
The intended audience is intermediate- to advanced-level recording engineers—both practicing and aspiring—as well as audiophiles, home
theater owners, and people who sell and install audio equipment. This book will teach you advanced audio concepts in a way that can be
applied to all past, current, and future technology. In short, this book explains how audio really “works.” It not only tells you what, but why. It
delves into some of the deepest aspects of audio theory using plain English and mechanical analogies, with minimal math. It explains signal
ow, digital audio theory, room acoustics, product testing methods, recording and mixing techniques, musical instruments, electronic
components and circuits, and much more. Therefore, this book is for everyone who wants to truly understand audio but prefers practical rather
than theoretical explanations. U sing short chapter sections that are easy to digest, every subject is described in depth using the clearest language
possible, without jargon. All that’s required of you is a genuine interest and the desire to learn.
Equally important are dispelling the many myths that are so prevalent in audio and explaining what really matters and what doesn’t about
audio delity. Even professional recording engineers, who should know better, sometimes fall prey to illogical beliefs that defy what science
knows about audio. Most aspects of audio have been understood fully for 50 years or more, with only a little added in recent years. Yet people
still argue about the value of cables made from silver or oxygen-free copper or believe that ultra-high-digital sample rates are necessary even
though nobody can hear or be influenced by ultrasonic frequencies.
In this Internet age, anyone can run a blog site or post in web forums and claim to be an “expert.” Audio magazines print endless interviews
with well-intentioned but clueless pop stars who lecture on aspects of audio and recording they don’t understand. The amount of public
misinformation about audio science is truly staggering. This book, therefore, includes healthy doses of skepticism and consumerism, which, to
me, are intimately related. There’s a lot of magic and pseudo-science associated with audio products, and often price has surprisingly little to do
with quality.
Hopefully you’ll nd this book much more valuable than an “audio cookbook” or buyer’s guide because it gives you the knowledge to
separate fact from ction and teaches you how to discern real science from marketing hype. Once you truly understand how audio works, you’ll
be able to recognize the latest fads and sales pitches for what they are. So while I won’t tell you what model power ampli er to buy, I explain
in great detail what defines a good amplifier so you can choose a first-rate model wisely without overpaying.
Finally, this book includes audio and video examples for many of the explanations o ered in the text. If you’ve never used professional
recording software, you’ll get to see compressors and equalizers and other common audio processing devices in action and hear what they do.
When the text describes mechanical and electrical resonance, you can play the demo video to better appreciate why resonance is such an
important concept in audio. Although I’ll use software I’m familiar with for
software and hardware. Several examples of pop tunes and other music are
searching YouTube if you’re not familiar with a piece.
As with every creative project, we always nd something to improve or
on the same website listed below that hosts the audio and video example
own website——for additional information.
my examples, the basic concepts and principles apply to all audio
mentioned throughout this book, and most can be found easily by
add after it’s been put to bed. So be sure to look for an addendum
les that accompany this book. You’re also welcome to browse my
Bonus Web Content
This book provides a large amount of bonus material online. In addition to three extra chapters, several audio and video demos are available to
enhance the explanations in the printed text. Spreadsheets and other software are provided to perform common audio-related calculations. All of
this additional content is on the book’s website: In addition, the book’s website contains links to the
external YouTube videos, related articles and technical papers, and the software mentioned in the text, so you don’t have to type them all
Table of Contents
Cover Image
About the Author
PART 1. Audio Defined
Chapter 1. Audio Basics
Volume and Decibels
Standard Signal Levels
Signal Levels and Metering
Calculating Decibels
Graphing Audio
Standard Octave and Third-Octave Bands
Phase Shift and Time Delay
Comb Filtering
Fourier and the Fast Fourier Transform
Sine Waves, Square Waves, and Pink Noise—Oh My!
Audio Terminology
The Null Test
Chapter e1. MIDI Basics
MIDI Internal Details
MIDI Hardware
MIDI Channels and Data
MIDI Data Transmission
General MIDI
Standard MIDI Files
MIDI Clock Resolution
MIDI Minutiae
Playing and Editing MIDI
Chapter 2. Audio Fidelity, Measurements, and Myths
High Fidelity Defined
The Four Parameters
Lies, Damn Lies, and Audio Gear Specs
Lies, Damn Lies, and Audio Gear Specs
Test Equipment
Audio Transparency
Common Audio Myths
The Stacking Myth
The Big Picture
Chapter e2. Computers
Divide and Conquer
Back U p Your Data
Optimizing Performance
Practice Safe Computing
If It Ain’t Broke, Don’t Fix It
Avoid Email Viruses
Bits ’n’ Bytes
Computer Programming
Website Programming
Image Files
Custom Error Page with Redirect
Embedded Audio and Video
Chapter 3. Hearing, Perception, and Artifact Audibility
Fletcher-Munson and the Masking Effect
Distortion and Noise
Audibility Testing
Dither and Truncation Distortion
Hearing Below the Noise Floor
Frequency Response Changes
U ltrasonics
Phase Shift
Absolute Polarity
Ears Are Not Linear!
Blind Testing
Psychoacoustic Effects
Placebo Effect and Expectation Bias
When Subjectivists Are (Almost) Correct
Chapter e3. Video Production
Video Production Basics
Live Concert Example
Color Correction
Synchronizing Video to Audio
Panning and Zooming
Video Transitions
Key Frames
Orchestra Example
Cello Rondo and Tele-Vision Examples
Time-Lapse Video
Media File Formats
Chapter 4. Gozintas and Gozoutas
Audio Signals
Audio Wiring
Audio Connectors
Patch Panels
PART 2. Analog and Digital Recording, Processing, and Methods
Chapter 5. Mixers, Buses, Routing, and Summing
Solo, Mute, and Channel Routing
Buses and Routing
Console Automation
Other Console Features
Digital Audio Workstation Software and Mixing
The Pan Law
Connecting a Digital Audio Workstation to a Mixer
Inputs and Outputs
Setting Record Levels
Monitoring with Effects
The Windows Mixer
Related Digital Audio Workstation Advice
5.1 Surround Sound Basics
Gain Staging
Microphone Preamplifiers
Preamp Input Impedance
Preamp Noise
Clean and Flat Is Where It’s At
Chapter 6. Recording Devices and Methods
Chapter 6. Recording Devices and Methods
Recording Hardware
Analog Tape Recording
Tape Bias
Tape Pre-Emphasis and De-Emphasis
Tape Noise Reduction
Tape Pre-Distortion
The Failings of Analog Tape
Digital Recording
In the Box Versus Out of the Box
Record Levels
Recording Methods
Specific Advice on Digital Audio Workstations
Copy Protection
Microphone Types and Methods
Micing Techniques
The 3-to-1 Rule
Microphone Placement
DI=Direct Injection
Additional Recording Considerations
Advanced Recording Techniques
Chapter 7. Mixing Devices and Methods
Volume Automation
Basic Music Mixing Strategies
Be Organized
Monitor Volume
Reference Mixes
Getting the Bass Right
Avoid Too Much Reverb
Verify Your Mixes
Thin Your Tracks
Distance and Depth
Bus versus Insert
Pre and Post, Mute and Solo
Room Tone
Perception Is Fleeting
Be Creative!
In the Box Versus Out of the Box—Yes, Again
U sing Digital Audio Workstation Software
Slip-Editing and Cross-Fading
Track Lanes
Editing and Comping
Rendering the Mix
Who’s on First?
Time Alignment
Editing Music
Editing Narration
Backward Audio
Save Your Butt
Chapter 8. Digital Audio Basics
Sampling Theory
Sample Rate and Bit Depth
The Reconstruction Filter
Bit Depth
Pulse-Code Modulation versus Direct Stream Digital
Digital Notation
Sample Rate and Bit Depth Conversion
Dither and Jitter
External Clocks
Digital Converter Internals
Digital Signal Processing
Floating Point Math
Digital Audio Quality
Chapter 9. Dynamics Processors
Compressors and Limiters
U sing a Compressor
Common Pitfalls
Multiband Compressors
Noise Gates and Expanders
Noise Gate Tricks
But …
Dynamics Processor Special Techniques
Other Dynamics Processors
Compressor Internals
Time Constants
Chapter 10. Frequency Processors
Equalizer Types
All Equalizers (Should) Sound the Same
Digital Equalizers
EQ Techniques
Boosting versus Cutting
Common EQ Frequencies
Mixes That Sound Great Loud
Complementary EQ
Extreme EQ
Linear Phase Equalizers
Equalizer Internals
Other Frequency Processors
Chapter 11. Time Domain Processors
Phasers and Flangers
Chapter 12. Pitch and Time Manipulation Processors
Pitch Shifting Basics
Auto-Tune and Melodyne
Acidized Wave Files
Chapter 13. Other Audio Processors
Tape-Sims and Amp-Sims
Other Distortion Effects
Software Noise Reduction
Other Processors
Vocal Removal
Chapter 14. Synthesizers
Analog versus Digital Synthesizers
Additive versus Subtractive Synthesis
Voltage Control
Sound Generators
MIDI Keyboards
Beyond Presets
Alternate Controllers
Software Synthesizers and Samplers
Sample Libraries
Creating Sample Libraries
Key and Velocity Switching
Sampler Bank Architecture
FM Synthesis
Physical Modeling
Granular Synthesis
Algorithmic Composition
Notation Software
PART 3. Transducers
Chapter 15. Microphones and Pickups
Microphone Types
Dynamic Microphones
Dynamic Directional Patterns
Ribbon Microphones
Condenser Microphones
Condenser Directional Patterns
Other Microphone Types
Phantom Power
Microphone Specs
Measuring Microphone Response
Microphone Modeling
Guitar Pickups and Vibrating Strings
Chapter 16. Loudspeakers and Earphones
Loudspeaker Basics
Loudspeaker Drivers Types
Loudspeaker Enclosure Types
Enclosure Refinements
Active versus Passive Speakers
Room Acoustics Considerations
Loudspeaker Impedance
Loudspeaker Isolation
Loudspeaker Polarity
Loudspeaker Specs
Accurate or Pleasing?
PART 4. Room Acoustics, Treatment, and Monitoring
Chapter 17. Acoustic Basics
Room Orientation and Speaker Placement
Reflection Points
Calculating Reflection Points
Angling the Walls and Ceiling
Low Frequency Problems
Reverb Decay Time
Stereo Monitoring
Surround Monitoring
Chapter 18. Room Shapes, Modes, and Isolation
Modal Distribution
Room Ratios
Modes, Nodes, and Standing Waves
ModeCalc Program
Room Anomalies
Odd Room Layouts
One Room versus Two Rooms
Vocal Booths
Surface Reflectivity
Calculating Reflections
Isolation and Noise Control
Air Leaks
Room within a Room
Chapter 19. Acoustic Treatment
Acoustic Treatment Overview
Buy or Build?
Flutter Echo
Absorb or Diffuse?
Rigid Fiberglass
Absorption Specs
Material Thickness and Density
Acoustic Fabric
Wave Velocity, Pressure, and Air Gaps
Wave Velocity, Pressure, and Air Gaps
Bass Traps
DIY Bass Traps
Free Bass Traps!
Treating Listening Rooms and Home Theaters
Bass in the Place
Front Wall Absorption
Treating Live Recording Rooms
Hard Floor, Soft Ceiling
Variable Acoustics
Treating Odd Shaped Rooms
Treating Large Venues
Room Equalization
Chapter 20. Room Measuring
Why We Measure
How We Measure
Room Measuring Software
Configuring Room EQ Wizard
U sing Room EQ Wizard
Interpreting the Data
Waterfall Plots
RT60 Reverb Time
Energy Time Curve
U sing the Real Time Analyzer
Measuring Microphones
Microphones Comparison
The Results
Calibrating Loudspeakers
PART 5. Electronics and Computers
Chapter 21. Basic Electronics in 60 Minutes
Volts, Amps, Watts, and Ohms
Electronic Components
Capacitor U pgrades
Power Ratings
Parasitic Elements
Active Solid-State Devices
Amplifier Damping
Negative Feedback
Power Supplies
Passive Filters
Active Filters
Digital Logic
Practical Electronics
Splitters and Pads
Phone Patch
Chapter 22. Test Procedures
Frequency Response
Harmonic Distortion
IM Distortion
Null Tests
Disproving Common Beliefs
PART 6. Musical Instruments
Chapter 23. Musical Instruments
Instrument Types
Sympathetic Resonance
The Harmonic Series Is Out of Tune
Equal Temperament
“Wood Box” Instruments
Bowed Instruments
The Bow
The Stradivarius
Plucked Instruments
Solid Body Electric Guitars
Blown Instruments
Single Reeds
Double Reeds
Brass Instruments
Percussion Instruments
The Piano
Mozart, Beethoven, and Archie Bell
Part 1
Audio Defined
If a tree falls in the forest and no one is there to hear it, does it make a sound?
I hope it’s obvious that the answer to the preceding question is yes, because sounds exist in the air whether or not a person, or a microphone,
is present to hear them. At its most basic, audio is sound waves—patterns of compression and expansion that travel through a medium such as
air—at frequencies humans can hear. Therefore, audio can be as simple as the sound of someone singing or clapping her hands outdoors or as
complex as a symphony orchestra performing in a reverberant concert hall. Audio also encompasses the reproduction of sound as it passes
through electrical wires and circuits. For example, you might place a microphone in front of an orchestra, connect the microphone to a
preampli er, which then goes to a tape recorder, which in turn connects to a power ampli er, which is nally sent to one or more
loudspeakers. At every stage as the music passes through the air to the microphone, and on through the chain of devices, including the
connecting wires in between, the entire path is considered “audio.”
Chapter 1
Audio Basics
“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in
numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of science.”
—Lord Kelvin (Sir William Thomson), nineteenth-century physicist
Volume and Decibels
When talking about sound that exists in the air and is heard by our ears (or picked up by a microphone), volume level is referred to as sound
pressure level, or SPL. Our ears respond to changing air pressure, which in turn de ects our eardrums, sending the perception of sound to our
brains. The standard unit of measurement for SPL is the decibel, abbreviated dB. The “B” refers to Alexander Graham Bell (1847–1922), and the
unit of measure is actually the Bel. But one Bel is too large for most audio applications, so one-tenth of a Bel, or one decibel, became the
common unit we use today.
By definition, decibels express a ratio between two volume levels, but in practice SPL can also represent an absolute volume level. In that case
there’s an implied reference to a level of 0 dB SPL—the softest sound the average human ear can hear, also known as the threshold of hearing.
So when the volume of a rock concert is said to be 100 dB SPL when measured 20 feet in front of the stage, that means the sound is 100 dB
louder than the softest sound most people can hear. Since SPL is relative to an absolute volume level, SPL meters must be calibrated at the
factory to a standard acoustic volume.
For completeness, 0 dB SPL is equal to a pressure level of 20 micropascals (millionths of 1 Pascal, abbreviated μPa). Like pounds per square
inch (PSI), the Pascal is a general unit of pressure—not only air pressure—and it is named in honor of the French mathematician Blaise Pascal
Note that decibels use a logarithmic scale, which is a form of numeric “compression.” Adding dB values actually represents a multiplication of
sound pressure levels, or voltages when it relates to electrical signals. Each time you add some number of decibels, the underlying change in air
pressure, or volts for audio circuits, increases by a multiplying factor:
+6 dB=2 times the air pressure or volts
dB=10 times the air pressure or volts
dB=100 times the air pressure or volts
dB=1,000 times the air pressure or volts
dB=10,000 times the air pressure or volts
Likewise, subtracting decibels results in division:
−6 dB=
−20 dB=
−40 dB=
−60 dB=
−80 dB=
the air pressure or volts
the air pressure or volts
the air pressure or volts
the air pressure or volts
the air pressure or volts
So when the level of an acoustic source or voltage increases by a factor of 10, that increase is said to be 20 dB louder. But increasing the
original level by 100 times adds only another 20 dB, and raising the volume by a factor of 1,000 adds only 20 dB more. U sing decibels instead
of ratios makes it easier to describe and notate the full range of volume levels we can hear. The span between the softest sound audible and the
onset of extreme physical pain is about 140 dB. If that di erence were expressed using normal (not logarithmic) numbers, the span would be
written as 10,000,000,000,000 to 1, which is very unwieldy! Logarithmic values are also used because that’s just how our ears hear. An increase
of 3 dB represents a doubling of power,1 but it sounds only a little louder. To sound twice as loud, the volume needs to increase by about 8 to
10 dB, depending on various factors, including the frequencies present in the source.
Note that distortion and noise specs for audio gear can be expressed using either decibels or percents. For example, if an ampli er adds 1
percent distortion, that amount of distortion could be stated as being 40 dB below the original signal. Likewise, noise can be stated as a percent
or dB difference relative to some output level. Chapter 2 explains how audio equipment is measured in more detail.
You may have read that the smallest volume change people can hear is 1 dB. Or you may have heard it as 3 dB. In truth, the smallest level
change that can be noticed depends on several factors, including the frequencies present in the source. We can hear smaller volume di erences at
midrange frequencies than at very low or very high frequencies. The room you listen in also has a large e ect. When a room is treated with
absorbers to avoid strong re ections from nearby surfaces, it’s easier to hear small volume changes because echoes don’t drown out the
loudspeaker’s direct sound. In a room out tted with proper acoustic treatment, most people can easily hear level di erences smaller than 0.5 dB
at midrange frequencies.
It’s also worth mentioning the inverse square law. As sound radiates from a source, it becomes softer with distance. This decrease is due partly
to absorption by the air, which a ects high frequencies more than low frequencies, as shown in Table 1.1. But the more important reason is
simply because sound radiates outward in an arc, as shown in Figure 1.1. Each time the distance from a sound source is doubled, the same
amount of energy is spread over an area twice as wide. Therefore, the level reduces by a corresponding amount, which in this case is 6 dB.
Table 1.1: Frequencies over Distances at 20°C (68°F) with a Relative Humidity of 70%.
Frequency Attenuation Over Distance
125 Hz
0.3 dB/Km
250 Hz
1.1 dB/Km
500 Hz
2.8 dB/Km
1,000 Hz
5.0 dB/Km
2,000 Hz
9.0 dB/Km
8,000 Hz
76.6 dB/Km
Figure 1.1:
Sound radiates outward from a source in an arc, so its volume is reduced by 6 dB with each doubling of distance. As you can see in Table 1.1, very high frequencies are
reduced over distances due to absorption by the air. This attenuation is in addition to losses caused by the inverse square law, which applies equally to all frequencies.
Standard Signal Levels
As with acoustic volume levels, the level of an audio signal in a wire or electrical circuit is also expressed in decibels, either relative to another
signal or relative to one of several common standard reference levels. An ampli er that doubles its input voltage is said to have a gain of 6 dB.
This is the same amount of increase whether the input is 0.001 volts or 5 volts; whatever voltage happens to be at the input becomes twice as
large at the output. But, as with SPL, the dB is also used to express absolute levels for electronic signals using an implied reference. The most
common standard reference levels used for audio are dBm, dBV, dBu, and dBFS.
The “m” in dBm stands for milliwatt, with 0 dBm equal to 1 milliwatt (thousandth of a watt) of power. Other dBm values describe absolute
amounts of power either lower or higher than the 1 milliwatt reference level. Therefore, 10 dBm is the same as 10 milliwatts, 20 dBm is the
same as 100 milliwatts, and −3 dBm is 0.5 milliwatt. The “V” in dBV stands for volts. So 0 dBV equals 1 volt, 20 dBV equals 10 volts, and
−6 dBV is half a volt.
The dBu standard is similar to dBV, but the reference level for 0 dBu is 0.775 volts. The value 0.775 is used because telephone systems (and
older audio devices) were originally designed with an input and output impedance of 600 ohms. The electrical symbol for ohms is the Greek
letter omega, shown as Ω. When 0.775 volts is applied to 600 ohms, the result is 1 milliwatt of power. Impedance is explained in more depth in
later chapters.
The unit dBFS is speci c to digital audio, with FS standing for full scale. This is the maximum level a digital device can accommodate or, in
other words, the largest number a converter can accept or output. No reference voltage level is needed or implied. Whatever input and output
level your particular sound card or outboard converter is calibrated for, 0 dBFS equals the maximum level possible before the onset of gross
Again, more complete explanations of volts, amps, watts, and impedance are provided in later chapters. But for now, the main point is that
dB levels are always relative, even though there’s often an implied reference to a specific voltage or power level.
Signal Levels and Metering
Level meters are an important part of recording and mixing because every recording medium has a limited range of volume levels it can
accommodate. For example, when recording to analog tape, if the audio is recorded too softly, you’ll hear a “hiss” in the background when you
accommodate. For example, when recording to analog tape, if the audio is recorded too softly, you’ll hear a “hiss” in the background when you
play back the recording. And if the music is recorded too loudly, an audible distortion can result. The earliest type of audio meter used for
recording (and broadcast) is called the VU meter, where VU stands for volume units. This is shown in Figure 1.2.
Figure 1.2:
Standard VU meters display a range of levels from −20 to +3 dB.
Early VU meters were mechanical, made with springs, magnets, and coils of wire. The spring holds the meter pointer at the lowest volume
position, and then when electricity is applied, the coil becomes magnetized, moving the pointer. Since magnets and coils have a nite mass, VU
meters do not respond instantly to audio signals. Highly transient sounds such as claves or other percussion instruments can come and go before
the meter has a chance to register their full level. So when recording percussive instruments, and instruments that have a lot of high-frequency
content, you need to record at levels lower than the meter indicates to avoid distortion.
Adding “driver” electronics to a VU meter o ers many potential advantages. One common feature holds the input voltage for half a second or
so, giving the meter time to respond to the full level of a transient signal. Another useful option is to expand the range displayed beyond the
typical 23 dB. This is often coupled with a logarithmic scale, so, for example, a meter might display a total span of 40 or even 50 dB, shown as
10 dB steps equally spaced across the meter face. This is di erent from the VU meter in Figure 1.2, where the nonlinear log spacing is
incorporated into the dB scale printed on the meter’s face.
Modern digital meters use either an LED “ladder” array as shown in Figure 1.3 or an equivalent as displayed on a computer screen by audio
recording software. Besides showing a wider range of volumes and holding peaks long enough to display their true level, many digital meters
can also be switched to show either peak or average volumes. This is an important concept in audio because our ears respond to a sound’s
average loudness, where computer sound cards and analog tape distort when the peak level reaches what’s known as the clipping point.
Mechanical VU meters inherently average the voltages they receive by nature of their construction. Just as it takes some amount of time (50 to
500 milliseconds) for a meter needle to de ect fully and stabilize, it also takes time for the needle to return to zero after the sound stops. The
needle simply can’t keep up with the rapid changes that occur in music and speech, so it tends to hover around the average volume level.
Therefore, when measuring audio whose level changes constantly—which includes most music—VU meters are ideal because they indicate how
loud the music actually sounds. But mechanical VU meters won’t tell you whether audio exceeds the maximum allowable peak level unless
extra circuitry is added.
Figure 1.3:
Digital meters feature instant response times, a wide range of display levels, and often peak-hold capability.
Modern digital meters often show both peak and average levels at the same time. All of the lights in the row light up from left to right to
show the average level, while single lights farther to the right blink one at a time to indicate the peak level, which is always higher. Some digital
meters can even be told to hold peaks inde nitely. So you can step away and later, after the recording nishes, you’ll know if the audio clipped
when you weren’t watching. The di erence between a signal’s peak and average levels is called its crest factor. By the way, the crest factor
relation between peak and average levels applies equally to acoustic sounds in the air.
The concept of peak versus average levels also applies to the output ratings of power ampli ers. Depending on their design, some ampli ers
can output twice as much power for brief periods than they can provide continuously. Many years ago, it was common practice for ampli er
makers to list only peak power output in their advertisements, and some of the claims bordered on fraud. For example, an ampli er that could
output only 30 watts continuously might claim a peak power output of hundreds of watts, even if it could provide that elevated power for only
one millisecond. Thankfully, the U S Federal Trade Commission passed a law (FTC Rule 46 CFR 432) in 1974 making this practice illegal.
Calculating Decibels
The included Excel spreadsheet Decibels.xls calculates decibel values from voltages or ratios or percents, and also computes decibel changes
when combining two identical signals having opposite polarities. This is useful because it lets you determine the extent of peaks and nulls
caused by acoustic re ections having a known strength. It also works the other way around, letting you use room testing software to derive
absorption coefficients of acoustic materials based on the measured strength of reflections at various frequencies. That type of testing is explained
in Chapter 18.
The spreadsheet clearly shows what you enter for each section and what information is returned, so I won’t elaborate here. All input and
output values are in Column B, with the input elds you enter shown in bold type. Simply replace the sample values in those bold elds. The
rst section accepts two voltages and tells you the dB di erence between them. The second section is similar but accepts a voltage or SPL
di erence as a ratio and returns the di erence in decibels. The third section does the opposite: It accepts a decibel di erence and returns the
equivalent voltage or SPL difference as a ratio. The fourth section computes percent distortion from a decibel relationship, and vice versa.
Although this book aims to avoid math as much as possible, for completeness the following formulas show how to calculate decibels, where
the asterisk (*) signifies multiplication:
Both of these formulas are used in the spreadsheet. The e ectiveness of acoustic materials is measured by how much acoustic power they
absorb, so 10 * LOG(10) is used for those calculations. But peak increases and null depths are dependent on sound pressure di erences, which
are calculated like voltages, so those cells instead use 20 * LOG(10) . I encourage you to look at the formulas in the various result cells to see how
they work.
The unit of frequency measurement is the hertz, abbreviated Hz, in honor of German physicist Heinrich Hertz (1857–1894). However, before
1960, frequencies were stated as cycles per second (CPS), kilocycles (KC=1,000 Hz), megacycles (MC=1,000,000 Hz), or gigacycles
(GC=1,000,000,000 Hz). As with volume, frequencies are also heard and often expressed logarithmically. Raising the pitch of a note by one
octave represents a doubling of frequency, and going down one octave divides the frequency in half. Figure 1.4 shows all of the A notes on a
standard 88-key piano. You can see that each unit of one octave doubles or halves the frequency. This logarithmic frequency relation also
corresponds to how our ears naturally hear. For an A note at 440 Hz, the pitch might have to be shifted by 5 to 10 Hz before it’s perceived as
out of tune. But the A note at 1,760 Hz two octaves higher would have to be o by 20 to 40 Hz before you’d notice that it’s out of tune. Musical
notes can also be divided into cents, where 1 cent equals a pitch change equal to 1 percent of the di erence between adjacent half-steps. U sing a
percent variance for note frequencies also expresses a ratio, since the number of Hz contained in 1 cent depends on the note’s fundamental
Figure 1.4:
A span of one musical octave corresponds to a doubling, or halving, of frequency.
Again, for completeness, I’ll mention that for equal tempered instruments such as the piano, the distance between any two musical half-steps
is equal to the 12th root of 2, or 1.0595. We use the 12th root because there are 12 musical half-steps in one musical octave. This divides the
range of one octave logarithmically rather than into equal Hz steps. Therefore, for the A note at 440 Hz, you can calculate the frequency of the
Bb a half-step higher as follows:
To find the frequency that lies musically halfway between two other frequencies, multiply one times the other, then take the square root:
Table 1.2 lists all of the note frequencies you’re likely to encounter in real music. You’ll probably never encounter a fundamental frequency in
the highest octave, but musical instruments and other sound sources create overtones—also called harmonics or partials—that can extend that
high and beyond. Indeed, cymbals and violins generate overtones extending to frequencies much higher than the 20 KHz limit of human hearing.
Table 1.2: Standard Frequencies for Musical Notes.
Note: The C at 261.6 Hz is middle C.
Graphing Audio
As we have seen, both volume levels and frequencies with audio are usually expressed logarithmically. The frequency response graph in Figure
1.5 is typical, though any decibel and frequency ranges could be used. This type of graph is called semi-log because only the horizontal
frequency axis is logarithmic, while the vertical decibels scale is linear. Although the dB scale is linear, it’s really not because decibels are
inherently logarithmic. If the vertical values were shown as volts instead of decibels, then the horizontal lines would be spaced logarithmically
one above the other, rather than spaced equally as shown. And then the graph would be called log-log instead of semi-log.
Figure 1.5:
This is the typical layout for a semi-log audio graph, where the horizontal frequency axis is logarithmic and the vertical dB volume axis is linear (equal step distances).
Standard Octave and Third-Octave Bands
As we have seen, audio frequencies are usually expressed logarithmically. When considering a range of frequencies, the size of each range, or
band, varies rather than contains a constant number of Hz. As shown in Figure 1.5, the octave distance left to right between 20 and 40 Hz is the
same as the octave between 200 and 400 Hz. Likewise for the 10-to-1 range (called a decade) between 100 and 1,000 Hz, and the decade
spanning 1,000 to 10,000 Hz. Table 1.3 shows the standard frequencies for audio based on octave (boldface) and third-octave bandwidths. These
bands are used for measuring the frequency response of microphones and other gear, for specifying the absorption of acoustic products, as well
as for the available frequencies in most graphic equalizers. The stated frequency is at the center of a band that encompasses a range of
frequencies, and so it is called the center frequency.
Table 1.3: Standard Octave and Third-Octave Audio Bands.
An audio lter is a device that selectively passes, or suppresses, a range of frequencies. A common lter type familiar to all audio enthusiasts is
the equalizer, though in practice most equalizers are more complex than the basic lters from which they’re created. Table 1.4 shows the
basic filter types, which are named for how they pass or suppress frequencies.
Table 1.4: Common Audio Filter Types.
High-Pass Passes frequencies above the stated cutoff
Passes frequencies below the stated cutoff
Band-Pass Passes frequencies within a range surrounding the center frequency
Band-Stop Passes all frequencies except those within a range around the center frequency
Passes all frequencies equally, but applies phase shift
Besides the stated cutoff frequency, high-pass and low-pass filters also have a fall-off property called slope, which is specified in dB per octave.
The cuto for these lter types is de ned as the frequency at which the response has fallen by 3 dB, also known as the half-power point. The
slope is the rate in dB at which the level continues to fall at higher or lower frequencies. The high-pass lter in Figure 1.6 has a cuto frequency
of 125 Hz, which is where the response is 3 dB below unity gain. At 62 Hz, one octave below 125 Hz, the response is therefore at −9 dB. An
octave below that, it’s −15 dB at 31 Hz. The low-pass lter in Figure 1.7 has a cuto
frequency of 1 KHz, which again is where the response is
3 dB below unity gain. At 2 KHz, one octave above 1 KHz, the response is therefore at −9 dB. An octave higher, it’s −15 dB at 4 KHz.
Figure 1.6:
This high-pass filter has a −3 dB cutoff frequency of 125 Hz, with a slope of 6 dB per octave.
Figure 1.7:
This low-pass filter has a cutoff frequency of 1 KHz, with a slope of 6 dB per octave.
Although the o cial names for these lters describe the frequencies they pass, I prefer to call them by the frequencies they actually a ect. For
example, when a high-pass lter is set to a low frequency to remove rumble picked up by a microphone, I think of that as a low-cut lter
because that’s how it’s being used. Likewise for a low-pass lter with a cuto frequency in the treble range. Yes, it passes frequencies below the
treble range cuto , but in practice it’s really reducing high frequencies in relation to the rest of the audio range. So I prefer to call it a high-cut
filter. Again, this is just my preference, and either wording is technically correct.
Band-pass and band-stop lters have a center frequency rather than a cuto frequency. Like high-pass and low-pass lters, band-pass lters
also have a slope stated in dB per octave. Both of these lter types are shown in Figures 1.8 and 1.9. Determining the slope for a band-stop lter
is tricky because it’s always very steep at the center frequency where the output level is reduced to near zero. Note that band-stop lters are
sometimes called notch filters due to the shape of their response when graphed.
Figure 1.8:
This band-pass filter has a center frequency of 1 KHz, with a slope of 18 dB per octave.
Figure 1.8:
This band-pass filter has a center frequency of 1 KHz, with a slope of 18 dB per octave.
Figure 1.9:
This band-stop
lter has a cuto
frequency of 1 KHz, with a slope of 6 dB per octave. In practice, the response at the center frequency approaches zero output. Therefore,
the slope becomes much steeper than 6 dB per octave at the center frequency.
Most lters used for audio equalizers are variants of these basic lter types, and they typically limit the maximum amount of boost or cut. For
example, when cutting a treble frequency range to remove harshness from a cymbal, you’ll usually apply some amount of dB cut at the chosen
frequency rather than reduce it to zero, as happens with a band-stop
lter. When boosting or cutting a range above or below a cuto frequency,
we call them shelving lters because the shape of the curve resembles a shelf, as shown in Figure 1.10. When an equalizer boosts or cuts a range
by some amount around a center frequency, it’s called a peaking lter or a bell
Figure 1.11.
lter because its shape resembles a bell, like the one shown in
Figure 1.10:
Shelving lters are similar to high-pass and low-pass
Figure 1.11:
Peaking-style EQ has three basic properties: center frequency, amount of boost or cut in dB, and Q, or bandwidth. Both of these screens show an equalizer adding 18 dB
lters, but they level out at some maximum amount of boost or cut. The high-frequency shelving lter (top) boosts
high frequencies by up to 12 dB, and the low-frequency shelving filter (bottom) cuts low frequencies by no more than 12 dB.
boost at 1 KHz, but the top EQ has a Q of 0.5, while the bottom Q is 6.0.
boost at 1 KHz, but the top EQ has a Q of 0.5, while the bottom Q is 6.0.
At the circuit level, lters are made from passive components—capacitors, inductors, and resistors—and are inherently cut-only. So to obtain
boost requires active electronics. In practice, you may nd equalizers that claim to be passive, but they usually include active circuitry to raise
the entire signal level either before or after the passive filter, or both.
Filter slopes are inherently multiples of 6 dB per octave, which is the same as 20 dB per decade. A decade is a range of ten to one, also
referred to as an order of magnitude. A lter made from a single capacitor and resistor falls o at a rate of 6 dB per octave, as does a lter made
from one inductor and one resistor. To get a slope of 12 or 18 dB per octave, or larger, requires multiple
contributes one pole to the response. Therefore, a three-pole low-pass filter has a slope of 18 dB per octave.
lter sections, where each section
Another important lter parameter is its Q, which stands for quality. How the Q of a lter is interpreted depends on the lter type. Q usually
applies to band-pass
lters, but it can also be used for peaking
lters whether they’re set to boost or to cut. The equalizer response graphs
shown in Figure 1.11 both have 18 dB of boost at 1 KHz, but the boost shown in the top graph has a fairly low Q of 0.5, while the bottom graph
shows a higher Q of 6.0. EQ changes made using a low Q are more audible simply because a larger span of frequencies is a ected. Of course, for
equalization to be audible at all, the source must contain frequencies within the range being boosted or cut. Cutting 10 KHz and above on an
electric bass track will not have much audible affect, because most basses have little or no content at those high frequencies.
High-pass, low-pass, and shelving
lters can also have a Q property, which a ects the response and slope around the cuto
shown in Figure 1.12. As you can see, as the Q is increased, a peak forms around the cuto
frequency, as
frequency. However, the slope eventually settles to
6 dB per octave (or a multiple of 6 dB). Applying a high Q to a low-pass lter is the basis for analog synthesizer lters, as made famous by early
Moog models. For example, the low-pass lter in a MiniMoog has a slope of 24 dB per octave; the sharp slope coupled with a resonant peak at
the cuto frequency creates its characteristic sound. These days, digital lters are often used to create the same type of sounds using the same
slope and Q.
Figure 1.12:
High-pass and low-pass
lters can also have a Q parameter. Both of these low-pass
the top filter has a Q of 1.4, while the bottom filter’s Q is 6.0.
lters have a cuto
frequency of 2 KHz and an eventual slope of 6 dB per octave, but
Again, to be complete, Figure 1.13 shows the mathematical relation between frequencies, bandwidth, and Q. The cuto frequency of low-pass
and high-pass lters is de ned as the frequency at which the response has fallen by 3 dB, and the same applies to band-pass, band-stop, and
peaking EQ filters.
Figure 1.13:
Bandwidth is the reciprocal (opposite) of Q, with higher Q values having a narrower bandwidth.
Phase Shift and Time Delay
Earlier I mentioned the all-pass lter, which might seem nonsensical at rst. After all, what good is an audio
frequencies? In truth, a
lter that doesn’t boost or cut any
lter that applies phase shift without changing the frequency balance has several uses in audio. For example, all-pass
lters are at the heart of phase shifter e ects. They can also be used to create arti cial stereo from a mono sound source. Flanger e ects are
similar, using a simple time delay rather than phase shift. But let’s rst consider what phase shift really is, since it’s at the heart of every
and equalizer.
Like a circle, one complete cycle of a sine wave is divided into 360 degrees. The upper sine wave in Figure 1.14 shows the wave starting at a
level of zero at an arbitrary point in time called Time Zero. Since one cycle contains 360 degrees, after 90 degrees—one-quarter of the way
through the cycle—the wave has reached its peak positive amplitude. After 180 degrees the level is back to zero, and at 270 degrees the wave
reaches its maximum negative level. At 360 degrees it’s back to zero again. Note that the span between the maximum positive level at 90
degrees and the maximum negative level at 270 degrees is a difference of 180 degrees. The significance of this will become obvious soon.
Figure 1.14:
Phase shift is similar to time delay in that certain frequencies exit an all-pass filter later than they arrived at its input.
Now let’s consider the lower sine wave, which started at the same time as the upper wave but was sent through an all-pass lter that delays
this particular frequency by 90 degrees. Viewing both waves together you can see the time delay added by the all-pass lter. The phase shift
from an all-pass lter is similar to a simple time delay, but not exactly the same. Time delay shifts all frequencies by the same amount of time,
where phase shift delays some frequencies longer than others. In fact, an all-pass lter’s center frequency is de ned as the frequency at which the
phase shift is 90 degrees.
To put this theory into practice, let’s see what happens when music—which typically contains many frequencies at once—is sent through an
all-pass lter or time delay, and the delayed audio is mixed with the original. When you combine audio with a delayed version of itself, the
frequency response is altered. As one cycle of the wave is rising, the delayed version is falling, or perhaps it hasn’t yet risen as high. So when the
two are combined, they partially cancel at some frequencies only. This is the basis for all analog equalizers. They shift the phase for a range of
frequencies and then combine the phase-shifted audio with the original.
An all-pass lter can also be used to create a pseudo-stereo e ect. This is done by combining both the original and phase-shifted audio
through two separate paths, with the polarity of one path reversed. A block diagram is shown in Figure 1.15, and the output showing the
response of one channel is in Figure 1.17. One path is combined with the original, as already explained, and the other path is combined with a
reversed polarity at the same time. This creates equal but opposite comb lter responses such that whatever frequencies are peaks at the left
output become nulls at the right output, and vice versa. The di erent frequency responses at the left and right channel outputs are what creates
the pseudo-stereo effect.
Figure 1.15:
A stereo synthesizer is similar to a phaser effect unit, except it combines the phase-shifted output twice, with the polarity of one path reversed.
Figure 1.17:
ltering is characterized by a repeating pattern of equally spaced peaks and deep nulls. Because the frequencies are at even Hz multiples, when graphed, a linear
rather than logarithmic frequency axis is often used.
A di erent method for using an all-pass lter to create fake stereo is to simply apply di erent amounts of phase shift to the left and right
channels without mixing the original and shifted versions together. In this case the frequency response is not altered, but the sound takes on an
exaggerated sense of width and dimension. This technique can also make sounds seem to come from a point beyond the physical location of the
speakers. Further, if the amount of phase shift is modulated to change over time, the audible result is similar to a Leslie rotating speaker. In fact,
this is pretty much what a Leslie speaker does, creating phase shift and constantly varying time delays via the motion of its speaker driver. With
a real Leslie speaker, the Doppler e ect causes the pitch to rise and fall as the rotating horn driver moves toward and away from you. But this
also happens when phase shift is varied over time.
Most digital equalizers mimic the behavior of analog equalizers, though with a totally di erent circuit design. Instead of using capacitors or
inductors to shift phase, they use taps on a digital delay line. A digital delay line is a series of memory locations that the digitized audio samples
pass through. The rst number that arrives is stored in Address 0. Then, at the next clock cycle (44,100 times per second for a 44.1 KHz sample
rate), the number now in Address 0 is shifted over to Address 1, and the next incoming sample is stored at Address 0. As more numbers enter the
input, they are all shifted through each memory location in turn, until they eventually arrive at the output. This is the basis for a digital delay.
You can alter the delay time by changing the total number of addresses the numbers pass through or the clock rate (shift speed), or both. Indeed,
a block of memory addresses used this way is called a shift register because of the way numbers are shifted through them in turn.
To create an equalizer from a digital delay line, you tap into one of the intermediate memory addresses, then send that at some volume level
back to the input. It’s just like the feedback control on an old EchoPlex tape-based delay, except without all the flutter, noise, and distortion. You
can also reverse the polarity of the tapped signal, so a positive signal becomes negative and vice versa, before sending it back to the input to get
either cut or boost. By controlling which addresses along the delay route you tap into and how much of the tapped signal is fed back into the
input and with which polarity, an equalizer is created. With an analog EQ the phase shift is created with capacitors and inductors. In a digital EQ
the delays are created with a tapped shift register. But the key point is that all equalizers rely on phase shift unless they use special trickery.
Finally, phase shift can also be used to alter the ratio of peak to average volume levels without a ecting the sound or tone quality. The top
waveform in Figure 1.16 shows one sentence from a narration Wave le I recorded for a tutorial video. The bottom waveform shows the same
file after applying phase shift using an all-pass filter.
Figure 1.16:
Phase shift can alter the peak level of a source (top) without changing its average level.
The Orban company sells audio processors to the broadcast market, and its Optimod products contain a phase rotator feature that’s tailored to
reduce the peak level of typical male voices without lowering the overall volume. Just as the maximum level you can pass through a tape
recorder or preamp is limited by the waveform peaks, broadcast transmitters also clip based on the peak level. By reducing the peak heights
with phase shift, broadcasters can increase the overall volume of an announcer without using a limiter, which might negatively a ect the sound
quality. Or they can use both phase shift and limiting to get even more volume without distortion.
Comb Filtering
A comb filter is a unique type of filter characterized by a series of peaks and deep nulls that repeat at equally spaced (not logarithmic) frequency
intervals. As with other lter types, a comb
lter is created by combining an audio signal with a delayed version of itself. If time delay is used
rather than phase shift, the result is an in nite number of peaks and nulls, as shown in Figure 1.17. A comb lter response also occurs naturally
when sound re ects o
a wall or other surface and later combines in the air with the original sound. Indeed, comb
appearance many times throughout this book.
The anging e ect is the classic implementation of a comb
ltering will make an
lter and is easily recognized by its characteristic hollow sound. The earliest
anging e ects were created manually using two tape recorders playing the same music at once but with one playback delayed a few
milliseconds compared to the other. Analog tape recorders lack the precision needed to control playback speed and timing accurately to within
a few milliseconds. So recording engineers would lay their hand on the tape reel’s ange (metal or plastic side plate) to slightly slow the speed
of whichever playback was ahead in time. When the outputs of both recorders were then mixed together, the brief time delay created a comb
filter response, giving the hollow sound we all love and know as the flanging effect.
Comb ltering peaks and nulls occur whenever an audio source is combined with a delayed version of itself, as shown in Figure 1.18. For any
given delay time, some frequency will be shifted exactly 180 degrees. So when the original wave at that frequency is positive, the delayed
version is negative, and vice versa. If both the original and delayed signals are exactly the same volume, the nulls will be extremely deep, though
the peaks are boosted by only 6 dB. When used as an audio e ect, the comb lter frequency is usually swept slowly up and down to add some
animation to, for example, an otherwise static-sounding rhythm guitar part. Faster speeds can also be used to create a warbling or vibrato effect.
Figure 1.18:
The flanging effect is created by sending audio through a time delay, then combining the delayed output with the original audio.
Figure 1.19 shows a single frequency tone delayed so its phase is shifted rst by 90 degrees, then by a longer delay equal to 180 degrees. If the
original tone is combined with the version shifted by 180 degrees, the result is complete silence. Other frequencies present in the audio will not
be canceled unless they are multiples of the same frequency. That is, a delay time that shifts 100 Hz by 180 degrees will also shift 300 Hz by one
full cycle plus 180 degrees. The result therefore is a series of deep nulls at 100 Hz, 300 Hz, 500 Hz, and so forth. Please understand that the
severely skewed frequency response is what creates the hollow “swooshy” sound associated with anger and phaser e ect units. You are not
hearing the phase shift itself.
Figure 1.19:
Delaying audio is similar to applying phase shift, and the amount of phase shift at any given frequency is related to the delay time.
For comb lter type e ects, the delay is typically just a few milliseconds. Most of us will simply turn the delay knob until we get a sound we
like, though it’s simple to determine the first (lowest frequency) peak from the delay time:
The lowest null frequency is always half the lowest peak frequency. Both then repeat at multiples of the rst peak’s frequency. So for a delay
of 1 millisecond, the rst peak is at 1 KHz, with subsequent peaks at 2 KHz, 3 KHz, 4 KHz, and so forth. The lowest null is at 500 Hz, and
subsequent nulls are at 1,500 Hz, 2,500 Hz, 3,500 Hz, and so forth. You can see this in Figure 1.17.
Getting the strongest e ect requires mixing the original and delayed sounds together at precisely equal volumes. Then the peak frequencies are
boosted by 6 dB, and the nulls become in nitely deep. Nearby frequencies that are shifted less, or more, than 180 degrees also cancel, but not as
much. Likewise, when the two signal levels are not precisely equal, the peaks and nulls are less severe. This is the same as using a lower
Strength or Mix setting on phaser or anger e ect units. The Decibels.xls calculator described earlier can tell you the maximum extent of peaks
and nulls when original and delayed sounds are mixed together at various levels.
There’s a subtle technical di erence between anging and phasing e ects. A anger e ect creates an in nite series of peaks and nulls starting
at some lower frequency. But a phaser creates a limited number of peaks and nulls, depending on how many stages of phase shift are used.
Many phaser guitar pedal e ects use six stages, though some use more. Each stage adds up to 90 degrees of phase shift, so they’re always used in
pairs. Each pair of phase shift stages (0 to 180 degrees) yields one peak and one null. Phaser hardware e ects are easier to design and build than
anger e ects, because they use only a few simple components. Compare this to a anger that requires A/D and D/A converters to implement a
time delay digitally. Of course, with computer plug-ins the audio they process is already digitized, avoiding this extra complication.
This same hollow sound occurs acoustically in the air when re ections o a wall or oor arrive delayed at your ears or a microphone. Sound
travels at a speed of about 1.1 feet per millisecond, but rounding that down to a simpler 1 foot=1 millisecond is often close enough. So for
every foot of distance between two sound sources, or between a sound source and a re ecting room surface, there’s a delay of about 1
millisecond. Since 1 millisecond is the time it takes a 1 KHz tone to complete one cycle, it’s not di cult to relate distances and delay times to
frequencies without needing a calculator.
Figure 1.20 shows that for any frequency where the distance between a listener (or a microphone) and a re ective wall is equal to one-quarter
wavelength, a null occurs. The delay-induced phase shift occurs at predictable distances related to the frequency’s wavelength, because at that
point in space your ear hears a mix of both the direct and re ected sounds. The depth of the notch depends on the strength of the re ection at
that frequency. Therefore, hard, reflective surfaces create a stronger comb filter effect.
Figure 1.20:
Reflections off a nearby wall or other surface create the same type of comb filter response as a flanger effect unit.
U nderstand that a one-quarter wavelength distance means the total round trip is one-half wavelength, so the re ection arrives after 180
degrees of phase shift, not 90. Nulls also occur at related higher frequencies where the distance is equal to three-quarters wavelengths, one and
one-quarter wavelengths, and so forth. This is why the frequency response has a series of peaks and nulls instead of only one. Note that comb
ltering also occurs at lower frequencies where the distances are larger, causing peaks and nulls there, too. The Frequency-Distance Calculator
for Windows described in Chapter 17 will tell you the relationship between frequencies and distances in one-quarter wavelength increments.
To make this easier to visualize and hear, the video “comb_ ltering” shows pink noise playing through a loudspeaker that’s pointed at a
re ecting window about two feet away. I held a DPA 4090 measuring microphone and then moved it slowly toward and away from the
window. The audio you hear in this video is what the microphone captured, and the sweeping comb lter frequencies are very easy to hear. It’s
not always easy to hear this comb lter e ect when standing near a boundary because we have two ears. So the peak and null frequencies in one
ear are di erent in the other ear, which dilutes the e ect. Just for fun, I ran the recorded signal through the Room EQ Wizard measuring
software, to show a live screen capture of its Real Time Analyzer at the same time.
Besides re ections, comb ltering can also occur when mixing tracks that were recorded with multiple microphones due to arrival time
di erences. For example, a microphone near the snare drum picks up the snare, as well as sound from the nearby kick drum. So when the snare
and kick drum mics are mixed, it’s possible for the low end to be reduced—or boosted—because of the arrival time di erence between
microphones. Again, while phase shift is the cause of the response change, it’s the response change that you hear and not the phase shift itself.
Indeed, some people claim they can hear phase shift in equalizers because when they boost the treble, they hear a sound reminiscent of phaser
e ects units. So they wrongly assume what they hear is the damaging phase shift everyone talks about. In truth, what they’re really hearing is
high-frequency comb filtering that was already present in the recording, but not loud enough to be noticed.
For example, when a microphone is placed near a re ective boundary such as the wooden lid of a grand piano, the delay between the direct
and re ected sounds creates a comb lter acoustically in the air that the microphone picks up. If the treble is then boosted with EQ, the comb
ltering already present becomes more apparent. So the equalizer did not add the comb ltered sound, but merely brought it out. The
“problems” caused by phase shift have been repeated so many times by magazine writers and audio salespeople that it’s now commonly
accepted, even though there’s not a shred of truth to it.
Comb ltering also intrudes in our lives by causing reception dropouts and other disturbances at radio frequencies. If you listen to AM radio in
the evening, you’ll sometimes notice a hollow sound much like a anger e ect. In fact, it is a anger e ect. The comb ltering occurs when your
AM radio receives both the direct signal from the transmitting antenna and a delayed version that’s been reflected off the ionosphere.
Likewise, FM radio suffers from a comb filtering effect called picket fencing, where the signal fades in and out rapidly as the receiving antenna
travels through a series of nulls. As with audio nulls, radio nulls are also caused by re ections as the waves bounce o nearby large objects such
as a truck next to your car on the highway. The signal fades in and out if either you or the large object is moving, and this is often noticeable as
you slow down to approach a stoplight in your car. The slower you travel, the longer the timing between dropouts.
Reception dropouts also occur with wireless microphones used by musicians. This is sometimes called multi-path fading because, just as with
acoustics, comb ltering results when a signal arrives through two paths and one is delayed. The solution is a diversity system having multiple
receivers and multiple antennas spaced some distance apart. In this case it’s the transmitter (performer) that moves, which causes the peak and
null locations to change. When one receiving antenna is in a null, the other is likely not in a null. Logic within the receiver switches quickly
from one antenna to another to ensure reception free of the dropouts that would otherwise occur as the performer moves around.
The same type of comb ltering also occurs in microwave ovens, and this is why these ovens have a rotating base. (Or the rotor is attached to
the internal microwave antenna hidden from view.) But even with rotation, hot and cold spots still occur. Wherever comb lter peaks form, the
food is hot, and at null locations, the food remains cold. As you can see, comb
aspects of our daily lives; it’s not just an audio effect!
ltering is prevalent in nature and has a huge impact on many
Fourier and the Fast Fourier Transform
Joseph Fourier (1768–1830) showed that all sounds can be represented by one or more sine waves having various frequencies, amplitudes,
durations, and phase relations. Conversely, any sound can be broken down into its component parts and identi ed completely using a Fourier
analysis; one common method is the fast Fourier transform (FFT). Fourier’s nding is signi cant because it proves that audio and music contain
no unusual or magical properties. Any sound you hear, and any change you might make to a sound using an audio e ect such as an equalizer,
can be known and understood using this basic analysis of frequency versus volume level.
FFT is a valuable tool because it lets you assess the frequency content for any sound, such as how much noise and distortion are added at every
frequency by an ampli er or sound card. The FFT analysis in Figure 1.21 shows the spectrum of a pure 1 KHz sine wave after being played back
and recorded through a modestly priced sound card (M-Audio Delta 66) at 16 bits. A sine wave is said to be “pure” because it contains only a
single frequency, with no noise or distortion components. In this case the sine wave was generated digitally inside an audio editor program
(Sony Sound Forge), and its purity is limited only by the precision of the math used to create it. Creating a sine wave this way avoids the added
noise and distortion that are typical with hardware signal generators used to test audio gear. Therefore, any components you see other than the
1 KHz tone were added by the sound card or other device being tested.
Figure 1.21:
An FFT analysis shows volume level versus frequency, and it is commonly used to assess both the noise and distortion added by audio equipment.
In Figure 1.21, you can see that the noise oor is very low, with small “blips” at the odd-number harmonic distortion frequencies of 3, 5, 7,
and 9 KHz. Note the slight rise at 0 Hz on the far left of the graph, which indicates the sound card added a small amount of DC o set to the
recording. Although the noise oor of 16 bit digital audio used for this test is at −96 dB, the noise in the FFT screen appears lower—well below
the −114 dB marker line. This is because the −96 dB noise level of 16 bits is really the sum of the noise at all frequencies.
A complete explanation of FFT spectrum analysis quickly gets into very deep math, so I’ve addressed only the high points. The main settings
you’ll deal with when using an FFT display are the upper and lower dB levels, the start and end frequencies (often 20 Hz to 20 KHz), and FFT
resolution. The resolution is established by the FFT Size setting, with larger sizes giving more accurate results. I recommend using the largest FFT
size your audio editor software offers.
Sine Waves, Square Waves, and Pink Noise—Oh My!
Fourier proved that all sounds are comprised of individual sine waves, and obviously the same applies to simple repeating waveforms such as
sine and square waves. Figure 1.22 shows the ve basic waveform types: sine, triangle, sawtooth, square, and pulse. A sine wave contains a
single frequency, so it’s a good choice for measuring harmonic distortion in audio gear. You send a single frequency through the ampli er or
other device being tested, and any additional frequencies at the output must have been added by the device. Triangle waves contain only oddnumbered harmonics. So if the fundamental pitch is 100 Hz, the wave also contains 300 Hz, 500 Hz, 700 Hz, and so forth. Each higher harmonic
is also softer than the previous one. The FFT display in Figure 1.23 shows the spectrum for a 100 Hz triangle wave having a peak level of
−1 dB, or 1 dB below full scale (dBFS). Note that the 100 Hz fundamental frequency has a level of −12, well below the le’s peak level of
−1 dBFS. This is because the total energy in the le is the sum of the fundamental frequency plus all the harmonics. The same applies to other
waveforms, and indeed to any audio data being analyzed with an FFT. So when analyzing a music Wave le whose peak level is close to full
scale, it’s common for no one frequency to be higher than −10 or even −20.
Figure 1.22:
The five basic waveforms are sine, triangle, sawtooth, square, and pulse.
Figure 1.23:
Triangle waves contain only odd-numbered harmonics, with each higher harmonic at a level lower than the one before.
Sawtooth waves contain both odd and even harmonics. Again, the level of each progressively higher harmonic is softer than the one before, as
shown in Figure 1.24. Triangle waves and square waves have only odd-numbered harmonics because they are symmetrical; the waveform goes
up the same way it comes down. But sawtooth waves contain both odd and even harmonics because they are not symmetrical. As you can see in
Figure 1.25, the level of each harmonic in a square wave is higher than for a triangle wave because the rising and falling slopes are steeper. The
faster a waveform rises or falls—called its rise time—the more high-frequency components it contains. This principle applies to all sounds, not
just static waveforms.
Figure 1.24:
Figure 1.25:
Sawtooth waves contain both odd and even harmonics, with the level of each falling off at higher frequencies.
Square waves contain only odd-numbered harmonics because they’re symmetrical, and, again, the level of each harmonic becomes progressively softer at higher
Pulse waves are a subset of square waves. The di erence is that pulse waves also possess a property called pulse width or duty cycle. For
example, a pulse wave that’s positive for 5 percent of the time, then zero or negative for the rest of the time, is said to have a duty cycle of 5
percent. So a square wave is really just a pulse wave with a 50 percent duty cycle, meaning the voltage is positive half the time and zero or
negative half the time. Duty cycle is directly related to crest factor mentioned earlier describing the di erence between average and peak levels.
As the duty cycle of a pulse wave is reduced from 50 percent, the peak level remains the same, but the average level—representing the total
amount of energy—becomes lower and lower. Since nonsquare pulse waves are asymmetrical, they contain both odd and even harmonics.
Just to prove that Fourier is correct, Figure 1.26 shows a square wave being built from a series of harmonically related sine waves. The top
sine wave is 100 Hz, then 300 Hz was added, then 500 Hz, and nally 700 Hz. As each higher harmonic is added, the waveform becomes closer
and closer to square, and the rising and falling edges also become steeper, re ecting the added high-frequency content. If an in nite number of
odd harmonics were all mixed together at the appropriate levels and phase relations, the result would be a perfect square wave.
Figure 1.26:
A square wave can be built from an infinite number of frequency-related sine waves.
All of the waveforms other than sine contain harmonics that fall o in level at higher frequencies. The same is true for most musical
instruments. It’s possible to synthesize a waveform having harmonics that fall o and then rise again at high frequencies, but that doesn’t usually
occur in nature. It’s also worth mentioning that there’s no such thing as subharmonics, contrary to what you might have read. Some musical
sounds contain harmonics that aren’t structured in a mathematically related series of frequencies, but the lowest frequency present is always
considered the fundamental.
Just as basic waveforms and most musical instruments create waveforms having a fundamental frequency plus a series of numerically related
harmonics, the same happens when audio circuits create distortion. Some ampli er circuits tend to create more odd-numbered harmonics than
even-numbered, and others create both types. But the numerical relation between the distortion frequencies added by electronic gear is
essentially the same as for basic waveforms and most musical instruments. The FFT graph in Figure 1.21 shows the odd harmonics added by a
typical sound card. Of course, the level of each harmonic is much lower than that of a square wave, but the basic principle still applies.
One exception to the numerically sequential harmonic series is the harmonic content of bells, chimes, and similar percussion instruments such
as steel drums. The FFT in Figure 1.27 shows the spectrum of a tubular bell tuned to a D note. Even though this bell is tuned to a D note and
sounds like a D note, only two of the ve major peaks (at 1,175 Hz and 2,349 Hz) are related to either the fundamental or harmonics of a D
note. The harmonic content of cymbals is even more complex and dense, containing many frequencies all at once with a sound not unlike white
noise (hiss). However, you can coax a more normal series of harmonics from a cymbal by striking it near the bell portion at its center. Chapter
23 examines harmonic and inharmonic sound sources in more detail.
Figure 1.27:
The harmonic series for a tubular bell is not a linear sequence as occurs with basic waveforms or nonpercussive musical instruments such as violins and clarinets.
One important di erence between the harmonics added by electronic devices and the harmonics present in simple waveforms and musical
instruments is that audio circuits also add nonharmonic components known as intermodulation distortion (IMD). These are sum and di erence
frequencies created when two or more frequencies are present in the source. For example, if audio contains an A note at 440 Hz and also the B
note above at 494 Hz, audio circuits will add a series of harmonics related to 440 Hz, plus another series related to 494 Hz, plus an additional
note above at 494 Hz, audio circuits will add a series of harmonics related to 440 Hz, plus another series related to 494 Hz, plus an additional
series related to the sum of 440+494=934 Hz, plus another series related to the di erence of 494–440=54 Hz. Distortion is impossible to
avoid in any audio circuit, but most designers aim for distortion that’s too soft to hear. Figure 1.28 shows the spectrum of a Wave le containing
440 Hz and 494 Hz mixed together, after adding distortion. The many closely spaced peaks are multiples of the 54 Hz di erence between the
two primary frequencies.
Figure 1.28:
Adding distortion to a signal containing only 440 Hz and 494 Hz creates harmonics related to both frequencies, plus IM components related to the sum and di erence
of those frequencies.
The last wave type I’ll describe is noise, which contains all frequencies playing simultaneously. Noise sounds like background hiss and comes
in several avors. The two noise types most relevant to audio are white noise and pink noise, both of which are common test signals. White
noise has the same amount of energy at every frequency, so when displayed on an FFT, it appears as a straight horizontal line. Pink noise is
similar, but it falls o at higher frequencies at a rate of 3 dB per octave. Pink noise has two important advantages for audio testing: Because it
contains less energy at treble frequencies, it is less irritating to hear at loud levels when testing loudspeakers, and it is also less likely to damage
your tweeters. The other advantage is it contains equal energy per octave rather than per xed number of Hz, which corresponds to how we
hear, as explained at the beginning of this chapter. Therefore, the octave between 1 KHz and 2 KHz contains the same total amount of energy as
the octave between 100 Hz and 200 Hz. Compared to white noise, that has the same amount of energy within every 100 Hz or 1 KHz span.
Earlier I mentioned that audio
lters come in multiples of 6 dB per octave, because that’s the minimum slope obtainable from a single
capacitor or inductor. So obtaining the 3 dB per octave slope needed to build a pink noise generator is actually quite complex. The article
“Spectrum Analyzer and Equalizer Designs” listed on the Magazine Articles page of my website shows a circuit for a pink
noise generator, and you’ll see that it requires four resistors and five capacitors just to get half the roll-off of a single resistor and capacitor!
Resonance is an important concept in audio because it often improves the perceived quality of musical instruments, but it harms reproduction
when it occurs in electronic circuits, loudspeakers, and listening rooms. Mechanical resonance results when an object having a nite weight
(mass) is coupled to a spring having some amount of tension. One simple example of resonance is a ball hanging from a spring (or rubber
band), as shown in Figure 1.29. Pendulums and tuning forks are other common mass-spring devices that resonate.
Figure 1.29:
A ball hanging from a spring will resonate at a frequency determined by the weight of the ball and the stiffness of the spring.
In all cases, when the mass is set in motion, it vibrates at a frequency determined by the weight of the mass and the sti ness of the spring.
With a pendulum, the resonant frequency depends on the length of the pivot arm and the constant force of gravity. A tuning fork’s natural
resonant frequency depends on the mass of its tines and the springiness of the material it’s made from. The resonant frequency of a singer’s vocal
resonant frequency depends on the mass of its tines and the springiness of the material it’s made from. The resonant frequency of a singer’s vocal
cords depends on the mass and length of the tissues, as well as their tension, which is controlled by the singer.
A close cousin of resonance is called damping, which is generally caused by friction that converts some of the motion energy to heat. The more
friction and damping that are applied, the faster the motion slows to a stop. The shock absorbers in your car use a viscous uid to damp the
car’s vibration as it rests on the supporting springs. Otherwise, every time you hit a bump in the road, your car would bounce up and down
many times before eventually stabilizing.
As relates to audio, resonance occurs in mechanical devices such as loudspeaker drivers and microphone diaphragms. In a speaker driver, the
mass is the cone with its attached wire coil, and the spring is the foam or rubber surrounds that join the cone’s inner and outer edges to the
metal frame. The same applies to microphone diaphragms and phonograph cartridges. Most musical instruments also resonate, such as the wood
front and back plates on a cello or acoustic guitar and the head of a snare or timpani drum. Air within the cavity of a musical instrument also
resonates, such as inside a violin or clarinet. With a violin, the resonant frequencies are constant, but with a clarinet or ute, the pipe’s internal
resonance depends on which of the various finger keys are open or closed.
Another way to view damping is by its opposite property, Q. This is the same Q that applies to lters and equalizers as described earlier. For
example, a plank of wood has a low Q due to friction within its bers. If you strike a typical wooden 2 by 4 used for home construction with a
hammer, you’ll hear its main resonant frequency, but the sound will be more like a thud that dies out fairly rapidly. However, denser types of
wood, such as rosewood or ebony used to make xylophones and claves, have less internal friction and thus have a higher Q. So the tone from a
xylophone or marimba is more defined and rings out for a longer time, making it musically useful.
Rooms also resonate naturally, and they too can be damped using absorbers made from rigid berglass or acoustic foam. As sound travels
through the ssures in the berglass or foam, friction inside the material converts the acoustic energy into heat. A rectangular-shaped room has
three resonant frequencies, with one each for its length, width, and height. If you clap your hands in a small room that’s totally empty, you’ll
generally hear a “boing” type of sound commonly known as utter echo. This is due to sound bouncing repeatedly between two or more
opposing surfaces. With nothing on the walls or oor to absorb the re ections, the sound waves continue to bounce back and forth for a few
seconds, or even longer. But rooms also resonate at very low frequencies, and hand claps don’t contain enough low-frequency energy to excite
the resonances and make them audible. However, these resonances are de nitely present, and Chapter 17 explains room resonance in much
more detail, including how to control it.
One key point about resonance with audio is understanding when you want it and when you don’t. Musical instruments often bene t from
resonance, and many instruments such as violins and cellos require resonance to create their characteristic sound. The primary di erences
between a cheap violin and a Stradivarius are the number, strength, Q, and frequency of their natural body resonances. But resonance is always
damaging in playback equipment and listening rooms because it adds a response peak at each resonant frequency, and those frequencies also
continue to sound even after the music source has stopped. If I record myself playing a quarter note on my Fender bass, then press my hand on
the string to stop the sound, that note should not continue for a longer time when played back later through loudspeakers.
Earlier I mentioned the natural resonance of loudspeaker drivers. This resonance tends to be stronger and more damaging with larger speakers
used as woofers, because they have much more mass than the small, lightweight cones of typical tweeter drivers. Some speaker drivers use
viscous uid for cooling and to help damp vibration and ringing, but a useful amount of damping is also obtained from the power ampli er that
drives the speaker. Loudspeaker electrical damping is similar to mechanical damping, but it uses electromagnetic properties instead of viscous
fluids or shock absorber type devices. This is explained in more detail in Chapter 21.
To illustrate the resonant frequencies in speaker drivers, I created the short video “loudspeaker_resonance” that compares three speaker
models: Yamaha NS-10 M, Mackie HR624, and JBL 4430. I placed a Zoom H2 portable recorder near each speaker’s woofer to record the sound,
then tapped each woofer’s cone with my nger. You can hear that each speaker has a di erent resonant frequency, and the video also shows an
FFT of each recording on-screen. Note that the JBL’s self-resonance is very low at 50 Hz, so you won’t hear it if you listen on small speakers.
Audio Terminology
Earlier I mentioned that I prefer the term “low-cut” rather than “high-pass” when changing the frequency response in the bass range. Both are
technically correct, but some common audio terms make less sense. For example, “warm,” “cold,” “sterile,” “digital,” “forward,” “silky,” and so
forth are not useful because they don’t mean the same thing to everyone. On the other hand, “3 dB down at 200 Hz” is precise and leaves no
room for misinterpretation. Of course, “warm” and “cold” or “sterile” could describe the relative amount of high-frequency content. But saying
“subdued or exaggerated highs” is still better than “sterile” in my opinion. However, many of the terms I see are nonsensical.
Sometimes people refer to a piece of gear as being “musical” sounding or “resolving,” but what does that really mean? What sounds musical
to you may not sound musical to me. Some people like the added bass you get from a hi- receiver’s Loudness setting. To me that usually makes
music sound tubby, unless the music is already too thin sounding. The same goes for a slight treble boost to add sheen or a slight treble cut to
reduce harshness. Whether these response changes sound pleasing or not is highly dependent on the music being played, the speci c frequencies
being boosted or cut, and personal preference.
I don’t think we need yet more adjectives to describe audio delity when we already have perfectly good ones. Some audiophile words are
even sillier, such as “fast bass,” which is an oxymoron. The common audiophile terms “PRaT” (Pace, Rhythm, and Timing) take this absurdity to
new heights, because these words already have a speci c musical meaning unrelated to whatever audiophiles believe they are conveying. Some
of the worst examples of nonsensical audio terms I’ve seen arose from a discussion in a hi-
audio forum. A fellow claimed that digital audio
of the worst examples of nonsensical audio terms I’ve seen arose from a discussion in a hi- audio forum. A fellow claimed that digital audio
misses capturing certain aspects of music compared to analog tape and LP records. So I asked him to state some speci c properties of sound that
digital audio is unable to record. Among his list were tonal texture, transparency in the midrange, bloom and openness, substance, and the
organic signature of instruments. I explained that those are not legitimate audio properties, but he remained convinced of his beliefs anyway.
Perhaps my next book will be titled Scientists Are from Mars, Audiophiles Are from Venus.
Another terminology pet peeve of mine relates to the words “hum,” “buzz,” and “whine.” To me, hum is a low-frequency tone whose
frequency is 60 Hz or 120 Hz, or a mix of both. In Europe, AC power is 50 Hz, but the same principle applies. Buzz is also related to the AC
power frequency, but it has substantial high-frequency content. Whine to me is any frequency other than those related to AC power—for
example, the sound of a car engine revving at a high RPM.
I’m also amazed when people confuse the words “front” and “rear” when talking about the orientation of their listening rooms. To me, the
front wall of a room is the wall you face while listening. It’s common for people to call that the rear wall because it’s behind their speakers. But
it’s the front wall!
Finally, many people confuse phase with polarity. I see this often in audio magazines and even on the front panel labels of audio gear. As
explained earlier, phase shift is related to frequency and time. Any phase shift that’s applied to audio will delay di erent frequencies by
di erent amounts. Polarity is much simpler, and it is either positive or negative at all frequencies. One example of polarity reversal is swapping
the wires that connect to a loudspeaker. With the wires connected one way, the speaker cone pushes outward when a positive voltage is applied
to the speaker’s Plus terminal. When reversed, a positive voltage instead causes the cone to pull inward. When you reverse the polarity of audio,
all frequencies are inverted. So a “phase” switch on a mixing console or outboard mic preamp doesn’t really a ect the phase, but rather simply
reverses the polarity.
The Null Test
The last topic I’ll address in this chapter is the null test, which is an important concept for audio testing. Earlier I mentioned the forum poster
who believed that digital recording somehow misses capturing certain aspects of audio. Other claims might seem di cult to prove or disprove,
such as whether two competent speaker wires can sound di erent, or the common claim that the sound of wires or solid state electronics change
over time, known as break-in. Audio comparisons are often done using blind tests. With a blind test, one person switches playback between two
audio sources, while another person tries to identify which source is playing by listening alone, without watching.
Blind tests are extremely useful, but they’re not always conclusive. For example, if you blind test two CD decks playing the same CD, there
may be slightly di erent amounts of distortion. But the distortion of both CD players could be too soft to hear even if it can be measured. So no
matter how often you repeat the test, the result is the same as a coin toss, even if one CD player really does have less distortion. Another
potential fault of blind testing is it applies only to the person being tested. Just because you can’t hear a di erence in sound quality doesn’t
mean that nobody can. A proper blind test will test many people many times each, but that still can’t prove conclusively that nobody can hear a
di erence. Further, some people believe that blind tests are fundamentally awed because it puts stress on the person being tested, preventing
him or her from noticing real di erences they could have heard if only they were more relaxed. In my opinion, even if a di erence is real but so
small that you can’t hear it when switching back and forth a few times, how important is that difference, really?
I’ll have more to say about blind testing in Chapter 3 and null tests in Chapter 22. But the key point for now is that a null test is absolute and
100 percent conclusive. The premise of a null test is to subtract two audio signals to see what remains. If nothing remains, then the signals are by
de nition identical. If someone claims playing Wave les from one hard drive sounds di erent than playing them from another hard drive, a
null test will tell you for certain whether or not that’s true.
Subtracting is done by reversing the polarity of one source, then mixing it with the other. If the result is total silence when viewed on a widerange VU meter that displays down to total silence (also called digital black), then you can be con dent that both sources are identical. Further,
if a residual di erence signal does remain, the residual level shows the extent of the di erence. You can also assess the nature of a residual
di erence either by ear or with an FFT analysis. For example, if one source has a slight low-frequency roll-o , the residual after nulling will
contain only low frequencies. And if one source adds a strong third harmonic distortion component to a sine wave test tone, then the di erence
signal will contain only that added content. So when that forum fellow claimed that digital recording somehow misses certain aspects of audio, a
null test can easily disprove that claim. Of course, whether or not this proof will be accepted is another matter!
This chapter explains the basic units of measurement for audio, which apply equally for both acoustic sounds in the air and for signal voltages
that pass through audio equipment. Although the decibel always describes a ratio di erence between two signal levels, it’s common for decibels
to state an absolute volume level using an implied reference. Decibels are useful for audio because they can express a very large range of volume
levels using relatively small numbers. You also saw that both decibels and frequencies are usually assessed using logarithmic relationships.
Phase shift and time delay are important properties of audio, whether created intentionally in electronic circuits or when caused by re ections
in a room. Indeed, phase shift is the basis for all lters and equalizers, including comb lters. Although phase shift is often blamed in the
popular press for various audio ills, phase shift in usual amounts is never audible. What’s really heard is the resultant change in frequency
response when the original and delayed sounds are combined. Further, using the ndings of Fourier, we know that all sound comprises one or
more sine waves, and likewise all sound can be analyzed and understood fully using a Fourier analysis.
more sine waves, and likewise all sound can be analyzed and understood fully using a Fourier analysis.
This chapter also explains the harmonic series, which is common to all simple waveforms, and also describes the harmonic and
intermodulation distortion spectrum added by audio equipment. Simple waveforms that are perfectly symmetrical contain only odd-numbered
harmonics, while asymmetrical waves contain both odd and even harmonics. Further, where harmonic distortion adds frequencies related to the
source, IM distortion is more audibly damaging because it contains sum and di erence frequencies. Resonance is another important property of
audio because it’s useful and needed for many musical instruments, but it’s not usually wanted in audio gear and listening rooms.
I also mentioned several terminology pet peeves, and I listed better wording for some commonly misused subjective audio terms. Finally, you
saw that the null test is absolute because it shows all di erences between two audio signals, including distortion or other artifacts you may not
have thought to look for when measuring audio gear.
You may know that doubling the p ower of an audio signal gives an increase of 3 dB rather than 6 dB shown above for voltages. When the voltage is doubled, twice as much current is also drawn by the connected device such as a loudsp eaker. Since
both the voltage and the current are then twice as large, the amount of p ower consumed actually quadrup les. Hence, doubling the voltage gives a 6 dB increase in p ower.
Chapter e1
MIDI Basics
MIDI Internal Details
MIDI stands for Musical Instrument Digital Interface. It was developed by a consortium of musical product manufacturers to create a common
language and protocol for musical software and hardware to communicate with each other. Before MIDI, the keyboard of one synthesizer brand
could not send note data to another brand of synthesizer, nor was there a standard way to connect sound modules to the early computers of the
day. I recall a session in 1982 when composer Jay Chattaway recorded the music for the movie Vigilante at my studio. The music was hard
hitting and created entirely with early hardware synthesizers and samplers. For the week-long sessions Jay brought to my studio a very elaborate
synthesizer setup, including a custom computer that played everything at once. It was a very complex and expensive system that took many
hours to connect and get working.
MIDI changed all that. The original MIDI Speci cation 1.0 was published in 1982, and while it still carries the 1.0 designation, there have
since been many additions and re nements. It’s amazing how well this standard has held up for so long. You can connect a 1983 synthesizer to a
modern digital audio workstation (DAW), and it will play music. Indeed, the endurance of MIDI is a testament to the power of standards. Even
though the founding developers of the MIDI Manufacturer’s Association (MMA) competed directly for sales of musical instrument products, they
were able to agree on a standard that benefited them all. If only politicians would work together as amicably toward the common good.
As old as MIDI may be, it’s still a valuable tool because it lets composers experiment for hours on end without paying musicians. MIDI data
are much smaller than the audio they create because it’s mostly 2- and 3-byte instructions. Compare this to CD-quality stereo audio that occupies
176,400 bytes per second. When I write a piece of music, I always make a MIDI mockup in SONAR rst, even for pieces that will eventually be
played by a full orchestra (heck—especially if they will be played by a real orchestra). This way I know exactly how all the parts will sound and
t together, and avoid being embarrassed on the rst day of rehearsal. Imagine how much more productive Bach and Haydn might have been if
they had access to the creative tools we enjoy today.
MIDI is also a great tool for practicing your instrument or just for making music to play along with for fun. Google will nd MIDI les for
almost any popular or classical music you want. You can load a MIDI le into your DAW program, mute the track containing your instrument,
and play along. Further, a MIDI backing band never makes a mistake or plays out of tune, and it never complains about your taste in music
MIDI Hardware
The original MIDI communications spec was a hardware protocol that established the data format and connector types. A ve-pin female DIN
connector is used on the MIDI hardware, with corresponding male connectors at each end of the connecting cables. (DIN stands for Deutsche
Industrie-Normen, a German standards organization.) Although MIDI was originally intended as a way for disparate products to communicate
with one another, it’s since been adapted by other industries. For example, MIDI is used to control concert and stage lighting systems and to
synchronize digital and analog tape recorders to computer DAW programs via SMPTE as described in Chapter 6. MIDI is also used to send
performance data from a hardware control surface to manipulate volume and panning and plug-in parameters in DAW software.
Figure e1.1 shows the standard arrangement of three MIDI jacks present on most keyboard synthesizers. The MIDI In jack accepts data from a
computer or other controlling device to play the keyboard’s own built-in sounds, and the MIDI Out sends data as you press keys or move the
mod wheel, and so forth. The MIDI Thru port echoes whatever is received at the MIDI In jack to pass along data meant for other synthesizers.
For example, a complex live performance synth rig might contain four or even more keyboard synths and sound modules, each playing a
di erent set of sounds. So a master keyboard could send all the data to the rst module in the chain, which passes that data on to all the others
using their Thru connectors.
Figure e1.1
MIDI hardware uses a 5-pin DIN connector. Most MIDI devices include three ports for input, output, and pass-through (Thru) for chaining multiple devices.
MIDI Channels and Data
A single MIDI wire can send performance and other data to more than one synthesizer, or it can play several di erent voices at once within a
single synth. In order to properly route data on a single wire to di erent synthesizers or voices, most MIDI data include a channel number. This
number ranges between 0 and 15, representing channels 1 through 16, with the channel number embedded within the command data. For
example, a note-on message to play middle C (note 60) with a velocity of 85 on Channel 3 sends three bytes of data one after the other, as
shown in Table e1.1. MIDI channel numbers are zero-based, so the 2 in 92 means the note will play through Channel 3.
Table e1.1. The MIDI Note-On Command.
1001 0011 92
9=Note-on command, 2=Channel 3
0011 1100 3C
3C Hex=60=Note number
0101 0101 55
55 Hex=85=Velocity
There are many types of MIDI data—note-on, note-o , pitch bend, sustain pedal up or down, and so forth—and most comprise three bytes as
shown in Table e1.1. The rst byte is treated as two half-bytes, sometimes called “nybbles” (or “nibbles”), each holding a range of 0 through 15
(decimal, or 0-F Hex). The higher (leftmost) nybble contains the command, and the lower nybble holds the channel number the message is
intended for. The next two bytes contain the actual data, which usually ranges from 0 through 127 (7F Hex).
Some MIDI messages contain only two bytes when the data portion can be expressed in one byte. For example, after touch requires only a
single byte, so the rst byte is the command and channel number, and the second byte is the after touch data value. Likewise, a program change
that selects a di erent voice for playback requires only two bytes. The program change command and channel number occupy one byte, and the
new voice number is the second byte. Some MIDI messages require two bytes for each piece of data to express values larger than 127. The pitch
bend wheel is one example, because 127 steps can’t express enough in-between values to give adequate pitch resolution. So pitch bend data are
sent as the command with channel number in one byte, followed by the lower value data byte, then the higher value data byte.
Every note sent by a MIDI keyboard includes a channel number, which usually defaults to Channel 1 unless you program the keyboard
otherwise. If one keyboard is to play more than one voice, or control more than one physical sound module, you’ll tell it to transmit on di erent
channels when you want to play the other voices or access other sound modules. Some keyboards have a split feature that sends bass notes out
through one channel and higher notes out another. Likewise, each receiving device must be set to respond to the speci c channels to avoid
accidentally playing more than one voice or sound module at a time. Of course, if each synthesizer you’ll play contains its own keyboard, this
complication goes away.
Receiving devices can also be set to omni mode, in which case they’ll respond to all incoming data no matter what channels it arrives on. This
is common when recording to a MIDI track in a DAW program because it avoids those embarrassing head-scratching moments when you press a
key and don’t understand why you hear nothing. In omni mode, the incoming channel is ignored, so the receiving synthesizer will respond no
matter which channel is speci ed. However, omni mode may not be honored when recording or playing a drum set through MIDI because many
MIDI drum synths respond only to Channel 10.
Note that the term “channel” can be misleading, compared to TV and radio channels that are sent over di erent frequencies all at once. With
MIDI data, the same wires or internal computer data paths are used for all channels. A header portion of the data identi es the channel number
so the receiving device or program knows whether to honor or ignore that data. If you’re familiar with data networking, the channel is similar to
an IP address, only shorter.
MIDI Data Transmission
The original hardware MIDI standard uses a serial protocol, similar to the serial ports on older personal computers used to connect modems and
early printers. Newer devices use U SB, which is also a type of serial communication. Serial connections send data sequentially, one bit after the
other, through a single pair of wires. So if you play a chord with eight notes at once, the notes are not sent all at the same time, nor are they
received at the same time by the connected equipment.
Early (before U SB) MIDI hardware uses a speed of 31,250 bits per second—called the baud rate—which sends about three bytes of data per
millisecond. But most MIDI messages comprise at least three bytes each, so the data for that eight-note chord will be spread out over a span of
about eight milliseconds by the time it arrives at the receiving synthesizer or sound module. For a solo piano performance, small delays like this
are usually acceptable. But when sending MIDI data for a complete piece of music over a single MIDI cable, the accumulated delays for all the
voices can be objectionable.
Imagine what happens on the downbeat of a typical pop tune as a new section begins. It’s not uncommon for 20 or more notes to play all at
once—the kick drum, a cymbal crash, ve to ten piano notes, a bass note, another ve to ten organ notes, and maybe all six notes of a full guitar
chord. Now, that 8-millisecond time span has expanded to 20 or even 30 milliseconds, and that will surely be noticed. The e ect of hearing
notes staggered over time is sometimes called flamming, after the “flam” drum technique where two hits are played in rapid succession.
Today, with software synthesizers and samplers that run within a DAW, MIDI messages are passed around inside the computer as fast as the
computer can process them, so serial hardware delays are no longer a problem. But you still may occasionally want to move time-critical MIDI
data between hardware synthesizers or between a hardware synth and a computer. I learned a great trick in the 1990s, when I wanted to copy
data between hardware synthesizers or between a hardware synth and a computer. I learned a great trick in the 1990s, when I wanted to copy
all the songs in my Yamaha SY77 synthesizer to a MIDI sequencer program to edit the music more efficiently on my computer.
The SY77 is an all-in-one workstation that includes a 16-voice synthesizer using both samples and FM synthesis, several audio e ects, plus
built-in sequencing software to create and edit songs that play all 16 voices at once. Rather than play a song in real time on the SY77 while
capturing its MIDI output on the computer, I set the SY77’s tempo to the slowest allowed. I think that was 20 beats per minute (BPM). So the
data for a song whose tempo is 120 BPM are sent from the synthesizer at a rate equivalent to six times faster than normal, thus reducing greatly
the time offset between notes when played back on a computer at the correct tempo.
General MIDI
As valuable as MIDI was in the early days, the original 1.0 version didn’t include standardized voice names and their corresponding program
numbers. The computer could tell a synthesizer to switch to Patch number 14, but one model synth might play a grand piano, while the same
patch number was a tuba on another brand. To solve this, in 1991 the MIDI spec was expanded to add General MIDI (GM), which de nes a large
number of voice names and their equivalent patch numbers. The standard General MIDI voice names and patch numbers are listed in Table e1.2.
Note that some instrument makers consider these numbers as ranging from 1 to 128. So the tenth patch in the list will always be a glockenspiel,
but it might be listed as Patch 10 instead of 9.
Table e1.2. Standard MIDI Patch Definitions.
Of course, how di erent sound modules respond to the same note data varies, as do the basic sound qualities of each instrument. So a purely
MIDI composition that sounds perfect when played through one synthesizer or sound module might sound very di erent when played through
another brand or model, even if the intended instruments are correct. A soft drum hit might now be too soft or too loud, and a piano that
sounded bright and clear when played at a medium velocity through one sound module might now sound mu ed or brash at the same velocity
on another sound module.
on another sound module.
Fortunately, many modern samplers contain several di erent-sounding instruments within each type and o er more than one type of drum set
to choose from. So Patch 000 in Bank 0 will play the default grand piano, while the same patch in Banks 1 and 2 contains additional pianos that
sound di erent. Indeed, this is yet another terri c feature of MIDI: U nlike audio Wave les, MIDI data is easy to edit. All modern MIDI-capable
DAW software lets you select a range of notes and either set new velocities or scale the existing velocities by some percentage. Scaling is useful
to retain the variance within notes, where a musical phrase might get louder, then softer again. Or you could set all of the notes in a passage to
the same velocity, not unlike applying limiting to an audio file.
In addition to de ning standard patch names and numbers, General MIDI also established a standard set of note names and numbers for all of
the drum sounds in a drum set. Table e1.3 shows the standard GM drum assignments that specify which drum sounds will play when those notes
are struck on the keyboard or sent from a sequencer program via MIDI. This also helps to ensure compatibility between products from di erent
vendors. As with GM voices, the basic tonal character of a given drum set, and how it responds to di erent velocities, can vary quite a lot.
Further, some drum sets are programmed to cut o the sound when you release the key, while others continue sounds to completion even if the
note length played is very short. But with GM, at least you know you’ll get the correct instrument sound, if not the exact timbre.
Table e1.3. Standard GM Drum Assignments.
In addition to standards for voice names and numbers and drum sounds, MIDI also de nes a standard set of continuous controller (CC)
numbers. The standard continuous controllers are shown in Table e1.4, though not all are truly continuous. For example, the sustain pedal, CC
#64, can be only On or O . So any value of 64 or greater is considered as pushing the pedal down, and any value below 64 releases the pedal.
These standard controllers are recognized by most modern software and hardware synthesizers, and some synthesizers recognize additional CC
commands specific to features of that particular model.
Table e1.4. Standard MIDI Continuous Controller Assignments.
Bank select
Modulation wheel
Breath controller
Foot controller
Portamento time
Data entry
Main volume
LSB for controllers 0–31, when two bytes are needed
Sustain pedal
Sostenuto pedal
Sostenuto pedal
Soft pedal
Nonregistered parameter number LSB
Nonregistered parameter number MSB
Registered parameter number LSB
Registered parameter number MSB
102–119 U ndefined
Reset all controllers
Local control
All notes Off
Omni mode Off
Omni mode On
Mono mode On
Poly mode On
Standard MIDI Files
Besides all the various data types, the MIDI standard also de nes how MIDI computer les are organized. There are two basic types of MIDI le:
Type 0 and Type 1. Type 0 MIDI les are an older format not used much anymore, though you’ll occasionally nd Type 0 les on the Internet
when looking for songs to play along with. A Type 0 MIDI le doesn’t distinguish between instrument tracks, lumping all of the data together
onto a single track. In a Type 0 le, the only thing that distinguishes the data for one instrument from another is the channel number.
Fortunately, most modern MIDI software expands Type 0 files to multiple tracks automatically when you open them.
Type 1 MIDI les can contain multiple tracks, and that’s the preferred format when saving a song as a MIDI le. Regardless, both MIDI le
types contain every note and its channel number, velocity, and also a time stamp that speci es when the notes are to start and stop. Other MIDI
data can also be embedded, such as on-the- y program (voice) changes, continuous controller values, song tempo, and so forth. Those also have
a time stamp to specify when the data should be sent.
MIDI Clock Resolution
Many DAW sequencer programs let you specify the time resolution of MIDI data, speci ed as some number of pulses per quarter note,
abbreviated PPQ. These pulses are sometimes called clock ticks. Either way, the PPQ resolution is usually set somewhere in the Options menu of
the software and typically ranges from 96 PPQ through 960 PPQ. Since this MIDI time resolution is related to the length of a quarter note, which
in turn depends on the current song tempo, it’s not an absolute number of milliseconds. But it can be viewed as a form of quantization because
notes you play between the clock pulses are moved in time to align with the nearest pulse. To put into perspective the amount of time
resolution you can expect, Table e1.5 lists the equivalent number of milliseconds per clock pulse for di erent PPQ values at the common song
tempo of 120 beats per minute.
Table e1.5. Pulse Spacing for MIDI Resolutions.
PPQ Time Between Pulses
5.2 ms
240 2.1 ms
480 1.0 ms
960 0.5 ms
Older hardware synthesizers often use a resolution of 96 PPQ, which at 5 ms is accurate enough for most applications. I use 240 PPQ for my
MIDI projects because the divisions are easy to remember when I have to enter or edit note lengths and start times manually, and 2 ms is more
than enough time resolution. To my thinking, using 960 PPQ is akin to recording audio at a sample rate of 192 KHz; it might seem like it should
be more accurate, but in truth the improved time resolution is not likely audible. Even good musicians are unable to play with a timing accuracy
better than about 20 to 30 milliseconds.1 As a fun exercise to put this into proper perspective, try tapping your nger along with a 50 Hz square
wave, and see how accurately you can hit every tenth cycle. Table e1.6 shows the duration in MIDI clock pulses of the most common note
lengths at a resolution of 240 PPQ. If you ever enter or edit MIDI note data manually, it’s handy to know the number of clock pulses for the
standard note lengths. Even at “only” 240 PPQ, time increments can be as ne as 1/60 of a 1/16 note. It seems unlikely to me that any type of
music would truly benefit from a higher resolution.
Table e1.6. Duration of Common Note Lengths at 240 PPQ.
Note Length
Number of Pulses
Whole note
Half note
Half note triplet
Quarter note
Quarter note triplet
Eighth note
Eighth note triplet
Sixteenth note
Sixteenth note triplet 40
MIDI Minutiae
The Bank Select command is used when selecting patches on synthesizers that contain more than one bank for sounds. You won’t usually need to
deal with this data directly, but you may have to choose from among several available bank select methods in your software, depending on the
brand of synthesizer you are controlling.
Most DAW and MIDI sequencer programs send an All Notes O command out all 16 channels every time you press Stop to cut o any notes
that might be still sounding. But sometimes you may get “stuck notes” anyway, so it pays to learn where the All Notes O button is located in
your MIDI software and hardware.
Besides the standard continuous controllers, General MIDI also de nes registered parameter numbers (RPNs) and nonregistered parameter
numbers (NRPNs). There are a few standard registered parameters, such as the data sequence that adjusts the pitch bend range to other than the
usual plus/minus two musical half-steps. NRPNs are used to control other aspects of a synthesizer’s behavior that are unique to a certain brand
or model and thus don’t fall under the standard CC de nitions. The speci c sequence of data needed to enable a nonstandard feature on a given
synthesizer should be listed in the owner’s manual.
Earlier I mentioned that some MIDI data require two bytes of data in order to express a su ciently large range of values such as pitch bend. In
that case, two bytes are sent one after the other, with the lower byte value rst. When two bytes are treated as a single larger value, they’re
called the least significant byte (LSB) and most significant byte (MSB).
You can buy hardware devices to split one MIDI signal to two or more outputs or to merge two or more inputs to a single MIDI signal. A MIDI
splitter is easy to build; it simply echoes the incoming data through multiple outputs. But a MIDI merger is much more complex because it has
to interpret all the incoming messages and combine them sequentially without splitting up data that span multiple bytes. For example, if one
input receives a three-byte note-on command at the same time another input receives a two-byte command to switch from a sax to a ute voice,
the merger must output one complete command before beginning to send the other. Otherwise, the data would be scrambled with a command
and channel number from one device, followed by data associated with another command from a different device.
Finally, MIDI allows transferring blocks of data of any size for any purpose using a method called Sysex, which stands for system exclusive.
Sysex is useful because it can store and recall patch information for older pre–General MIDI synthesizers that use a proprietary format. I’ve also
used Sysex many times to back up custom settings for MIDI devices to avoid having to enter them all over again if the internal battery fails.
Playing and Editing MIDI
Most MIDI-capable DAW programs work more or less the same, or at least have the same basic feature set. Again, I’ll use SONAR for my
examples because it’s a full-featured program that I’m most familiar with. A program that records and edits MIDI data is often called a
sequencer, though the lines have become blurred over the years now that many DAW programs can handle MIDI tracks as well as Wave le
audio. Chapter 7 showed how audio clips can be looped, split, and slip-edited, and MIDI data can be manipulated in the same way. You can’t
apply cross-fades to MIDI data, but you can manipulate it in many more ways than audio files.
Because MIDI is data rather than actual audio, it’s easy to change note pitches, adjust their length and velocity, and make subtle changes in
phrasing by varying their start times. There are also MIDI plug-in e ects that work in a similar way as audio e ects. The MIDI arpeggiator shown
in Chapter 14 operates as a plug-in, but there are also compressors that work by varying MIDI note-on velocity, echo and transpose (pitch shift)
effects, and even chord analyzers that tell you what chord and its inversion is playing at a given moment.
MIDI tracks in SONAR include a key o set eld to transpose notes higher or lower, without actually changing the data. This is useful when
writing music for transposing instruments such as the clarinet and French horn. You can write and edit your music in the natural key for that
instrument, yet have it play at normal concert pitch. But most MIDI sequencer software also o ers destructive editing to change note data
permanently. For example, you can quantize notes to start at uniform 1/8 or 1/16 note boundaries. This can often improve musical timing, and
it is especially helpful for musicians who are not expert players. Then again, real music always varies at least a little, so quantizing can equally
rob a performance of its human qualities. Most sequencer programs let you specify the amount of quantization to apply to bring the start times
rob a performance of its human qualities. Most sequencer programs let you specify the amount of quantization to apply to bring the start times
of errant notes nearer to the closest time boundary, rather than forcing them to an exactly uniform start time. Many sequencers also o er a
humanizing feature that intentionally moves note start times away from “the grid” to sound less robotic.
Another important MIDI feature is being able to record a performance at half speed or even slower, which again is useful for people who are
not accomplished players. If you set up the metronome in your software to play every eighth note instead of every quarter note, it’s easier to
keep an even tempo at very slow speeds. I’m not much of a keyboard player, but I have a good sense of timing and dynamics control. So I often
play solos at full speed on the MIDI keyboard, paying attention to phrasing and how hard I hit the keys, but without worrying about the actual
notes I play. After stabbing at a passage “with feeling,” I can go back later and x all the wrong notes. I nd that this results in a more musical
performance than step-entering notes one by one with a mouse or playing very slowly, which can lose the context and feel of the music.
Figure e1.2 shows the Piano Roll view in SONAR, and other software brands o er a similar type of screen for entering and editing MIDI note
data. Every aspect of a note can be edited, including its start time, length, and velocity. Controller data can also be entered and edited in the
areas at the bottom of the window. The screen can be zoomed horizontally and vertically to see as much or as little detail as needed.
Figure e1.2
The Piano Roll window lets you enter and edit every property of MIDI notes and controller data.
When you need to see and work with an even ner level of detail, the Event List shown in Figure e1.3 displays every piece of MIDI data
contained in the sequence. This includes not just musical notes and controllers, but also tempo changes and RPN and NRPN data. If your
synthesizer requires a special sequence of bytes to enable a special feature, this is where you’ll enter that data.
Figure e1.3
The Event List view o ers even more detailed editing and data entry, showing every aspect of a MIDI project, including tempo changes, pitch bends, and other non-note
Some musicians are more comfortable working with musical notes rather than computer data, and most sequencing software o ers a Staff
View similar to that shown in Figure e1.4. This type of screen is sometimes called Notation View. As with the Piano Roll view, you can edit
notes to new pitches and start times, and even insert song lyrics and common dynamics symbols. The music display ability of most MIDI
sequencers falls short of dedicated notation software, but many sequencer programs are capable of creating perfectly useable printed music and
lead sheets. Some even include standard guitar symbols for all the popular chord types.
Figure e1.4
Most sequencers let you enter and manipulate MIDI data as musical notes in a Staff View, rather than as little bars on a grid.
The “midi_editing” video shows an overview of MIDI editing basics, using the solo cello and piano section from my Tele-Vision music video as
a demo. The piano track is entirely MIDI, which I created partly by playing notes on a keyboard and partly by entering and copying notes one at
a time with a mouse. But the cello is my live performance; it’s not a sampled cello!
This chapter covers MIDI internal details in depth, including hardware protocols, and data formats and their use of channels. The standard
General MIDI instruments and drum notes were listed, along with miscellaneous MIDI tidbits such as MIDI le types, nonregistered parameters,
and using Sysex to back up custom settings on MIDI hardware. Finally, a short video tutorial shows the basics of editing MIDI notes in a
sequencer program.
Movement-Related Feedback and Temp oral Accuracy in Clarinet Performance, by Caroline Palmer, Erik Koop mans, Janeen D. Loehr, and Christine Carter. McGill University, 2 009.
Chapter 2
Audio Fidelity, Measurements, and Myths
“Science is not a democracy that can be voted on with the popular opinion.”
—Earl R. Geddes, audio researcher
In this chapter I explain how to assess the delity of audio devices and address what can and cannot be measured. Obviously, there’s no metric
for personal preference, such as intentional coloration from equalization choices or the amount of arti cial reverb added to recordings as an
e ect. Nor can we measure the quality of a musical composition or performance. While it’s easy to tell—by ear or with a frequency meter—if a
singer is out of tune, we can’t simply proclaim such a performance to be bad. Musicians sometimes slide into notes from a higher or lower pitch,
and some musical styles intentionally take liberties with intonation for artistic e ect. So while you may not be able to “measure” Beethoven’s
Symphony #5 to learn why many people enjoy hearing it performed, you can absolutely measure and assess the delity of audio equipment
used to play a recording of that symphony. The science of audio and the art of music are not in opposition, nor are they mutually exclusive.
High Fidelity Defined
By de nition, “high delity” means the faithfulness of a copy to its source. However, some types of audio degradation can sound pleasing—
hence the popularity of analog tape recorders, gear containing tubes and transformers, and vinyl records. As with assessing the quality of music
or a performance, a preference for intentional audio degradation cannot be quanti ed in absolute terms, so I won’t even try. All I can do is
explain and demonstrate the coloration added by various types of audio gear and let you decide if you like the e ect or not. Indeed, the same
coloration that’s pleasing to many people for one type of music may be deemed unacceptable for others. For example, the production goal for
most classical (and jazz or big band) music is to capture and reproduce the original performance as cleanly and accurately as possible. But many
types of rock and pop music benefit from intentional distortion ranging from subtle to extreme.
“The Allnic Audio’s bottom end was deep, but its de nition and rhythmic snap were a bit looser than the others. However, the bass sustain, where the instrumental textures
reside, was very, very good. The Parasound seemed to have a ‘crispy’ lift in the top octaves. The Ypsilon’s sound was even more transparent, silky, and airy, with a decay that
seemed to intoxicatingly hang in the air before effervescing and fading out.”
—Michael Fremer, comparing phonograph preamplifiers in the March 2011 issue of Stereophile magazine
Perusing the popular hi- press, you might conclude that the above review excerpt presents a reasonable way to assess and describe the
quality of audio equipment. It is not. Such owery prose might be fun to read, but it’s totally meaningless because none of those adjectives can
be de ned in a way that means the same thing to everyone. What is rhythmic snap? What is a “crispy” lift? And how does sound hang in the air
and e ervesce? In truth, only four parameters are needed to de ne everything that a ects the delity of audio equipment: noise, frequency
response, distortion, and time-based errors. Note that these are really parameter categories that each contain several subsets. Let’s look at these
categories in turn.
The Four Parameters
Noise is the background hiss you hear when you raise the volume on a hi- receiver or microphone preamp. You can usually hear it clearly
during quiet passages when playing cassette tapes. A close relative is dynamic range, which de nes the span in decibels (dB) between the
residual background hiss and the loudest level available short of gross distortion. CDs and DVDs have a very large dynamic range, so if you hear
noise while playing a CD, it’s from the original master analog tape, it was added as a by-product during production, or it was present in the
room and picked up by the microphones when the recording was made.
Subsets of noise are AC power-related hum and buzz, vinyl record clicks and pops, between-station radio noises, electronic crackling, tape
modulation noise, left-right channel bleed-through (cross-talk), doors and windows that rattle and buzz when playing music loudly, and the
triboelectric cable e ect. Tape modulation noise is speci c to analog tape recorders, so you’re unlikely to hear it outside of a recording studio.
Modulation noise comes and goes with the music, so it is usually drowned out by the music itself. You can often hear it on recordings that are
not bright sounding, such as a bass solo, as each note is accompanied by a “p t” sound that disappears between the notes. The triboelectric e ect
is sometimes called “handling noise” because it happens when handling poor-quality cables. The sound is similar to the rumble you get when
handling a microphone. This defect is rare today, thanks to the higher-quality insulation materials used by wire manufacturers.
Frequency response describes how uniformly an audio device responds to various frequencies. Errors are heard as too much or too little bass,
midrange, or treble. For most people, the audible range extends from about 20 Hz at the low end to slightly less than 20 KHz at the high end.
Some youngsters can hear higher than 20 KHz, though many senior citizens cannot hear much past 12 KHz. Some audiophiles believe it’s
important for audio equipment to pass frequencies far beyond 20 KHz, but in truth there’s no need to reproduce ultrasonic content because
nobody will hear it or be a ected by it. Subsets of frequency response are physical microphonics (mechanical resonance), electronic ringing and
oscillation, and acoustic resonance. Resonance and ringing will be covered in more detail later in this and other chapters.
Distortion is a layman’s word for the more technical term nonlinearity, and it adds new frequency components that were not present in the
Distortion is a layman’s word for the more technical term nonlinearity, and it adds new frequency components that were not present in the
original source. In an audio device, nonlinearity occurs when a circuit ampli es some voltages more or less than others, as shown in Figure 2.1.
This nonlinearity can result in a attening of waveform peaks, as at the left, or a level shift near the point where signal voltages pass from plus
to minus through zero, as at the right. Wave peak compression occurs when electrical circuits and loudspeaker drivers are pushed to levels near
their maximum limits.
Figure 2.1:
volts (right).
Two types of nonlinearity: peak compression at the top and/or bottom of a wave (left), and crossover distortion that a ects electrical signals as they pass through zero
Some circuits compress the tops and bottoms equally, which yields mainly odd-numbered harmonics—3rd, 5th, 7th, and so forth—while other
circuit types atten the top more than the bottom, or vice versa. Distortion that’s not symmetrical creates both odd and even harmonics—2nd,
3rd, 4th, 5th, 6th, and so on. Crossover distortion (shown in Figure 2.1) is also common, and it’s speci c to certain power ampli er designs.
Note that some people consider any change to an audio signal as a type of distortion, including frequency response errors and phase shift. My
own preference is to reserve the term “distortion” only when nonlinearity creates new frequencies not present in the original.
When music passes through a device that adds distortion, new frequencies are created that may or may not be pleasing to hear. The design
goal for most audio equipment is that all distortion be so low in level that it can’t be heard. However, some recording engineers and audiophiles
like the sound of certain types of distortion, such as that added by vinyl records, transformers, or tube-based electronics, and there’s nothing
wrong with that. My own preference is for gear to be audibly transparent, and I’ll explain my reasons shortly.
The two basic types of distortion are harmonic and intermodulation, and both are almost always present together. Harmonic distortion adds
new frequencies that are musically related to the source. Ignoring its own inherent overtones, if an electric bass plays an A note whose
fundamental frequency is 110 Hz, harmonic distortion will add new frequencies at 220 Hz, 330 Hz, 440 Hz, and subsequent multiples of 110 Hz.
Some audio devices add more even harmonics than odd, or vice versa, but the basic concept is the same. In layman’s terms, harmonic distortion
adds a thick or buzzy quality to music, depending on which speci c frequencies are added. The notes created by most musical instruments
include harmonics, so a device whose distortion adds more harmonics merely changes the instrument’s character by some amount. Electric guitar
players use harmonic distortion—often lots of it—to turn a guitar’s inherent plink-plink sound into a singing tone that has a lot of power and
Intermodulation distortion (IMD) requires two or more frequencies to be present, and it’s far more damaging audibly than harmonic distortion
because it creates new sum and di erence frequencies that aren’t always related musically to the original frequencies. For example, if you play a
two-note A major chord containing an A at 440 Hz and a C# at 277 Hz through a device that adds IM distortion, new frequencies are created at
the sum and difference frequencies:
717 Hz is about halfway between an F and F# note, and 163 Hz is slightly below an E note. Neither of these are related musically to A or C#,
nor are they even standard note pitches. Therefore, even in relatively small amounts, intermodulation distortion adds a dissonant quality that can
be unpleasant to hear. Again, both harmonic and intermodulation distortion are caused by the same nonlinearity and thus are almost always
present together. What’s more, when IM distortion is added to notes that already contain harmonics, which is typical for all musical instruments,
sum and difference frequencies related to all of the harmonics are created, as well as for the fundamental frequencies.
Another type of distortion is called aliasing, and it’s unique to digital audio. Like IM distortion, aliasing creates new sum and di erence
frequencies not harmonically related to the original frequencies, so it can be unpleasant and irritating to hear if it’s loud enough. Fortunately, in
all modern digital gear, aliasing is so low in level that it’s rarely if ever audible. Aliasing artifacts are sometimes called “birdies” because
di erence frequencies that fall in the 5–10 KHz range change pitch in step with the music, which sounds a little like birds chirping. An audio le
letting you hear what aliasing sounds like is in Chapter 3.
Transient intermodulation distortion (TIM) is a speci c type of distortion that appears only in the presence of transients—sounds that increase
quickly in volume such as snare drums, wood blocks, claves, or other percussive instruments. This type of distortion may not show up in a
standard distortion test using static sine waves, but it’s revealed easily on an oscilloscope connected to the device’s output when using an
impulse-type test signal such as a pulse wave. TIM will also show up as a residual in a null test when passing transient material. Negative
feedback is applied in ampli ers to reduce distortion by sending a portion of the output back to the input with the polarity reversed. TIM occurs
when stray circuit capacitance delays the feedback, preventing it from getting back to the input quickly enough to counter a very rapid change in
when stray circuit capacitance delays the feedback, preventing it from getting back to the input quickly enough to counter a very rapid change in
input level. In that case the output can distort briefly. However, modern amplifier designs include a low-pass filter at the input to limit transients
to the audible range, which effectively solves this problem.
Time-based errors are those that a ect pitch and tempo. When playing an LP record whose hole is not perfectly centered, you’ll hear the pitch
rise and fall with each revolution. This is called wow. The pitch instability of analog tape recorders is called flutter. U nlike the slow, once per
revolution pitch change of wow, utter is much faster and adds a warbling e ect. Digital recorders and sound cards have a type of timing error
called jitter, but the pitch deviations are so rapid they instead manifest as added noise. With all modern digital audio gear, jitter is so soft
compared to the music that it’s almost always inaudible. The last type of time-based error is phase shift, but this too is inaudible, even in
relatively large amounts, unless the amount of phase shift is di erent in the left and right channels. In that case the result can be an unnaturally
wide sound whose location is difficult to identify.
Room acoustics could be considered an additional audio parameter, but it really isn’t. When strong enough, acoustic re ections from nearby
boundaries create the comb
ltered frequency response described in Chapter 1. This happens when re ected sound waves combine in the air
with the original sound and with other re ections, enhancing some frequencies while canceling others. Room re ections also create audible
echoes, reverb, and resonance. In an acoustics context, resonance is often called modal ringing at bass frequencies, or
utter echo at midrange
and treble frequencies. But all of these are time-based phenomena that occur outside the equipment, so they don’t warrant their own category.
Another aspect of equipment quality is channel imbalance, where the left and right channels are ampli ed by di erent amounts. I consider
this to be a “manufacturing defect” caused by an internal trimmer resistor that’s set incorrectly, or one or more
xed resistors that are out of
tolerance. But this isn’t really an audio parameter either, because the audio quality is not affected, only its volume level.
The preceding four parameter categories encompass everything that a ects the delity of audio equipment. If a device’s noise and distortion
are too soft to hear, with a response that’s su ciently uniform over the full range of audible frequencies, and all time-based errors are too small
to hear, then that device is considered audibly transparent to music and other sound passing through it. In this context, a device that is
transparent means you will not hear a change in quality after audio has passed through it, even if small di erences could be measured. For this
reason, when describing audible coloration, it makes sense to use only words that represent what is actually a ected. It makes no sense to say a
power ampli er possesses “a pleasant bloom” or has a “forward” sound when “2 dB boost at 5 KHz” is much more accurate and leaves no room
for misinterpretation.
Chapter 1 explained the concept of resonance, which encompasses both frequency and time-based e ects. Resonance is not so much a
parameter as it is a property, but it’s worth repeating here. Resonance mostly a ects mechanical transducers—loudspeakers and microphones—
that, being mechanical devices, must physically vibrate. Resonance adds a boost at some frequency and also continues a sound’s duration over
time after the source has stopped. Resonance in electrical circuits generally a ects only one frequency, but resonances in rooms occur at multiple
frequencies related to the spacing between opposing surfaces. These topics will be examined in more depth in the sections that cover transducers
and room acoustics.
When assessing frequency response and distortion, the nest loudspeakers in the world are far worse than even budget electronic device.
However, clarity and stereo imaging are greatly a ected by room acoustics. Any room you put the speakers in will exaggerate their response
errors further, and re ections that are not absorbed will reduce clarity. Without question, the room you listen in has much more e ect on sound
quality than any electronic device. However, the main point is that measuring these four basic parameters is the correct way to assess the quality
of ampli ers, preamps, sound cards, loudspeakers, microphones, and every other type of audio equipment. Of course, to make an informed
decision, you need all of the relevant specs, which leads us to the following.
Lies, Damn Lies, and Audio Gear Specs
Jonathan: “You lied first.”
Jack: “No, you lied to me first.”
Jonathan: “Yes, I lied to you first, but you had no knowledge I was lying. So as far as you knew, you lied to me first.”
—Bounty hunter Jack Walsh (Robert De Niro) arguing with white-collar criminal Jonathan Mardukas (Charles Grodin) in the movie Midnight Run
When it comes to audio delity, the four standard parameter categories can assess any type of audio gear. Although published product specs
could tell us everything needed to evaluate a device’s transparency, many specs are incomplete, misleading, and sometimes even fraudulent. This
doesn’t mean specs cannot tell us everything needed to determine transparency—we just need all of the data. However, getting complete specs
from audio manufacturers is another matter. Often you’ll see the frequency response given but without a plus/minus dB range. Or a power amp
spec will state harmonic distortion at 1 KHz, but not at higher or lower frequencies where the distortion might be much worse. Or an ampli er’s
maximum output power is given, but its distortion was spec’d at a much lower level such as 1 watt.
Lately I’ve seen a dumbing down of published gear reviews, even by contributors in pro audio magazines, who, in my opinion, have a
responsibility to their readers to aim higher than they often do. For example, it’s common for a review to mention a loudspeaker’s woofer size
but not state its low-frequency response, which is, of course, what really matters. Audio magazine reviews often include impressive-looking
graphs that imply science but are lacking when you know what the graphs actually mean. Much irrelevant data is presented, while important
specs are omitted. For example, the phase response of a loudspeaker might be shown but not its distortion or o -axis frequency response, which
are far more important. I recall a hi- magazine review of a very expensive tube preampli er so poorly designed that it verged on selfoscillation (a high-pitched squealing sound). The reviewer even acknowledged the defect, which was clearly visible in the accompanying
oscillation (a high-pitched squealing sound). The reviewer even acknowledged the defect, which was clearly visible in the accompanying
frequency response graph. Yet he summarized by saying, “Impressive, and very highly recommended.” The misguided loyalty of some audio
magazines is a huge problem in my opinion.
Even when important data are included, they are sometimes graphed at low resolution to hide the true performance. For example, a common
technique when displaying frequency response graphs is to apply smoothing, also called averaging. Smoothing reduces the frequency resolution
of a graph, and it’s justi ed in some situations. But for loudspeakers you really do want to know the full extent of the peaks and nulls. Another
trick is to format a graph using large, vertical divisions. So a frequency response line may look reasonably straight, implying a uniform response,
yet a closer examination shows that each vertical division represents a substantial dB deviation.
The graphs in Figures 2.2 through 2.4 were all derived from the same data but are presented with di erent display settings. For this test I
measured the response of a single loudspeaker in a fairly large room with a precision microphone about a foot away. Which version looks more
like what speaker makers publish?
Figure 2.2:
This graph shows the loudspeaker response as measured, with no smoothing.
Figure 2.3:
This graph shows the exact same data but with third-octave smoothing applied.
Figure 2.4:
This graph shows the same smoothed data as in Figure 2.3, but at 20 dB per vertical division instead of 5 dB, making the speaker’s response appear even flatter.
Test Equipment
“Empirical evidence trumps theory every time.”
Noise measurements are fairly simple to perform using a sensitive voltmeter, though the voltmeter must have a at frequency response over
the entire audible range. Many budget models are not accurate above 5 or 10 KHz. To measure its inherent noise, an ampli er or other device is
powered on but with no input signal present; then the residual voltage is measured at its output. U sually a resistor or short circuit is connected
to the device’s input to more closely resemble a typical audio source. Otherwise, additional hiss or hum might get into the input and be
ampli ed, unfairly biasing the result. Most power ampli ers include a volume control, so you also need to know where that was set when the
noise was measured. For example, if the volume control is typically halfway up when the ampli er is used but was turned way down during the
noise test, that could make the amplifier seem quieter than it really is.
Although it’s simple to measure the amount of noise added by an audio device, what’s measured doesn’t necessarily correlate to its audibility.
Our ears are less sensitive to very low and very high frequencies when compared to the midrange, and we’re especially sensitive to frequencies
in the treble range around 2 to 3 KHz. To compensate for this, many audio measurements employ a concept known as weighting. This
intentionally reduces the contribution of frequencies where our ears are less sensitive. The most common curve is A-weighting, as shown in
Figure 2.5.
Figure 2.5:
A-weighting intentionally reduces the contribution of low and very high frequencies, so noise measurements will correspond more closely to their audibility.
In the old days before computers were common and a ordable, harmonic distortion was measured with a dedicated analyzer. A distortion
analyzer sends a high-quality sine wave, containing only the single desired frequency with minimal harmonics and noise, through the device
being tested. Then a notch lter is inserted between the device’s output and a voltmeter. Notch lters are designed to remove a very narrow
band of frequencies, so what’s left are the distortion and noise generated by the device being tested. Figure 2.6 shows the basic method, and an
old-school Hewlett-Packard distortion analyzer is shown in Figure 2.7.
Figure 2.6:
To measure a device’s harmonic distortion, a pure sine wave is sent through the device at a typical volume level. Then a notch
remains are the distortion and noise of the device being tested.
lter removes that frequency. Anything that
Figure 2.7:
The Hewlett-Packard Model 334A Distortion Analyzer.
Photo courtesy of Joe Bucher.
Intermodulation distortion is measured using two test tones instead of only one, and there are two standard methods. One method sends 60 Hz
and 7 KHz tones through the device being tested, with the 60 Hz sine wave being four times louder than the 7 KHz sine wave. The analyzer then
measures the level of the 7,060 Hz and 6,940 Hz sum and di erence frequencies that were added by the device. Another method uses 19 KHz
and 20 KHz at equal volume levels, measuring the amplitude of the 1 KHz difference tone that’s generated.
Modern audio analyzers like the Audio Precision APx525 shown in Figure 2.8 are very sophisticated and can measure more than just
frequency response, noise, and distortion. They are also immune to human hearing foibles such as masking,1 and they can measure noise,
distortion, and other artifacts reliably down to extremely low levels, far softer than anyone could possibly hear.
Figure 2.8:
The Audio Precision Model APx525 Audio Analyzer.
Photo courtesy of Audio Precision.
Professional audio analyzers are very expensive, but it’s possible to do many useful tests using only a Windows or Mac computer with a
decent-quality sound card and suitable software. I use the FFT feature in Sony’s Sound Forge audio editing program to analyze frequency
response, noise, and distortion. For example, when I wanted to measure the distortion of an inexpensive sound card, I created a pure 1 KHz sine
wave test signal in Sound Forge. I sent the tone out of the computer through a high-quality sound card having known low distortion, then back
into the budget sound card, which recorded the 1 KHz tone. The result is shown in Figure 2.9. Other test methods you can do yourself with a
computer and sound card are described in Chapter 22. As you can see in Figure 2.9, a small amount of high-frequency distortion and noise above
2 KHz was added by the sound card’s input stage. But the added artifacts are all more than 100 dB softer than the sine wave and so are very
unlikely to be audible.
Figure 2.9:
This FFT screen shows the distortion and noise added by a consumer-grade sound card when recording a 1 KHz sine wave.
Low distortion at 1 KHz is easy to achieve, but 30 Hz is a di erent story, especially with gear containing transformers. Harmonic distortion
above 10 KHz matters less because the added harmonics are higher than the 20 KHz limit of most people’s hearing. However, if the distortion is
high enough, audible IM di erence frequencies below 20 KHz can result. Sadly, many vendors publish only THD measured at 1 KHz, often at a
level well below maximum output. This ignores that distortion in power ampli ers and gear containing transformers usually increases with
rising output level and at lower frequencies.
The convention these days is to lump harmonic distortion, noise, and hum together into a single THD+Noise spec and express it as either a
percentage or some number of dB below the device’s maximum output level. For example, if an ampli er adds 1 percent distortion, that
amount can be stated as 40 dB below the original signal. A-weighting is usually applied because it improves the measurement, and this is not
unfair. There’s nothing wrong with combining noise and distortion into a single
gure either when their sum is safely below the threshold of
audibility. But when distortion artifacts are loud enough to be audible, it can be useful to know their speci c makeup. For example, artifacts at
very low frequencies are less objectionable than those at higher frequencies, and harmonics added at frequencies around 2 to 3 KHz are
especially noticeable compared to harmonics at other frequencies. Again, this is why A-weighting is usually applied to noise and distortion
measurements and why using weighting is not unreasonable.
Audio Transparency
As we have seen, the main reason to measure audio gear is to learn if a device’s quality is high enough to sound transparent. All transparent
devices by de nition sound the same because they don’t change the sound enough to be noticed even when listening carefully. But devices that
add an audible amount of distortion can sound di erent, even when the total measured amount is the same. A-weighting helps relate what’s
measured to what we hear, but some types of distortion are inherently more objectionable (or pleasing) than others. For example, harmonic
distortion is “musical,” whereas IM distortion is not. But what if you prefer the sound of audio gear that is intentionally colored?
In the 1960s, when I became interested in recording, ads for most gear in audio magazines touted their at response and low distortion. Back
then, before the advent of multilayer printed circuit boards, high-performance op-amps, and other electronic components, quality equipment
was mostly handmade and very expensive. In those days design engineers did their best to minimize the distortion from analog tape, vacuum
tubes, and transformers. Indeed, many recordings made in the 1960s and 1970s still sound excellent even by today’s standards. But most audio
gear is now mass-produced in Asia using modern manufacturing methods, and very high quality is available at prices even nonprofessionals can
easily afford.
Many aspiring recording engineers today appreciate some of the great recordings from the mid-twentieth century. But when they are unable to
make their own amateur e orts sound as good, they wrongly assume they need the same gear that was used back then. Of course, the real reason
so many old recordings sound wonderful is because they were made by very good recording engineers in great (often very large) studios having
excellent acoustics. That some of those old recordings still sound so clear today is in spite of the poorer-quality recording gear available back
then, not because of it!
Somewhere along the way, production techniques for popular music began incorporating intentional distortion and often extreme EQ as
creative tools. Whereas in the past, gear vendors bragged about the at response and low distortion of their products, in later years we started to
see ads for gear claiming to possess a unique character, or color. Some audio hardware and software plug-ins claim to possess a color similar to
speci c models of vintage gear used on famous old recordings. U nderstand that “color” is simply a skewed frequency response and/or added
distortion; these are easy to achieve with either software or hardware, and in my opinion need not demand a premium price. For example,
distortion similar to that of vacuum tubes can be created using a few resistors and a diode, or a simple software algorithm.
The key point is that adding color in the form of distortion and EQ is proper and valuable when recording and mixing. During the creative
process, anything goes, and if it sounds good, then it is good. But in a playback system the goal must be for transparency—whether a recording
studio’s monitors or a consumer playback system. In a studio setting the recording and mixing engineers need accurate monitoring to know how
the recording really sounds, including any coloration they added intentionally. With a consumer playback system you want to hear exactly what
the producers and mix engineers heard; you’ll hear their artistic intent only if your own system adds no further coloration of its own.
Common Audio Myths
“I thought cables didn’t matter, so I tried running my system without them. Huge difference!”
—Posted in a hi-fi audio forum
Now that we understand what to measure and how, and know that a null test can prove our measurements valid, let’s use that knowledge to
bust some common audio myths. The earliest audio myth I can recall is the bene t of fancy wire for connecting loudspeakers, and it’s still going
strong. Some vendors claim their wire sounds better than normal wire, and, of course, it’s more expensive than normal wire. In truth, the most
important property of speaker wire is resistance, which is directly related to its thickness. The wire’s resistance must be small to pass the high-
important property of speaker wire is resistance, which is directly related to its thickness. The wire’s resistance must be small to pass the high-
current signals a power ampli er delivers, and this is exactly analogous to a large pipe letting more water ow through it than a small pipe. For
short distances—say, up to 5 or 10 feet—16-gauge wire of any type is adequate, though thicker wire is required for longer lengths. When heavier
gauges are needed—either for longer runs or when connecting high-power ampli ers and speakers—Romex wire typically used for AC power
wiring is a fine choice for loudspeakers.
The three other wire parameters are inductance, capacitance, and skin e ect, and these will be explained in more detail in the section of this
book that covers electronics. But these parameters are not important with usual cable lengths at audio frequencies, especially when connecting
speakers to a power ampli er. Low-capacitance wire can be important in some cases, such as between a phonograph cartridge or highimpedance microphone and a preamp. But high-quality, low-capacitance wire is available for pennies per foot. U nscientific and even impossible
claims for wire products are common because wire is a low-tech device that’s simple to manufacture, and the pro t margin for manufacturers
and retailers is very high. I could devote this entire section to debunking wire claims, but instead I’ll just summarize that any audio (or video)
cable that costs more than a few dollars per foot is a poor value.
Bi-wiring is a more recent myth, and it’s a pretend relative to bi-amping, which is legitimate. No single-speaker driver can reproduce the
entire range of audible frequencies, so speaker makers use two or three drivers—woofers and tweeters—to handle each range. Bi-amping splits
the audio into low/high- or low/mid/high-frequency ranges, and each range goes to a separate ampli er that in turn powers each speaker
driver. This avoids passive crossovers that lose some of their power as heat and usually add distortion. Bi-wiring uses two separate speaker
wires, but they’re both connected to the same power amplifier that then feeds a passive crossover!
A related myth is cable elevators—small devices that prevent your speaker wires from touching the
oor. Like so many audiophile “tweak”
products, the claim that cable elevators improve sound quality by avoiding damaging static electric and mechanical vibration is unfounded. If
vibration really a ected electricity as it passed through wires, airplanes—with their miles of wire subject to shock and extreme vibration—would
fall from the sky daily. Indeed, it would be trivial for vendors to prove that audio passing through wire is a ected by vibration, thus establishing
a real need for their products. To my knowledge, no vibration-control product vendor has ever done that.
Even less likely to improve sound than replacement speaker wire is after-market AC power cords and most other power “conditioner”
products. The sales claims seem reasonable: Noise and static can get into your gear through the power line and degrade sound quality. In severe
cases it’s possible for power-related clicks and buzzes to get into your system, but those are easily noticed. The suggestion that power products
subtly increase “clarity and presence” is plain fraud. Indeed, every competent circuit designer knows how to lter out power line noise, and such
protection is routinely added to all professional and consumer audio products. Buying a six-foot replacement power cord ignores the other
hundred-odd feet of regular AC wiring between the wall outlet and power pole. Likewise for replacement AC outlets, and even more so for
replacement AC outlet cover plates that claim to improve audio quality. Again, this would be easy for vendors to prove with hard science, but
they never do. Power conditioner vendors sometimes show an oscilloscope display of the power line noise before and after adding their product.
But they never show the change at the output of the connected equipment, which, of course, is what really matters.
The last wire myth I’ll mention is the notion that boutique U SB and HDMI cables avoid or reduce the degradation of audio (and video)
compared to standard wires. The signals these wires pass are digital, not analog, so the usual wire properties that can lose high frequencies don’t
apply except in extreme cases. For the most part, digital data either arrive at the other end intact or don’t. And many digital connections employ
some type of error checking to verify the integrity of the received data. So generally, if you hear any sound at all through a digital cable, you can
be confident that nothing was lost or changed along the way.
Among devoted audiophiles, one of the most hotly debated topics is the notion that reproducing ultrasonic frequencies is necessary for high
delity reproduction. But no human can hear much past 20 KHz, and few microphones respond to frequencies beyond that. Even fewer
loudspeakers can reproduce those high frequencies. If recording and reproducing ultrasonic frequencies were free, there’d be little reason to
object. But in this digital age, storing frequencies higher than necessary wastes memory, media space, and bandwidth. The DVD format
accommodates frequencies up to 96 KHz, but then lossy2 data compression, which is audibly degrading, is needed to make it t! Record
companies and equipment manufacturers were thrilled when we replaced all our old LPs and cassettes with CDs back in the 1980s and 1990s.
Now, with newer “high-resolution” audio formats, they’re trying hard to get us to buy all the same titles again, and new devices to play them,
with the false promise of fidelity that exceeds CDs.
Another myth is the bene t of mechanical isolation. The claims have a remote basis in science but are exaggerated to suggest relevance where
none is justi ed. If you ever owned a turntable, you know how sensitive it is to mechanical vibration. U nless you walk lightly, the record might
skip, and if you turn up the volume too high, you may hear a low-frequency feedback howl. A turntable is a mechanical device that relies on
physical contact between the needle and the record’s surface. But CDs and DVDs work on an entirely di erent principle that’s mostly immune to
mechanical vibration. As a CD or DVD spins, the data are read into a memory bu er, and from there they’re sent to your receiver or headphones.
The next few seconds of music is already present in the player’s bu er, so if the transport is jostled enough to make the CD mis-track, the player
sends its data stream from the bu er until the drive nds its place again. For this reason, large bu ers were common on CD players sold to
joggers before MP3 players took over.
Mechanical isolation is not useful for most other electronic gear either. However, mechanical isolation with loudspeakers is valid because
they’re mechanical devices that vibrate as they work. When a speaker rests on a tabletop, the table may vibrate in sympathy and resonate.
Electronic devices that contain vacuum tubes can also be sensitive to vibration because tubes can become microphonic. If you tap a tube with a
pencil while the ampli er is turned on, you might hear a noise similar to tapping a microphone. But microphonic tubes are excited mainly by
sound waves in the air that strike the tube. Placing a tube amplifier on a cushion reduces only vibrations that arrive from the floor.
Vinyl records and vacuum tube equipment are very popular with devoted audiophiles who believe these old-school technologies more
faithfully reproduce subtle nuance. There’s no question that LPs and tubes sound di erent from CDs and solid state gear. But are they really
better? The answer is, not in any way you could possibly assess
delity. Common to both formats is much higher distortion. LPs in particular
have more inherent noise and a poorer high-frequency response, especially when playing the inner grooves. I’m convinced that some people
prefer tubes and vinyl because the distortion they add sounds pleasing to them. In the audio press this is often called euphonic distortion. Adding
small amounts of distortion can make a recording sound more cohesive, for lack of a better word. Distortion can seem to increase clarity, too,
because of the added high-frequency overtones. Recording engineers sometimes add distortion intentionally to imitate the sound of tubes and
analog tape, and I’ve done this myself many times. Simply copying a song to a cassette tape and back adds a slight thickening that can be
pleasing if the instrumentation is sparse. But clearly this is an effect, no matter how pleasing, and not higher fidelity.
Other common audio myths involve very small devices that claim to improve room acoustics. You can pay a hundred dollars each for small
pieces of exotic wood the size and shape of hockey pucks. Other common but too-small acoustic products are metal bowls that look like Sake
cups and thin plastic dots the size and thickness of a silver dollar. Sellers of these devices suggest you put them in various places around your
room to improve its acoustics. But with acoustics, what matters is covering a su cient percentage of the room’s surface. Real acoustic treatment
must be large to work well, and that’s not always conducive to a domestic setting. Some people want very much to believe that something small
and unobtrusive can solve their bad acoustics, without upsetting the decor. Sadly, such products simply do not work. Worse, an acoustic device
that purports to be a “resonator” can only add unwanted artifacts, assuming it really is large enough to have an audible e ect. There’s a type of
bass trap called a Helmholtz resonator, but that works as an absorber rather than adding the sound of resonance into the room.
Another myth is that the sound of vinyl records and CDs can be improved by applying a demagnetizer. There’s no reason to believe that the
vinyl used for LP records could be a ected by magnetism. Even if plastic could be magnetized, there’s no reason to believe that would a ect the
way a diamond needle traces the record’s grooves. A change in sound quality after demagnetizing a CD is even less likely because CDs are made
from plastic and aluminum, and they store digital data! For the most part, digital audio either works or it doesn’t. Although digital audio might
possibly be degraded when error checking is not employed, degradation is never due to a CD becoming “magnetized.”
As an audio professional I know that $1,000 can buy a very high-quality power ampli er. So it makes no sense to pay, say, $17,000 for an
ampli er that is no better and may in fact be worse. However, some myths are more like urban legends: No products are sold, but they’re still a
waste of time. For example, one early hi-fi myth claims you can improve the sound of a CD by painting its edge with a green felt marker. (Yes, it
must be green.) A related myth is that cables and electronic devices must be “broken in” for some period of time before they achieve their nal
highest delity. Speaker and headphone drivers can change slightly over time due to material relaxation. But aside from a manufacturing defect,
the idea that wire or solid state circuits change audibly over time makes no sense and has never been proven. This myth becomes a scam when a
vendor says that for best results you must break in the product for 90 days. Why 90 days? Because most credit card companies protect your right
to a refund for only 60 days.
The Stacking Myth
The last audio myth I’ll debunk is called stacking. The premise is that audio gear such as a microphone preamp or sound card might measure
well and sound clean with a single source, but when many separate tracks are recorded through that same preamp or sound card and later
mixed together, the degradation “stacks” and becomes more objectionable. In this sense, stacking means the devices are used in parallel, versus
sending one source through multiple devices in series with the output of one sent to the input of the next. Stacking theory also presumes that
when many tracks are recorded through a device having a non at frequency response, such as a microphone’s presence boost, the e ect of that
skewed response accumulates in the nal mix more than for each separate track. However, this type of accumulated coloration is easy to
disprove, as shown in Figure 2.10.
Figure 2.10:
If a microphone or preamp has a skewed frequency response, shown here as a 4 dB boost at 1 KHz, the net response is the same no matter how many microphones or
preamps are used. And whatever frequency response error the microphone or preamp may have, it can be countered with equalization.
As an extreme example, let’s say the preamp used for every track of a recording has a 4 dB boost at 1 KHz. The result is the same as using a
at preamp and adding an equalizer with 4 dB boost on the output of the mixer. Of course, no competent preamp has a frequency response
nearly that skewed. Even modest gear is usually at within 1 dB from 20 Hz to 20 KHz. But even if a preamp did have such a severe response
error—whether pleasing or not—it could be countered exactly using an opposite equalizer setting. So no matter how many tracks are mixed,
only 4 dB of EQ cut would be needed to counter the response of the preamp.
Now let’s consider distortion and noise—the other two audio parameters that a ect the sound of a preamp or converter. Artifacts and other
coloration from gear used in parallel do not add the same as when the devices are connected in series. When connected in series, it is far more
damaging because noise and coloration accumulate. Related, some people believe that two pieces of gear might sound and measure exactly the
same, but it’s easier or faster to get a good sounding mix if all the tracks had been recorded through one device versus the other. In truth,
recording multiple tracks repeatedly through the same device and then mixing those tracks together later actually reduces distortion compared to
mixing the tracks rst and going through the device only once. Even then, any di erence between stacking or not is audible only if the device’s
distortion is loud enough to hear in the first place.
As we learned earlier, where harmonic distortion adds new harmonically related frequencies, IM distortion creates sum and di erence
frequencies and thus is more dissonant and audibly damaging. Further, whenever harmonic distortion is added by a device, IM distortion is also
added. Both are caused by the same nonlinearity and so are inseparable except in special contrived circuits.
Let’s say you have three tracks, each with a di erent frequency sine wave. (Yes, music is more complex than three sine waves, but this more
easily explains the concept.) For this example we’ll assume the recording medium adds some amount of distortion, but the mixing process is
perfectly clean and is not part of the equation. When each sine wave is recorded on its own track, some amount of harmonic distortion is added.
But no IM distortion is added by the recorder because only one frequency is present on each track. So when the recorder’s tracks are mixed
cleanly, the result is three sine waves, each with its own harmonically related distortion frequencies added. This is shown in Figure 2.11, using
the three notes of an A major chord as the source frequencies. For simplicity, only the first two added harmonics are listed for each tone.
Figure 2.11:
Recording multiple single-frequency sources onto separate recorder tracks adds new distortion products created within the recorder, but only at frequencies
harmonically related to each source.
Compare that to mixing the three sine waves together cleanly and then recording that mix onto a single track that adds distortion. Now the
recorder’s nonlinearity adds not only harmonic distortion to each of the three fundamental pitches but also adds IM sum and di erence
frequencies because the three sources are present together when recorded. This is shown in Figure 2.12.
Figure 2.12:
all of the sources.
Recording audio sources onto a single recorder track after they’re combined adds harmonics related to each source and adds sum and di erence frequencies related to
So by separating out sources across multiple recorder tracks—or converters or preamps or any other devices that might contribute audible
distortion—the result is always cleaner than when mixing the sources together
rst. Note that the di erence between THD and IMD distortion
amounts is purely a function of the device’s nonlinearity. With transparent gear the added IM products are not audible anyway—hence the proof
that audible stacking is a myth when using high-quality gear. And even when gear is not transparent, stacking can only reduce distortion, which
is the opposite of what’s claimed.
This brings us to coherence. Noise and distortion on separate tracks do not add coherently. If you record the same mono guitar part on two
analog tape tracks at once, when played back, the signals combine to give 6 dB more output. But the tape noise is di erent on each track and so
rises only 3 dB. This is the same as using a tape track that’s twice as wide, or the di erence between 8 tracks on half-inch tape versus 8 tracks on
one-inch tape. Figure 2.13 shows this in context, where recording the same source to two tracks at once yields a 3 dB improvement in the signal
to noise ratio.
Figure 2.13:
Coherent signals add by 6 dB, but noise is random and increases only 3 dB.
The same thing happens with distortion. The distortion added by a preamp or converter on a bass track has di erent content than the
distortion added to a vocal track. So when you combine them cleanly in a mixer, the relative distortion for each track remains the same. Thus,
there is no “stacking” accumulation for distortion either. If you record a DI bass track through a preamp having 1 percent distortion on one track
and then record a grand piano through the same preamp to another track, the mixed result will have the same 1 percent distortion from each
One key to identifying many audio myths is the high prices charged. Another is the lack of any supporting data. It’s one thing for a vendor to
claim improved sound, but quite another to prove it. If one brand of speaker wire really is better than all the others, it can be easily proven
using the standard four parameters. When a vendor o ers owery wording instead of test data or says only “Just listen,” that’s a pretty good sign
that the claims are probably not truthful. I imagine some vendors actually believe their own claims! But that’s irrelevant. What really matters is
that you know how to separate truth from fiction.
Many of the myths I’ve described do have a factual basic in science, but the e ects are so in nitesimal that they can’t possibly be audible. I
often see “subjectivists” proclaim that science has not yet found a way to identify and measure things they are certain they can hear, such as a
change in sound after a solid state power amp has warmed up for half an hour. I’ve also heard people state that audio gear can measure good
but sound bad, or vice versa. But if a device measures good yet sounds bad—and sounding bad is con rmed by a proper blind test—then clearly
the wrong things were measured. This is very different from the belief that what is heard as sounding bad (or good) can’t be measured at all.
In truth it’s quite the other way around. We can easily measure digital jitter that’s 120 dB below the music, which is a typical amount and is
about 1,000 times softer than could be audible. It’s the same for distortion, frequency response, and noise, especially when you factor in the ear’s
susceptibility to the masking e ect. Many audiophiles truly believe they hear a change in quality when science and logic suggest that no audible
di erence should exist. But this is easy to disprove: If there were more to audio than the four basic parameters, it would have been revealed by
now as a residual in a null test. Hewlett-Packard distortion analyzers going back to the mid-twentieth century use nulling to remove the test
signal and reveal any artifacts that remain. The beauty of nulling is that it reveals all di erences between two signals, including distortion or
other artifacts you might not have thought to look for.
The Big Picture
Keeping what truly matters in perspective, it makes little sense to obsess over microscopic amounts of distortion in a preamp or computer sound
card, when most loudspeakers have at least ten times more distortion. Figure 2.14 shows the rst ve individual components measured from a
loudspeaker playing a 50 Hz tone. When you add them up, the total THD is 6.14 percent, and this doesn’t include the IM sum and di erence
products that would also be present had there been two or more source frequencies, as is typical for music.
Figure 2.14:
This graph shows the fundamental plus first four distortion frequencies measured from a loudspeaker playing a single 50 Hz tone.
Courtesy of
Midrange and treble speaker drivers often have less distortion than woofers, mostly because woofer cones have to move much farther to create
similar volume levels. But even high-quality tweeters playing at moderate volumes typically have more distortion than modern electronic
This chapter explained the four parameter categories that de ne everything a ecting audio
delity, as well as important specs that vendors
sometimes hide, and ways vendors skew data in published graphs to appear more favorable. We also busted a number of common audio myths
and learned the correct terminology to define fidelity.
The quality of audio gear can be measured to a much higher resolution than human ears can hear, and those measurements are more accurate
and reliable than hearing. Although transparency can be de ned and determined conclusively through measuring, color is more di cult to
quantify because it involves preference, which cannot be de ned. Likewise, degradation caused by lossy MP3-type audio compression is di cult
to measure because it doesn’t lend itself to traditional delity tests. Even with bit-rates high enough to not audibly harm the music, a null test
will always reveal residual artifacts. In that case, blind tests—requiring many trials with many test subjects—are the only way to assess how
objectionable the lossy compression is for a given bit-rate. But there’s no magic, and everything that audibly a ects electronic gear can be easily
U ltimately, many of these are consumerist issues, and people have a right to spend their money however they choose. If Donald Trump wants
to pay $6,000 for an AC power cord, that’s his choice and nobody can say he’s wrong. Further, paying more for real value is justi ed. Features,
reliability, build quality, good components, convenience and usability, and even appearance all demand a price. If I’m an engineer at U niversal
Studios recording major lm scores, which can cost hundreds of dollars per minute just for the orchestra musicians, I will not buy the cheapest
brand that could break down at the worst time, no matter how clean it sounds.
Further, even if a device is audibly transparent, that doesn’t mean it’s “good enough,” and so recording engineers and consumers won’t bene t
from even higher performance. Audio typically passes through many devices in its long journey from the studio microphones to your
loudspeakers, and what we ultimately hear is the sum of degradation from all of the devices combined. This means not just distortion and noise,
but also frequency response errors. When audio passes though ve devices in a row that each have a modest 1 dB loss at 20 Hz, the net response
is a 5 dB reduction at 20 Hz.
The goal of this chapter is to explain what a ects audio delity, to what degree of audibility, and why. But one important question remains:
Why do people sometimes believe they hear a change in audio quality—for example, after replacing one competent wire with another—even
when measurements prove there is no audible difference? This long-standing mystery will be explored fully in Chapter 3.
The masking effect refers to the ear’s inability to hear a soft sound in the p resence of a louder sound. For examp le, you won’t hear your wristwatch ticking at a loud rock concert, even if you hold it right next to your ear. Masking is strongest when
both the loud and soft sounds contain similar frequencies, and this is described more fully in Chap ter 3.
Lossy comp ression is ap p lied to audio data to reduce their size for storage or transmission. Lossy methods allow for a substantial size reduction—MP3 files are typ ically comp ressed about 10 to 1—but the original content is not restored exactly
up on p layback. When using a sufficiently high bit-rate, the small loss in quality is usually accep table and may not even be audible. Contrast this to less effective but lossless comp ression that reduces the size of comp uter files, where the data
must not change. Lossy comp ression is also used with JPG images and video files to reduce their size, though the sp ecific methods differ from those that reduce the size of audio data.
Chapter e2
“Try to learn something about everything and everything about something.”
—Thomas Henry Huxley
When I entered the world of audio and recording in the 1960s, a studio owner had to know how to align tape recorders and swap circuit
boards, if not repair them directly. My, how things have changed! Today, instead of aligning tape heads and repairing electronics, you have to
know how to optimize your computer to handle as many tracks and plug-ins as possible, back up hard drives to protect your operating system
and data les, and install drivers for a new sound card. Many musicians and studios also have their own websites, separate from social
networking sites, which require yet another set of skills to maintain. Indeed, an audio expert today must also be a computer expert. I’ll use
Windows for the following explanations and examples, because that’s what I’m most familiar with, but the concepts apply equally to Mac and
Linux computers.
For most of us today, audio maintenance includes knowing how to properly set up and organize a personal computer. I’ve owned computers
since the Apple ][, and I’ve used every version of DOS and Windows since then. My audio computer runs all day long, yet it never crashes or
misbehaves. And that’s not because I’m lucky. Further, I also use my audio computer for other tasks like emailing, creating websites, image
scanning and graphics design, nances, and computer programming, with no problems. Contrary to popular opinion, there’s no inherent reason
why a computer that records audio cannot be used for other things, too. The key is being organized: Install only programs you really need,
defragment your hard drives, disable unneeded background tasks, and back up faithfully in case of disaster. By being organized and keeping your
system clean, your computer will run better, and you’ll be able to work faster, smarter, and more safely.
Divide and Conquer
One of the most important ways to keep your computer organized is to divide today’s extremely large hard drives into separate, smaller
partitions. Even entry-level computers come with hard drives that would have seemed impossibly large just a few years ago. But storing all of
your les on one huge drive is like tossing all your personal correspondence and tax records for the past 20 years into one enormous shoebox.
As each year passes, it becomes more di cult to nd anything. I can’t tell you how many times I’ve heard, “Help! I downloaded a le yesterday,
but now I can’t find it.” Figure e2.1 shows an old hard drive I took apart just for fun.
Figure e2.1
This 500 MB SCSI hard drive has five platters, and it cost a fortune when new. Now it’s just a really cool paperweight.
Photo by Jay Munro.
A disk drive is much like an o ce ling cabinet. For example, a four-drawer ling cabinet is equivalent to a drive with four partitions, or
virtual drives. The cabinet is divided rather than one enormous drawer, which would be clumsy and di cult to manage. Inside each drawer is a
series of hanging folders, which are the same as the folders in a hard drive’s root directory. And within each hanging folder are manila folders
that contain both loose papers as well as other manila folders, equivalent to disk les and folders, respectively. Besides dividing hard drives into
partitions, it’s equally important to organize your folders and les in a logical manner. The “partitions_folders” video shows how I partitioned
and organized the hard drives on my previous computer that I replaced while working on this book.
Besides helping to organize your data into logical groups, partitioning a drive has many other bene ts. Perhaps most important, it’s a lot
easier to back up your operating system and programs separately from data les. When everything is on one drive, it’s di cult to know which
les have changed since the last time you backed up. There are thousands of operating system (OS) les stored in various places on a system
drive, and their names are not obvious. Further, if an errant program installation or driver update damages Windows such that it won’t even
start, you’ll be hard-pressed to recover your backup
les at all unless you have an original installation DVD to
important, if you restore an entire drive from a previous backup, your audio
will overwrite the newer current versions.
rst restore the OS. But most
les and other data will be lost in the process because older les
The solution is to divide your hard drives based on the types of les each partition will store. Storing les in separate partitions also reduces
le fragmentation. As les are saved, deleted, and resaved, they become split up into many small pieces in various places around the drive.
When you play those
les later, the drive works harder as it navigates to di erent areas of the drive’s magnetic platters to gather up all the
pieces. With computers used for audio and video projects, drive fragmentation limits the number of tracks you can record and play at once. Even
with modern fast computers, if a drive is badly fragmented, you might not be able to record even one or two tracks without dropouts.
Most computers include defragmenting software that consolidates all of the les into contiguous sectors on the disk. Dividing a hard drive lets
you defragment partitions independently and only those that actually need it. For example, my SoundFont instrument sample les rarely change,
so they don’t become fragmented. But if they were on the same partition as my audio project les, it would take much longer to defragment the
partition because many gigabytes of SoundFont les would be shu ed around in the process, adding hours to the process. By keeping les that
don’t change separate from
les that change often, defragmenting is more e cient. Several small partitions can be defragmented much more
quickly than one large partition, even when the total amount of disk space is the same.
Current versions of Mac OS include partitioning software, but the partitioning tools in Windows 7 are too limited to be useful. So for Windows
computers, you’ll do better using EASEU S Partition Master, a freeware program shown in Figure e2.2. This screen shows both physical hard
drives in my computer, including all of the partitions on Disk 1. Disk 2 has a single partition I use to back up
les from the
rst drive. I also
have four large external U SB drives for additional backup, but they weren’t connected when I made this gure. The various operations are
invoked from the menu choices at the left of the screen. I’ll also mention Partition Magic, another excellent program. It’s not free, but it’s
affordable, and it has even more features than Partition Master.
Figure e2.2
Partition Master lets you easily divide a large hard drive into smaller partitions.
As you can see, a 108 GB Drive C: partition holds only Windows and programs. Another 40 GB D: partition holds what I call “small data”—
word processor les, my income and expenses database, client web pages and graphics, family photos, and so forth. The 100 GB E: partition
holds all of my current audio projects and nal mixes, and the large 1 TB F: partition is for video projects. Another 30 GB G: partition holds
what I call “static data”—drivers and programs I’ve downloaded, SoundFont instrument sample sets, and other les that rarely change. The
10 GB H: partition holds temporary les I create or download, then delete, as well as temporary Internet les saved by my web browsers, and
temporary storage for programs like Sound Forge and WinZip that save, then delete large les while they work. I also made an I: partition in the
older FAT32 format needed for a DOS program that I wrote years ago to process my website log files.
It’s not necessary to divide a hard drive into this many partitions, though you should at least create a data partition that’s separate from the OS
and your programs. It’s also a good idea to make the partitions much larger than the amount of data you intend to store in them, because
defragmenting goes faster when there’s plenty of free space on the drive.
I generally partition the main drive when a computer is rst purchased, before installing any programs. Otherwise, once you’ve divided a
single large drive into smaller partitions, you’ll move data les that had been on Drive C: to other partitions. For Windows computers, most of
these les are in subfolders under My Documents. Fortunately, most programs let you specify other locations, which you’ll need when you set up
folders in a separate data partition. It’s also useful to move your temporary Internet les from C: to another partition, because they, too, become
quickly fragmented. There’s no need to actually move these les. Just change their location in your web browser’s settings, and then after
restarting your computer, you can delete any abandoned les left on the C: drive. Do the same for your email program to keep its les on a data
drive that’s backed up often.
drive that’s backed up often.
Again, the whole point of moving data les o
the OS drive is to let you back up the OS and software programs separately from your data.
This way you can easily back up your data daily, which goes quickly because you copy only what’s new or changed. The OS partition changes
only when you install or remove programs, or change system settings, so you can back that up only occasionally by making an image backup of
the partition to a separate internal or external hard drive. Without all of your large audio and video les and temporary Internet les, an image
backup also goes quickly and takes up much less space.
Back Up Your Data
There’s a saying in the computer industry that data do not truly exist unless they reside in three places. The next time you’re about to power
down your computer, ask yourself, “If I turn on my computer tomorrow and the hard drive is dead, what will I do?” If backing up seems like a
nuisance, then devise a strategy that makes it easy so you’ll do it. I use an excellent program called SyncBack, shown in Figure e2.3, but there are
others ranging from freeware to inexpensive.
Figure e2.3
SyncBack lets you easily copy all of your data to one or more backup drives.
As you can see, I created several di erent backup pro les for each group of data. This lets me alternate backing up to multiple drives. Imagine
you’re in some program—an audio editor, word processor, or any program that uses les—and you save the le. But unknown to you, a
software or hardware glitch caused the le to be corrupted. If you back up that corrupted le, it overwrites your only good backup copy, leaving
you with two corrupted les. Admittedly such le corruption is rare, but it happens. It’s happened to me a few times, and it will probably
happen to you eventually, too. The solution is to reload the le after saving to verify it’s good before overwriting your only backup. I did that
constantly while writing this book. If I opened an earlier chapter to add or clarify something, I’d save and then reopen it again right away to
verify it was okay. Most programs have a “recently used” file list, so it takes only a few seconds to reopen a file to verify its integrity.
It’s also a good idea to alternate backups to separate drives. One day you back up to Drive 1, the next day to Drive 2, and then back to Drive 1
the third day. I rotate between ve backup drives—a second drive inside my computer, a network server in my basement, a 16 GB thumb drive,
and two external U SB drives I turn on only when needed. I also have two more external drives that I leave with friends and trade every few
months. This way, even if my house is robbed or burns down, I won’t lose everything. While writing this book, I backed up the entire project to
DVDs once a week in case I needed to refer to an older version of something. I also bought a second 16 GB thumb drive and carried it with me
whenever I left the house. There are “cloud” storage services that let you store data remotely on their web server, and many are inexpensive. I
use Microsoft’s SkyDrive, a free service included with Hotmail accounts. Again, you don’t have to be as compulsive as I am, but not backing up is
an invitation to lose every audio project you’ve ever done, not to mention your financial data and priceless photos.
Backing up your emails can be easy or di cult, depending on how you manage your email. As a businessperson, I keep all incoming and
outgoing emails on my hard drive for at least three months, so I can refer to past correspondence. I use Thunderbird, a free email client that runs
on my computer. I have it set to delete incoming emails stored on the remote server after three days. This way, if my hard drive dies tomorrow, I
can go online and retrieve my recent emails to not miss replying to something important. If you use a free web-based email account such as
Hotmail, Yahoo, or Gmail, I suggest downloading your emails regularly to your local data drive. A free email service owes you nothing, and
that’s all you can expect. However, most free services are excellent, and they are often a better choice than an email account with your local
Internet provider. If you change Internet service providers, you’ll lose your email address, and if you forget to notify any of your correspondents,
they won’t be able to contact you.
Optimizing Performance
“Software expands to fill the available memory, and software is getting slower more rapidly than hardware gets faster.”
—Niklaus Wirth, inventor of the Pascal programming language
The most important reason to optimize a computer is to let it use more of its computing power to process your audio (and video) data and
plug-ins rather than waste resources on background tasks such as constantly verifying your Wi-Fi network’s signal strength. Another is to reduce
latency, which is the time it takes audio to get into and out of the computer. Modern computers are very fast and don’t require all the tweaks
used in years past to maximize performance. But still, there’s no point in having programs you don’t need running in the background slowing
down your computer and wasting memory. On Windows computers, most programs that run in the background are listed under Services, which
you get to from the Control Panel. Others are listed in the Startup tab of msconfig, shown in Figure e2.4, which you can run manually by typing
its name in the Start .. Run menu.
Figure e2.4
The Startup tab of msconfig lists programs that run automatically every time your computer starts. Entries at the left that aren’t checked are disabled.
Don’t just disable everything you don’t recognize! Many needed background programs and services have names that aren’t intuitive. But many
are recognizable, and the descriptions in the Services list tell what they do. For example, many CD and DVD writing programs run in the
background, waiting for a blank disk to be inserted. You don’t need that running all the time and wasting memory. Rather than give a list of
services that aren’t usually needed, and that will surely change with the next version of Windows, I suggest searching the web for “disable
Windows services” to find websites that list the current details. Another useful resource is Mike River’s Latency article.
I also suggest disabling file indexing, which runs in the background and creates a database of the contents of every le on a hard drive to make
searching go faster. This is a useful feature for an o ce computer, but it risks slowing down your drive for audio and video projects. Right-click
a drive letter in Windows Explorer, then select Properties, and uncheck the box on the General tab.
Write caching is another “feature” that should be disabled for all removable drives because it delays writing your data. If you copy a large le
and then unplug the drive, the data may not have been copied yet. Again, right-click the drive letter in Windows Explorer, then select the
Hardware tab. Highlight a removable drive, select Properties, then check Quick Removal to disable write caching on the Policies tab. If there is
no Policies tab, then that drive is not at risk.
Windows o ers a System Restore feature that takes snapshots of the operating system and programs every few days, or when you tell it to,
letting you undo an improper or unintended change. But System Restore doesn’t always work, and it’s yet another service that runs in the
background, monitoring your les and potentially taxing your hard drive. (However, System Restore can be set to take snapshots only when you
tell it to.) Making an image backup to an external drive after major changes is much more reliable. Current versions of Windows include an
image backup utility, and for older computers, Norton Ghost and Acronis True Image are excellent and affordable.
Practice Safe Computing
As explained earlier, I use one computer for everything. I understand that some people prefer to keep their audio computer o the Internet, but
it’s not really needed if you practice “safe computing,” which includes keeping a current OS image backup. Further, if your audio computer is
not connected to the Internet, it’s di cult to install or update software that “phones home” to verify legitimate ownership. I use Firefox for most
web browsing, but with no plug-ins or extensions installed, and all ActiveX and Java scripting disabled. I can’t view online videos or Flash
content with this computer, but neither can it be infected with malware. I can click any link in a web forum that looks interesting and know that
my computer won’t get a virus. When I want to see a YouTube video or visit another site I’m certain is safe, I use Internet Explorer with a
Medium security setting. This dual-browser method also avoids the need for antivirus software that runs in the background, wasting computer
I also have a network router that serves as a hardware firewall to block outside access to all the computers on my home network. Even if you
have only one computer that connects directly to the Internet, buy a wired—not wireless—router that has this feature (many do). This type of
rewall uses a method called network address translation, or NAT, to hide your computer’s IP address from the outside world. This ensures that
nothing is allowed into your computer from an external computer unless a prior request to view a web page or download a le originated from
your computer. A hardware router is more secure than a software firewall, and it doesn’t use any computer resources.
A n uninterruptible power supply (U PS) is also mandatory to protect your computer and data. If the power goes out while a program is
writing to a hard drive, the hard drive is almost sure to be corrupted. Even if the power is reliable where you live, a U PS is still a good
investment. You only have to lose an important project once to understand the value of a U PS. An expensive large-capacity U PS isn’t necessary;
it just needs to provide power only long enough to shut down properly.
Finally, a computer is not a sacred device that must be left powered on all the time. It’s just an appliance! I suggest turning it o (not standby)
when you’re not going to use it for an hour or more. It’s also a good idea to reboot before important sessions, especially if you’ve just done a lot
of audio or video editing or web surfing.
If It Ain’t Broke, Don’t Fix It
I never update software unless something doesn’t work right or I truly need a new feature. I know people who update to every new version
when it’s
rst released and update their OS every time a new security patch arrives, which is almost daily. I can’t count how many times I’ve
seen web forum posts where someone updated something and it made their computer worse. Many computer professionals avoid any software
version number ending with “.0,” instead waiting for the inevitable subsequent version that actually works as advertised. Many programs check
for updates every time you power on your computer, and Windows does the same. The first thing I do after installing Windows or new programs
is disable automatic updates if present. The second thing I do, when applicable, is tell the program to save its temporary
les to my Temp
partition to avoid cluttering and fragmenting the C: drive. Sometimes updates are desirable or necessary, and setting a Windows Restore point
manually or making an image backup before installing a new program or upgrade is a good idea in case the installation doesn’t go as planned.
Avoid Email Viruses
I always check my email online in a web browser to delete spam and other unwanted emails before downloading to my computer. As
mentioned, I save all of my emails in and out for three months, so I prefer to avoid cluttering my data drive with unwanted emails. My web
email client is set to display emails in plain text rather than as HTML, which lets spammers know your email address is valid when an
embedded image links back to their website. If that’s too much effort, the following guidelines will help you to avoid email viruses.
Rule #1: Never open a le attachment you receive by email, even if it comes from your best friend. Rather, save it to disk rst, then open it
from the appropriate program. What looks like an innocent link or photo le may install malware on your computer. Many viruses propagate
by sending themselves to everyone in the infected person’s address book, so the virus arrives in an email from a friend rather than a stranger. I
receive virus les by email several times per year. The last time was from a good friend. Of course, I didn’t open the attached le, but he had no
idea his computer sent the email!
JPG and GIF images are always safe to open, but a virus can be disguised as an image le. By default, Windows hides extensions for common
le types much as .exe, .jpg, .pdf, and so forth. So the sneaky bastards that create viruses often name them hotbabe.jpg.exe or joke.doc.exe,
because most people will see only hotbabe.jpg or joke.doc, without the real .exe extension at the end. Fortunately, this is easy to x: From
Windows Explorer, go to the Tools menu, then select Folder Options and click the View tab. Find Hide extensions for known le types and
make sure it’s not checked. But even with extensions revealed, you still must be careful. If an attached le has a very long name, the extension
may be hidden if there’s not enough room to display the full name.
Most viruses are executable programs having an .exe le extension, but they can also use .com, .vbs, .pif, .bat, and probably others. Although
.doc (Microsoft Word) and .xls (Microsoft Excel) les are usually safe, viruses can be hidden inside them in the form of self-running macros. In
practice, very few people need to use Word’s macros feature, so you should set Word and Excel to warn you whenever such a macro is about to
In the long run, the best way to avoid receiving viruses by email is minimizing the number of people who know your email address. More and
more web forums and online product registration forms require an email address. But unless you really want to let them contact you (perhaps to
be noti ed of product updates), use a phony email address like [email protected] Or set up a free email account just for registrations, since
most web forums require you to follow up to their email before you’re accepted.
Finally, never click a link in a spam or other unwanted email that o ers “to be removed from our mailing list.” That’s just a trick; the real
purpose is getting you to con rm that they sent to a valid address. Once they know they found a “live” one, your email address is worth more
when they sell it to other spammers.
Bits ’n’ Bytes
“Today’s production equipment is IT based and cannot be operated without a passing knowledge of computing, although it seems that it can be operated without a passing
knowledge of audio.”
—John Watkinson
A complete explanation of computer internals is beyond the scope of this book, but some of the basics are worth understanding to better
appreciate how audio software works. There are two types of computer memory: random access memory (RAM) and read-only memory (ROM).
RAM is the memory used by software to store data you’re working on, such as text documents and emails as you write them, and MIDI and audio
clips as you work in a digital audio workstation (DAW). When you power down your computer, whatever data are in RAM are lost unless you
saved it to a disk drive. ROM is permanent, and it’s used for a computer’s BIOS—the basic input/output system—to store the low-level code that
accesses hard drives and other hardware needed to start your computer before the OS is loaded from disk into RAM. Video cards and sound cards
also contain their own ROM chips for the same purpose. Some types of ROM, called flash memory, can be written to for software updates. Flash
memory is also used in modern electronic devices instead of batteries to save user settings when the power is o . But most ROM is permanent
memory is also used in modern electronic devices instead of batteries to save user settings when the power is o . But most ROM is permanent
and cannot be changed.
As explained in Chapter 8, the smallest unit of digital memory is one bit, which holds a single One or Zero value. One byte (also called an
octet) contains eight bits, one word contains two bytes or 16 bits, and a double-word contains four bytes or 32 bits. When audio is recorded at 24
bits, three bytes are used for each sample. There’s also the nybble, which is four bits or half a byte, though it’s not used much today because most
memory chips are organized into groups of bytes or words. These data sizes are shown in Table e2.1. The binary numbers in the second column
are just examples to show the size of each data type.
Table e2.1. Digital Memory Units.
U nit
1 Bit
1 Nybble
1 Byte
1011 0110
1 Word
1011 0110 1010 0011
1 Double Word 1011 0110 1010 0011 1010 0110 0001 0011
Both RAM and ROM memory chips are organized in powers of 2 for e ciency. So 1 kilobyte of memory actually contains 2^10=1,024
memory locations rather than only 1,000. Therefore:
KB=1,024 bytes
MB=1,024 KB
GB=1,024 MB
TB=1,024 GB
Technically, 1 MB is one megabyte or 1,048,576 bytes; 1 GB is one gigabyte or 1,073,741,824 bytes; and 1 TB is one terabyte or
1,099,511,627,776 bytes. But not everyone uses this method—especially companies that sell hard drives—so sometimes 1 GB really means only
1,000,000,000 bytes. A mix of formats is also used, where 1 MB=1,048,576 bytes, but 1 GB=1,000 MB rather than 1,024 MB. The great thing
about standards is there are so many of them.
Computer Programming
Many people think that computer programming is a complex science that requires advanced math skills. Nothing could be further from the
truth. Now, some types of programming require high-level math, such as coding an equalizer plug-in or a Fast Fourier Transform, but most
programming is based on simple IF/THEN and AND/OR logic:
IF the mouse is clicked, AND it’s currently positioned over the Solo button, THEN mute all of the other tracks that are playing.
In this case, Mute is activated by storing either a One or a Zero in a memory location set aside to hold the current state of each track. So when
you open a project with 20 tracks, your DAW program sets aside at least 20 bits of memory just to hold the current Mute state of each track.
Then when you press Play, the program checks each location to nd out whether it is to include that track in the playback. If a track is not
muted, then the program reads the Wave le that’s active at that time in the project and sends it through each plug-in and output bus. This is
very logical, and computer programming certainly requires being organized, but it’s not necessarily as complicated as many believe. Most types
of computer programming require little math beyond addition, subtraction, multiplication, and simple IF/THEN logic tests.
Computer programming is also an art as much as a science. Yes, you need to know how memory is organized and other science facts, but there
are many ways to accomplish the same functionality, and some algorithms are far more elegant and e cient than others. The code that sorts
your address book alphabetically might occupy 500 bytes or 5,000 bytes. It might take one millisecond to sort 100 names or be able to sort
1,000 names in just a few microseconds. This is not unlike electronic circuit design, where the goal is low distortion, a at response, and low
noise, while drawing as little power as possible to avoid wasted energy and excess heat. There are a dozen ways to design a mic preamp, and
some are decidedly better than others. The same is true for programming.
A high-level language lets you use English words like IF and THEN to instruct a computer without having to deal with the extreme detail and
minutiae needed to talk to a computer in its native machine language. The classic rst example taught in programming classes is called Hello
World, and such programs go back to the 1960s when the BASIC language was invented:
10 CLS REM clear the screen
20 LOCATE 10, 15 REM put the cursor at the 10th row, in the 15th column
30 PRINT “Hello World!” REM print the quoted text at the current cursor position
In early versions of BASIC, each command was numbered so the computer would know in what order to do each operation. Lines were often
numbered by tens as shown to allow inserting other commands later if needed. The REM to the right of each command stands for remark, letting
programmers add comments to their code to remind themselves or other programmers what each command does. A program this simple is self-
evident, but complex programs can be a nightmare to understand and modify a year later if they’re not well documented—especially for
someone other than the original programmer. This style of programming was great back when computers were text oriented. Today, modern
operating systems require more setup than simple LOCATE and PRINT, but the underlying concepts are the same. For those who are curious, the
Liberty Basic (Windows) source code for the ModeCalc and Frequency-Distance programs included with this book are in the same Zip
contain the executable programs.
les that
Website Programming
Social networking sites are a great way for musicians to tell the world about themselves and let fans download their tunes, and YouTube will
gladly host all of your videos for free. But any musicians or studio operators who are serious about their business will also have a real website.
Owning your own site is more prestigious than a Facebook page, and it avoids your visitors having to endure unwanted ads and pop-ups,
including ads for your competitors. Again, this chapter can’t include everything about web design, but I can cover the basics of what studio
owners and musicians need to design and maintain their own sites. Many web hosting companies include templates that simplify making a site
look the way you want, but you should understand how to customize the content.
The basic language of website programming is HTML, which stands for HyperText Markup Language. HyperText refers to highlighted text
that, when clicked with a mouse, calls up other related content on the same page or a di erent page. This is not limited to websites, and the rst
hypertext programs were like PDF les, where, for example, clicking a word in a book’s index automatically takes you to the entry on that page.
With websites, links to additional content are typically underlined, though they don’t have to be. Here is the complete HTML code for a very
simple web page:
<title>Doug’s Home Page</title>
<p>Welcome to my web site!</p>
Most of this code is needed just to conform to HTML standards to identify the header and body portions and the page title. One line then
displays a simple welcome message.
All of the visible page content is between the <body> and </body> markers, called tags. All HTML tags are enclosed in <brackets> that
end with the same word preceded by a slash inside other </brackets> to mark the end of the section. For example, <p> marks the start of a
paragraph, and </p> marks the end. HTML allows extensive formatting to create tables with rows and columns, thumbnail pictures that
enlarge when clicked on, and much more. A page containing many text and graphic elements can quickly become very complex.
The good news is you rarely need to deal with HTML code. Web design software lets you write text, control its font color and size, embed
images and links to MP3 les, and so forth, as well as edit the underlying HTML code directly. As you type, the software adds all the HTML code
and tags automatically. But it’s useful to understand the basics of HTML, if only to check for unwanted bloat on your pages. For example, if you
highlight a section of text as italics, the web design software places <i> and </i> tags around that text. Making text bold instead surrounds it
with <b> and </b>. But sometimes after editing a page and making many changes, you can end up with empty tags from previous edits that
are no longer needed:
<p>From this web page y ou can download <i></i> all of my tunes and f ollow the links to <b></b> my v ideos on Y ouTube.</p>
A few bytes of super uous data are not usually a big deal, but it can make your pages load slowly if there are many such empty tags,
especially if someone is viewing your site on a cell phone. My own approach is to keep websites as simple as possible. This way everyone will
see the pages as you intend, even if they’re using an older browser. HTML can do the same things in di erent ways, and some are more e cient
than others. For example, you can make text larger or smaller by specifying a font size by number or by enclosing it within <big></big> and
<small></small> tags. I prefer using tags rather than speci c size numbers not only because it creates less code, but it also lets people more
easily size the text to suit their own eyes and screen resolution. On Windows PCs, most browsers let you use the mouse wheel with the Ctrl key
to scale the font size larger or smaller. But some browsers won’t let you scale the text when font sizes are specified by number.
There are many books and online resources that can teach you web design and HTML, so I won’t belabor that here. However, just to give a
taste of how this works, later I’ll show code to embed audio and video les on your site, as well as create a custom error page that looks more
professional than the generic “Page not found” most sites display. It’s easier to embed videos from YouTube into pages on your site than to add
code manually for a media player, but again, that risks displaying unwanted ads, including ads for your competitors. Also, most browsers can
display the source code for web pages you’re viewing. So if you see something interesting and want to learn how it’s done, use View Source to
see the underlying HTML.
Image Files
Web pages can display three types of image files: JPG, GIF, and PNG. The most popular format is JPG, which is great for photos, but not so great
for screenshots or logos that include text. JPG
les use lossy compression to reduce their size to 1/10 normal or even smaller, much like MP3
les are much smaller than Wave les. But JPG les create artifacts that look like a cluster of small colored dots near sharp edges. These are less
visible in a photograph, but they show clearly on images containing text or lines. GIF les are also compressed to reduce their size, but they use
a lossless method that restores the original image exactly. However, GIF
les are limited to 256 distinct colors, which makes them less suitable
for photos. PNG les are a newer format supported by modern browsers. They are both lossless and high quality, but they tend to be larger than
the other types. I suggest JPG for photographs, and try both GIF or PNG for line art to see which comes out smaller for a given image. Most
photo editor programs let you set the amount of compression when saving JPG les, so you can balance picture quality against le size. Again,
the point is to keep file sizes reasonable so your web pages load faster.
Custom Error Page with Redirect
If you click a link for a page that no longer exists, or mistype a web address by mistake, most websites display a generic “Error 404” page. Your
site will look more professional if you make your own custom error page that has the same look as the rest of the site. You can even send users
to your home page automatically after giving them time to read your message. After you create a custom page, change your website’s control
panel to specify the new Error 404 page instead of the default. Then add this line just above the </head> tag that identi es the end of the
page’s header, using your own site address:
<meta http-equiv ="ref resh" content="6; URL=http://www.y">
The “6” tells the browser to stay on the current page for six seconds, before going to your site’s home page. Of course, you can change the
duration or send users to a specific page.
Hundreds of useful and free programming examples are available on the web, showing simple JavaScript code to do all sorts of cool and
useful stu . You can even download complete web programs, such as software that adds a search capability to your site. However, integrating a
complex add-on requires a deeper knowledge of web programming. It’s easier to add code to let Google search your site, but, again, that will
display ads, which is unprofessional. Further, if you make changes or add new pages, you can’t control when Google will update its search
index. So it could be days or weeks before Google will nd recent content in a site search. When I wanted to add searching to the websites I
manage, I bought Zoom Search, a highly capable program from Wrensoft. With one click, it indexes the entire site, and with another, it uploads
the index files.
Embedded Audio and Video
Embedding an MP3 file into a web page is pretty simple, requiring a single line with this link:
<p>Click <a href ="my tune.mp3">HERE</a> to download the tune.</p>
This line assumes the MP3 le is in the same web directory as the page containing the link. If the
you’ll add the complete address:
le is somewhere else, or on another site,
Click <a href =" tune.mp3">HERE</a> to download the tune.
Most people have a media player that will launch automatically when they click a link to an MP3 le. Some players start playing
immediately, then stream the audio in the background while continuing to play. Other players wait until the entire le is downloaded, then start
playback. MP3 les are relatively small, so your site visitors probably won’t have to wait very long. But video les can be very large, so it’s
better to embed your own player software that you know will start playing immediately. I use a freeware Flash player, which plays FLV format
videos. This player is highly compatible with a wide range of browsers, and it requires uploading only two support les to your site. Both les
are included in the “” file that goes with this book, along with a sample web page and Flash video.
I suggest you start with the included HTM le, which is stripped down to the minimum, and add to that to make it look the way you want.
However, you’ll need to edit the page title, the name of your video le, and the height and width of the embedded video. This line speci es the
display size for the video, and you’ll change only the numbers 480 (width) and 228 (height):
v ar s1 = new SWFObject(’play er.swf ’,’ply ’,’480’,’228’,’9’,’#f f f f f f ’);
This video is small—one of the SONAR editing demos from Chapter 8—and its actual display size is 480 pixels wide by only 208 pixels high.
But you need to add 20 to the height to accommodate the player’s Play, Volume, and other controls at the bottom. The other line you’ll edit
contains the name of the video file itself:
s1.addParam(’f lashv ars’,’f ile=sonar_cross-f ade.f lv ’);
As with MP3 les, if the video is not in the same directory as the web page, you’ll need to include the full web address. Once you verify that
the page displays as intended on your local hard drive, copy it to your website, along with the FLV video and the player’s two support files.
Finally, unless your video editing software can render to FLV les directly, you need a way to convert your video to the Flash format. I use the
AVS Video Converter mentioned in online Bonus Chapter 3, Video Production. This a ordable program converts between many di erent video
AVS Video Converter mentioned in online Bonus Chapter 3, Video Production. This a ordable program converts between many di erent video
file types and can also extract video files from a DVD and save them in popular formats.
This chapter covers hard drive partitioning and backup and explains the basics of optimizing performance by disabling unneeded programs that
run in the background. A computer whose hard drive is well organized and defragmented and runs only the programs you really need will be
faster and more reliable than when
rst purchased. Further, by keeping operating system and program
les separate from your personal data,
you can back up easily and quickly. Even though most people view backing up as a burden, it doesn’t have to be. A good backup program
makes it easy to protect your audio projects and other
les, and when it’s easy, you’re more likely to do it regularly. Likewise, making an
occasional image backup of your system drive protects against errant programs and outright hard drive failure.
Computer viruses and other malware are increasingly common, but protecting yourself is not di cult, and it requires mostly common sense
plus a few basic guidelines. If you buy a hardware
rewall, don’t open email attachments directly, and use two browsers with one set for high
security, you won’t need antivirus software that slows down your computer, reducing the number of tracks you can play at once. Likewise, a U PS
is a wise investment that avoids losing changes made to the project you’re currently working on, or possibly losing all of the data on the hard
drive if the power goes out while the drive is being written to.
This chapter also explains a bit about computer and website programming, just to show that they’re not as mysterious and complicated as
many believe. Although modern web design software makes it easy to create web pages without learning to write HTML code, understanding the
basics is useful—and even fun!
Chapter 3
Hearing, Perception, and Artifact Audibility
“I wasn’t present in 1951 when the Pultec equalizer was designed, but I suspect the engineers were aiming for a circuit that a ects the audio as little as possible beyond the
response changes being asked of it. I’m quite sure they were not aiming for a ‘vintage’ sound. The desire for ‘warmth’ and a ‘tube sound’ came many years later, as a new
generation of engineers tried to understand why some old-school recordings sound so good. Failing to understand the importance of good mic technique in a good-sounding
room coupled with good engineering, they assumed (wrongly IMO) that it must be the gear that was used. Personally, I want everything in my recording chain to be absolutely
clean. If I decide I want the sound of tubes, I’ll add that as an effect later.”
—Ethan, posting in an audio forum
“I agree with this in every respect.”
—George Massenburg, famous recording engineer, and designer of the first parametric equalizer, response to Ethan’s comment
Chapter 2 listed a number of common audio myths, such as special speaker wire that claims to sound better than common wire having very
similar electrical properties. It’s also a myth that vibration isolation devices placed under solid state electronics or wires improve the sound by
avoiding resonance because those components are mostly immune to vibration. The same goes for too small acoustic treatments that supposedly
improve clarity and stereo imaging but can’t work simply because they don’t cover enough surface area to a ect a room’s frequency response or
decay time. Indeed, if you visit audiophile web forums, you’ll see posters who claim all sorts of improvements to their audio systems after
applying various “tweak” products or procedures. Some of these tweaks are like folk medicine, such as taping a small piece of quartz crystal to
the top of a speaker cabinet, though others are sold as commercial products. Besides fancy wire and isolation devices, “power” products claiming
to cleanse the AC mains feeding your audio devices are also popular. Another tweak product is “mod” (modi cation) services, where sellers
replace existing resistors, capacitors, and integrated circuits with supposedly higher-quality components. Others cryogenically treat (deep-freeze)
wires, fuses, circuit boards, and even entire amplifiers and power supplies for a fee.
It’s easy to prove through measuring and null tests whether sound passing through a CD player or other device changed after applying a tweak
or mod, or after being “broken in” for some period of time, as is often claimed. Yet even when a di erence cannot be measured, or de es the
laws of physics, some people still insist they can hear an improvement. Beliefs, expectation bias, and the placebo e ect are very strong. When
people argue about these things on the Internet, they’re called “religious” arguments, because opinions seem based more on faith than facts and
logic. I’ve even heard people argue against blind tests, claiming they stress the listener and “break the mood,” thus invalidating the results. Blind
testing is an important tool used by all branches of science, and it’s equally necessary for assessing audio equipment. As explained in Chapter 1,
even if blind listening did hide subtle di erences that might be audible in special situations, is a di erence that is so small that you can’t hear it
when switching between two sources really that important in the grand scheme of things?
This chapter uses logic and audio examples to explain what types of quality changes, and added artifacts, can be heard and at what volume
levels. It also addresses the fallibility of human hearing and perception, which are closely related. Before we get to the frailty of hearing, let’s
rst examine the thresholds for audibility. If a musical overtone or background sound is so soft that you can just barely hear it, you might think
you hear it when it’s not present, and vice versa. So it’s important to learn at what levels we can hear various sounds in the presence of other
sounds to help distinguish what’s real from what’s imagined.
Some people are more sensitive to soft sounds and subtle musical details than others. One reason is due to high-frequency hearing loss that
occurs with age. Most teenagers can hear frequencies up to 18 KHz or higher, but once we reach middle age, it’s common not to hear much
higher than 14 KHz. So if an audio circuit has a small but real loss at the very highest audio frequencies, some people will notice the loss, while
others won’t. Likewise, distortion or artifacts containing only very high frequencies may be audible to some listeners but not others. Further, for
frequencies we can hear, learning to identify subtle detail can be improved through ear training. This varies from person to person—not only
due to physical attributes such as age, but also with hearing acuity that improves with practice.
Even though I had been a musician for more than 30 years before I started playing the cello, after practicing for a few months, I realized that
my sense of ne pitch discrimination had improved noticeably. I could tell when music was out of tune by a very small amount, whether
hearing myself play or someone else. We can also learn to identify artifacts, such as the swishy, swirly sound of lossy MP3 compression and
digital noise reduction. It helps to hear an extreme case rst, such as orchestral music at a low bit-rate like 32 kilobits per second. Then, once
you know what to listen for, you’re able to pick out that artifact at much lower levels.
I created two examples of low bit-rate encoding to show the e ect: cymbal.wav is a mono le containing a cymbal strike, and
cymbal_compressed.mp3 is the same le after applying lossy MP3 compression at a very low bit-rate. music.wav and music_compressed.mp3 are
similar, but they play music instead of just one cymbal. Encoding music at low bit-rates also discards the highest frequencies, so you’ll notice that
the compressed MP3 les are not as bright sounding as the originals. But you’ll still hear the hollow e ect clearly, as various midrange
frequencies are removed aggressively by the encoding process. Note that the compression process for encoding MP3 les is totally di erent from
compression used to even out volume changes. Only the names are the same.
It’s impossible for me to tell someone else what they can and cannot hear, so I won’t even try. At the time of this writing I’m 63 years old, and
I can hear well up to about 14 KHz. I have two di erent audio systems—one based around modest but professional grade gear in my large home
studio and one a 5.1 surround system in my living room home theater. I consider both systems to be very high quality, and both rooms are very
studio and one a 5.1 surround system in my living room home theater. I consider both systems to be very high quality, and both rooms are very
well treated acoustically, but I don’t own the most expensive gear in the world either. So when discussing audibility issues in audio forums, it’s
common for someone to claim, “Your system is not revealing enough, old man, so of course you can’t hear the di erence.” Therefore, the best
way to show what does and does not a ect audio delity is with examples you’ll listen to on your own system. You can play the sound clips
whenever you want, as often as you want, and never feel pressured to “perform” in front of someone else. Of course, for the A/B-type
comparison clips, you must be honest with yourself because you know which version is playing.
All of the audibility examples for this chapter were created entirely in software to avoid passing the audio signals through electronics that
could potentially mask (or add) subtle sounds. Obviously, the music used for these examples was recorded and sent through microphones and
preamps and other electronic devices. But for assessing the audibility of changes and additions to the sound, all the processing was done using
high-resolution software that’s cleaner and more accurate than any audio hardware.
Fletcher-Munson and the Masking Effect
The masking effect in uences the audibility of artifacts. Masking is an important principle because it a ects how well we can hear one sound in
the presence of another sound. If you’re standing next to a jackhammer, you won’t hear someone talking softly ten feet away. Masking is
strongest when the loud and soft sounds have similar frequency ranges. So when playing an old Led Zeppelin cassette, you might hear the tape
hiss during a bass solo but not when the cymbals are prominent. Likewise, you’ll easily hear low-frequency AC power line hum when only a
tambourine is playing, but maybe not during a bass or timpani solo.
Low-frequency hum in an audio system is the same volume whether the music is playing or not. So when you stop the CD, you can more
easily hear the hum because the music no longer masks the sound. Some artifacts like tape modulation noise and digital jitter occur only while
the music plays. So unless they’re fairly loud, they won’t be audible at all. Note that masking a ects our ears only. Spectrum analyzers and other
test gear can easily identify any frequency in the presence of any other frequency, even when one is 100 dB or more lower in level than the
other. In fact, this is the basis for lossy MP3-type compression, where musical data that are deemed inaudible due to masking are removed,
reducing the file size.
When I rst became interested in learning at what level distortion and other unwanted sounds are audible, I devised some experiments that
evolved into the example clips that accompany this book. For one test I created a 100 Hz sine wave in Sony Sound Forge, then mixed in a 3 KHz
tone at various levels below the 100 Hz tone. I picked those two frequencies because they’re very far apart and so minimize masking. Research
performed by Fletcher-Munson shows that our hearing is most sensitive around 2 to 3 KHz. So using these two frequencies biases the test in favor
of being able to hear soft artifacts. Further, I inserted the 3 KHz tone as a series of pulses that turn on and o once per second, making it even
easier to spot.
Figure 3.1 shows the Equal Loudness curves of hearing sensitivity versus frequency as determined by Harvey Fletcher and W. A. Munson in the
1930s. More recent research has produced similar results. You can see that the response of our ears becomes closer to at at louder volumes,
which is why music sounds fuller and brighter when played more loudly. But even at high volume levels, our hearing favors treble frequencies.
Note that this graph shows how loud sounds must be at various frequencies to be perceived as the same volume. So frequencies where our ears
are more sensitive show as lower SPLs. In other words, when compared to a 4 KHz tone at 43 dB SPL, a 30 Hz tone must be at 83 dB SPL to
sound the same volume. This is why many consumer receivers include a loudness switch, which automatically boosts the bass at low volumes.
Some receivers also boost the treble to a lesser extent.
Figure 3.1:
The Fletcher-Munson Equal Loudness Curves show the volume that different frequencies must be to sound equally loud.
Figure 3.1:
The Fletcher-Munson Equal Loudness Curves show the volume that different frequencies must be to sound equally loud.
Graph reproduced by permission of Eddy Brixen.
For this test I played the combined tones quite loudly, listening through both loudspeakers and earphones. When the 3 KHz tone was 40 dB
softer than the 100 Hz tone, it was easy to hear it pulse on and o . At −60 dB it was very soft, but I could still hear it. At −80 dB I was unable
to hear the tone at all. Even if someone can just barely pick out artifacts at such low levels, it’s di cult to argue that distortion this soft is
audibly damaging or destroys the listening experience. But again, the accompanying example
les let you determine your own audibility
thresholds through any system and loudspeakers you choose. The four les for this example are named “100hz_and_3khz_at_−40.wav” through
Note that when running tests that play high-frequency sine waves through loudspeakers, it helps to move your head slightly while you listen.
This avoids missing a high-frequency tone that’s present and audible but in an acoustic null. Even when a room has acoustic treatment, deep
nulls are often present at high frequencies every few inches, especially when playing the same mono source through two loudspeakers at once.
You can hear this easily by playing a 3 KHz tone by itself, then moving your head a few inches in any direction. That
le is named
Distortion and Noise
Everyone reading this book understands what distortion is and why minimizing distortion is an important goal for high
delity, even if it’s
sometimes useful as an e ect. But there are several types of distortion and many causes of distortion. Further, audio gear can create audible
artifacts that are not distortion. For example, there’s hiss, hum and buzz, ringing in crossover lters, and digital aliasing. The rattle of a buzzing
window or rack cabinet at high playback volumes is also an audible artifact, though it occurs outside of the equipment. Indeed, nothing is more
revealing of rattles in a room than playing a loud sine wave sweep starting at a very low frequency.
Chapters 1 and 2 explained that distortion is the addition of new frequency components not present in the source. Distortion is usually
specified as a percentage, but it can also be expressed as some number of dB below the original sound. The relationship between dB and percent
is very simple, where each factor of 10 changes the volume level by 20 dB:
10 percent distortion=−20 dB
1 percent distortion=−40 dB
0.1 percent distortion=−60 dB
0.01 percent distortion=−80 dB
0.001 percent distortion=−100 dB
Many things create distortion: audio transformers whose distortion rises at high signal levels, improper gain-staging in a mixer, incorrectly
biased vacuum tubes, slew rate limiting in op-amps, and, of course, analog tape—especially when recording at levels above 0 VU . As was
explained, intermodulation distortion is more damaging than harmonic distortion because it creates new frequencies that are not necessarily
related musically to the source frequencies. Digital aliasing—which is rarely if ever audible with modern converters—is similar to IMD because it
too creates new frequencies not musically related to the source. Therefore, IMD and aliasing can usually be heard at lower levels than harmonic
distortion simply because the artifact frequencies do not blend as well with the source frequencies.
Many types of distortion and noise can be added by audio gear, and all four of the audio parameters listed in Chapter 2 are important. But
what matters most is their magnitude, because that alone determines how audible they are. If the sum of all distortion is 80 dB or more below
the music, it’s unlikely to be heard while the music plays. Further, some types of distortion are masked more by the music. For example, let’s say
the third harmonic of a low note played on a Fender bass is 10 dB softer than the fundamental. An ampli er that adds 0.1 percent third
harmonic distortion will increase the note’s own harmonic content by a very small amount. Since 0.1 percent is the same as −60 dB, the 0.03 dB
increase after adding distortion at −60 dB to a natural harmonic at −10 dB is so small it can’t possibly be heard. Not to mention that most
loudspeakers add 10 to 100 times more distortion than any competent amplifier.
Some people insist that because of masking, the amount of harmonic distortion is irrelevant, and all that matters is the nature of the distortion.
If an ampli er adds 0.1 percent distortion at the 3rd harmonic, the distortion is not only much softer than the original sound, but its frequency is
also 1.5 octaves away. Some types of distortion are more trebly, adding a “buzzy” sound quality whose components are even farther away from
the source frequencies. So with trebly distortion added to an A bass note at 110 Hz, the harmonics many octaves away will be more audible than
harmonics nearer to the fundamental. But again, once artifacts are −80 dB or softer, their spectrum doesn’t matter simply because they’re too
soft to hear.
“It’s amazing to me that nobody ever complained about analog recording like they do about digital recording. I’m doing a project right now completely in Pro Tools 24
bit/48 kHz. The musicians were great, everything sounds great, so that’s all I care about. The TDM buss doesn’t sound thin to me. What is a thin TDM buss supposed to sound
like? I’ve done a dozen albums completely in Pro Tools, including three Grammy-winning Bela Fleck albums.”
—Roger Nichols, famous recording engineer and an early proponent of digital recording
One artifact that’s often cited as detrimental to audio clarity is jitter, a timing error speci c to digital audio. The sample rate for CD-quality
audio is 44.1 KHz, which means that 44,100 times per second a new data sample is either recorded or played back by the sound card. If the
timing between samples varies, the result is added noise or artifacts similar to noise or IM distortion. The more the timing between samples
deviates, the louder these artifacts will be. Modern sound cards use a highly stable quartz crystal to control the ow of input and output data. So
even if the timing intervals between samples is not at precisely 44.1 KHz, it is very close and also varies very little from one sample to the next.
Note that there are two types of frequency deviation. One is simply a change in frequency, where the sample rate might be 44,102 Hz instead
of precisely 44,100 Hz. There’s also clock drift, which is a slow deviation in clock frequency over long periods of time. This is di erent from
jitter, which is a di erence in timing from one sample to the next. So the time between one pair of samples might be
the next sample is sent
of a second, but
seconds later. Therefore, jitter artifacts are caused by a change in timing between adjacent samples.
With modern digital devices, jitter is typically 100 dB or more below the music, even for inexpensive consumer-grade gear. In my experience
that’s too soft to be audible. Indeed, this is softer than the noise oor of a CD. Even though jitter is a timing issue, it manifests either as noise or
as FM side bands added to the music, similar to IM distortion. Depending on the cause and nature of the jitter, the side bands may be
harmonically related or unrelated to the music. The spectral content can also vary. All of these could a ect how audible the jitter will be when
music is playing because of the masking effect.
In fairness, some people believe there’s more to jitter than just added artifacts, such as a narrowing of the stereo image, though I’ve never seen
compelling proof. Similarly, an artifact called truncation distortion occurs when reducing 24-bit audio les to 16 bits if dither is not applied, and
some people believe this too can a ect things like fullness and stereo imaging. However, fullness is a change in frequency response that’s easily
veri ed. And good imaging, in my opinion, is related more to room acoustics and avoiding early re ections than low-level distortion or
microscopic timing errors.
One obstacle to devising a meaningful test of the audibility of jitter and some other artifacts is creating them arti cially in controlled amounts.
Real jitter occurs at extremely high frequencies. For example, one nanosecond of jitter equates to a frequency of 1 GHz. Yes, GHz—that is not a
typo. The same goes for distortion, which adds new frequencies not present in the original material. It’s easy to generate a controlled amount of
distortion with one sine wave but impossible with real music that contains many different frequencies at constantly changing volume levels.
Audibility Testing
Since many people don’t have the tools needed to prepare a proper test, I created a series of CD-quality Wave les to demonstrate the audibility
of artifacts at di erent levels below the music. Rather than try to arti cially generate jitter and the many di erent types of distortion, I created a
nasty-sounding treble-heavy noise and added that at various levels equally to both the left and right channels. The spectrum of the noise is shown
in Figure 3.2. Since this noise has a lot of treble content at frequencies where our ears are most sensitive, this biases the test in favor of those
who believe very soft artifacts such as jitter are audible. This noise should be at least as noticeable as distortion or jitter that occurs naturally, if
not more audible. So if you play the example le containing noise at −70 dB and can’t hear the noise, it’s unlikely that naturally occurring jitter
at the same volume or softer will be audible to you.
Figure 3.2:
This noise signal has a lot of energy at frequencies where our hearing is most sensitive.
To make the noise even more obvious—again favoring those who believe very soft artifacts matter—the noise pulses on and o rather than
remains steady throughout the music. In all of the example les, the noise pulse is about 3/4 second long and restarts every 2 seconds. The rst
pulse starts 2 seconds into each file and lasts for 3/4 second. The next pulse starts 4 seconds in, and so forth.
• The “noise.wav” le is the noise burst by itself, so you can hear it in isolation and know what to listen for when the music is playing. The level
is at −20 dB rather than 0 because it sounds really irritating. I don’t want you to lunge for the volume control when you play it at a normal
volume level!
• The “concerto–40.wav” le is a gentle passage from my cello concerto with the noise mixed in at −40 dB. Since this passage is very soft,
mostly around −25 and peaking at −15 dB, the noise is only 15 to 25 dB below the music. Everyone will easily hear where the noise starts
and stops.
and stops.
• The
les “concerto–50.wav,” “concerto–60.wav,” and “concerto–70.wav” are similar, with the noise mixed in at −50, −60, and −70 dB,
respectively. In the −70 dB version the noise is 45 to 55 dB below the music. Note that a slight noise occurs naturally in this piece at around
8 seconds in. I believe it’s the sound of a musician turning a page of music during the recording. The noise is in the original recording, and at
this low volume it just happens to sound like my intentional noise.
• The
le “men_at_work_1–40.wav” is a section from one of my pop tunes, Men At Work, with the noise mixed in at −40 dB. I had planned to
create other versions with the noise at ever-softer levels as above, but it’s barely audible (if at all) even at this relatively high level, so I didn’t
• The “men_at_work_2–40.wav” le is a di erent section from the same pop tune that’s more gentle sounding, which potentially makes the noise
at −40 dB a little easier to notice.
It’s worth mentioning one situation where jitter can be so severe that it really does border on being audible. In February 2009, the British
magazine Hi-Fi News published an article showing that audio from HDMI connections often have much higher jitter levels than when output via
S/PDIF. HDMI is a protocol used by digital televisions, consumer receivers, and Blu-ray disk players. It’s popular with content providers because
it supports copy protection, and it’s convenient for consumers because it carries both audio and video over a single wire. Modern receivers that
handle HDMI can switch both the audio and video automatically when you choose to watch TV or a DVD.
The magazine’s report measured the jitter from four di erent receiver models, and in every case the jitter was worse through the HDMI
connection. The best receiver tested was a Pioneer, with 37 picoseconds (ps) of jitter from its S/PDIF output and 50 ps through HDMI. But the
HDMI audio from all the other receivers was far worse than from their S/PDIF outputs, ranging from 3,770 ps to 7,660 ps. Compare that to 183
to 560 ps for the same receivers when using S/PDIF, which is certainly too soft to hear. To relate jitter timing with audibility, 10,000
picoseconds (10 nanoseconds) is about equal to the −96 dB noise oor of 16-bit audio at 1 KHz. But the noise produced at 10 KHz from that
same amount of jitter is about −78 dB, which is potentially audible in certain situations. Again, this higher-than-usual amount of jitter applies
only to HDMI audio as used by consumer equipment at the time of this writing. It is not typical for computer sound cards and outboard
Some people believe that correlated artifacts such as added harmonics or IM products are more audible than uncorrelated artifacts like
random noise. Jitter can be either correlated or not, depending on its cause. But if the jitter noise is more than 100 dB below the music, which is
always the case except for HDMI audio, it’s unlikely to be audible regardless of its spectrum or correlation to the music.
It’s clear to me that the burden of proof is on those who believe jitter is audible. This subject has been discussed endlessly in audio forums,
and someone will inevitably insist audibility tests such as those presented here are not conclusive. So I always ask them to make their own
example les using any musical source and any distortion or other soft artifact they believe best makes their case and then post it for everyone to
hear. Nobody has ever risen to the challenge.
Dither and Truncation Distortion
Conventional audio wisdom says that dither is required to eliminate truncation distortion whenever you reduce the bit-depth of an audio le,
and it’s most often used when reducing a 24-bit mix le to 16 bits for putting onto a CD. Dither is a very soft noise whose level is at the lowest
bit—around −90 dB when reducing digital audio to 16 bits. Most people would have a hard time hearing noise that’s 60 dB below the music,
since the music masks the noise. Yet if you ask a dozen audio recording engineers if dither is necessary when going from 24 to 16 bits, every one
of them will say yes. Even the manual for Sony Sound Forge claims dither is important: “If you want to burn a 24-bit audio le to an audio CD,
dithering will produce a cleaner signal than a simple bit-depth conversion.”
Some engineers even argue over which type of dither is best, claiming this algorithm is more airy or full sounding than that one, and so forth.
But just because everyone believes this, does that make it true? To be clear, using dither is never a bad thing, and it can reduce distortion on soft
material recorded at very low levels. So I never argue against using dither! But I’ve never heard dither make any di erence when applied to
typical pop music recorded at sensible levels. Not using dither is never the reason an amateur’s mixes sound bad.
To put this to the test, I created a set of eight les containing both truncated and dithered versions of the same sections of my pop tune
Lullaby. These are the exact steps I followed to create the les named “lullaby_a.wav” through “lullaby_h.wav”: I started by rendering Lullaby
from SONAR at 24 bits, then extracted four short sections. I alternately dithered and truncated each section down to 16 bits and renamed the files
to hide their identity. So “lullaby a/b” are the same part of the tune, with one dithered and the other truncated. The same was done for the le
pairs “c/d,” “e/f,” and “g/h.” Your mission is to identify which le in each pair is dithered and which is truncated. The dithering was done in
Sony Sound Forge using high-pass triangular dither with high-pass contour noise shaping. You’re welcome to send me your guesses using the
email link on my home page, When I have enough reader submissions, I’ll post the results on my website.
Hearing Below the Noise Floor
It’s well known that we can hear music and speech in the presence of noise, even if the noise is louder than the source. I’ve seen estimates that
we can hear music or speech when it’s 10 dB below the noise oor, which in my experience seems about right. Of course, the spectral content of
the noise and program content a ects how much the noise masks the program, and disparate frequencies will mask less. So I was surprised
when a famous audio design engineer claimed in an audio forum that he can hear artifacts 40 dB below the noise oor of analog tape while
music plays. To test this for myself—and for you—I created this series of CD-quality files you can play through your own system:
• The
le “tones_and_noise.wav” contains pink noise and a pair of test tones mixed at equal levels. The test tones portion contains both 100 Hz
and 3 KHz at the same time to be more obvious to hear when they turn on and off.
• The “tones–10.wav” le is the same pink noise and test tones, but with the tones 10 dB below the noise. It’s still easy to hear where the tones
start and stop in this file.
• The “tones–20.wav” and “tones–30.wav” les are similar, but with the tones 20 and 30 dB below the noise, respectively. I can just barely hear
the tones in the −20 file, and it seems unlikely anyone could hear them in the −30 version.
• For a more realistic test using typical program material, I also created les using speech and pop music mixed at low levels under pink noise.
The file “speech_and_noise.wav” contains these two signals at equal levels, and it’s easy to understand what is said.
• The “speech–10.wav” and “speech–20.wav” les are similar, but with the speech 10 and 20 dB below the noise, respectively. When the speech
is 10 dB below the noise, you can hear that someone is talking, but it’s di cult to pick out what’s being said. When the speech is 20 dB lower
than the noise, it’s all but inaudible.
• The last two
les mix pop music softly under the noise. The
les “music_10db_below_noise.wav” and “music_20db_below_noise.wav” are self-
descriptive. It’s not difficult to hear that music is playing in the −10 dB version, but can you hear it when it’s 20 dB below the noise floor?
I’m con dent that these test les bust the myth that anyone can hear music or speech that’s 40 dB below a typical noise oor. However, there
are many types of noise. The audibility of program content softer than noise depends directly on the frequencies present in the noise versus the
frequencies present in the program. Noise containing mostly high frequencies will not mask low-frequency sounds, and vice versa.
Frequency Response Changes
Chapter 1 explained that applying a broad low-Q boost or cut with an equalizer is more audible than a narrow boost or cut simply because a
low Q a ects more total frequencies. Some people believe that very narrow peaks and nulls are not audibly damaging, especially nulls. While a
narrow bandwidth a ects a smaller range of frequencies, and thus less overall energy than a wide bandwidth, EQ changes using very narrow
bandwidths can still be audible. What matters is if the frequencies being boosted or cut align with frequencies present in the program. The
graphs in Figure 3.3 show an equalizer set to cut 165 Hz by 10 dB with a Q of 2 and 24.
Figure 3.3:
Cutting frequencies with an equalizer having a low Q (top) affects more total sound energy than the same amount of cut with a high Q (bottom).
I believe the notion that narrow EQ cuts are not damaging arises from a 1981 paper1 by Roland Bücklein in the Journal of the Audio
Engineering Society, describing his tests of boost and cut audibility at various bandwidths. Some of the tests used speech and white noise, while
others used music. White noise contains all frequencies in equal amounts, so a wide bandwidth boost adds more energy than a narrow boost, and
it is more audible. The same is true for broad cuts that reduce more content than narrow cuts.
But for the music tests, the frequencies boosted and cut in Mr. Bücklein’s experiments did not align with the frequencies in the music being
played. Instead, he used the standard third-octave frequencies listed in Table 1.3 from Chapter 1, and none of those align exactly with any of the
standard music note frequencies in Table 1.2. Music consists mostly of single tones and harmonics that are also single tones, so the correlation
between the frequencies changed with EQ and the frequencies present in the music is very important. An extremely narrow bandwidth may miss
a particular frequency of interest, but a boost or cut of all the frequencies in a third-octave band is bound to change the music audibly. If a mix
engineer needs to reduce the level of a particular frequency—for example, a single bass note that’s 10 dB louder than other notes—he’d use a
parametric equalizer with a high Q to zero in on that frequency to avoid affecting other nearby notes.
To illustrate the potential audibility of very narrow boosts and cuts, I created a series of three Wave les. The rst clip, “men_at_work.wav,” is
an excerpt of the tune as I mixed it, but it is reduced in volume to allow adding EQ boost without distorting. The second
“men_at_work_boost.wav,” is the same clip with 10 dB of very narrow EQ boost (Q=24) applied at 165 Hz. The third
“men_at_work_cut.wav,” is the original clip but with a 10 dB very narrow cut (Q=24) at 165 Hz. I chose 165 Hz because that’s an E note, which
is the key of the tune. So in these examples the narrow boost and cut are very obvious because they correspond to notes the bass plays.
This is directly related to the folly of expecting EQ to improve room acoustics at low frequencies. One problem with using EQ for acoustics is
it’s not possible to counter deep nulls. Nulls of 20 to 30 dB, or even deeper, are common, and you’ll blow up your speakers trying to raise such
nulls enough to achieve a at response. I’ve seen EQ proponents claim that nulls are not a problem because they’re so narrow, and they often
cite the same Bücklein article! However, the frequency response in a room can change drastically over very small distances, even at low
frequencies. Therefore, a deep null at one ear may be less deep at the other ear, so the total volume heard through both ears is not reduced as
much as at only one ear. But not all nulls are so highly localized. Hopefully, these example les show clearly that even very narrow nulls can be
damaging when they align with notes in the music.
Even though very few people can hear frequencies above 20 KHz, many believe it’s important for audio equipment to reproduce frequencies
even higher than that to maintain clarity. I’ve never seen compelling evidence that a frequency response beyond what humans can hear is
audible or useful. It’s true that good ampli er designs generally have a frequency response that extends well beyond the limits of hearing, and
the lack of an extended response can be a giveaway that an ampli er is de cient in other areas. If for no other reason, though there certainly are
other reasons, an ampli er’s e ective cuto frequency—de ned as the point where its output has dropped by 3 dB—must be high enough that
the loss at 20 KHz is well under 1 dB. So it’s common for the −3 dB point of good-quality amplifiers to be 50 KHz or even higher.
With microphones and speakers, their cuto frequency can be accompanied by a resonant peak, which can add ringing as well as a level boost
at that frequency. Therefore, designing a transducer to respond beyond 20 KHz is useful because it pushes any inherent resonance past audibility.
This is one important feature of condenser microphones that use a tiny (less than ½-inch) diaphragm designed for acoustic measuring. By
pushing the microphone’s self-resonance to 25 KHz or even higher, its response can be very at with no ringing in the audible range below
20 KHz.
It’s easy to determine, for once and for all, if a response beyond 20 KHz is noticeable to you. All you need is a sweepable low-pass lter. You
start with the lter set to well beyond 20 KHz, play the source material of your choice, and sweep the lter downward until you can hear a
change. Then read the frequency noted on the lter’s dial. I’ve used a set of keys jingling in front of a high-quality, small-diaphragm condenser
mike, but a percussion instrument with extended high-frequency content such as a tambourine works well, too. Most people don’t have access to
suitable audio test gear, but you can do this with common audio editing software. Record a source having content beyond 20 KHz using a sample
rate of 88.2 or 96 KHz, then sweep a lter plug-in as described above. I suggest you verify that ultrasonic frequencies are present using an FFT or
Real Time Analyzer plug-in to be sure your test is valid.
So you won’t have to do it, I recorded the le “tambourine.wav” at a sample rate of 96 KHz through a precision DPA microphone. As you can
see in Figure 3.4, this le contains energy beyond 35 KHz, so it’s a perfect source for such tests. It’s only 7 seconds long, so set it to loop
continuously in your audio editor program as you experiment with a plug-in EQ filter.
Figure 3.4:
This 96 KHz recording of a tambourine has content beyond 30 KHz, so it’s a great test signal for assessing your own high-frequency hearing.
Years ago there was a widely publicized anecdote describing one channel in a Neve console that was audibly di erent than other channels,
and the problem was traced to an oscillation at 54 KHz. I’m sure that channel sounded di erent, but it wasn’t because Rupert Neve or Beatles
engineer Geo Emerick was hearing 54 KHz. When an audio circuit oscillates, it creates hiss and “spitty” sounds and IM distortion in the audible
range. So obviously that’s what Geo heard, not the actual 54 KHz oscillation frequency. Further, no professional studio monitor speakers I’m
aware of can reproduce 54 KHz anyway.
There was also a study by Tsutomu Oohashi that’s often cited by audiophiles as proof that we can hear or otherwise perceive ultrasonic
There was also a study by Tsutomu Oohashi that’s often cited by audiophiles as proof that we can hear or otherwise perceive ultrasonic
content. The problem with this study is they used one loudspeaker to play many high-frequency components at once, so IM distortion in the
tweeters created di erence frequencies within the audible range. When the Oohashi experiment was repeated by Shogo Kiryu and Kaoru
Ashihara using six separate speakers,2 none of the test subjects were able to distinguish the ultrasonic content. This is from their summary:
When the stimulus was divided into six bands of frequencies and presented through six loudspeakers in order to reduce intermodulation distortions, no subject could detect any
ultrasounds. It was concluded that addition of ultrasounds might affect sound impression by means of some nonlinear interaction that might occur in the loudspeakers.
I’ve also seen claims proving the audibility of ultrasonic content where a 15 KHz sine wave is played, then switched to a square wave.
Proponents believe that the quality change heard proves the audibility of ultrasonic frequencies. But this doesn’t take into account that
loudspeakers and power ampli ers can be nonlinear at those high frequencies, thereby a ecting the audible spectrum. Further, most hardware
generators used to create test tones output a xed peak level. When the peak (not average) levels are the same, a square wave has 2 dB more
energy at the fundamental frequency than a sine wave. So, of course, the waves could sound different.
Finally, it’s worth mentioning that few microphones, and even fewer loudspeakers, can handle frequencies much higher 20 KHz. Aside from
tiny-diaphragm condenser microphones meant for acoustic testing, the response of most microphones and speakers is down several dB by
20 KHz if not lower.
Chapter 1 explained the concepts of resonance and ringing, which occur in both mechanical and electronic devices. Whenever a peaking boost is
added with an equalizer, some amount of ringing is also added. This sustains the boosted frequency after the original sound has stopped. The
higher the Q of the EQ boost, the longer the ringing occurs. However, the source must contain energy at the frequency being boosted in order for
ringing to be added. So depending on its loudness, ringing is another potentially audible artifact. Note that ringing is di erent from distortion
and noise that add new frequency components. Rather, ringing merely sustains existing frequencies.
To illustrate the concept of EQ adding ringing, I created a Wave le containing a single impulse, then applied 18 dB of EQ boost at 300 Hz
with two di erent Q settings. A single impulse contains a wide range of frequencies whose amplitudes depend on the duration and rise times of
the impulse. Figure 3.5 shows an impulse about 3 milliseconds long that I drew manually into Sony Sound Forge using the pencil tool. I set the
time scale at the top of each screen to show seconds, so 0.010 on the timeline means that marker is at 10 milliseconds, and 0.500 is half a
Figure 3.5:
This short impulse contains all the frequencies between DC and the 22 KHz limit of a 44.1 KHz sample rate.
I then copied the impulse twice, half a second apart, and applied EQ boost to each copy. If you play the audio le “impulse_ringing.wav,”
you’ll hear the sound switch from a click to a partial tone, then to a more sustained tone. Figure 3.6 shows the entire le in context, and Figures
3.7 and 3.8 show close-ups of the impulses after ringing was added, to see more clearly how they were extended. Although these examples show
how ringing is added to a single impulse, the same thing happens when EQ is applied to audio containing music or speech. Any frequency
present in the source that aligns with the frequency being boosted is affected the same way: amplified and also sustained.
Figure 3.6:
This wave
Figure 3.7:
Zooming in to see the wave cycles more clearly shows that a Q of 6 brings out the 300 Hz component of the impulse and sustains it for about 40 milliseconds after the
Figure 3.8:
When the Q is increased to 24, the impulse now continues to ring for about 130 milliseconds.
le contains three brief impulses in a row. The second version has 18 dB of EQ boost applied at 300 Hz with a Q of 6, and the third applied the same boost
with a Q of 24, making it sustain longer.
impulse ends.
Earlier I explained that ringing doesn’t add new frequencies, and it can only sustain frequencies that already exist. In truth, ringing can add
new frequencies, which could be seen as a type of distortion or the addition of new content. This can occur in a room that has a strong resonance
at a frequency near, but not exactly at, a musical note frequency. If a room has a strong resonance at 107 Hz and you play an A bass note at
110 Hz, that note will sound flat in the room. How flat the note sounds depends on the strength and Q of the room’s resonance.
The same thing happens with EQ boost when the boosted frequency is near to a frequency present in the source. Figure 3.9 shows an FFT of
the example le “claves_original.wav.” The most prominent peak near the left of the screen corresponds to the primary pitch of 853 Hz for these
particular claves. For those not familiar with Latin percussion instruments, a pair of claves is shown in Figure 3.10.
Figure 3.9:
Figure 3.10:
This is the spectrum of a pair of claves whose fundamental pitch is 853 Hz.
Claves are a popular percussion instrument used in Salsa and other types of Latin music. When struck together, they produce a distinctive pitched sound that’s heard
easily, even over loud music.
After adding 18 dB of high-Q boost at 900 Hz, one musical half-step higher, the original 853 Hz is still present, but a new much louder
component is added at 900 Hz. This file is named “claves_boost.wav,” with its spectrum shown in Figure 3.11. Playing the two audio files side by
side, you can hear the prominent higher pitch in the EQ’d version. This boost also makes the tone of the claves sound more pure than the
original, with less of a wood click sound.
Figure 3.11:
After adding a high Q boost at a frequency higher than the natural pitch of the claves, a new tone is created at the frequency that was boosted.
The reason a high-Q boost can add a new frequency component is called sympathetic resonance, whereby one vibrating object causes another
nearby object to also vibrate. This is why playing low-frequency tones loudly in a room can make your windows rattle, though the trebly
buzzing sound you hear is merely a by-product. What’s really happening is the low-frequency tones vibrate walls that have a similar resonant
frequency, and that vibration then causes any loose window sills and frames to buzz.
Aliasing is a type of artifact speci c to digital recording, and it’s similar to IM distortion because new sum and di erence frequencies are created.
In this case, one of the source frequencies is the sample rate of the sound card. I haven’t heard this problem in many years, but it could happen
with old converters having poor input lters or that were made before the advent of oversampling. Converter lters and oversampling are
explained in Chapter 8, but it’s worth hearing what aliasing sounds like now.
The le “quartet.wav” is a short section of a string quartet, and “quartet_aliasing.wav” is the same le after adding arti cial aliasing using a
plug-in. Aliasing artifacts are sometimes called “birdies” because di erence frequencies that fall in the range around 5–10 KHz come and go in
step with the music, which sounds a bit like birds chirping. For these examples I made the aliasing louder than would ever occur even with poor
converters, just to make it easier to identify.
Phase Shift
Phase shift is another frequent target of blame, accused by the audio press repeatedly of “smearing” clarity and damaging stereo imaging. To
disprove this myth, I created the le “phase_shift.wav,” which contains the same two bars from one of my pop tunes twice in a row. The rst
disprove this myth, I created the
le “phase_shift.wav,” which contains the same two bars from one of my pop tunes twice in a row. The
version is as I mixed it, and the second is after applying 4 stages (360 degrees) of phase shift at 1,225 Hz. This is more phase shift than occurs
normally in audio gear, and it’s in the midrange where our ears are very sensitive. Phase shift in audio gear usually happens only at the
frequency extremes—below 20 Hz or above 20 KHz—due to coupling capacitors at the low end or a natural roll-o
various factors.
at the high end caused by
In my experience, the only time phase shift is audible in normal amounts is either when it’s changing or when its amount or frequency is
di erent in the left and right channels. With digital signal processing (DSP), it’s possible to generate thousands of degrees of phase shift, and
eventually that will be audible because some frequencies are delayed half a second or more relative to other frequencies. But this never happens
with normal analog gear, and certainly not with wire, as I’ve seen claimed. As explained previously, sometimes when boosting high frequencies
with an equalizer, you’ll hear the hollow sound of a phase shifter e ect. But what you’re hearing is comb ltering already present in the source
that’s now brought out by the EQ. You’re not hearing the phase shift itself.
I learned that phase shift by itself is relatively benign in the 1970s, when I built a phase shifter outboard e ect unit. As explain in Chapter 1,
phaser e ects work by shifting the phase of a signal, then combining the original source with the shifted version, thus yielding a series of peaks
and nulls in the frequency response. When testing this unit, I listened to the phase-shifted output only. While the Shift knob was turned, it was
easy to hear a change in the apparent “depth” of the track being processed. But as soon as I stopped turning the knob, the sound settled in and
the static phase shift was inaudible.
Absolute Polarity
Another common belief is that absolute polarity is audible. While nobody would argue that it’s okay to reverse the polarity of one channel of a
stereo pair, I’ve never been able to determine that reversing the polarity of a mono source—or both channels if stereo—is audible. Admittedly, it
would seem that absolute polarity could be audible—for example, with a kick drum. But in practice, changing the absolute polarity has never
been audible to me.
You can test this for yourself easily enough: If your DAW software or console o ers a polarity-reverse switch, listen to a steadily repeating kick
or snare drum hit, then ip the switch. It’s not a valid test to have a drummer play in the studio hitting the drum repeatedly while you listen in
the control room, because every drum hit is slightly di erent. The only truly scienti c way to compare absolute polarity is to audition a looped
drum sample to guarantee that every hit is identical.
Years ago my friend Mike Rivers sent me a Wave le that shows absolute polarity can sometimes be audible. The “polarity_sawtooth.wav” le
is 4 seconds long and contains a 20 Hz sawtooth waveform that reverses polarity halfway through. Although you can indeed hear a slight change
in low-end fullness after the transition point, I’m convinced that the cause is nonlinearity in the playback speakers. When I do the test using a
50 Hz sawtooth waveform, there’s no change in timbre. However, as Mike explained to me, it really doesn’t matter why the tone changes, just
that it does. And I cannot disagree.
To test this with more typical sources, I created two additional test
les. The “polarity_kick.wav”
le contains the same kick drum pattern
twice in a row, with the second pattern reversed. I suggest you play this example a number of times in a row to see if you reliably hear a
di erence. The “polarity_voice.wav” le is me speaking because some people believe that absolute polarity is more audible on voices. I don’t
hear any di erence at all. However, I have very good loudspeakers in a room with proper acoustic treatment. As just explained, if your
loudspeakers (or earphones) don’t handle low frequencies symmetrically, that can create a di erence. That is, if the loudspeaker diaphragm
pushes out differently than it pulls in, that will account for the sound changing with different polarities.
Ears Are Not Linear!
Even though few people can hear frequencies above 20 KHz, we can sometimes be a ected by those frequencies. This is not because of bone
conduction or other means of perception, as is sometimes claimed. It’s because our ears are nonlinear, especially at loud volumes. As explained
in the previous chapters, whenever an audio pathway is nonlinear, sum and difference frequencies are created.
One of my favorite examples derives from when I played percussion in a local symphony. A glockenspiel is similar to a vibraphone but
smaller, and the metal bars are higher pitched and have no resonating tubes underneath. A glockenspiel can also play very loud! I noticed while
playing a chord of two high notes that I heard di erence frequencies as overtones when those notes aliased down into the audible range. That is,
I’d hear low tones even though both of the notes were very high pitched. But the distortion was created entirely within my ears.
It’s not that my ears are defective; anyone can hear this. But you have to be very close to the glockenspiel in order for the volume to be loud
enough to distort your ears. I’ve never noticed this from out in the audience. You can even hear IM products from a single note because the
various supersonic overtones of one note can create audible sum and di erence frequencies. U nlike square waves, the overtones from bell-type
instruments are not necessarily harmonically related. It’s possible for a very high note to contain ultrasonic overtones closer together than the
fundamental frequency. If one harmonic is 22 KHz and another is 25 KHz, the result is the perception of a 3 KHz tone.
So it’s possible to be in uenced by ultrasonic content, but only because the IM tones are generated inside your ear. Further, I don’t want to
hear that when I’m listening to a recording of my favorite music. If a recording lters out ultrasonic content that would have created inharmonic
distortion inside my ears at loud volumes, I consider that a feature!
Blind Testing
Blind Testing
“High-end audio lost its credibility during the 1980s, when it atly refused to submit to the kind of basic honesty controls (double-blind testing, for example) that had
legitimized every other serious scienti c endeavor since Pascal. [This refusal] is a source of endless derisive amusement among rational people and of perpetual embarrassment
for me, because I am associated by so many people with the mess my disciples made of spreading my gospel.”
—J. Gordon Holt, audio journalist and founder of Stereophile magazine
Informal self-tests are a great way to learn what you can and cannot hear when the di erences are large. I don’t think anyone would miss
hearing a di erence between AM and FM radio, or between a cassette recording and one made using a modern digital recorder. But when
di erences are very small, it’s easy to think you hear a di erence even when none exists. Blind testing is the gold standard for all branches of
science, especially when evaluating perception that is subjective. Blind tests are used to assess the e ectiveness of pain medications, as well as
for food and wine tasting comparisons. If a test subject knows what he’s eating or drinking, that knowledge can and does affect his perception.
The two types of blind tests are single-blind and double-blind. With a single-blind test, the person being tested does not know which device
he’s hearing or which brand of soda she’s drinking, but the person administering the test knows. So it’s possible for a tester to cue the test subject
without meaning to. Double-blind tests solve this because nobody knows which product was which until after the tests are done and the results
are tabulated.
Even though measuring audio gear is usually conclusive, blind tests are still useful for several reasons. One reason is the listener is part of the
test, and not all listeners hear the same. Since the speci c makeup of distortion and other artifacts a ects how audible they are, it can be di cult
to pin down exact numbers relating dB level to audibility. A blind test can tell exactly how loud a particular sound must be before you’ll hear it
on your system. Further, blind testing is the only practical way to assess the damage caused by lossy MP3-type compression. That process doesn’t
lend itself to traditional static delity measurement because the frequency response changes from moment to moment with the music. Blind
testing can also be used to satisfy those who believe that “science” hasn’t yet found a way to measure what they’re certain they can hear. At least,
you’d think participating in a blind test should satisfy them!
As mentioned earlier, some people argue against blind tests because they believe these tests put the listener on the spot, making it more
di cult to hear di erences that really do exist. This can be avoided with an ABX test. ABX testing was codeveloped by Arny Krueger in the
1980s, and it lets people test themselves as often as they want, over a period as long as they want, in the comfort of their own listening
environment. The original ABX tester was a hardware device that played one of two audio sources at random each time you pressed the button.
The person being tested must identify whether the “X” currently playing is either source A or source B. After running the same test, say, ten
times, you’d know with some certainty whether you really can reliably identify a di erence. These days greatly improved ABX testers are
available as software, and Arny’s own freeware test program is shown in Figure 3.12.
Figure 3.12:
The freeware ABX Comparator lets you test yourself to determine whether you can reliably identify a difference between two audio sources.
Another useful self-test that’s very reliable is simply closing your eyes while switching between two sources with software. When I want to test
myself blind, I set up two parallel tracks in SONAR and assign the Mute switches for those tracks to the same Mute Group while the Mute
switches are in opposite states. That is, one track plays while the other is muted, and vice versa. Each time the button is clicked, the tracks
switch. This lets me change smoothly from one track to the other without interruption or clicks. I put the mouse cursor over either track’s Mute
button, close my eyes, then click a bunch of times at random without paying attention to how many times I clicked. This way, I don’t know
which version will play rst. Then I press the space bar to start playback, still with my eyes closed, and listen carefully to see if I can really tell
which source is which as I switch back and forth. When I open my eyes, I can see which track is currently playing.
Whether you’re using a single-blind, double-blind, or ABX test, it’s important to understand a few basic requirements. First, the volume of both
sources much be matched exactly, to within 0.1 dB. When all else is equal, people generally pick the louder (or brighter) version as sounding
better, unless, of course, it was already too loud or bright. Indeed, people sometimes report a di erence even in an “A/A” test, where both
sources are the same. And just because something sounds “better,” it’s not necessarily higher delity. Boosting the treble and bass often makes
music sound better, but that’s certainly not more faithful to the original source material.
It’s also important to test using the same musical performance. A common mistake I see is comparing microphones or preamps by recording
someone playing a guitar part with one device, then switching to the other device and performing again. The same subtle details we listen for
someone playing a guitar part with one device, then switching to the other device and performing again. The same subtle details we listen for
when comparing gear also change from one performance to another—for example, a bell-like attack of a guitar note or a certain sheen on a
brushed cymbal. Nobody can play or sing exactly the same way twice or remain perfectly stationary. So that’s not a valid way to test
microphones, preamps, or anything else. Even if you could sing or play the same, a change in microphone position of even half an inch is
enough to make a real difference in the frequency response the microphone captures.
One solution is a technique known as re-amping. Rather than recording live performances that will surely vary, you instead record a single
performance, then play that recording through a loudspeaker. Figure 3.13 shows a re-amp test setup in my home studio using a JBL 4430
monitor and an audio-technica 4033 microphone. This is a great way to audition preamps and microphone wires, though for comparing
microphones it’s critical that each microphone be in exactly the same place as all the others. Re-amping is described in more detail in Chapter 7.
Figure 3.13:
When comparing microphones and preamps, using a loudspeaker to play the same source repeatedly is more valid than expecting a musician to sing or play exactly the
same several times in a row.
In 2008, I did a comparative test of ten small diaphragm measuring microphones and needed to ensure that each microphone was in the exact
same spot in front of the speaker playing sweep tones. A full explanation of that test and the results are in Chapter 20, and you’ll need the same
precision placement if you intend to compare microphones.
It can be di cult to prove or disprove issues like those presented here because human auditory perception is so fragile and our memory is so
short. With A/B testing—where you switch between two versions to audition the di erence—it’s mandatory that the switch be performed very
quickly. If it takes you 15 minutes to hook up a replacement ampli er or switch signal wires, it will be very di cult to tell if there really was a
difference, compared to switching between them instantly.
Finally, it’s important to understand that, logically, a blind test cannot prove that added artifacts or changes to the frequency response are not
audible. All we can hope to prove is that the speci c people being tested were or were not able to discern a di erence with su cient statistical
validity. This is why blind tests typically use large groups of people tested many times each. But blind tests are still very useful, regardless of
whether or not a di erence can be measured. If a large number of trained listeners are unable to hear a di erence in a proper test between, say,
a converter using its own clock versus an outboard clock, there’s a pretty good chance you won’t be able to hear a difference either.
Psychoacoustic Effects
Psychoacoustics is the eld that studies human perception of sound, which is di erent from the physical properties of sound. For example,
playing music very loudly makes it sound sharp in pitch because the eardrum and its supporting muscles tighten, raising their resonant
frequencies. This can be a real problem for singers and other musicians when using earphones at loud volumes in a recording studio. There are
many examples of audio perception illusions on the Internet, not unlike optical illusions, so I won’t repeat them here. But it’s worth mentioning
a few key points about hearing perception.
One important principle is the Haas E ect, which is closely related to the Precedence E ect. When two versions of the same sound arrive at
your ears within about 20 milliseconds of each other, the result is a slight thickening of the sound rather than the perception of a separate echo.
A comb ltered frequency response also occurs, as explained earlier. Further, if the two sounds arrive from di erent directions, the sound will
seem to come from whichever source arrives rst. This location illusion occurs even if the latter sound is as much as 10 dB louder than the rst
sound. You won’t perceive the delayed sound as an echo unless it arrives at least 25 milliseconds later.3 For this reason, PA systems in
auditoriums and other large venues often use delay lines to delay sound from the speakers farther from the stage. As long as people in the rear
hear sound rst from the speakers close to the stage, that’s where the sound will seem to come from. Even though the speakers farther back are
closer and louder, they won’t draw attention to themselves.
In the 1970s, a product called the Aphex Aural Exciter was introduced for use by recording studios. This device claimed to improve clarity and
detail in a way that can’t be accomplished by boosting the treble with an equalizer. At the time, Aphex refused to sell the device, instead renting
it to studios for use on albums. The company charged a per-minute fee based on song length. The Aural Exciter was used on many important
albums of the time, most notably Fleetwood Mack’s Rumours. The device’s operation was shrouded in mystery, and at the time Aphex claimed
phase shift was at the heart of its magic. Sound familiar? I recall a review in Recording Engineer/Producer Magazine at the time that included a
graph showing phase shift versus frequency, which, of course, was a ruse. Eventually it was revealed that what the device really does is add a
graph showing phase shift versus frequency, which, of course, was a ruse. Eventually it was revealed that what the device really does is add a
small amount of trebly distortion, above 5 KHz only.
In principle, this is a very clever idea! For program material that has no extended high frequencies at all, using distortion to synthesize new
treble content can add sparkle that can’t be achieved any other way. Even when high frequencies are present, applying substantial EQ boost will
increase the background hiss, too, which is a real problem when using analog tape. I consider adding trebly distortion a psychoacoustic process
because it seems to make music sound clearer and more detailed, even though in truth clarity is reduced by the added distortion. To illustrate the
effect, I created a patch in SONAR that mimics the Aural Exciter, shown in Figure 3.14.
Figure 3.14:
To emulate an Aphex Aural Exciter, the same stereo mix is placed on two tracks, with high-frequency distortion added to the second track at a very low level.
I put the same tune on two tracks, then inserted a high-pass lter set for 5 KHz at 12 dB per octave on the second track. The Sonitus EQ I use
o ers only 6 dB per octave, so I used two bands to get a steeper slope. The EQ was followed by a distortion plug-in. You can hear short excerpts
in the les “light_pop.wav” and “light_pop_distorted.wav.” Track 2 is mixed in at a very low level—around 20 dB softer—to add only a subtle
amount of the effect. But it really does make the high-hat snap and stand out more clearly.
Some years after the Aphex unit was released, BBE came out with a competing product called the Sonic Maximizer. This unit also claims to
increase clarity via phase shift, though when I tried one, it sounded more like a limiter that a ects only treble frequencies. What bothered me at
the time was BBE’s claim that clarity is increased by applying phase shift. According to their early literature, the Sonic Maximizer counters the
“damaging phase shift” inherent in all loudspeaker crossovers. But speakers use wildly di erent crossover frequencies, and some speakers are
two-way designs with a woofer and tweeter, while others have midrange drivers with three or even four bands. It’s impossible for phase shift of
one amount and frequency to do what’s claimed for all speakers. So even if the BBE device does add phase shift, that’s not the e ect you actually
I’m convinced that the seeming increase in clarity after adding small amounts of distortion is the real reason some people prefer the sound of
LP records and analog tape. But while small amounts of distortion might be subjectively pleasing when added to some types of music, it’s
certainly not higher delity. I nd it surprising and even amusing that recording engineers who are highly vocal about preferring analog tape
and tubes are often highly critical of digital converters, claiming that none are transparent enough.
I’ve also noticed many times that music sounds better when accompanied by visuals. In the 1960s, Joshua White used colored oil and other
liquids over plastic lm to project psychedelic patterns onto large screens at live rock concerts. MTV launched their cable channel in the 1980s,
and music videos became an instant hit. Today, both classical and pop music concerts on DVD and Blu-ray are hugely popular. I’ve theorized
that music sounds better with visuals because our perception is split between the two senses, and the “criticism” part of each sense is reduced
slightly. Merely closing your eyes while listening to music can change the way it sounds.
Finally, no discussion of hearing and perception would be complete without a mention of mixing music after drinking alcohol or taking drugs.
Alcohol suppresses hearing sensitivity, forcing you to raise the playback level to achieve the sensation of a satisfying volume. In the long term
this can damage your hearing because your ears are physically harmed by the louder volume, even if your senses perceive a softer level. The loss
of hearing sensitivity after drinking ranges from 2 to 7 dB,4 depending on frequency and sex (male versus female). In my experience, alcohol
also dulls the senses, making it more di cult to discern
ne detail. Aspirin also a ects high-frequency perception and is best avoided when
Marijuana tends to have the opposite e ect, often making your hearing more sensitive such that soft instruments and details become clearer.
But this isn’t useful either because important elements in a mix can end up too soft if you hear them too clearly. Further, di erent strains of
marijuana vary quite a lot, from making you sleepy to fidgety—versus alcohol, which has the same basic effect whether it’s beer or wine, or clear
or brown liquor. So while you can probably learn to mix under the in uence of drugs or alcohol, there are risks involved. If nothing else, it will
surely take you longer to complete a mix!
The interaction between our hearing “hardware” and brain “software” that interprets sounds is complex and not fully understood. We
recognize familiar voices by their fundamental pitches, in ection, timing, vocal tract formants,5 and the frequency ratio of the fundamental pitch
to those formants. This ear/brain interaction also lets us easily tell whether a song on the radio is an original or a remake, often after only the
rst few notes. But it’s important to distinguish between perception, which varies over time and from one person to the next, versus delity,
rst few notes. But it’s important to distinguish between perception, which varies over time and from one person to the next, versus
which sends the exact same acoustic waves from your speakers each time you press Play.
Placebo Effect and Expectation Bias
“Everyone understands and accepts that the placebo effect is real, but for some reason audiophiles think it never happens to them.”
Audiophile magazines and web forums are lled with anecdotes about perceived improvements after applying various tweaks. People often
think they heard an improvement after replacing a wire or precisely leveling their CD player. But it’s more likely that they simply became more
familiar with the music after repeated playing and noticed more details. What many listeners overlook is that human hearing is fragile and shortterm. If you play a piece of music, then spend ve minutes replacing your speaker wires and listen again, it’s very di cult to recall the earlier
playback’s tonality. Is that violin section really brighter now, or does it just seem that way? Every time you play a recording you might hear
details you missed previously.
According to former DTS chief scientist James Johnston,6 hearing memory is valid for about 250 milliseconds. James (he prefers JJ) also
explains that we cannot focus on everything in a piece of music all at once; on one playing we might notice the snare drum but ignore the
rhythm guitar, and so forth. This makes it very di cult to know if subtle di erences are real or imagined. If you play the same section of music
ve times in a row, the sound reaching your ears will not change unless you move your head, but the way you hear and perceive each playing
can and will vary.
Psychological factors like expectation and fatigue are equally important. If I brag to a friend how great my home theater sounds and that
person comes for a visit, it always sounds worse to me while we’re both listening. I recall a forum post where a fellow recounted recording
several guitar tracks through various amp simulators—hardware or software that emulates the sound of an ampli er and loudspeaker. He said
that many of his guitar player friends hate amp sims with a passion, so when he played them his tracks, he told them they were real guitar amps
and asked them to pick which they liked best. After they stated their preferences, he told them all the tracks were recorded through amp sims.
They were furious. It seems to me they should have thanked him for the education about bias.
I’ve found that mood is everything, both when assessing the quality of audio gear and when enjoying music. When you feel good, music
sounds better. When you feel lousy, music and audio quality both sound poor. Further, our ears easily acclimate to bad sound such as resonances
or a poor tonal balance, especially if you keep raising the volume to make a mix you’re working on sound clearer and more powerful. But the
same mix will probably sound bad tomorrow.
“Argument from authority is a common logical fallacy. Just because someone is an expert in one
may be. And even in their own field, experts are sometimes wrong.”
eld does not make them an expert in other
elds, no matter how famous they
It’s not uncommon for professional mixing engineers who are very good at their craft to believe they also understand the science, even when
they don’t. Knowing how to turn the knobs to make music sound pleasing is very di erent from understanding how audio works at a technical
level. The same applies to musicians. I love guitarist Eric Johnson, and nobody is more expert than he when it comes to playing the guitar. But
he is clearly mistaken with this belief, from a 2011 magazine interview:
“I once read a story about a guitar player who didn’t like his instrument. Rather than changing pickups or con guration, he decided to just will it into sounding di erent as he
practiced. And after months and months of playing, that guitar did in fact sound totally different. I believe that story.”
I’m sure the guitar did sound di erent after the guy practiced for many months. But any change in sound was obviously from his improved
playing, not a physical change to the guitar due to wishful thinking. This is not an uncommon belief. I know a luthier who truly believes she can
make her violins sound better just by willing them to do so.
There’s also the issue of having spent a lot of money on something, so you expect it to be better for that reason alone. If I just paid $4,500 for
a high-end CD player and someone told me I could have gotten the exact same sound from a $50 CD Walkman®, I’d be in denial, too. It’s also
important to consider the source of any claim, though someone’s nancial interest in a product doesn’t mean the claims are necessarily untrue.
But sometimes the sound reaching your ears really does change, if not for the reasons we think. Which brings us to the following.
When Subjectivists Are (Almost) Correct
Frequency response in a room is highly position dependent. If you move your head even one inch, you can measure a real change in the
response at mid and high frequencies. Some of this is due to loudspeaker beaming, where di erent frequencies radiate unequally in di erent
directions. But in rooms without acoustic treatment, the main culprit is comb filtering caused by reflections.
Chapter 2 busted a number of audio myths, yet audiophiles and even recording engineers sometimes report hearing things that defy what is
known about the science of audio. For example, some people claim to hear a di erence between electrically similar speaker cables. When
pressed, they often say they believe they can hear things that science has not yet learned how to measure. But modern audio test equipment can
measure everything known to a ect sound quality over a range exceeding 100 dB, and it’s di cult, if not impossible, to hear artifacts only 80 dB
below the music while it is playing. In my experience, the top 20 to 30 dB matters the most, even if softer sounds can be heard.
Certainly some of these reports can be attributed to the placebo e ect and expectation bias. If you know that a $4 cable has been replaced
with a cable costing $1,000, it’s not unreasonable to expect the sound to improve with the more expensive model. This applies to any expensive
audio components. After all, how could a $15,000 power ampli er not sound better than one that costs only $150? Yet tests show repeatedly
that most modern gear has a frequency response that’s acceptably at—within a fraction of a dB—over the audible range, with noise, distortion,
and all other artifacts below the known threshold of audibility. (This excludes products based on lossy compression such as MP3 players and
satellite radio receivers.) So what else could account for these perceived differences?
Through my research in room acoustics, I believe acoustic comb ltering is the most plausible explanation for many of the di erences people
claim to hear with cables, power conditioners, isolation devices, low-jitter external clocks, ultra-high sample rates, replacement power cords and
fuses, and so forth. As explained in Chapter 1, comb ltering occurs when direct sound from a loudspeaker combines in the air with re ections
o the walls, oor, ceiling, or other nearby objects. Comb ltering can also occur without re ections, when sound from one speaker arrives at
your ear slightly sooner than sound from the other speaker.
Figure 3.15 shows a simpli cation of re ections in a small listening room, as viewed from above. The direct sound from the loudspeakers
reaches your ears rst, followed quickly by rst re ections o the nearby side walls. First re ections are sometimes called early re ections
because in most rooms they arrive within the 20-millisecond Haas time gap. Soon after, secondary re ections arrive at your ears from the
opposite speakers, and these can be either early or late, depending on how long after the direct sound they arrive. Other early rst re ections
also arrive after bouncing off the floor and ceiling, but those are omitted here for clarity. Finally, late reflections arrive after bouncing off the rear
wall behind you. In truth, re ections from the rear wall can be early or late, depending on how far the wall is behind you. If it’s closer than
about 10 feet, the re ections will be early. Again, this illustration is simpli ed, and it omits re ections such as those from the rear wall that
travel to the front wall and back again to your ears.
Figure 3.15:
In a typical small room, direct sound from the speakers arrives first, followed by the first reflections, then the second, and finally any late reflections.
The thin panel on the right-side wall is where an absorber should go to reduce the strength of re ections o that wall. Sound at most
frequencies doesn’t travel in a straight line like a laser beam as shown here. So while we often refer to placing absorbers at re ection points, in
practice you’ll treat a larger area. How much area requires covering with absorption depends on the distance between your ears and the
loudspeakers, with greater distances needing more coverage.
While testing acoustic treatment, I had occasion to measure the frequency response in an untreated room at very high resolution. This room is
typical of the size you’ll nd in many homes—16 by 11½ by 8 feet high. The loudspeakers used were Mackie HR824 active monitors, with a
Carver Sunfire subwoofer. The frequency response measurements were made with the R+D software and an Earthworks precision microphone.
Besides measuring the response at the listening position, I also measured at a location 4 inches away. This is less distance than the space
between an adult’s ears. At the time I was testing bass traps, so I considered only the low-frequency response, which showed a surprising change
for such a small span.
Conventional wisdom holds that the bass response in a room cannot change much over small distances because the wavelengths are very long.
For example, a 40 Hz sound wave is more than 28 feet long, so you might think a distance of 14 feet is needed to create a null 180 degrees
away. Yet you can see in Figure 3.16 that the large peak at 42 Hz varies by 3 dB for these two nearby locations, and there’s still a 1 dB difference
even as low as 27 Hz. The reason the frequency response changes so much even at such low frequencies is because many re ections, each having
di erent arrival times and phase o sets, combine at di erent volume levels at each point in the room. In small rooms, the re ections are strong
because all the re ecting boundaries are nearby, further increasing the contribution from each re ection. Also, nulls tend to occupy a relatively
narrow physical space, which is why the nulls on either side of the 92 Hz marker have very di erent depths. Indeed, the null at 71 Hz in one
location becomes a peak at the other.
Figure 3.16:
This graph shows the low-frequency response in a room 16 by 11½ by 8 feet at two locations 4 inches apart. Even over such a small physical span, the response changes
substantially at many frequencies.
I also examined the same data over the entire audible range, and that graph is shown in Figure 3.17. The two responses are so totally di erent
that you’d never guess this is the same room with the same loudspeakers. The cause of these large response di erences is comb ltering. Peaks
and deep nulls occur at predictable quarter-wavelength distances, and at higher frequencies it takes very little distance to go from a peak to a
null. At 7 KHz, one-quarter wavelength is less than half an inch! At high frequencies, re ections from a nearby co ee table or leather seat back
can be significant.
Figure 3.17:
This graph shows the full-range response from the same measurements in Figure 3.16. At high frequencies the response di erence 4 inches apart is even more
substantial than below 200 Hz.
Because of comb ltering, moving even a tiny distance changes the response by a very large amount at mid and high frequencies. This is
especially true in small rooms having no acoustic treatment at nearby re ection points. The response at any given cubic inch location in a room
is the sum of the direct sound from the speakers, plus many competing re ections arriving from many di erent directions. So unless you strap
yourself into a chair and clamp your head in a vise, there’s no way the sound will not change while you listen.
Even when absorbers are placed strategically at re ection points, loudspeaker beaming and lobing7 contribute to response changes with
location. One speaker design goal is a at response not only directly in front of the speaker but also o -axis. But it’s impossible to achieve the
exact same response at every angle, so that’s another factor.
We don’t usually notice these changes when moving around because each ear receives a di erent response. So what we perceive is more of an
average. A null in one ear is likely not present or as deep in the other ear. Since all rooms have this property, we’re accustomed to hearing these
changes and don’t always notice them. However, the change in response over distance is very real, and it’s de nitely audible if you listen
carefully. If you cover one ear, it’s even easier to notice, because frequencies missing in one ear are not present in the other ear. You can also
hear comb
ltering more clearly if you record sound from a loudspeaker near a re ecting boundary with only one microphone, then play that
hear comb ltering more clearly if you record sound from a loudspeaker near a re ecting boundary with only one microphone, then play that
back in mono through both speakers.
I’m convinced that comb ltering is at the root of people reporting a change in the sound of cables and electronics, even when no change is
likely (at least when they’re not using headphones). If someone listens to her system using one pair of cables, then gets up and switches cables
and sits down again, the frequency response heard is sure to be very di erent because it’s impossible to sit down again in exactly the same place.
So the sound really did change, but probably not because the cables sound different.
With audio and music, frequencies in the range around 2 to 3 KHz are rather harsh sounding. Other frequencies are more full sounding (50 to
200 Hz), and yet others have a pleasant “open” quality (above 5 KHz). So if you listen in a location that emphasizes harsh frequencies, then
change a cable and listen again in a place where comb ltering suppresses that harshness, it’s not unreasonable to believe the new cable is
responsible for the di erence. Likewise, exchanging a CD player or power ampli er might seem to a ect the music’s fullness, even though the
change in low-frequency response was due entirely to positioning.
Large variations in frequency response can also occur even in rooms that are well treated acoustically, though bass traps reduce the variation at
low frequencies. Figure 3.18 shows the response I measured at seven locations one inch apart, left-to-right. These measurements were made in
my large, well-treated home recording studio, not the same room as the graphs in Figures 3.16 and 3.17.
Figure 3.18:
This graph shows the full-range response at seven locations one inch apart in a large room well treated with broadband absorption and bass traps. With bass traps, the
response at low frequencies varies much less, but the mid and high frequencies still change drastically over very small distances.
Since this graph displays seven curves at once, I used third octave averaging to show each response more clearly. Without averaging there is
too much detail, making it di cult to see the “forest for the trees,” so to speak. Besides a fair amount of bass trapping, the listening position in
this room is free of strong early reflections, so variations in the response at higher frequencies are due mainly to other factors.
Since the listening position has little re ected energy, one likely cause of the di erences is loudspeaker beaming, which alters the response at
di erent angles in front of the speaker drivers. Another is comb ltering due to the di erent arrival times from the left and right speakers. The
test signal was played through both speakers at the same time, and the measuring microphone was moved in one-inch increments from the
center toward the right. As the microphone was moved to the right, sound from the right speaker arrived earlier and earlier compared to the left
speaker, and the phase di erences also caused comb ltering. So even in a well-treated room, the response at higher frequencies can vary a large
amount over small distances. This further confirms that comb filtering is the main culprit, even when a room has sufficient acoustic treatment.
It’s clear that bass traps make the low-frequency response more consistent across small distances. You can’t see that here speci cally, because
the low frequencies are also averaged in third octaves, which reduces the display resolution. But the seven responses below 200 Hz when viewed
at high resolution are similar to the third-octave averaged graphs, so I didn’t make a separate graph just for the bass range.
There are other reasons people might think the sound changed even when no change is likely. One is the short-term nature of auditory
memory, as mentioned. Crawling around on the oor to change speaker wires will also likely raise your blood pressure, which can a ect
perception. We also hear di erently early in the day versus later when we’re tired, and even just listening for a while can change our perception.
Does that solid state ampli er really sound di erent after warming up for half an hour, or is it our perception that changed? Add a couple of
cocktails to the mix, and then you really can’t tell for sure what you’re hearing!
Some people like the sound of certain artifacts such as small amounts of distortion. Preferring the sound of a device that’s not audibly
transparent is not the same as imagining a change when one doesn’t exist, but it’s not unrelated. This may be why some people prefer the sound
of vinyl records and tube equipment. Everyone agrees that vinyl and tubes sound di erent from CDs and solid state gear. It’s also easy to measure
of vinyl records and tube equipment. Everyone agrees that vinyl and tubes sound di erent from CDs and solid state gear. It’s also easy to measure
di erences in the artifacts each adds. So then the question is, why do some people consider added distortion to be an improvement in sound
quality? In a 2006 article for Sound On Sound magazine, I wrote about the positive e ects of adding subtle distortion intentionally to a
Years ago I did a mix in my DAW [Digital Audio Workstation] and made a cassette copy for a friend. I noticed the cassette sounded better—more “cohesive” for lack of a better
word—and I liked the e ect. A few times I even copied a cassette back into the computer, used a noise reducer program to remove the hiss, then put the result onto a CD. I knew
it was an effect, not higher fidelity or the superiority of analog tape, but I had to admit I liked the effect.
Over the years I’ve read many arguments in audio newsgroups and web forums. Science-minded folks insist everything that a ects audio
delity can be measured, and things like replacement AC wall outlets cannot possibly a ect the sound, no matter what subjectivist “tweakers”
claim. The subjectivists argue they’re certain they hear a di erence and insist that the gear-head objectivists are simply measuring the wrong
It now appears that both sides have a point. Some things really are too insigni cant to be audible, and sometimes the wrong things are
measured, such as THD without regard to IMD. The room you listen in has far more in uence on what you hear than any device in the signal
path, including speakers in most cases. It seems reasonable that the one thing neither camp usually considers—acoustic comb ltering—is an
important factor.
I’ve visited the homes of several audiophile reviewers—the same people who write the owery prose about loudspeaker imaging, and power
ampli er presence, and turntable preamps whose sound is “forward” at high frequencies. Sadly, few reviewers have any acoustic treatment at all
in their rooms. One professional reviewer I visited had several mid/high-frequency absorbers in his listening room, but none were in the right
place! When I moved the panels and showed him how much the stereo imaging improved, he was thankful but not the least bit embarrassed.
Think about that.
As you can see, there are many reasons sound quality can seem to change even when it didn’t, or when the change is real but due only to
room acoustics. It’s also possible for real di erences to be masked by the listening environment or by a de ciency elsewhere in the playback
path other than the component being tested. So after all of these explanations, it’s worth mentioning that sometimes replacing one competent
cable with another really does change the sound. This can happen when connectors are unplugged, then reseated. RCA connectors in particular
tend to corrode over time, so removing a cable and plugging it in again can improve the sound because the contacts were scraped clean. But this
happens even when using the same cable.
When I was a teenager in the 1960s, I loved the Twilight Zone and Outer Limits TV shows. In later years The X-Files was very popular. The
appeal of the unknown and mysterious is as old as time. Many people want to believe there’s more “out there” than meets the eye. Some of us
like to believe there’s life on other planets. And we’re very entertained by stories about unusual ideas and events. So it only follows that people
want to believe there’s some mysterious force that prevents a double blind test from revealing subtle di erences that can be appreciated only
over time. Likewise, the belief that “science” has not yet found a way to measure what they are sure they can hear is appealing to some people.
To my mind, this is the same as believing in the supernatural. There’s also some arrogance in that position: “I don’t need no stupid ‘science’
because I know what I hear.”
As we have seen, what a ects the audibility of distortion and other artifacts is their magnitude. If the sum of all artifacts is too soft to hear,
then their specific spectrum is irrelevant. The masking effect coupled with the Fletcher-Munson curves often reduces the artifact audibility further
compared to isolated sounds at disparate frequencies. But sometimes the sound really did change, due to comb
reflections in the room.
ltering caused by acoustic
Like the Emperor’s New Clothes, some people let themselves be conned into believing that a higher truth exists, even if they cannot personally
hear it. It’s common to see someone in an audio forum ask if this or that will make a di erence in sound quality. When others reply, “Why don’t
you try it for yourself?,” it’s common for the original poster to reply that he fears his hearing isn’t good enough to tell the di erence, and he
doesn’t want to be embarrassed when others hear the failings in his recording. There’s no disputing that hearing can improve with practice, and
you can learn to recognize di erent types of artifacts. But that’s not the same as imagining something that doesn’t exist at all. And, logically
speaking, just because a large number of people believe something does not alone make it true.
Once we understand what really a ects audio
delity, it seems pointless to fret over tiny amounts of distortion in an A/D converter or
microphone preamp when most loudspeakers have at least ten times more distortion. And compared to listening rooms that really do have an
obvious e ect on delity, any di erence between two competent audio devices is far less important. In the end, the purpose of audio is to
present the music. A slamming mix of a great tune played by excellent musicians will sound terri c even on AM radio or a low-bitrate MP3 le.
What makes a mix sound great has much more to do with musicianship and overall frequency balance than a lack of low-level artifacts. Room
buzzes and rattles are often even worse. If you slowly sweep a loud sine wave starting around 30 or 40 Hz, you will likely hear rattles at various
frequencies as windows and other furnishings resonant sympathetically. Those rattles are present and audible when excited by certain bass notes,
but they’re often masked by the midrange and treble content in the music.
U ltimately, people on both sides of the “subjectivist versus objectivist” debate want the same thing: to learn the truth. As I see it, the main
di erence between these two camps is what evidence they consider acceptable. Hopefully, the audio examples presented in this chapter helped
you determine for yourself at what level artifacts such as truncation distortion and jitter are a problem. And just because something is very soft
doesn’t mean it’s not audible or detrimental. My intent is simply to put everything that affects audio quality in the proper perspective.
doesn’t mean it’s not audible or detrimental. My intent is simply to put everything that affects audio quality in the proper perspective.
http :// 91
http ://
How much later an echo must arrive to be heard as a sep arate sound dep ends on the nature of the sound. A brief transient such as a snare drum rim shot can be heard as an echo if it arrives only a few milliseconds after the direct sound. But with
sounds that change slowly over time, such as a slow, sustained violin section, a delay might not be noticed until it arrives 50 or even 8 0 milliseconds later.
http :// mc/articles/PMC2 0318 8 6/
Formants are p art of the comp lex acoustic filtering that occurs in our vocal tracts, and it’s described in Chap ter 10.
http :// ress/p r_p ap ers.cfm
Beaming and lobing refer to the disp ersion p attern of loudsp eakers, where various frequencies are sent in varying strengths at different angles. This will be exp lained further in Chap ter 16.
Chapter e3
Video Production
With so many bands and solo performers using videos to promote their music, video production has become an important skill for musicians
and audio producers to acquire. While this section can’t explain video cameras and editing techniques as deeply as a dedicated book, I’ll cover
the basics for creating music videos and concert videos. I’ll use as examples my one-person music videos A Cello Rondo and Tele-Vision, a song
from a live concert I produced for a friend’s band, as well as a cello concerto with full orchestra that I worked on as a camera operator and
advisor. It will help if you watch those first:
Ethan Winer – A Cello Rondo music video: 9Q
Ethan Winer – Tele-Vision music video:
Rob Carlson – Folk Music in the Nude music video:
Allison Eldredge – Dvorak Cello Concerto fragment:
Because video editing is as much visual as intellectual, I created three tutorial videos to better show the principles. The video vegas_basics
gives an overview of video editing using Sony Vegas Video, and vegas_rondo and vegas_tele-vision show many speci c editing techniques I used
to create those two music videos. I use Vegas, but most other professional video programs have similar features and work the same way. So the
basic techniques and specific examples that follow can be applied to whatever software you prefer.
Video Production Basics
Most modern video editing software works similarly to an audio DAW program, with multiple tracks containing video and audio clips. Video
plug-ins are used to change the appearance of video clips in the same manner as audio plug-ins modify audio. And just like a DAW, when a
project is finished it’s rendered by the software to a media file, ready for viewing, streaming online, or burning to a DVD.
Modern video production software uses a paradigm called nonlinear editing, which means you can jump to any arbitrary place on the
timeline to view and edit the clips. This is in contrast to the older style of video editing using tape that runs linearly from start to nish, where
editing is performed by copying from one tape deck to another. This is not unlike the di erence between using an audio DAW program and an
analog tape recorder with a hardware mixing console.
Most music videos are shot as overdubs, where the musicians mime their performance while listening to existing audio. The music is rst
recorded and mixed to everyone’s satisfaction; then the band or solo performer pretends to sing and play while the cameras roll. When
overdubbing this way, the music is played loudly in the video studio or wherever the video is being shot so the performers can hear their part
clearly and maintain the proper tempo. The cameras also record audio, usually through their built-in microphones, but that audio is used only to
align the video track to the “real” audio later during editing.
Often, a band will mime the same song several times in a row so the camera operators can capture di erent performances and camera angles
to choose from when editing the footage later. If the budget allows having four or more manned cameras, each camera can focus on a di erent
player or show di erent angles of the entire band during a single performance. But many videos are done on a low budget with minimal
equipment, and even a single camera operator can su ce if the band performs the same song several times. Shooting multiple takes lets a single
camera operator focus on one player at a time during each performance. When editing, video of the guitar player can be featured during the
guitar solo, and so forth. Live video can be cool, with the occasional blurred frame as the camera swings wildly from one player to another, but
editing from well-shot clips is more efficient and usually gives more professional-looking results.
Of course, it’s possible to video a group during a live performance. In that case, the audio is usually recorded separately o the venue’s mixing
board either as a stereo mix or with each instrument and microphone recorded to a separate track to create a more polished mix later in a more
controlled environment. Again, each camera’s audio track is used only when editing the video later to synchronize the video tracks with the
master audio. This is the method I use when recording my friend’s band playing live.
Figure e3.1 shows the main editing screen of Vegas Video. The video and audio tracks are at the top, and at the bottom are a preview of builtin video e ects on the left, the Trimmer window where video clips are auditioned before being added to the project, and the audio output
section at right. Several other windows and views not shown here are also available, such as an Explorer to nd and import les, and a list of
the project’s current media. The audio window at lower right can also be expanded to show the surround mixer if appropriate, and so forth.
Figure e3.1
Sony Vegas Video offers an unlimited number of tracks for both video and audio, and every track can have as many video or audio plug-in effects as needed.
U nlike an audio DAW where multiple tracks are mixed together in some proportion to create a
nal mix, video tracks usually appear only
one at a time or are cross-faded brie y during a transition from one camera to another. Therefore, video tracks require establishing a priority to
specify which track is visible when multiple clips are present at the same time. As with an audio DAW, tracks are numbered starting at 1 from
top to bottom. The convention is for upper tracks to have priority over lower tracks below. So if you want to add a text title that overlays a
video clip, the track containing the text must be above the clip’s track. Otherwise, the clip will block the text.
Video editing software also has a Preview window for watching the video as you work. My computer setup has two monitors, and Vegas
allows the preview window to be moved to the second monitor for a full-screen view of the work in progress. This is much more convenient
than trying to see enough detail on a small window within the program’s main screen. Video editing and processing take a lot of computer
horsepower, memory, and disk space. U nless your computer is very fast or you’re working with only a few video tracks and plug-in e ects, your
computer may not be able to keep up with playback at the highest resolution in real time. Vegas lets you specify the preview quality so you can
watch more tracks in low resolution or fewer tracks at a higher quality. Another option is to render a short section of the project to memory.
Rendering a short section may take a few minutes to complete, but it lets you preview your work at full quality. You can also render all or part
of the video to a file.
Live Concert Example
The project in Figure e3.1 shows a live performance of my friend’s band that I shot in a medium-size theater using four cameras. This song is
very funny, describing a concert the band performed at a nudist folk festival! You can see each camera’s audio track underneath its video clip,
and the height of every track can be adjusted to see or hide detail as needed. In Figure e3.1, the video tracks are opened wide enough to see
their contents, and the camera audio tracks are narrower.
For this show, three cameras were at the back of the room, with a fourth camera to the left in front of the stage. The audio in the bottom track
was recorded during the show by taking a feed from the venue’s mixing board to the line input of a Zoom H2 portable audio recorder. I used the
H2 because it runs off internal batteries to avoid any chance of hum due to a ground loop. This is the audio heard when watching the video.
Three cameras on tripods were in the rear of the room. One was unmanned, set up in the middle of the rear wall and zoomed out to take in
the entire stage. This is a common technique, where a single static camera captures the full scene, serving as a backup you can switch to when
editing later in case none of the other camera shots were useful at a given moment. A second camera was o to the right, also unmanned, set up
to focus on Vinnie, who plays the ddle, mandolin, and acoustic guitar. Having a single camera on the second most important player in the
group ensures that this camera angle will always be available when needed to capture an instrumental solo or funny spoken comment between
The third camera was also tripod-mounted near the center of the back wall. I operated this camera manually, panning and zooming on
whatever seemed important at the moment. Most of the time, that camera was pointed at the lead singer, Rob, though I switched to the piano
player or drummer during their solos. The fourth camera was also manned, placed on a tripod o to one side in the front of the room to get a
totally di erent angle for more variety. Both of us running the cameras focused on whoever was the current featured player, free to pan or zoom
quickly if needed to catch something important. Any blurry shots that occurred while our cameras panned were replaced during editing with the
other manned camera or one of the unmanned cameras.
Speaking of blurry footage, it’s best to use a tripod if possible, especially when zooming way in to get a close-up of something far away.
Handheld camera shots are generally too shaky for professional results, unless you have a Steadicam-type stabilizer. These are very expensive for
good models, and they’re still not as stable as a good tripod resting on solid ground. U nlike still photography, where you position the tripod
once and shoot, a video tripod also needs to move smoothly as you pan left and right or up and down. The best type of video tripod has a fluid
head that can move smoothly, without jerking in small steps. Again, good ones tend to be expensive, though some models, such as those from
head that can move smoothly, without jerking in small steps. Again, good ones tend to be expensive, though some models, such as those from
Manfrotto, are relatively a ordable. Another option is the monopod, which is basically a single pole you rest on the ground. This gives more
stability than holding the camera in your hands unsupported, but it’s still not as good as a real tripod with a fluid head.
Many bands are too poor to own enough high-quality video cameras to do a proper shoot in high de nition, which was the case for this video.
I own a nice high-de nition (HD) Sony video camera, and my friend who ran the fourth camera is a video pro who owns three really nice Sony
professional cameras. But the two other cameras were both standard de nition (SD), one regular and one wide-screen. This video project is SD,
not HD, so I used my HD camera for the static full-stage view. That let me zoom in a little to add motion when editing, without losing too much
quality. If you zoom way in on a camera track after the fact when editing, the enlarged image becomes soft and grainy. But you can usually zoom
up to twice the normal size without too much degradation, and zooming even more is acceptable when using a camera having twice the
resolution of the end product.
Color Correction
One of the problems with using disparate cameras is the quality changes from shot to shot when switching between cameras. This includes not
just the overall resolution, but the color also shifts unless the camera’s white balance is set before shooting. Modern digital still-image and video
cameras can do this automatically: You place a large piece of white paper or cardboard on the set, using the same lighting that will be present
during the shoot. Then you zoom the camera way in so the white object lls the frame and press a button that tells the camera “this” is what
white should look like. The camera then adjusts its internal color balance to match. But even then, di erent cameras can respond di erently,
requiring you to apply color correction later during editing.
The Secondary Color Corrector plug-in shown in Figure e3.2 can shift the overall hue of a video clip, or its range can be limited to shift only a
single color or range of colors. This is done using the color wheel near the upper left of the screen by dragging the dot in the center of the wheel
toward the outside edge. The closer the dot is toward one edge, the more the hue is shifted toward that color.
Figure e3.2
The Secondary Color Corrector plug-in lets you shift the overall color balance of a video clip or alter just one color or range of colors, leaving other colors unchanged.
The color corrector also has controls for overall brightness, saturation (color intensity), and gamma. If you want a clip to be black and white
for special e ect, reduce the saturation to zero. If you want to bring out the colors and make them more vibrant, increase the saturation. The
gamma adjustment is particularly useful because it lets you increase the overall brightness of a clip, but only for those portions that are dim.
Where a brightness control makes everything in the frame lighter, gamma adjusts only the dark parts, leaving brighter portions alone. This
avoids bright areas becoming washed out, as can happen when increasing the overall brightness. This is not unlike audio compression that raises
the volume of soft elements without distorting sounds that are already loud enough.
Figure e3.3 shows a scene from a YouTube video I made of my pinball machines before and after increasing the gamma. You can see that the
lights on the play eld are the same brightness in both shots, but all of the dim portions of the frame were made much brighter in the lower
Figure e3.3
The gamma adjustment lets you make dim areas brighter, without making bright portions even brighter. The clip at the top is as shot, and the bottom is after increasing
the gamma. If the overall brightness was increased, the play field lights would have become washed out.
It’s common for recording engineers to check an audio mix on several systems and in the car, and likewise you should check your videos on
several monitors or burn a DVD to see it on several TVs, including the one you watch the most. Look for too dark or washed-out images or areas,
and for too much or too little contrast. In particular, verify that skin colors are correct. If you plan to do a lot of video editing, you can buy a
device to calibrate your video monitor. This attaches to the front face of the display during setup, and the included software tells you what
monitor settings to change to achieve a standard brightness and contrast with correct colors. There are also software-only methods that use
colored plastic sheets you look through while adjusting the red, green, and blue levels.
Synchronizing Video to Audio
Continuing on with the live concert example in Figure e3.1, each of the four camera’s les was placed on its own track. When you add a video
le to a project, both its video and audio are imported and placed onto adjacent tracks, so there are really two tracks associated with each
camera le. By putting each camera on its own video track, you can add the color corrector or other corrective plug-ins to modify only that
track. Vegas also lets you apply video plug-ins to individual clips when needed.
Importing video clips from older tape-based video cameras requires using a video capture utility that runs in real time as you play the tape in
your camera. This program is usually included with the video editing software, and the camera connects to your computer through a FireWire
port. Newer cameras save video clips as les directly to an internal memory card, and the les can be transferred via U SB, FireWire, or a card
reader much more quickly than real-time playback. A camera that uses memory cards not only transfers video more quickly, but it’s also more
reliable by avoiding the drop-outs that sometimes occur with videotape.
The rst step, after importing all of the camera les, is to synchronize each camera to the master audio track by sliding the camera clips left or
right on the timeline. The camera’s video and audio portions move together unless you specifically ungroup them, so aligning the camera’s audio
to the master audio also aligns the video. Figure e3.4 shows a close-up of one camera’s video and stereo audio tracks, with the master stereo
audio track underneath. The audio at the bottom was recorded from the mixing console, and the camera’s audio was recorded through its builtin microphone. Once the tracks are visually aligned—zoom way in as needed—you should listen to both audio tracks at once to verify they’re in
sync with no echo. Then you can mute the camera’s audio or even delete that track.
Figure e3.4
To synchronize a video track to the master audio, place the master audio track directly underneath the camera’s audio track, then slide the camera clip left or right until
both audio tracks are aligned.
Once all of the camera tracks are aligned, you can watch each track one at a time all the way through, making notes of where each camera
should be shown in the video. Often, the rst shot in a music video will be the full-stage camera so viewers get a sense of the venue. Then you’ll
decide when to switch to the other cameras, depending on who should be featured at that moment. It’s also common to zoom in (or out) slowly
over time, which adds motion to maintain interest. Ideally, this zooming will be done by the camera operator, but it can also be done during
editing as long as you don’t zoom in too much.
Panning and Zooming
Generally, when running a video camera, you’ll avoid zooming in or panning across quickly. Too much fast motion is distracting to viewers, and
it can create video artifacts at the low bit-rates often used for online videos. It’s also recommended to avoid “trombone” shots that zoom in and
then zoom out again soon after. Do either one or the other, but not both over a short period of time. Of course, it’s okay to zoom in fast to
capture something important, but consider switching to another camera when editing to hide the zoom. U nderstand these are merely suggestions
based on established practice and common taste. Art is art, so do whatever you think looks good.
Speaking of zooming, I’m always amused by ads for inexpensive video cameras that claim an impressive amount of digital zoom capability.
What matters with video cameras is their optical zoom, which tells how much the lens itself can zoom in to magnify the subject while shooting.
Digital zooming just means the camera can enlarge the video while playing back on its built-in screen, and this type of zooming always degrades
quality. For example, repeating every pixel to make an image twice as large makes angled lines and edges look jagged. However, digital
zooming can have higher quality than you’d get from simply repeating pixels to make an image larger. When done properly, digital zooming
creates new pixels having in-between values, depending on the image content. A good digital zoom algorithm will actually create new colors
and shades that better match the surrounding pixels on both sides, giving a smoother appearance. But still, digital zooming always compromises
quality compared to optical zooming, especially for large zoom amounts.
Video Transitions
Switching between cameras during editing can be either sudden or with a cross-fade from one to the other. You create a transition by dragging
one camera’s clip to overlap another, and the transition occurs over the duration of the overlap. This is much like a cross-fade in an audio DAW
program, and the two clips can be on the same track or separate tracks. If both clips are on separate tracks, as I usually do, you’ll use fade-in and
fade-out envelopes on both tracks, rather than have one clip physically overlap the other. Either way, to create a fast cross-fade, the clips will
overlap for half a second or so, or the overlap can extend for several seconds to create a slow transition. The second clip on Track 6 of Figure
e3.1 shows a fade-out envelope, which creates a cross-fade to the subsequent clip below on Track 8. Since Track 6 has a higher priority and
hides Track 8, there’s no need to apply a corresponding fade-in on Track 8; Track 6 simply fades out to gradually reveal Track 8 over a period
of about one and a half seconds. You can also specify the fade curve, which controls how the cross-fade changes over time.
Pop music videos are usually fast-paced, often switching quickly from one camera angle to the next, or applying a transition e ect between
clips. Besides the standard cross-fade, most video editor software includes a number of transition e ects such as Iris, Barn Door, and various other
wipe and color ash e ects. For example, an Iris transition opens or closes a round window to reveal the subsequent clip as in Figure e3.5.
Many other transition types are available, and Vegas lets you audition all the types and their variations in a preview window. The included
video “vegas_television” shows how that works.
Figure e3.5
Most video programs o er a large number of transition types, including the Iris shown here that opens a round window over time to expose the subsequent track. In this
example, the singer’s track transitions to the mandolin player.
I prefer to cut or cross-fade from one camera to another a second or two before something happens, such as the start of a guitar solo. This
gives the viewer a chance to prepare for what’s coming and already be focused on the player in the new perspective before the solo begins. But
sometimes one solo starts before the previous one has ended, or a solo starts while the lead singer is still singing. In that case, you can use a slow
cross-fade over a second or two to partially show both performers at the same time. Or you can put one camera in its own smaller window on
the screen to show both cameras at the same time.
When I do these live videos for my friend, both camera operators focus on whatever seems important to us at the moment. But sometimes
neither of us is pointing our camera at what’s most important. Maybe we’ll both think a piano solo is coming and aim there, but it was actually
the guitar player’s turn. So by the time we focus our cameras on the guitar player, he’s already ve seconds into the solo. This is where the fall-
back camera that captures the entire stage is useful. When editing, I’ll switch to the full-stage camera a second or two before the guitar solo
starts, then slowly pan and zoom that camera’s track toward the guitarist in anticipation of the solo. As mentioned, you can usually zoom a clip
up to 200 percent (double size) before the quality degrades enough to be objectionable. So I’ll do a slow zoom over a few seconds that doesn’t
enlarge the frame too much and at the same time pan toward the soloist to imply what’s coming. Then I nally switch to one of the manned
cameras once it’s pointing at the featured player. This is shown in Figure e3.6 in the next section to anticipate a piano solo.
Figure e3.6
Key frames indicate points on the timeline where something is to change. This can be a pan and zoom as shown here or changes in color, brightness, text size, or literally
any other property of a video clip or plug-in effect.
The venue where this live concert was shot is not huge, with about 300 seats. Since I recorded the audio from the house mixing board, the
sound was very clean and dry—in fact, too dry to properly convey the feel of a live concert. To give more of a live sound, I added the audio
recorded by the two cameras in the rear—one each panned hard left and right—mixed in very softly to add just a touch of ambience. This also
increased the overall width of the sound field, because instruments and voices from the board audio were mostly panned near the center.
Key Frames
One of the most powerful features of nonlinear video editing is key frames. These are points along the timeline where changes occur, such as the
start and end points of a zoom or pan. Figure e3.6 shows the Pan/Crop window that pans and zooms video clips. The top image shows the full
frame, which is displayed initially at the start of the clip. The large hollow “F” in the middle of the frame is a reference showing the frame’s size
and orientation. Video software can rotate images as well as size and pan them, so the “F” lets you see all three frame properties at once.
You can see three markers in the clip’s timeline at the bottom marked Position: one at the far left, another at the four seconds marker, and the
last at eight seconds. In this case, the full frame is displayed for the rst four seconds of the clip because both key frames are set the same. The
lower image shows that a smaller window is displayed at the eight seconds mark. Since a smaller portion of the clip is framed, that area
becomes zoomed in to ll the entire screen. When this clip plays, Vegas creates all the in-between zoom levels to transition from a full frame to
the zoomed-in portion automatically. The other timeline area marked “Mask” is disabled here, but it can be used to show or hide selected
portions of the screen. The video “vegas_basics” explains how masks are used.
As you can see, key frames are a very powerful concept because you need only de ne the start and end conditions, and the software does
whatever is needed to get from one state to the next automatically. If you want something to change more quickly, simply slide the destination
key frame to the left along the timeline to arrive there earlier. Again, key frames can be applied to anything the software is capable of varying,
including every parameter of a video plug-in.
Most video runs at 30 frames per second (FPS), which is derived from the 60 Hz AC power line frequency. In most U S localities, the frequency
of commercial power is very stable, so this is a convenient yet accurate timing reference for the frame rate. The timeline is divided into hours,
minutes, seconds, and frames, with a new frame every 1/30 of a second. The PAL video format used in Europe divides each second into 25
frames, because AC power there is 50 Hz. NTSC video uses a frame rate of 29.97 Hz (the explanation is complicated, but the simple version is it
allowed making TV sets cheaper).
Blu-ray disks run at 24 frames per second, so when creating those, you have to shoot at 24 FPS, or your software will convert the data when it
burns the disk. The process is similar to audio sample rate conversion, dropping or repeating frames as needed. Many professional HD video
cameras can also shoot at 60 FPS. This is not so much to capture a higher resolution but to achieve smoother slow motion when that e ect is
used. If you slow a 30 FPS clip to half speed, each frame is repeated once, which gives a jerky e ect. If you shoot at 60 FPS, you can slow it
down to half speed and still have 30 unique frames per second.
Orchestra Example
Pop music videos often contain many quick transitions from one camera to another, sometimes with various video e ects applied. But for
classical music, a gentler approach is usually better, especially when the music is at a slow tempo. In the orchestra example linked at the start of
this chapter, you’ll see that most of the cross-fades from one camera to another are relatively slow, and with slower pieces, camera cross-fades
can span four seconds or even longer. You’ll also notice that many of the camera shots constantly zoom in (or out) slowly on the players and
soloist, as described earlier. This video was shot in high de nition using four cameras, with full 5.1 surround sound, though the YouTube clip is
standard resolution and plain stereo.
My friend and professional videographer Mark Weiss was at the front of the balcony at the far right, and I was also in the balcony but at the
far left. This way Mark could capture the faces of players on the left side of the stage and the cello soloist’s left side. My position let me do the
same for players on the right of the stage and zoom in on the cellist’s right side. Another camera operator was on the main floor near the front to
the right of the audience. A fourth unmanned camera was placed high on a ledge at the side of the stage pointing down toward the conductor.
Having a dedicated camera lets Mark switch to the conductor’s face occasionally during editing, which is not possible from in front where the
camera operators were stationed. The three of us have shot many videos for this orchestra, though for some videos the third camera operator was
on the stage itself, off to one side, to get better close-ups of the players.
Figure e3.7, taken from the stage looking up toward the balcony at the rear of the hall, shows the surround microphone rig Mark built for
these orchestra videos. This consists of a metal frame to which ve microphone shock-mount holders attach. It’s hung from thin steel wires 18
feet in the air, centered above the third row of the audience. The microphones are connected by long cables to a laptop computer with a
multichannel FireWire interface in a room at one side of the stage.
Figure e3.7
ve microphones are placed high up over the audience. Three of the mics point forward and down for the left, center, and right main channels, and two more mics
point toward the rear left and right to capture a surround sound field from the back of the hall.
Cello Rondo and Tele-Vision Examples
The demo “vegas_rondo” shows many of the editing techniques I used to create that video, but it’s worth mentioning a few additional points
here. The opening title text “A Cello Rondo” fades in, then zooms slightly over time, using two key frames: one where the text is to start
zooming and another where it stops at the nal larger size. Likewise, the text “By Ethan Winer” uses key frames to slide onto the screen from left
to right, then “bounces” left and right as the text settles into its final position.
There are three ways to apply key frames to change the size of on-screen text. If you use the Scaling adjustment in the text object itself, or
resize the frame in the Pan/Crop window, each size is generated at the highest resolution when the video is rendered. You can also size and
move video clips using a track’s Track Motion setting, but that resizes the text after it’s been generated, which lowers the resolution and can
soften the edges.
As mentioned, the order of tracks determines their priority for overlapping video clips, with tracks at the top showing in front of lower tracks.
Figure e3.8 shows part of the video where two players are on-screen at the same time. In this case, the track for the player on the left is above
the track for the player on the right, which in turn is above the track holding the textured background. The result looks natural, as if one player’s
bow is actually behind the other player’s arm, with both players in front of the background.
Figure e3.8
In most video editing software, lower-numbered tracks appear in front of higher-numbered tracks. Here, the player on the left is on Track 7, the player on the right is on
Track 11, and the background is on Track 24.
Figure e3.9 shows a portion of the video with nine separate elements: ve cellists, my cat Bear, a halo over Bear’s head that’s automated using
key frames to follow his head movement, a white spotlight that sweeps across the screen via key frames, and a static photo of my cello used for
the background. I used a green screen to create what looks like a single performance from all of these video elements, letting me keep just the
players and strip out the wall behind me. A green screen lets you remove the background behind the subject and “ oat” the subject on top of a
new background. This is explained further in the “vegas_rondo” demo video.
Figure e3.9
Each of the nine elements in this scene is sized and positioned individually, with the halo and spotlight programmed to move via key frames.
Vegas includes a large number of e ects plug-ins, including a very useful “noise” generator. This creates many di erent types of texture patterns,
not just snow, as you’d see on a weak TV station, which is what video noise really looks like. The Vegas noise patterns include wood grain,
clouds, ames, camou age, lightning, and many others. I used every one of those for my Cello Rondo video, but I wanted something more
sophisticated for Tele-Vision. Many companies sell animated backgrounds you can add royalty-free to your projects, and I chose a product called
Production Blox from 12 Inch Design. These backgrounds are a ordable and far more sophisticated than anything I could have created myself
using the tools built into Vegas.
One goal of a music video is to add interest by doing more than is possible with only audio. In a live concert video you can switch cameras to
change angles and show di erent performers and use transitions and other special e ects. When creating a video such as Tele-Vision that’s
compiled from many di erent green screen clips, you can also change the backgrounds. Not only can you switch between backgrounds, but you
can change the background’s appearance over time using key frames. The Production Blox backgrounds are already animated, but I spent a lot of
time varying the stock Vegas backgrounds, such as making a checkerboard pattern change color and rotate, and animating cloud patterns. I even
created an entire animated disco dance
oor from scratch. A big part of audio mixing is sound design, and likewise an important part of video
production is thinking up clever ways for things to change over time to maintain the viewer’s interest.
Time-Lapse Video
Although not directly related to music videos, a common special e ect is time-lapse video, where several minutes or even hours elapse in just a
few seconds. If you need to speed up a clip by only a modest amount, Vegas lets you add a Velocity envelope to a video clip to change its
playback speed. You can increase playback speed as much as 300 percent or slow it to a full stop. You can even set the Velocity to a negative
value to play a clip backward, as shown in the “vegas_rondo” demo. If 300 percent isn’t fast enough, you can increase the speed further by Ctrl-
dragging the right edge of a clip. Ctrl-dragging its right edge to the left compresses the clip to play up to four times faster. You can also Ctrl-drag
to the right to stretch it out, slowing down playback to one-fourth the original speed. By setting the Velocity to 300 percent and Ctrl-dragging
fully to the left, you can increase playback speed as much as 12 times.
If that’s still not fast enough, Vegas lets you export frames from a video clip to a sequence of image les. You specify the start and end points
of the clip to export and how much time to skip between each frame saved to disk. For one of my other music videos I wanted to speed up a
clip by 60 to 1, where each minute passes in one second. So when exporting the image sequence, I kept one frame for every 59 that were
discarded. Then I imported the series of still images into Vegas, which treats them as a new uni ed video clip. Rather than describe the steps
here, I created the YouTube tutorial g, to show the effect and describe the procedure in detail.
Media File Formats
Audio les can be saved as either original Wave or AIFF les, or in a lossy-compressed format such as MP3 or AAC. Lossy compression discards
content deemed to be inaudible, or at least less important, thus making the le smaller. Raw video les are huge, so they’re almost always
reduced in size using lossy compression. An uncompressed high-de nition AVI le occupies nearly 250 MB of disk space per second! Clearly,
compression is needed if you hope to put a video longer than 18 seconds onto a 4.7 GB recordable DVD.
Where lossy audio compression removes content that’s too soft to be heard, video compression instead writes only what changes from one
frame to the next, rather than saving entire frames. For example, with a newscaster in front of a static background, most of the changes occur in
just a small part of the screen where the speaker’s mouth changes. The rest of the screen stays the same and doesn’t need to be repeated at every
frame. This is a simpli cation, but that’s the basic idea. Video in North America and many other parts of the world runs at 30 frames per second,
so not having to save every frame in its entirety can reduce the file size significantly.
As with audio, lossy video compression is expressed as the resulting bit-rate for the le or data stream. Most commercial DVDs play at a bitrate of 8 megabits per second (Mbps), though high-de nition video on a Blu-ray disk can be up to 50 Mbps. One byte of data holds eight bits, so
each second of an 8 Mbps compressed DVD video occupies 1 MB of disk space. Therefore, a single-layer recordable DVD holds about 78 minutes
at 8 Mbps. You can reduce the bit-rate when rendering a project to store a longer video or use dual-layer DVDs. Note that the speci ed bit-rate is
for the combined video and audio content, so the actual video bit-rate is slightly less. You can also specify the compressed audio bit-rate when
rendering videos to balance audio quality versus file size.
As with audio les, variable bit rate (VBR) encoding is also an option for compressed video. VBR changes the bit-rate from moment to
moment, depending on the current demands of the video stream. A static photo that remains on screen for ve seconds can get away with a
much lower bit-rate than a fast action scene in a movie or a close-up of a basketball player rushing across the court or a shot where the camera
pans quickly across a crowd. Since lossy video compression encodes what changes from frame to frame, motion is the main factor that increases
the size of a le. With VBR compression, the maximum bit-rate is used only when needed for scenes that include a lot of motion, so VBR video
files are usually smaller than constant bit-rate (CBR) files.
The le format for DVDs that play in a consumer DVD player is MPEG2, where MPEG stands for Moving Picture Experts Group, the standards
out t that developed the format. If you plan to put your video onto a DVD, this is the format you should export to. Vegas includes a render
template for this format that is optimized for use with its companion program DVD Architect. Video that will be uploaded to a website can use
other le formats, but don’t use a format so new or exotic that users must update their player software before they can watch your video.
Windows Media Video (WMV) is a popular format, as are Flash (FLV) and the newer MP4, which works well for uploading to YouTube. But the
popularity of video le formats comes and goes, and new formats are always being developed. What works best for YouTube today might be
different next year or even next week.
I usually render my videos as MP4 les at a high bit-rate so I can watch them on my TV in high de nition, and then I make a smaller Flash
version to put on my websites. I use the excellent and a ordable AVS Video Converter software to convert between formats. There are other
such programs, including those that claim to be freeware, though some are “annoyware” that add their branding on top of your video until you
buy the program. Also, this is one category of program that’s a frequent target for malware. Video conversion and DVD extraction are popular
buy the program. Also, this is one category of program that’s a frequent target for malware. Video conversion and DVD extraction are popular
software searches, and unscrupulous hackers take advantage of that. Beware!
Entire books have been written about lighting, and I can cover only the basics here. The single most important advice I can o er is to get
halogen lights that are as bright as possible. Newer LED lights are also available; they don’t run nearly as hot as 1 kilowatt of halogen lighting,
and they use less electricity. But at this time they’re quite expensive and are a good investment only if you do a lot of video work. Halogen
lamps produce a very pure white light, so colors will be truer than when using incandescent or uorescent bulbs. As with most things, you can
spend a little or a lot. Inexpensive halogen “shop” lights work very well, though professional lights have better stands that o er a wide range of
adjustment for both height and angle. Some pro lights also o er two brightness settings. Regardless of which type of halogen light you get, buy
spare bulbs and bring them with you when doing remote shoots.
When lighting a video shot in a home setting, it’s best to point the lights up toward the ceiling or at a nearby wall. This di uses the light and
avoids strong shadows with sharp edges. Direct light always creates shadows unless you have a lot of lights placed all around the subject, with
each light lling in shadows created by all the other lights. Placing lights to avoid problems with interaction is a bit like placing microphones.
Figure 18.9 from Chapter 18 shows a product photo I took in a friend’s apartment. I used two professional 650-watt halogen lights on stands,
with the lights several feet away from the subject, raised to about two feet below the ceiling and pointing up. With video, having an additional
light behind the subject is also common, often set to point at a person’s head to highlight his or her hair, which adds depth to the scene. Watch
almost any TV drama or movie, and you’ll notice that many of the actors have a separate light coming from one side or behind, pointed at their
Earlier I mentioned that modern cameras include automatic white balance, which is a huge convenience for hobbyists who don’t have the time
or resources to become camera experts and learn how to set everything properly manually. To get the best results, however, it’s important to use
the same type of lights throughout on a set rather than mix halogen, incandescent, and uorescent lights. Each bulb type has a di erent color
temperature, which a ects the hue the camera captures. If the colors of some parts of a room or stage vary compared to others, the colors will
shift as the camera pans to take in di erent subjects. Don’t be afraid to turn ordinary room lights off when setting up the lighting for your video
This chapter explains the basics of video production, including cameras, editing, media file formats, and lighting. Besides the four example music
videos, three demo videos let you see video editing in action in a way that’s not possible to convey in words alone. Modern video software
works in much the same way as audio DAW programs, using multiple tracks to organize video and audio clips. And as with audio, video plugins can be used to change the appearance of the clips or to add special e ects. But unlike audio mixing, video tracks are usually shown one at a
time, with tracks at the top of the list hiding the tracks below.
Most music videos are performed as overdubs, where the players mime their parts while listening to an existing audio mix. If you don’t have
enough cameras available to capture as many angles and close-ups as you’d like, you can use one or two cameras and have the band perform a
song several times in a row. Then for each performance, the cameras will feature one or two di erent players. It’s common to dedicate a single
unmanned camera to take in the entire scene to serve as a backup in case all the other camera shots turn out awed at a particular moment.
However, when using disparate cameras, the quality and color can change from shot to shot, especially if the cameras vary in quality. Thankfully,
many cameras can set their white balance automatically before shooting to even out color di erences. Besides being able to adjust color after the
fact, a gamma adjustment lets you increase the overall brightness of a clip, without washing out sections that are already bright enough.
Once you’ve imported all of the video les from each camera, they need to be synchronized with the nal audio track. After that’s done, the
camera audio is no longer needed, so you can mute or delete those tracks. Switching between cameras during editing can be abrupt or with a
cross-fade, and most video software includes a number of transition e ects. When editing video, one of the most powerful and useful features is
key frames that establish the start and end times over which a change occurs, and the software creates all the in-between points automatically.
Key frames can vary anything the software is capable of, including video plug-in parameters.
This chapter also explained that video les are always reduced in size using lossy compression, though the resulting degradation is not usually
a problem unless you compress to a very low bit-rate. Proper lighting is equally important. U sing halogen lamps that are bright and white and
are di used by bouncing the light o a wall or ceiling is a good rst step to achieving a professional look. But no matter how good you think
your video looks, it’s useful to verify the quality and color balance on more than one monitor or TV set.
Chapter 4
Gozintas and Gozoutas
Years ago, in the 1960s through 1980s, I designed audio circuits both professionally and as a hobby. One day I went to the local surplus
electronics store to buy connectors for some project or other. This place was huge and sold all sorts of used electronics devices, wire, and
component parts. The two guys who owned the shop—John and Irv—were colorful characters, but they loved what they did and o ered great
deals. They sold some World War II–era military surplus, but they also had more recent goodies such as used oscilloscopes and other test gear. I
once got a great deal on a dozen high-quality audio transformers. When I told John the type of connectors I needed, he said, “Oh, you want some
gozintas.” I must have looked puzzled for a few seconds, then we both laughed when John saw that I understood. Hence the title of this chapter.
Audio Signals
Audio wiring involves three di erent issues: signal levels, source and destination impedance, and connector types. We’ll consider audio signals
rst, starting with devices that output very small voltages, then progress to higher levels. In the following descriptions, the term passive refers to
microphones and musical instrument pickups that have no built-in preamplifier or other electronics.
In most cases, voltage from a dynamic microphone or phono cartridge is created via magnetic induction: A coil of wire (or strip of metal) is
placed in close proximity to one or more magnets. When either the coil or magnet moves, a small voltage is generated. The more the coil or
magnet moves while remaining in close proximity, the larger the voltage that’s created. The playback head in an analog tape recorder is also an
electromagnetic device; in that case the head contains the coil, and the tape itself serves as the moving magnet. A varying voltage proportional to
the tape’s magnetism and travel speed is generated in the head’s coil as the tape passes by. Piezo guitar and violin pickups work on a di erent
principle: Flexing or squeezing a thin crystal or ceramic plate generates a voltage proportional to the torque that’s applied.
Passive magnetic devices such as dynamic and ribbon microphones output extremely small voltages, typically just a few millivolts
(thousandths of a volt). The general term for these very small voltages is microphone level. Other passive devices that output very small signals
include phonograph cartridges and the playback heads in analog tape recorders. The output level from a typical magnetic guitar or bass pickup
is also small, though not as small as the signal from most low-impedance dynamic microphones.
Passive ribbon microphones output even smaller levels, as you can see in Table 4.1. Of course, the exact output voltage at any moment
depends on the loudness of sounds reaching the microphone or how hard you strum the strings on a guitar. The number of turns of wire in the
coil also a ects its output voltage and, simultaneously, its output impedance. For example, the coil in a low-impedance microphone has fewer
turns of wire than a guitar pickup, which is high impedance. Some inexpensive dynamic microphones are high impedance, using a built-in
audio step-up transformer to increase the output voltage and impedance. Note that low impedance and high impedance are often abbreviated as
low-Z and high-Z, respectively.
Table 4.1: Output Levels for Passive Transducers.
Output Level
Ribbon microphone
0.1 millivolts
Moving coil phono cartridge
0.15 millivolts
Analog tape playback head
2 millivolts
Moving magnet phono cartridge 5 millivolts
150-ohm dynamic microphone
10 millivolts
Fender Precision bass pickup
150 millivolts
Humbucker guitar pickup
200 millivolts
Piezo guitar pickup
0.5 volts
Because the voltages these devices generate are so small, a preampli er is needed to raise the signals enough to drive a line-level input on a
tape recorder or power ampli er. When dealing with very low signal levels, it’s best to place the preamp near to the source. U sing a short
shielded wire reduces the chance of picking up AC power mains hum or interference from radio stations and nearby cell phones. Depending on
the output impedance of the voltage source, using short wires can also minimize signal loss at high frequencies due to cable capacitance that
accumulates over longer distances.
Line-level signals are typically around 1 volt, though again the exact level depends on the volume, which changes from moment to moment.
There are also two line-level standards: −10 and +4. Chapter 1 explained that signal levels are often expressed as decibels relative to a
standard reference. In this case, −10 is actually −10 dBV, where −10 dB is relative to 1 volt. Professional audio gear handles +4 signals, or
+4 dBm, which means the nominal level is 4 dB above 1 milliwatt. Some professional gear includes a switch on the rear of the unit to
+4 dBm, which means the nominal level is 4 dB above 1 milliwatt. Some professional gear includes a switch on the rear of the unit to
accommodate both standards.
To o er adequate headroom for brief louder bursts of sound, audio gear capable of +4 levels must be able to output up to +18 dBm and
even higher. Such gear can also drive 600-ohm loads, which requires a fair amount of output current. These days, most pro devices don’t need to
drive 600-ohm loads, which originated in the early days of telephone systems. But vintage 600-ohm equipment is still in use, so pro gear usually
has that capability. Consumer audio uses a level 14 dB lower, with no need to drive a low-impedance input, hence the −10 label. Driving a +4
signal into 600 ohms requires a more substantial power supply than sending −10 dBV to a high-impedance load. So the lower −10 level is
used with home audio equipment mainly to reduce manufacturing costs.
Speaker-level signals are much larger than line levels. Indeed, a hefty power ampli er can put out enough voltage to give you a painful shock!
For example, a power ampli er sending 500 watts into an 8-ohm speaker outputs more than 60 volts. Because loudspeakers also draw a
relatively large amount of current, the wiring and connectors are much more substantial than for microphone and line-level signals. And that
brings us to audio wiring.
Audio Wiring
Microphone and line-level signals generally use the same types of wire: one or two conductors with an outer shield. The cord used for an electric
guitar carries one signal, so this type of wire uses a single conductor surrounded by a braided or wrapped shield. Wire that isn’t handled and
exed frequently often has a simpler but less sturdy foil shield. Stereo and balanced signals require two conductors plus a shield. Several types of
signal wires are shown in Figure 4.1.
Figure 4.1:
Most audio cable has one or two insulated signal wires surrounded by a metal shield to avoid picking up hum and radio interference. From left to right: two-conductor
with a foil shield and bare drain wire, single conductor with a wrapped shield, standard coaxial cable (“coax”) with a braided shield, and unshielded twisted pair.
Low-level audio cables generally use relatively thin 22- or 24-gauge copper wire because they pass only small amounts of current, and most
use stranded conductors rather than a single solid copper core. Wire made from many twisted thinner strands is more expensive to manufacture,
but the cable can be exed many times without breaking, and it also handles better because it’s less sti . Copper conductors are often tin-plated
to avoid tarnishing, which can make them di cult to solder after a few years. So even if a bare wire appears silver colored, it’s still copper
Wire that has two active conductors is used for two very di erent situations: stereo unbalanced and mono balanced. As you likely know,
electricity requires two conductors, with one acting as a return path. A stereo wire used to connect a portable MP3 player to a home receiver has
two conductors, plus a third return wire that serves both channels, as shown in Figure 4.2. The return can be either a plain wire or a surrounding
shield. Most such wires use a -inch (or ¼-inch) male phone plug that has a tip, ring, and sleeve, abbreviated TRS. The other end could have
either two ¼-inch phone plugs or two RCA connectors, as shown in Figure 4.2. By convention, the tip carries the left channel, the right channel
is the ring, and the sleeve carries the common ground connection for both channels.
Figure 4.2:
With two-conductor shielded wire used for consumer stereos, each conductor carries the active, or “hot” signal for one channel, and the shield carries the return signal
for both the left and right channels.
Two-conductor shielded wire is also used to carry a single balanced channel, as shown in Figure 4.3. Here, the two center conductors carry the
signal, and neither signal voltage is referenced to the grounded shield. In this case, the shield serves only to reduce hum and radio interference
getting to the active wires within. When used for balanced microphone and line-level signals, XLR connectors are often used, though ¼-inch TRS
phone plugs are also common. In that case, the tip is considered the positive connection, and the ring is negative.
Figure 4.3:
When two-conductor shielded wire is used for balanced microphone and line-level signals, the signal voltage is carried by the two active conductors only.
Another type of audio cable contains four active conductors plus a shield. This can be used to transmit balanced stereo signals down one cable
or can be arranged in a “star quad” con guration that o ers slightly greater hum rejection when used for one balanced channel. Microphone and
line-level signal wires are always shielded, but earphone and speaker wires don’t need a shield because the signals are larger, and the output
impedance of power ampli ers is very low. With large signal voltages, any hum or other interference that arrives through the air is very small in
comparison. Further, the driving power ampli er’s low output impedance acts as a short circuit to airborne signals reaching the wire. A low
output impedance also reduces cross-talk between the left and right channels for the same reason. When wires carrying audio signals are in close
proximity, one channel can leak into the other via both inductive and capacitive coupling, especially if they’re twisted together, as is common.
Such coupling acts like both a capacitor and a transformer, passing signals from one wire to the other. An ampli er’s low output impedance
reduces this e ect. The value of audio gear having a low output impedance is an important concept for other reasons, too, as you’ll see later in
this chapter.
Balanced wiring that has two active conductors plus a shield is used mainly to reject hum. Even when wires are shielded, strong AC power
elds can still nd their way to the inner conductors. When you consider the tiny voltages that passive microphones produce, even a few
microvolts (millionths of a volt) of hum can be a real problem. By using two wires whose voltage difference contains the desired signal, any hum
that gets through the shield is impressed equally onto both wires. A differential input circuit considers only the voltage di erence between the
two signal wires, so it is mostly una ected by any hum or interference that’s common to both wires. The same principle applies to unshielded
twisted pair wires. If hum or radio frequency interference (RFI) reaches the unshielded wires, it’s rejected by a di erential input because both
wires contain the same hum or interference. Adding a shield reduces hum and RFI further, but for noncritical applications, plain twisted wiring
is often adequate.
Humbucking guitar pickups are similar in concept to using balanced wiring to reject hum arriving through the air. A guitar pickup is
especially prone to hum pickup because its coil acts as a highly e cient antenna at the 60 Hz mains frequency. A humbucking pickup is built
from two separate coils, each with its own magnets. The two coils are wired in series, but with the polarity of one coil reversed. When hum in
the air reaches both coils, the two signals cancel each other out. To avoid canceling the desired signal from the guitar strings, the magnets in each
coil are oriented in opposite directions. That is, where one coil has the north pole at the top closer to the strings, the other coil has the north
pole at the bottom. This way the desired signal comes out twice as strong as with only one coil, while the hum is canceled. This is why
humbucking pickups are usually louder than single coil pickups. Very clever!
Another important reason to use balanced wiring is to avoid ground loops between two pieces of equipment. When a single conductor
shielded cable carries audio, the grounded shield serves as one of the two conductors. In a bedroom studio, all of the gear is likely plugged into a
single AC outlet or power strip, so each device has the same ground connection. But in larger installations it’s possible for the ground potential
(voltage) to vary slightly at di erent outlets. In theory, a ground should always be zero volts. But voltage losses due to resistance of the power
wires in the walls prevents a ground connection from being exactly zero volts. So ground at one outlet may be 2 millivolts but 7 millivolts at
another, creating a hum signal of 5 millivolts. Disconnecting the grounded shield at the receiving end of the wire avoids this problem, while still
passing the desired signal present in the two hot wires. In this case, 5 millivolts of hum is only 46 dB below a nominal line level of one volt.
As mentioned, even a few microvolts of hum can be a real problem at microphone signal levels. U sing a balanced input with balanced wiring
avoids hum because the desired signal is the di erence between the two active conductors, unrelated to the ground voltage. So balanced wiring
avoids hum pickup through the air like an antenna and also hum caused by a di erence in the ground voltages at the sending and receiving
equipment. The spec for how well an input rejects hum common to both signal wires is called its common mode rejection ratio, or CMRR,
expressed as some number of dB below the difference voltage that contains the desired signal.
Another factor with audio wiring is its capacitance, which becomes progressively like a short circuit between the conductors at higher
frequencies. A signal source such as a passive guitar pickup has a high output impedance, which limits the amount of current it can provide. It’s
equivalent to placing a large value resistor in series with the pickup’s output. So for guitar and bass pickups, the capacitance of the connecting
wire is very important, unless the guitar contains active electronics to better drive its output signal to the amplifier.
Later sections will cover capacitance in more detail, but brie y for now, a capacitor is similar to a battery. When you put a battery into a
charging station, the power supply provides current to charge the battery. The more current that’s available, the faster the battery will charge. It’s
not practical to charge a battery very quickly, because the battery would draw too much current from the power supply and overheat. So a
resistor is wired in series to limit the amount of current that can ow. The larger the resistor, the less current that can pass through, and the
longer the battery takes to charge. The same thing happens when a high output impedance is coupled with wire having a large amount of
longer the battery takes to charge. The same thing happens when a high output impedance is coupled with wire having a large amount of
capacitance. The available output current from the pickup is limited by its high impedance, so the wire’s inherent capacitance can’t be charged
quickly enough to follow rapid voltage changes, and in turn the high-frequency response suffers.
This is another reason shielded wire is not usually used for loudspeakers. Adding a shield can only increase the capacitance between each
conductor and the grounded shield. The ampli er then has to work harder at high frequencies to charge the cable’s capacitance at each wave
cycle. Low-capacitance wire is important for long lengths with digital audio, too, because the frequencies are at least twice as high as the audio
signals represented. That is, achieving a response out to 20 KHz with digital audio requires passing frequencies equal to the sample rate of
44.1 KHz or higher. A digital signal traveling through a wire is actually an analog square wave, so cable capacitance can a ect the signal by
rounding the steep edges of the waves, blurring the transition between ones and zeros.
Another type of digital audio cable uses
ber-optics technology, which sends the signals as light waves, so it is immune to capacitance and
other electrical e ects. The S/PDIF format was developed jointly by Sony and Philips—the S/P part of the name—and DIF stands for digital
interface. Most people pronounce it as “spid-i .” S/PDIF uses either 75-ohm coaxial cable with a BNC or RCA connector or
ber-optic cable
with TOSLINK (Toshiba Link) connectors. Fiber-optic cables are sometimes called light pipes because light is transmitted down the cable rather
than electrical voltages. One huge advantage of using
ber-optics versus copper wire is there’s no possibility of ground loops or picking up
airborne hum or radio interference. Another advantage is a single cable can transmit many audio channels at once. However, the downside is
optical cables are limited to a length of about 20 feet unless a repeater is used. S/PDIF is popular for both professional and home audio systems,
and many consumer devices include connectors for both coax and fiber-optic cables.
Speaker wires are rarely shielded, but the conductors must be thick to handle the high currents required. For short runs (less than 10 feet) that
carry up to a hundred watts or so, 16-gauge wire is usually adequate. I generally use “zip cord” lamp wire for short, low-power applications. But
for longer runs, the conductors must be thicker, depending on the length and amount of current the wire will carry. Romex used for AC power
wiring is a good choice for high-powered speaker applications up to 50 feet or even longer. Romex is commonly available in 14- to 10-gauge
and even thicker, where lower-gauge numbers represent thicker conductors.
I’ve purposely omitted AC power wiring and connectors because they vary around the world. Plus, AC wires and connectors are mostly selfevident and not complicated. Likewise, there’s little point in including FireWire, U SB, and CAT5 wiring and connectors as used for audio
because a computer or other digital device handles the signals at both ends, and users do not determine the speci c voltages or frequencies. For
the most part, a digital connection either works or it doesn’t.
Audio Connectors
There are many types of audio connectors, but I’ll cover only the common types. For example, some devices use proprietary connectors that
carry many channels at once over 25 or even more separate pins and sockets. For most connector styles, the “male” and “female” designations
are self-evident. Male connectors are also called plugs, though it’s not clear to me why female connectors are called jacks.
The ¼-inch phone plug and corresponding jack is used for electric guitars, but it’s sometimes used with loudspeakers at lower power levels.
This type of connector comes in one- and two-conductor versions having only a tip, or a tip and a ring. As with two-conductor wire, the twoconductor phone plug and jack shown in Figure 4.4 are used for both stereo unbalanced and mono balanced signals. They’re also used for
unbalanced insert points in mixing consoles, to allow routing a mixer channel to outboard gear and back. In that case, the tip is generally used
for the channel’s output, and the ring is the return path back into the mixer. But you should check the owner’s manual for your mixer to be sure.
Figure 4.4:
The ¼-inch stereo phone plug (left) and jack (right) shown here are meant to be soldered to the end of a shielded wire. The plastic-coated paper sleeve at the left
prevents the plug’s soldered connections from touching the grounded metal outer sleeve.
Phone plugs are also available in 3.5 mm ( -inch) and 2.5 mm sizes for miniature applications. The -inch stereo plug in Figure 4.5 was
wired as a stereo adapter, with separate ¼-inch left and right channel phone plugs at the other end to connect an MP3 player to a professional
mixer. The smaller 2.5 mm type (not shown) is commonly used with cell phones and is also available as mono or stereo. In fact, some 2.5 mm
plugs and jacks have three hot conductors, with two for the left and right earphones, plus a third for a cell phone headset’s microphone.
Figure 4.5:
I wired up this
-inch stereo plug as an audio adapter to connect an MP3 player to my home studio system.
Another common audio connector is the RCA plug (left) and jack (right), shown in Figure 4.6. These connectors are also meant to be soldered
to wire ends. RCA connectors are mono only and are used mostly with consumer audio equipment because they’re not reliable enough for
professional use compared to better connector types. In fact, when the RCA connector was invented in the 1940s, it was meant for use only
inside televisions sets, so a technician could remove modules for servicing without having to unsolder anything. RCA connectors were never
intended for general purpose use! But they caught on and were adapted by the consumer audio industry anyway because they’re so inexpensive.
RCA connectors are also called phono connectors because they’re commonly used with phonograph turntables, as opposed to ¼-inch phone
connectors that are similar to the ¼-inch plugs and jacks used in early telephone systems.
Figure 4.6:
RCA plugs and jacks here are mono only, though some jacks include a switch that opens when a plug is inserted fully.
For completeness, panel-mounted ¼-inch and RCA jacks are shown in Figure 4.7. Note the switch contact on the ¼-inch jack at the right. This
is the thin, at blade with a small dimple that touches the curved tip contact. When a plug is inserted, the connection between the switch blade
and tip contact opens. Some RCA connectors have a similar switch that’s normally connected to the active tip contact but disconnects when a
plug is inserted.
Figure 4.7:
Panel-mounted jacks like these are commonly used for audio equipment. From left to right: stereo ¼-inch jack with switch contacts for each channel, dual RCA jacks for
left and right inputs or outputs, and ¼-inch mono jack with a switch.
Whoever rst thought to add a switch to a connector was a genius, because it opens up many possibilities. One common use is to disconnect a
loudspeaker automatically when an earphone is plugged in, as shown in the schematic diagram in Figure 4.8. This shows only a mono source
and speaker, but the same principle can be used with stereo jacks like the one at the left in Figure 4.7. That’s why that phone jack has ve
solder points: one for the common ground, one each for the active left and right channel conductors, plus one each for the left and right switch
Figure 4.8:
When nothing is plugged into the earphone jack, the ampli er’s output passes through the jack’s switch to the loudspeaker. But when an earphone is plugged in, the
switch opens, disconnecting the loudspeaker, and only the earphone receives the signal.
Another useful switch arrangement sends a mono signal to both stereo channels when plugging into only the left channel input jack. This is
common with audio mixers that have separate inputs for the left and right channels, as shown in Figure 4.9. Note that switches coupled to jacks
common with audio mixers that have separate inputs for the left and right channels, as shown in Figure 4.9. Note that switches coupled to jacks
are not limited to simple one-point contacts as shown here. One or more switches can be physically attached to a connector to engage several
unrelated circuits at once when a plug is inserted.
Figure 4.9:
When a plug is inserted into only the left channel input, the signal passes through the right channel jack’s switch into the mixer, sending the mono signal to both the left
and right inputs. But when a plug is inserted into the right channel, the left channel’s signal is interrupted, and the right channel then goes to the mixer’s right input.
Another clever arrangement uses a ¼-inch stereo phone jack to automatically turn on battery-powered electronics inside an electric guitar or
bass only when the signal wire is plugged in. Rather than require a separate power switch, this method instead uses the ring contact as a ground
return for the battery, as shown in Figure 4.10. A standard ¼-inch phone plug has a solid metal barrel, so the grounded barrel touches the ring
contact when it’s plugged in. As far as I know, I was the rst person to do this, back in the 1960s, when I designed and built fuzz tones and other
gadgets into my friends’ electric guitars. Today this is a common feature, not only for electric guitars but also for tuners and metronomes and
other devices that have a ¼-inch audio output phone jack.
Figure 4.10:
When a mono guitar cord is plugged in, the battery’s negative terminal is grounded, completing the battery’s power connection to the circuit.
Speaking of batteries, you can quickly test a 9-volt battery by touching both its terminals at once to your tongue. If the battery is fresh and fully
charged, you’ll get a mild shock that’s unpleasant but not painful. Do this once when you buy a new battery to learn how a new battery feels.
There are also ¼-inch plugs available with built-in switches. These are used with guitar cords, with the switch set to short out the audio until
the plug is fully inserted into the guitar. This avoids the loud hum that otherwise occurs when a live guitar cord is plugged or unplugged. In this
case, the switch is mounted in the outer metal barrel, activated by a small protruding plunger.
It’s worth mentioning two variants of the ¼-inch phone plugs and jacks that are used in some studio patch bays: the long frame, which is also
¼ inch, and the bantam TT (tiny telephone), which is 0.173 inch (4.4 mm) in diameter. Be aware that the tip of the ¼-inch-long frame plug is
smaller than the usual phone plug, so plugging a regular phone plug into a long frame jack can stretch the jack’s tip contact if left in for an
extended period of time. Do this only when needed in an emergency.
The last type of low-voltage audio connector we’ll consider is the XLR. In the early days of audio these were called Cannon connectors, named
for the company that invented them. Cannon called this type of connector their X series, and then later added a latch (the “L” part of the name)
so a plug won’t pull out by accident. Today, XLR connectors are produced by several manufacturers, such as those in Figure 4.11 that are made
by Neutrik. The most common type of XLR connector has three pins or sockets for plus, minus, and ground. But XLR connectors having four,
five, and even six contacts are also available. By convention, pin 2 is plus, pin 3 is minus, and pin 1 is ground.
Figure 4.11:
XLR connectors are commonly used for both microphone and line-level signals, and most have three pins for plus, minus, and ground.
The standard today for XLR connector wiring is EIA (Electronic Industries Alliance) RS-297-A, which de nes pin 2 as plus, or hot. But some
older gear treats pin 3 as plus and pin 2 as minus. With most gear it doesn’t really matter which pin is plus internally as long as the audio
arrives at the same numbered output pins with the same polarity. But it can matter with microphones, especially when di erent models are in
close proximity on a single source such as a drum set.
No discussion of XLR connectors would be complete without a mention of Neutrik’s fabulous combination XLR/phone jacks shown in Figure
4.12. Besides accepting standard 3-pin XLR plugs, these connectors also handle ¼-inch phone plugs. The Neutrik website lists 24 di erent
versions, with both mono and stereo phone jacks, and many varied switch combinations. The PreSonus FireBOX in Figure 4.12 uses these combo
connectors to accept either microphones or electric guitars and other passive instruments through the same input jacks. When a microphone is
plugged in, the preamp presents an appropriate input impedance and suitable gain range. But when a phone plug is inserted, switches built into
the connector change both the impedance and gain range to suit instruments with passive pickups.
Figure 4.12:
its front panel.
These Neutrik combo XLR connectors also accept ¼-inch phone plugs, making them ideal for small devices like this Presonus sound card that has little room to spare on
Finally we get to loudspeaker connectors. The banana jacks shown in Figure 4.13 have been a staple for many years because they can handle
enough current to pass hundreds of watts. When used with matching banana plugs, their sturdy design and sti spring contacts ensure a reliable
connection that won’t easily pull out by accident. But banana jacks also accept bare wire, which is common, and secure enough for home stereo
Figure 4.13:
Banana jacks (also called binding posts) are commonly used for both professional and consumer loudspeaker connections.
Photo courtesy of
More recently, the speakON connector shown in Figure 4.14, also developed by Neutrik, has become the standard for high-power professional
loudspeaker applications. SpeakON connectors can handle very large currents, and they feature a locking mechanism that’s reliable enough for
professional use.
Figure 4.14:
SpeakON connectors are ideal for connecting power amplifiers to loudspeakers in professional applications.
Photos courtesy of Neutrik (UK) Ltd.
Patch Panels
Patch panels—also called patch bays—are the center of every hardware-based recording studio. Originally developed for telephone systems,
where calls were routed manually by operators from one phone directly to another, today patch panels allow connecting outboard audio gear in
any conceivable combination. The basic premise is for every piece of audio equipment in the studio to have its input and output connected to a
patch panel that’s centrally located, usually in a nearby rack cabinet. This way, short patch cords can be used to connect the various devices,
rather than running long wires across the room to reach a distant equipment rack.
In the old days, patch panels were hardwired to each piece of equipment, which required a lot of soldering for a major installation. When I
built a large professional recording studio in the 1970s, it took several days to solder the wires to connect all the outboard gear, and console
inputs and outputs, to
ve 48-jack patch panels. Back then, pro audio gear often used screw-down terminal strips for input and output
connections, so each piece of equipment required soldering one end of a wire to the patch panel, and soldering terminals at the other end to
attach to each device’s terminal strip. Audio gear has come a long way since then!
Today, modern patch panels have matching pairs of jacks on the front and rear, so no soldering is required unless you want to wire your own
cables to the custom lengths needed behind the rack. A typical modern patch panel is shown in Figure 4.15. However, the downside is you have
to buy premade wires for every input and output of each device you want to connect. And with twice as many connections, there’s more chance
a loose plug will drop out or cause distortion.
Figure 4.15:
This Neutrik patch panel has 24 pairs of stereo ¼-inch phone jacks. Each pair can serve as a left and right stereo input or output, or as an input and output for one
balanced or unbalanced mono channel.
Patch panels are often wired in a con guration known as normaled, or optionally half-normaled. When jacks are normaled, that means they
have a default “normal” connection even when nothing is plugged in. For example, it’s common to patch a console’s Reverb Send to an equalizer
(EQ) before it goes on to the reverb unit, perhaps to reduce low frequencies that could muddy the sound or just for general tone shaping. This is
shown in Figure 4.16. If you want to also insert a compressor before or after the equalizer, you’ll simply patch that in manually. But nothing
needs to be patched to have only the EQ in the path.
Figure 4.16:
This shows normaled patch bay connections, where the output of a console goes to an outboard equalizer, then on to a hardware reverb unit. Plugging a patch cord into
any of the input or output jacks breaks the normaled connection. The ground wires are omitted for clarity.
As you can see, the console’s Reverb Send goes to the patch panel for ready access, and the switches for both the Send output and EQ input are
As you can see, the console’s Reverb Send goes to the patch panel for ready access, and the switches for both the Send output and EQ input are
connected. With nothing plugged into either jack, the Send goes to the equalizer. Likewise, the EQ’s output is normaled to the reverb unit’s
input. Plugging anything into any of the jacks interrupts the normaled connections, letting you freely create other patch arrangements. But what
if you want to route the Reverb Send to two places at the same time—say, to the EQ input and also somewhere else? This is the purpose of the
half-normaled switching shown in Figure 4.17.
Figure 4.17:
With a half-normaled connection, only the input jack switches are used.
With half-normaling, the normaled connection is broken only if you plug into a device’s input. Plugging into a device’s output leaves the
normaled path intact, so you can patch that output to another piece of gear at the same time, or perhaps back into a console input. When one
output is sent to more than one input, that’s called a mult, short for multiple destinations. When I owned a professional recording studio, I wired
several groups of four jacks together to be able to send any device to as many as three destinations. This type of mult does not use a jack’s builtin switches. It simply connects the active conductors of three or more jacks together, so you can send one output signal to multiple destinations
as needed using patch cords. You could even create line-level pads in a patch panel to reduce overly loud signals by wiring resistors to the jacks
instead of connecting them directly. Creating resistor pads is explained in Chapter 21.
The last wiring topic I’ll address is wire labels. These are simple strips of adhesive tape with preprinted numbers, as shown in Figure 4.18. In
a complex installation comprising dozens or even hundred of cables, it can be a nightmare to track down which wire goes where when
something stops working. By applying labels to every wire during installation, you always know what they’re connected to at both ends. You
should also create a printed list of each source and destination with its wire number. Another option is to use write-on labels instead of simple
numbers, using a permanent marker to write the name of the gear connected at the other end.
Figure 4.18:
Stick-on wire labels may be low tech, but they’re incredibly useful.
The main di erence between impedance and simple resistance is that impedance implies a frequency component known as reactance. Whereas
a resistor has the same resistance at all audio frequencies, the impedance of a capacitor, inductor, loudspeaker, or power ampli er’s output
varies with frequency. Sometimes this is just what you want, as when using capacitors to create the lters shown in Chapter 1. But just as often,
an impedance that changes with frequency is not useful, such as when it harms the frequency response. Note that the term impedance is often
used even when it doesn’t change with frequency. For example, the input impedance of a preamp that doesn’t have an input transformer is
simple resistance, having the same ohms value at all audio frequencies.
As explained earlier, signals from high-impedance outputs such as passive guitar pickups require low-capacitance wire to avoid highfrequency loss. A passive pickup can output only a small amount of current, and that can’t charge a wire’s inherent capacitance quickly enough
to convey high frequencies. Therefore, using long wires having high capacitance attenuates higher frequencies. Wire capacitance is cumulative, so
longer wires have higher capacitance. This is why guitar cords bene t from low capacitance and why it’s rare to nd premade cords longer than
about 20 feet. A piezo pickup has an extremely high output impedance, so it bene ts from even shorter wires. Preamps designed speci cally for
use with piezo pickups are common, and they’re typically placed close to the instrument. The preamp can then drive a much longer wire that
use with piezo pickups are common, and they’re typically placed close to the instrument. The preamp can then drive a much longer wire that
goes on to the amplifier.
Line-level inputs typically have an impedance of around 10 K (10,000 ohms), though some are as high as 100 K, while yet others are as low as
600 ohms. Inputs meant to accept passive guitar and bass pickup are usually 1 meg (1 million ohms) or higher. The need for a high input
impedance with guitar pickups is related to the need for low-capacitance wire, but it’s not exactly the same. Figure 4.19 shows the electrical
model of a typical guitar pickup, including its inherent series inductance. When combined with the ampli er’s input impedance, the inductance
rolls o
high frequencies at a rate of 6 dB per octave. Hopefully, the ampli er’s input impedance is high enough that the roll-o
starts past
20 KHz, but the wire’s capacitance adds a second reactive pole, so the rate is potentially 12 dB per octave. The dB per octave slope at which
high frequencies are reduced depends on the particular values of the pickup’s inductance, the wire’s capacitance, and the ampli er’s input
impedance. So using high-capacitance wire or too low an input impedance, or both, can roll off high frequencies.
Figure 4.19:
A passive guitar pickup has a large inherent inductance because it’s basically a coil of wire with many turns. This inductance is e ectively in series with the circuit, so it
interacts with the capacitance of the wire and with the input impedance of whatever device it’s plugged into.
Piezo pickups work best when driving even higher impedances, as well as low capacitance. A cable from a 150-ohm microphone can run to
100 feet or even longer without excessive loss at high frequencies, but a wire connected to a piezo pickup should be 10 feet or less. U nlike a
magnetic pickup that has an inherent series inductance, a piezo pickup acts more like a capacitor. So when driving too low an impedance, it’s
the low frequencies that are reduced. Between their high output impedance that loses high frequencies to wire capacitance but loses low
frequencies with a low-load impedance, piezo pickups tend to have an exaggerated midrange that often requires equalization to sound
Most modern audio gear has a very low output impedance and fairly high input impedance. Besides reducing the e ect of wire capacitance,
designing circuits with a low output impedance also allows one device output to feed multiple inputs. Since most audio gear has a high input
impedance, you can typically send one output to many inputs without a ecting the frequency response or increasing distortion. As mentioned
earlier, in the old days professional audio gear had input and output impedances of 600 ohms, based on telephone systems of the day. This
arrangement is known as impedance matching, and it was done to maximize power transfer between devices. The modern method where a low
output impedance drives a high input impedance is better because it reduces noise and potentially improves frequency response, as well as
letting one audio device feed several others without degradation.
Earlier I mentioned wiring a mult from a group of jacks in a patch panel. U sing a “Y” splitter to send one output to several inputs is useful
and common. But you must never use “Y” wiring to combine two outputs directly together. For that you need a mixer, or at least some resistors
to act as a passive mixer. If two low-impedance outputs are connected together, each output acts as a short circuit to the other. At the minimum
this will increase distortion, but with a circuit having inadequate protection, this can possibly damage its output stage. The actual output
impedance of most audio circuits is less than 1 ohm, but typically a small resistor (10 ohms to 1 K) is wired internally in series with the output
connector to protect against a short circuit or improper connection as described here.
The same thing happens if you plug a mono ¼-inch phone plug into a stereo output such as a headphone jack. This shorts out the right
channel connected to the ring contact, which may or may not strain the output circuit. Again, well-designed circuits include a small series resistor
for protection, but it’s possible for a short circuit on one channel to increase distortion on the other because most devices have a single power
supply for both channels. It’s the same when you connect a balanced output to an unbalanced input with mono phone plugs. If you use stereo
phone plugs having a tip and ring at both ends, the ring is simply unconnected at one end and no harm is done. But inserting a mono plug into
a balanced output will short out the negative output at the ring. This isn’t a problem with older audio gear that has an output transformer, and
in fact grounding the transformer’s negative output is needed to complete the circuit. But many modern devices uses two separate ampli ers to
drive the plus and minus outputs. This is also why batteries that power portable devices are never wired in parallel. U nless both batteries have
precisely the same voltage, which never happens, each battery shorts out the other battery, quickly draining both and potentially causing damage
or even an explosion.
The last impedance issue I’ll address is 70-volt loudspeaker systems. These have been around for decades and are used for very large
multispeaker installations, such as stadiums where wires between the power ampli ers and speakers can run to thousands of feet. As explained
earlier, when long wires are used to connect loudspeakers, the wires must be thick to avoid voltage loss through the wire’s resistance. The
formula for this is very simple, where the asterisk (*) represents multiplication:
In this case, volts is the loss when some amount of current (amperes)
ows through some amount of resistance (ohms). With long speaker
runs, the resistance can’t easily be reduced because that requires using very thick wires, which can get expensive. So 70-volt speaker systems
instead use transformers to reduce the amount of current needed to deliver however much power is required to drive the speakers to an
adequate volume. The formula for power is equally simple:
As you can see in this formula, a given amount of power can be provided either as a large voltage with a small current, or vice versa. So if the
system designer determines that each loudspeaker needs 50 watts to be heard properly over the roar of a cheering crowd, that number of watts
can be achieved by sending 20 volts to a typical 8-ohm speaker that in turn draws about 2.5 amps. But using a 70-volt system as shown in Figure
4.20 reduces the current needed for the same amount of power to only 0.7 amps. When all else is equal, the cost of wire is related to the amount
of copper needed to achieve a given thickness, or gauge. So a 70-volt system can yield a substantial savings, even when you factor in the cost of
transformers, which are typically inexpensive units meant only for voice frequencies.
Figure 4.20:
A 70-volt speaker system can drive very long wires with little loss because the wire’s resistance is low compared to the amount of current each loudspeaker draws.
By the way, this same principle is used to send AC power over long distances. Sending 120 or 240 volts over hundreds of miles would require
extremely thick wires, so for long distances, AC power is sent at 110,000 volts or even higher. Once the power gets from the generator to your
town, it passes through transformers that reduce it to a few thousand volts as it travels over utility poles. Then it’s reduced again by smaller
transformers mounted high up on the poles before entering your home. Large AC power transformers are very expensive, but not as expensive as
hundreds of miles of wire thick enough to pass the high current required for lower voltages.
Finally, since my goal for every chapter is to bust at least one audio myth, I’ll address the value of using gold in audio connectors. Gold is an
excellent conductor, and it’s commonly used for circuit board edge connectors and other critical connections. But gold is very expensive, so it’s
always applied as a thin plating onto another more common metal. Although gold has a fairly low resistance, its real value for electronics is that
it doesn’t tarnish over time. If you connect a gold-plated plug to a gold-plated socket, the connection will be solid and reliable for many years.
Lesser materials that tarnish cannot only become intermittent, but the connection points can potentially become diodes or batteries due to
oxidization, which in turn creates distortion. Low-level audio signals such as from phono cartridges and analog tape playback heads are
especially prone to distortion caused by oxidized connections.
So using gold for audio connectors is a Good Thing. However, gold is most bene cial when used for both connectors! If you buy expensive
wires with gold-plated RCA connectors, plugging them into an audio device that has tin or nickel connectors loses any advantage of gold.
Further, gold plating is very thin—typically measured in microns (1 millionth of a meter, or around 39 millionths of an inch)—so repeatedly
plugging and unplugging gold connectors can eventually wear o the plating to expose whatever metal is underneath. Many other less expensive
materials are used to create highly reliable connections, including nickel, brass, and bronze, and these work just as well as gold in most audio
applications. Indeed, fancy RCA connectors made from exotic metals are sometimes called “audio jewelry” because they look nice, even if they
have little practical value.
This chapter describes several passive audio sources such as microphones, guitar pickups, and analog tape heads, along with their typical voltage
levels. Passive microphones and guitar pickups output very small voltages, requiring a preampli er to feed devices having line-level inputs.
Where most consumer audio equipment uses a nominal level of −10 dBV, professional audio gear operates at a level of +4 dBm. Besides using
a higher signal level to overcome noise and hum, most pro audio gear can also drive low-impedance inputs and longer cable lengths without
increasing distortion and cross-talk, or losing high frequencies.
You also learned that most audio wiring uses shielded cables to reduce airborne hum and radio interference. U sing balanced wiring, with two
conductors for each channel, further reduces hum and radio interference, as well as avoiding hum caused by ground loops. Although wire having
conductors for each channel, further reduces hum and radio interference, as well as avoiding hum caused by ground loops. Although wire having
too much capacitance can roll o high frequencies, this is mostly a problem when long runs are used with passive guitar pickups that have a
high output impedance, because guitar pickups don’t provide enough current to charge the wire’s inherent capacitance quickly enough.
This chapter further explains the various types of connectors used for audio and presents several clever techniques that use the switches built
into phone jacks. Further, switches built into some patch panels allow normaled and half-normaled connections, which avoids having to create
frequently used connections manually every time with patch cords.
Finally, this chapter explains the basics of impedance. High-impedance magnetic and piezo pickups work best when driving high-impedance
inputs to maintain a at frequency response. Further, most audio gear is designed with a low output impedance and high input impedance,
allowing one device output to feed several inputs at once. However, when a system needs to send high-powered audio (or AC power) over long
distances, a high impedance is better because larger voltages require less current to transmit the same amount of power.
Part 2
Analog and Digital Recording, Processing, and Methods
This section explains how audio hardware and software devices work, and how to use them. Audio systems are exactly analogous to plumbing,
because they both have inputs and outputs, with something owing through them. The plumbing in a house works in much the same way as
audio passing through a mixing console. Where a mixer has a preamp gain trimmer, pan pot, and master volume, the pipes under your sink
have a safety shuto valve in series with the sink’s faucet valve. Sliding the faucet handle left and right “pans” the balance between full hot and
full cold, or any setting in between. The concept of audio as plumbing applies equally to both analog and digital devices. Of course, a mixing
console o ers many more ways to route and alter sound, versus plumbing, which can only heat the water and maybe add a lter to remove
To continue the analogy, voltage is like water pressure, and current is the same as water ow expressed as gallons per minute. The voltage at
an AC power outlet is always present, but it’s not consumed until you plug something in and turn it on. The pressure at a water valve behaves
the same, passing nothing when it’s shut o . Then when you open the valve, pressure drives the water down a pipe, just as voltage propels
current down a wire. If you can see how the plumbing in a large apartment building is a collection of many identical smaller systems, you’ll
understand that a complex multichannel audio mixing console is configured similarly from many individual channels.
Over the years I’ve noticed that the best mix engineers also take on creative roles, including that of a producer and sometimes even musical
arranger. As you’ll note in the chapters that follow, mixing music is often about sound design: The same tools and processes are used both for
solving speci c audio problems and for artistic e ect to make recordings sound “better” than reality. So while the main intent of this book is not
so much to teach production techniques, I’ll explain in detail how the various audio processing tools work, how they’re used, and why.
Many of the examples in this section refer to software processors and devices, but the concepts apply equally to analog hardware. Most audio
software is modeled after analog hardware anyway, although some software does things that would be very complicated to implement as analog
hardware. For example, it’s very di cult if not impossible to create after-the-fact noise reduction, linear phase equalizers, and 96 dB per octave
lters in analog hardware. Further, a lot of audio hardware these days is really a computer running digital audio software, such as hardware
reverb units that are computer-based and do all their processing digitally.
Chapter 5
Mixers, Buses, Routing, and Summing
Large mixers are exactly the same as small ones, but with more channels and buses, so at rst they can seem complicated. Let’s start by
examining a single channel from a hypothetical mixing console. Although they’re called mixing consoles, most also include mic preamps and
signal routing appropriate for recording. Figure 5.1 shows the front panel layout of a typical channel, with its corresponding block diagram
showing the signal flow presented in Figure 5.2.
Figure 5.1:
This shows the front panel of a typical mixer channel, with knobs and switches for preamp gain, equalizers, auxiliary sends, and the master fader.
Figure 5.2:
This block diagram shows the signal flow within the mixer channel in Figure 5.1.
Figure 5.2:
This block diagram shows the signal flow within the mixer channel in Figure 5.1.
A block diagram such as Figure 5.2 shows the general signal
ow through an audio device. This is di erent from a schematic diagram that
shows component-level detail including resistors and capacitors, though the switches and variable resistors are as they’d appear in a schematic.
In this diagram, arrows show the direction of signal ow. The 48-volt phantom power supply at the upper right in Figure 5.2 isn’t part of the
audio signal path, but it provides power to the microphone connected to that channel when engaged. Most large consoles include individual
switches to turn phantom powering on and o
for each channel, though many smaller mixers use a single switch to send power to all of the
preamps at once. Phantom power is explained in more detail in Chapter 15.
A microphone is plugged into the XLR input jack on the rear of the console, which then goes to the preamp. The Trim control adjusts the
amount of gain, or ampli cation, the preamp provides. Most preamps can accommodate a wide range of input levels, from soft sounds picked
up by a low-output microphone, through loud sounds captured by a condenser microphone that has a high output level. The gain of a typical
preamp is adjustable over a range of 10 dB to 60 dB, though some microphones such as passive ribbons output very small signals, so preamps
may offer 70 dB of gain or even more to accommodate those mics.
Letting users control the preamp’s gain over a wide range minimizes noise and distortion. The preamps in some recording consoles from years
past couldn’t accept very high input levels without distorting, so engineers would sometimes add a pad between the microphone and preamp.
This is a simple barrel-shaped device with XLR connectors at each end, plus a few resistors inside, to reduce the level from a source such as a
loud kick drum when the microphone is placed very close. These days many preamps have a minimum gain low enough to accept very loud
mic- or even line-level signals without distorting, or they include a pad that can be engaged when needed. Further, many condenser microphones
include a built-in pad, as shown in Figure 5.3, to avoid overloading their own built-in preamps. So you can use that pad when recording very
loud sources.
Figure 5.3:
This audio-technica 4033 microphone has two switches—one for a 10 dB pad and another for a low-cut filter.
Continuing on with Figure 5.2, the signal from the preamp goes through a 100 Hz high-pass lter that can be engaged when needed, though I
prefer to think of this as a low-cut lter. Either way, reducing low frequencies is very common, so most consoles (and many outboard preamps)
routinely include such lters. As you can see in Figure 5.3, it’s common for microphones to include a low-cut lter, too. This type of lter is not
as exible as an equalizer because it has a single xed frequency and a xed roll-o slope that’s often only 6 dB per octave. If you need a
di erent frequency or slope, you’ll have to use an equalizer. But still, simple switched lters like these are often adequate to reduce unwanted
rumble or other low-frequency content.
After the low-cut lter, the signal passes through an equalizer. The three-band EQ shown in this hypothetical console channel is typical. While
not as exible or “surgical” as a fully parametric equalizer, it’s usually adequate for massaging audio enough to capture a decent recording. The
EQ shown here has fixed frequencies for the low and high shelving controls, with a midrange frequency that can be swept over a wide range.
Solo, Mute, and Channel Routing
Next in line are the Solo and Mute switches. The Mute switch is normally closed, so pressing the button on the front panel opens the circuit,
which mutes the channel. The Solo switch also mutes the mixer channels, but it’s part of a larger network of switches. When engaged, a Solo
switch mutes all of the channels other than the current channel. So if you think you hear a rattle on the snare drum, for example, solo’ing the
snare mic lets you hear only that channel to verify. A solo system is actually quite complex because all of the switches are linked together
electronically. Some solo systems disable all Aux sends when activated, though I prefer to hear whatever reverb or echo is active on that track.
More elaborate consoles let you solo either way.
Following the Mute switch is the Master Fader, or master volume control for the channel. This is typically a slide control rather than a round
knob. A slider lets you make volume changes more precisely—the longer the fader, the easier it is to make ne adjustments. Volume sliders also
let you control several adjacent channels at once using di erent ngers on each hand. After the Master Fader, the signal goes to the pan pot,
let you control several adjacent channels at once using di erent
ngers on each hand. After the Master Fader, the signal goes to the pan pot,
which sends the single channel to both the left and right outputs in any proportion from full left to full right. By the way, the “pan” in pan pot
stands for panorama, so the full term (which nobody uses) is panorama potentiometer.
It’s worth mentioning that there are two types of large format consoles: those meant for live sound where all of the inputs are sent to left and
right main output channels and those meant for multitrack recording. The console shown in Figures 5.1 and 5.2 is the simpler live sound type. A
mixing console meant for multitrack recording will have a pan pot on every channel and a bank of switches to let the engineer send that channel
to any single track or stereo pair of tracks on a multitrack recorder. A typical Track Assign switch matrix is shown in Figure 5.4.
Figure 5.4:
This switch matrix lets you send a single channel to any one output or to a pair of outputs for recording the channel as part of a stereo mix. The dashed line indicates that
both switch sections are physically connected to change together. For clarity, only eight outputs are shown here, but much larger switch matrices are common, such as 16, 24, or even 32
In Figure 5.4, the left and right channels can be assigned to outputs 1 and 2, or 3 and 4, or 5 and 6, and so forth, as stereo pairs. There are
many such arrangements, often using banks of push buttons, but this simpli ed drawing shows the general idea. When using a console with a
multitrack recorder having only 16 or 24 tracks, you may need to record more microphones at once than the number of available tracks. It’s not
uncommon to use eight microphones or more just for a drum set. In that case you’ll premix some of the microphones to fewer tracks when
recording. One common setup records the snare and kick drum microphones to separate tracks, with all of the other drum mics mixed together
in stereo. When a group of microphones is premixed to stereo, the result sent to the recorder is called a sub-mix. Of course, when microphones
are mixed together while recording, there’s no chance to change the balance between them later during mix-down, nor can you change the EQ of
some mics without a ecting all of the others. Other e ects such as reverb or compression must also be applied in the same amount to the entire
Buses and Routing
The last section in Figure 5.2 is three Aux Send groups, each having a Pre/Post switch. An Aux output—short for auxiliary output—is an
alternate parallel output that’s active at the same time as the main output. It’s typically used to send some amount of that channel to a reverb or
echo unit, which then is added back into the main left and right outputs during mixdown. Most audio e ects, such as equalizers and
compressors, are patched in series with a channel’s signal. It makes little sense to mix a bass or guitar track with an equalized version of itself, as
that just dilutes the effect and potentially causes unwanted comb filtering due to slight delays as audio passes through the devices. But reverb and
echo add new content to a track, so both the original sound and its reverb are typically present together. Chapter 7 explores further the
difference between inserting effects onto a track versus adding them to a bus.
Most of the time, when you add reverb to a channel, you’ll want the relative amount of reverb to remain constant as you raise or lower the
volume for that channel. This is the purpose of the Pre/Post switch. Here, Pre and Post refer to before and after the channel’s master volume
control. When set to Post, the signal sent to the reverb unit follows the volume setting for that channel. So when you make the acoustic guitar
louder, its reverb gets louder, too. But you may want the amount of reverb to stay the same even as the main volume changes—for example, if
you want to make an instrument seem to fade into the distance. As you lower the volume, the instrument sounds farther and farther away
because the main signal gets softer, and eventually all that remains is its reverb. You can hear this e ect at the end of Pleasant Valley Sunday by
The Monkees from 1967 as the entire song fades away, leaving only reverb.
Another important use for Aux sends, with the Pre switch setting, is to create a monitor mix for the musicians to hear through their earphones
while recording or adding overdubs. If you create a decent mix in the control room, where all of the instruments can be heard clearly, that same
mix is often adequate for the musicians to hear while recording. But sometimes a drummer wants to hear less of himself and more of the bass
player, or vice versa. In that case, an Aux bus can be con gured as an entirely separate mix, where the Aux Send level of each track is unrelated
to the channel’s master volume control. You’ll set each channel’s Aux Send #1 to approximately the same volume as the main channel volume,
but with more or less drums or bass as requested by the performers. A exible console will let you route any Aux bus output to the monitor
speakers, so you can hear the mix you’re creating.
The mixing console used for these examples has three separate Aux buses, so up to three di erent sets of parallel mixes can be created. Or you
could use one Aux bus for a reverb unit whose output goes to the main mix you hear in the control room, with the other two Aux buses set up
as monitor mixes. Sophisticated mixing consoles allow you to con gure many di erent groupings of channels and buses. Although it’s not shown
in these simpli ed block diagrams, when an Aux bus is used for reverb, the output of the reverb unit usually comes back into the console
in these simpli ed block diagrams, when an Aux bus is used for reverb, the output of the reverb unit usually comes back into the console
through the same-numbered Aux Return input on the rear of the console. An Aux Return always includes a knob to control the incoming volume
level, and most also include a pan pot. However, when used to create a monitor mix, the output of an Aux Bus goes to the power ampli er that
drives the studio headphones, rather than to a reverb or echo unit. In that case the corresponding Aux Return is not used, or it could be used as
an extra general purpose stereo input.
Most large-format mixers have only mono input channels, but some have stereo inputs that use a single volume fader to control the left and
right channels together. In that case, the pan pots are con gured to handle stereo sources. Many consoles have both mono and stereo buses, or
stereo buses that can be used with mono sources. For example, reverb buses are often set up with a mono send and stereo return. Many reverb
units create a stereo effect, generating different left and right side reflection patterns from a mono input source.
Console Automation
Another feature common to sophisticated mixing consoles is automation and scene recall. A complex mix often requires many fader moves
during the course of a tune. For example, it’s common to “ride” the lead vocal track to be sure every word is clearly heard. Or parts of a guitar
solo may end up a bit too loud, requiring the mix engineer to lower the volume for just a few notes here and there. There are many schemes to
incorporate automation into mixing consoles. One popular method uses faders that have small motors inside, as well as sensors that know where
the fader is currently positioned. Another method uses a voltage controlled ampli er (VCA) rather than a traditional passive slider with an
added motor. In that case, a pair of LED lights indicates if the fader’s current physical position is louder or softer than its actual volume as set by
the VCA.
Scene recall is similar, letting you set up several elaborate mixes and routing schemes, then recall them exactly with a single button push.
Modern digital consoles used for live sound and TV production often include this feature. For example, late-night talk shows often have a house
band plus a musical guest act with its own instruments. During rehearsal the mix engineer can store one scene for the house band and another
for the guest band. It’s then simple to switch all of the microphones, and their console settings, from one setup to the other.
When combined with a built-in computer, fader moves that you make while mixing can be recorded by the system, then later replayed
automatically. So on one pass through the mix you might ride the vocal level, and then on subsequent playbacks the fader will recreate those
fader changes. Next you’ll manually control the volume for the guitar track, and so forth until the mix is complete. If the faders contain motors,
the sliders will move as the console’s computer replays the automation data. As you can imagine, volume faders that contain motors and
position sensors are much more expensive than simple passive faders that control the volume.
In the 1970s, I built my own 16-track console, shown in Figure 5.5. This was a huge project that took two years to design and another nine
months to build. Back then console automation was new, and rare, and very expensive. So I came up with a clever arrangement that was almost
as good as real automation but cost only a few dollars per channel. Rather than try to automate a volume knob with motors, I used two volume
controls for each channel, with a switch that selected one or the other. So if the guitar volume needed to be raised during a solo, then lowered
again after, I’d set normal and loud levels for each volume control. Then during mixing I’d simply ip the switch when needed, rather than hunt
to find the right levels, or mark the fader with grease pencil to identify both positions.
Figure 5.5:
Ethan Winer built this 16-channel console in the 1970s for a professional recording studio he designed and ran in East Norwalk, Connecticut. Besides this console, Ethan
also designed and built much of the outboard gear you see in the rack.
One nal point about mixing consoles is that they’re set up di erently when recording versus when mixing. Some consoles contain two
entirely di erent sections, with one section dedicated to each purpose. That’s the design I used for my console in Figure 5.5. But most modern
consoles can be switched between recording and mixing modes, a concept developed by David Harrison of Harrison Consoles, and popularized
by Grover “Jeep” Harned who developed his company’s MCI consoles in the early 1970s. When a single 24-channel console can handle 24
microphone inputs, and also 24 tape recorder outputs, it can be half the width of a console that uses separate sections for each purpose.
When recording, each input comes from a microphone, or optionally a direct feed from an electric bass or electronic keyboard. The channel
outputs are then routed to any combination of recorder tracks, depending on the console’s design. Once recording is complete, the console is
switched to mixdown mode. At that point all of the channel inputs come from each recorder track, and the outputs (typically plain stereo) are
panned left and right as they’re sent on to the power ampli er and monitor speakers. Of course, inputs can be also switched individually to
panned left and right as they’re sent on to the power ampli er and monitor speakers. Of course, inputs can be also switched individually to
record mode for recording overdubs.
Other Console Features
All recording consoles contain microphone preamps, basic routing, and Aux buses, and many also include a polarity switch for each input
channel. But some include a fully parametric EQ on each input channel and Aux bus, which is more
exible than the simple three-band type
shown in Figure 5.1. Some high-end mixing consoles even include a compressor on every channel, and optionally on the Aux buses as well.
Many large-format consoles also include built-in patch panels that connect key input and output points in the mixer to outboard gear in racks.
This is more convenient than having the patch panels in a rack off to the side or behind you.
Almost all consoles include some sort of metering for every channel and bus, and these need to be calibrated to read the same as the VU
meters on the connected recording device. This way you can look at the meters in front of you on the console, rather than have to look over to
the analog or digital recorder’s meters when setting record levels.
Most larger consoles include a built-in microphone and talk-back system for communicating with performers. U nlike home studios that
usually have only one room, larger studios have a control room where the engineer works, plus a separate acoustically isolated live room where
the musicians perform. The recording engineer can easily hear the musicians talk, because one or more microphones are active in the live room.
But the performers can’t hear the engineer unless there’s a microphone in the control room and a loudspeaker out in the live room. A talk-back
system adds a button on the console that does two things when pressed: It engages a microphone in the control room (often built into the
console), sending it out to loudspeakers in the live room or the earphone monitor mix, and it lowers or mutes the control room monitors to
avoid a feedback loop.
Some consoles also include a stereo phase meter to monitor mono compatibility to ensure that important mix elements remain audible if a
stereo mix is reduced to mono. Most people listen to music in stereo, but mono compatibility is still important for music that will be heard over
AM radio, a mono television, or “music on hold” through a telephone. Figure 5.6 shows a typical software phase correlation meter. With this
type of display, the shape of the graph pattern indicates mono compatibility. A purely mono source, where both the left and right channels are
identical, displays as a vertical pattern similar to what’s shown here. The pattern tilts left or right when one channel is louder than the other. But
if the pattern expands horizontally, that indicates some of the content common to both channels is out of phase. Such content will therefore
become softer or even silent if the stereo channels are summed to mono.
Figure 5.6:
A phase correlation meter lets you assess mono compatibility for stereo program material.
Some consoles include 5.1 surround panning with six or more separate outputs, and this will be described shortly. Finally, some modern
digital mixers also serve as a computer “sound card,” connecting to a computer via FireWire or U SB. This is very convenient for folks who record
to a computer because it combines both elements into a single hardware unit.
Digital Audio Workstation Software and Mixing
Modern audio production software serves as both a multitrack recorder and mixer, modeled to include both functions, and many software
versions are even more sophisticated. Software is certainly more a ordable than hardware for a given number of channels and other features.
Most modern DAW (digital audio workstation) software includes not only a sophisticated mixing console and multitrack recorder, but also a
complete set of e ects, including EQ, reverb, compressors, and more. Many DAWs include software instruments such as synthesizers, electronic
drum machines with built-in patterns, and MIDI sample players. Some people consider “DAW” to mean a computer that runs audio software,
while others use the term more loosely to mean the entire system, including software, or just the software itself. To my way of thinking, a
workstation is the complete system of hardware and software, though it’s equally proper to consider DAW software and DAW computers
workstation is the complete system of hardware and software, though it’s equally proper to consider DAW software and DAW computers
Figure 5.7 shows a owchart for one channel in Cakewalk’s SONAR, the DAW software I use. As you can see, the signal
ow is basically the
same as a large format mixing console with full automation. It includes faders for Input Trim, Channel Volume and Pan, as many Pre- or
Postfader Aux Sends and Returns as you’ll ever need, plus an unlimited number of plug-in e ects. Besides automation for all volume and pan
levels, SONAR also lets you automate every parameter of every track and bus plug-in e ect. So you can turn a reverb on or o
at any point in
the tune, change EQ frequencies and boost/cut amounts, a anger e ect’s sweep rate, and anything else for which a control knob is available on
the plug-in. Further, the playback meters can be set pre- or post-fader, with a wide range of dB scales and response times. The meters can also
display RMS and peak levels at the same time and optionally hold the peaks as described in Chapter 1. Other DAW programs from other
vendors provide similar features.
Figure 5.7:
This owchart shows the internal structure of one channel in Cakewalk’s SONAR program. As you can see, modern DAW software is functionally equivalent to a complete
recording studio.
As you can see, SONAR follows the inline console model described earlier, where most of the controls for each channel are switched between
recording and mix-down mode as required. When recording, the input comes from either a physical hardware input—a computer sound card or
outboard A/D converter—or from a software synthesizer controlled by a MIDI track. This is shown at the upper right of Figure 5.7. In practice,
it’s not usually necessary to record the output from a software synthesizer because it creates its sounds in real time when you press Play. In other
words, when a synthesizer is inserted onto an audio track, that track plays the synthesizer’s audio output.
With regular audio tracks, the source is a Wave le on the hard drive. A Wave le on a track in a DAW program is often called a clip, and it
might contain only a small portion of a larger audio file. Regardless of the source, after passing through all of the volume, pan, and effects stages,
and the Aux sends, the output of each channel can be routed to either a stereo or 5.1 surround output bus for playback. The output bus then
goes to a physical hardware output, which is either a computer sound card or outboard D/A converter.
As with hardware mixers, tracks and channels in modern DAW programs can be either mono or stereo, and all relevant volume and pan
controls behave appropriately for either automatically. In SONAR, if you select a single sound card input as the record source, the track is
automatically set to mono and records a mono Wave le. If you select a stereo pair of inputs, recording will be in stereo. Newly created tracks
default to stereo, but if you import a mono Wave le to that track, it switches to mono automatically. Likewise, importing a stereo Wave le
into a mono track automatically changes the track to stereo.
The Pan Law
One nal but important aspect of hardware and software mixers is called the pan law or pan rule. This de nes how the volume level of a mono
source changes when panned from left to right, through center, in a stereo mix. As you will see, this is a surprisingly complex topic for what
might seem like a simple process. Technically, pan law is the premise that says in order to keep the perceived volume constant, the volume
must be reduced by some amount when the pan pot is centered. Pan rule is the specific implementation applied in a mixer or DAW program.
When a mono source is panned fully left or fully right, it emits from one speaker at a level determined by the volume control. But when that
same source is panned to the middle of the stereo eld by centering the pan pot, it plays through both the left and right speakers. So now the
sound is twice as loud. Obviously, it’s a nuisance to have to adjust the volume every time you change the panning for a track, so hardware and
software mixers automatically lower the volume a little when the pan pot is centered. For positions other than full left or full right, a “curve” is
applied to the in-between settings to keep the volume constant no matter where the pan pot is set.
The problem is deciding how much to reduce the volume when the pan pot is centered. Some consoles and DAW programs lower the volume
by 3 dB for each side when centered, since that sends half the acoustic power through each loudspeaker. So far so good. But if a stereo mix is
summed to mono electrically, as happens with AM radios and other mono playback devices, any instruments and vocals that are panned to the
center become twice as loud. That’s not so good! If the pan rule instead reduces the level of centered sounds by 6 dB, they may seem soft when
played in the control room, but at least the level will not change when summed to mono. Many DAW programs such as SONAR let you choose
from several pan rules because there’s no one best setting. Letting users change the pan rule is more di cult with hardware mixers, so some
console manufacturers split the difference, reducing the level by 4.5 dB when centered.
But wait, there’s more. Believe it or not, the ideal pan rule also depends on the quality of your monitoring environment. When you play a
mono source in a room with no acoustic treatment, you hear a combination of the direct sound from both speakers, plus re ections from nearby
room surfaces. Both speakers play the same sound, so their sum is coherent and plays 6 dB louder than just one speaker. But the re ections are
probably not coherent due to asymmetry in the room and other factors. So when a mono sound common to both the left and right speakers sums
acoustically in the air, the combined SPL also depends on the strength and coherence of the room reflections.
When the listener is in a re ection-free zone (RFZ), a pan rule of −6 dB does not reduce the volume for centered sounds as much as when a
room has untamed re ections. This is similar to the di erence between doubling the level of noise versus music, explained in Chapter 2 under
“The Stacking Myth.” That section showed that music rises by 6 dB when doubled, but noise rises only 3 dB. The same thing happens with
reflections in a room. The reflections you hear are likely different on the left and right sides, but the music played by both speakers is the same.
I suspect that pan rule implementation could account for reports of perceived quality di erences between various DAW programs. If you
import a group of Wave les into a DAW that uses one pan rule, then import the same series of les to another DAW that uses a di erent pan
rule, the mixes will sound different even if all the track volumes and pans are set exactly the same in both programs.
Connecting a Digital Audio Workstation to a Mixer
Modern DAW software contains a complete mixing console with full automation of every parameter, so project studios can usually get by with
only a compact “utility” mixer, or even no hardware mixer at all. Most compact mixers o er microphone preamps for recording and a master
volume control for the monitor speakers during playback and mixing. Many studios have a few other audio devices around, such as a synthesizer
or maybe a set of electronic drums, and those can also be connected to the same small mixer.
One of the most frequent questions I see in audio forums asks how to connect a mixer to a computer-based recording setup to be able to
record basic tracks first, then add overdubs later. Many people have a small mixer such as a Mackie or similar. The Mackie manual shows several
setups for combining musical instruments and other sources to play them through loudspeakers. But they ignore what may be the most common
setup of all: connecting the mixer to a computer DAW. I’ll use the Mackie 1402-VLZ3 for the examples that follow, but the concept applies to
any mixer that provides insert points for each preamp output.
With most DAW setups, it’s best to record each instrument and microphone on a separate track. This gives the most exibility when mixing,
letting you change the volume and equalization separately for each sound source, add more or less reverb to just that instrument, and so forth. I
prefer doing all the mixing within the DAW program using the software’s volume, pan, and plug-in e ects, rather than sending individual tracks
back out to a hardware mixer. Mixing in a DAW is more exible because it allows the mix to be automated and recalled exactly. This also
ensures that when you export the nal mix to a stereo Wave le, it will sound exactly the same as what you heard while mixing, without regard
to the hardware mixer’s settings. Rendering a mix from software directly to a Wave le also usually goes faster than playing a mix in real time
while recording to tape or to another Wave file.
Inputs and Outputs
Most small mixers contain two independent sections: an input section and a mixer section. The input section contains the XLR and ¼-inch input
connectors, plus the preamps that raise the input signals to line level suitable for recording. The mixer section then combines all of the
preampli ed inputs into one stereo mix that can be played through loudspeakers or headphones. It’s important to understand the di erence
between the input channels, whose outputs are available independently, and the combined stereo mix. You’ll record each microphone or
instrument through the mixer’s preamps to a separate input of your sound card, and you also need to monitor those same inputs to hear yourself
through earphones as you play or sing. Further, when adding tracks as overdubs to a song in progress, you need to hear the tracks that were
already recorded as well.
The rst step is to route each microphone or instrument from the mixer to a separate input on the sound card. If your sound card has only one
stereo input, you can record only one stereo source at a time, or optionally two mono sources. (Of course, you can overdub any number of
additional mono or stereo tracks later.) A multichannel interface lets you simultaneously record as many separate tracks as the interface has
inputs. However, it’s usually best to play all of the tracks through one stereo output, even if your interface o ers more outputs. This way, your
mix is controlled entirely by the settings in your DAW program, independent of the volume knobs on the hardware mixer.
Figure 5.8 shows the input section of a Mackie 1402-VLZ3 mixer. You can connect either XLR microphone or ¼-inch instrument cables, which
go through the mixer’s preamps and then on to the mixer section that combines all of the inputs for monitoring. The channel inserts section is on
the mixer’s rear panel, as shown in Figure 5.9. That’s where you’ll connect each of the mixer’s preamp outputs to one input of your sound card.
Figure 5.8:
The Mackie 1402-VLZ3 mixer has six inputs that accept either XLR microphones or ¼-inch phone plugs.
Drawing courtesy of Mackie.
Figure 5.9:
The channel inserts on the mixer’s rear panel let you send its preamp outputs to a sound card or external interface, without interrupting monitoring within the mixer.
Drawing courtesy of Mackie.
The original purpose of an insert point was to insert an external hardware e ects unit, such as a compressor, into one channel of the mixer.
With a DAW, you can do that with plug-ins later. But the insert point can also serve as a mult splitter to send the preampli ed microphone out
to the sound card, while also sending it to the rest of the mixer so you can hear yourself while recording. When an output is taken from an insert
point this way, it’s called a direct out, because the signal comes directly from the preamp’s output, before passing through the mixer’s volume
and tone control circuits.
With nothing plugged into an insert jack, the output of the preamp goes through the channel volume, pan, and EQ controls on the front panel
and is then combined in the mixer section with all of the other channels. When you insert a ¼-inch plug only partway into the jack, the output
of the preamp is sent to the inserted plug and also continues on to the rest of the mixer. This jack switch arrangement is shown in Figure 5.10,
and it’s the key to recording each input to a separate track.
Figure 5.10:
The mixer’s rear panel insert point lets you tap into a channel to send its preamp output to a sound card, without interrupting the signal flow within the mixer.
U nfortunately, these jacks are sometimes too loose to make a reliable connection when the plug is not fully seated. Or the jack switch may
wear over time, so you have to wiggle the plug occasionally to get the signal back. The best solution is Radio Shack’s adapter 274–1520. This
inexpensive gadget can be inserted fully into the mixer’s insert jack, yet it retains the connection needed from the preamp to the rest of the
mixer. You may not
nd this adapter in a Radio Shack store because it’s specialized, but you can order it online at by
entering the part number in the Search eld. Of course, you could wire up your own custom insert cables using ¼-inch stereo phone plugs that
can be inserted fully, with the tip and ring connected.
Since the preamp output still goes to the mixer section, you can control how loudly you hear that source through your speakers with that
channel’s volume control. And since each preamp output goes directly to the sound card, the recording for that track contains only that one
channel. Had you recorded from the mixer’s main stereo output, the new recording would include all of the tracks already recorded, as well as
any live inputs being recorded.
Figure 5.11 shows how each channel’s insert point output goes to one input of the sound card or external interface. The sound card’s main
stereo output then goes to one of the mixer’s stereo inputs so you can hear your DAW program’s playback. With this arrangement you control
the volume you hear for each track being recorded with its channel volume slider and set the playback volume of everything else using the
stereo channel’s slider. In this example, mixer channels 1 through 6 control how loudly you hear each source being recorded, and stereo input
pair 13 and 14 controls how loudly you hear the tracks that were already recorded.
Figure 5.11:
Each channel insert goes to one input on the sound card, and the sound card’s output comes back into the mixer on a stereo channel for monitoring.
Drawing courtesy of Mackie.
Setting Record Levels
Each of the mixer’s six mic/line mono input channels has two volume controls: the preamp gain knob (also called Trim) and the channel’s main
volume slider. Both a ect the volume you hear through the loudspeakers, but only the preamp gain changes the level sent to your sound card.
Therefore, when recording you’ll rst set the preamp gain control for a suitable recording level in your software, then adjust the channel output
level for a comfortable volume through your loudspeakers or headphones. Since the channel volume and equalizer are not in the path from
preamp to sound card, you can freely change them without affecting the recording.
Monitoring with Effects
Monitoring with Effects
It’s useful and even inspiring to hear a little reverb when recording yourself singing or playing, and also to get a sense of how your performance
will sound in the nal mix. Many DAW programs let you monitor with reverb and other software e ects while recording, but I don’t recommend
that. One problem with monitoring through DAW software with e ects applied is the inherent delay as audio passes through the program and
plug-ins, especially with slower computers. Even a small delay when hearing yourself can throw o
your timing while singing or playing an
Another problem is that using plug-in e ects taxes the computer more than recording alone. Modern computers can handle many tracks with
effects all at once, but with an older computer you may end up with gaps in the recorded audio, or the program might even stop recording. Most
small mixers like the 1402-VLZ3 used for these examples have Aux buses to patch in a hardware reverb. Any reverb you apply on an Aux bus
a ects only the monitoring path and is not recorded. Therefore, you can hear yourself sing (using earphones) with all the glory of a huge
auditorium, yet defer the amount of reverb actually added to that track until the nal mix. Better, since reverb on an Aux bus is not recorded, an
inexpensive unit is adequate. I have an old Lexicon Re ex I got from a friend when he upgraded his studio. It’s not a great reverb by modern
standards, but it’s plenty adequate for adding reverb to earphones while recording.
Likewise, I recommend recording without EQ or compression e ects. When recording to analog tape, the added tape hiss is always a problem.
In the old days, it was common to add treble boost or compression while recording if you knew those would be needed later during mixdown.
But modern digital recording—even at 16 bits—has a very low inherent noise level, so recording with e ects is not needed. More important, it’s
a lot easier to experiment or change your mind later if the tracks are recorded dry with no processing. It’s di cult to undo equalization, unless
you write down exactly what you did, and it’s just about impossible to reverse the e ects of compression. Indeed, one of the greatest features of
DAW recording is the ability to defer all balance and tone decisions until mixdown. This feature is lost if you commit e ects by adding them
permanently to the tracks while recording. The only exception is when an effect is integral to the sound, such as a phaser, fuzz tone, wah-wah, or
echo e ect with an electric guitar or synthesizer. In that case, musicians really do need to hear the e ect while recording, because it in uences
how they play. Most musicians for whom effects are integral to their sound use foot pedal “stomp” boxes rather than software plug-ins.
Some people prefer to commit to a nal sound when recording, and there’s nothing wrong with that. But you can easily do this using plug-ins
in the DAW software. Just patch the e ects you want into the playback path when recording, and you have “committed” to that sound. But you
can still change it later if you want. Even professional mix engineers change their minds during the course of a production. Often, the EQ needed
for a track changes as a song progresses, and overdubs are added. For example, a full-bodied acoustic guitar might sound great during initial
tracking, but once the electric bass is added, the guitar’s low end may need to be reduced to avoid masking the bass and losing de nition.
Deferring all tone decisions until the final mix offers the most flexibility, with no downside.
The Windows Mixer
Most professional multichannel interfaces include a software control panel to set sample rates, input and output levels, internal routing, and so
forth. But many consumer grade sound cards rely on the Windows mixer for level setting and input selection. If you create music entirely within
the computer, or just listen to music and don’t record at all, a regular sound card can be perfectly adequate. You launch the Windows mixer
shown in Figure 5.12 by double-clicking the small loudspeaker icon at the lower right of the screen. If that’s not showing, you can get to it from
the Windows control panel, then Hardware and Sound. Depending on your version of Windows, the wording may be slightly different.
Figure 5.12:
The Windows mixer record panel lets you choose which of several inputs to record from.
Once the Windows mixer is showing, click the Options menu, then Properties. Next, under Adjust volume for, select Recording. Figure 5.13
shows the Playback source choices, but the Recording screen is very similar and works the same way. If you have a typical consumer sound card,
you should select Line-In as the recording source, and be sure to set the input level to maximum.
Figure 5.13:
The Windows mixer properties panel lets you choose which inputs and outputs to show in the mixer, with separate check boxes for the Record and Playback sections.
Note that even when a playback channel’s slider isn’t showing, that source is still active and can add hiss and other noise.
The Windows mixer Play Control panel in Figure 5.14 adjusts the mix of sources that play through the sound card’s Line Output jack. The
Record Control panel lets you select only one input from which to record, but playback can be from several sources at the same time. Be aware
that the Windows mixer is probably hiding some input and output level controls. Yet those sources can contribute hiss from an unused input, or
even add noise generated by a hard drive or video card. For example, you might hear scratching sounds as you move your mouse or as things
change on the screen. Therefore, you should select all of the available playback sources in the Properties screen of Figure 5.13 to make them
visible. Then mute or turn down all the source you don’t need, such as Auxiliary, Microphone, and so forth. You can hide them again afterward
if you don’t want to see them.
Figure 5.14:
don’t want that.
The Windows mixer playback panel lets you set the volume level for each available playback source. All of the outputs can play at the same time, though you probably
Most Creative Labs SoundBlaster sound cards have a “What U Hear” input source that records the same mix de ned in the Play Control panel.
If you have a SoundBlaster card, do not select “What U Hear” when recording, because that also records the tracks you’re playing along to. It can
also add hiss to the recording because it includes the MIDI synthesizer, CD Audio, and all other playback sources that aren’t muted. The main use
for “What U Hear” is to capture audio that’s streaming from a website when the site doesn’t allow you to download it as a file.
Related Digital Audio Workstation Advice
I n Figure 5.11, the sound card’s main stereo output comes back into the mixer on stereo channels 13–14. This provides a separate volume
control for the DAW’s output, which can be adjusted independently from the volume of each input being recorded. But you may be using all of
the mixer’s stereo inputs for other sources like a CD player, cassette deck, and so forth. Or maybe you have a smaller mixer that has fewer stereo
inputs. In that case you can connect the sound card’s stereo output to the mixer’s second Aux Return, if available, or even to the Tape Input.
Even though the Windows mixer has record volume controls for each input source, it’s important to set these to maximum and adjust the
record level using the preamp gain setting on your hardware mixer. The same is true for the software control panel that’s included with more
expensive interfaces. Software volume controls lower the volume digitally after the sound card’s A/D converter. So if the level from your mixer’s
preamp is too high and overloads the sound card’s input, reducing the software volume control just lowers the recorded volume, yet the signal
remains distorted.
People who have used analog tape recorders but are new to digital recording tend to set record levels too high. With open reel tape and
People who have used analog tape recorders but are new to digital recording tend to set record levels too high. With open reel tape and
cassettes, it’s important to record as hot as possible to overcome tape hiss. But analog tape is more forgiving of high levels than digital systems.
Analog tape distortion rises gradually as the signal level increases and becomes objectionable only when the recorded level is very high. Digital
recorders, on the other hand, are extremely clean right up to the point of gross distortion. Therefore, I recommend aiming for an average record
level around −10 dBFS or even lower to avoid distortion ruining a recording of a great performance. The noise oor of 16-bit audio is at least
20 to 30 dB softer than that of the finest professional analog tape recorders, and the inherent noise of 24-bit recording is even lower.
Often when you connect an audio device like a mixer to a computer using analog cables, a ground loop is created between the computer and
the mixer that causes hum. Sometimes you can avoid this by plugging both the mixer and computer into the same physical AC power outlet or
power strip. If that doesn’t solve the problem, a good solution is to place audio isolation transformers in series with every connection between
the two devices. High-quality audio transformers can be expensive, but I’ve had decent results with the EBTECH Hum Eliminator. This device is
available in both two- and eight-channel versions, and at less than $30 per channel, it’s reasonably priced for what it is. However, if the highest
audio quality is important, and you’re willing to pay upwards of $100 per channel, consider better-quality transformers such as those from
Jensen Transformers and other premium manufacturers. Inexpensive transformers can be okay at low signal levels, but often their distortion
increases unacceptably at higher levels. Low-quality transformers are also more likely to roll off and distort at the lowest and highest frequencies.
5.1 Surround Sound Basics
Most music is mixed to stereo, but there are many situations where 5.1 surround is useful. For example, music for movie sound tracks is often
mixed in surround. Most modern DAW software can create 5.1 surround mixes, including Cakewalk SONAR used for these examples. In fact,
surround sound is not limited to 5.1 channels, and some surround systems support 7.1 or even more channels. But let’s
surround audio is configured using 5.1 channels.
rst consider how
Figure 5.15 shows a typical surround music playback system having three main loudspeakers in the front, marked L, C, and R, for Left, Center,
and Right. The rear surround speakers are LS and RS, for Left Surround and Right Surround. The subwoofer is shown here in the front left corner,
but subwoofers are typically placed wherever they yield the
Chapter 17.
Figure 5.15:
attest bass response. Surround speaker placement is explained more fully in
A 5.1 surround system comprises five full-range loudspeakers plus a subwoofer.
In a 5.1 surround system, the left and right main front speakers are also used for regular stereo when listening to CDs and MP3 les, or when
watching a concert video recorded in stereo. For true surround material, the center speaker is generally used for dialog, and is often called the
dialog channel because it anchors the dialog at that location. This is an important concept for both movies and TV shows recorded in surround.
In a control room, the engineer sits in the exact center, where the stereo concept of a “phantom image” works very well. But more than one
person often watches movies in a home theater, yet only one person can sit in the middle.
U nless you’re sitting in the exact center of the room left-to-right, voices panned equally to both the left and right speakers will seem to come
from whichever speaker is closer to you. When an actor is in the middle of the screen talking, this usually sounds unnatural. And if the actor
walks across the screen while talking, the sound you hear won’t track the actor’s position on screen unless you’re sitting in the center. To solve
this problem, the 5.1 surround standard adds a center channel speaker. Early quadraphonic playback from the 1970s included rear surround
speakers for ambience, so listeners could feel like they were in a larger virtual space. But it didn’t include the all-important center channel that
speakers for ambience, so listeners could feel like they were in a larger virtual space. But it didn’t include the all-important center channel that
anchors voices or other sounds in the center of the sound field.
Surround systems also use bass management to route low frequencies to a subwoofer. The “.1” channel contains only low-frequency sound
e ects such as earthquake rumbles and explosions. The subwoofer channel is called “.1” because its range is limited to bass frequencies only.
That channel always goes directly to the subwoofer. But surround receivers also route low frequencies present in the ve other channels away
from those speakers to the subwoofer. Very low frequencies are not perceived as coming from any particular direction, so having everything
below 80 or 100 Hz come from one subwoofer doesn’t a ect stereo imaging or placement. In a surround system that’s set up properly, you
should never notice the subwoofer playing or be able to tell where it’s located.
Another important advantage of bass management is it takes much of the load o the ve main speakers. Bass frequencies are the most taxing
for any loudspeaker to reproduce, so most speakers can play much louder and with less distortion when they don’t have to reproduce the lowest
two octaves. Further, speakers that don’t have to reproduce very low frequencies are physically smaller and generally less expensive than
speakers that can play down to 40 Hz or even lower. The standard for bass management speci es a crossover frequency of 80 Hz, but
frequencies slightly lower or higher are also used. However, the crossover should never be set too high, or placement and imaging can be
a ected. This is a problem with surround systems that use too small satellite speakers with a subwoofer crossover at 200 Hz or higher. You can
tell that some of the sound is coming from the subwoofer, which can be distracting.
Professional monitor controllers are available to route 5.1 mixes from a DAW program to surround speakers, including handling bass
management, but you can do the same thing with an inexpensive consumer type receiver. My home theater system is based on a Pioneer receiver
that is full-featured and sounds excellent, yet cost very little. However, a receiver used for surround monitoring must have separate analog inputs
for all six channels. All consumer receivers accept stereo and multichannel audio through a digital input, but not all include six separate analog
inputs. You’ll also need a sound card or external interface with at least six analog outputs. Figure 4.12 in Chapter 4 shows the Presonus FireBOX
interface I use to mix surround music in my living room home theater. The FireBOX uses a FireWire interface to connect to a computer and has
two analog inputs plus six separate analog outputs that connect to the receiver’s analog inputs. Other computer sound cards with similar features
are available with FireWire or U SB interfaces.
In order to monitor 5.1 surround mixes, you need to tell your DAW program where to send each surround bus output channel. As mentioned,
the Presonus FireBOX connected to my laptop computer has six discrete outputs, which in turn go to separate analog inputs on my receiver. This
basic setup is the same when using a professional monitor controller instead of a receiver. Figure 5.16 shows SONAR’s setup screen for assigning
surround buses to physical outputs, and this method is typical for other DAW programs having surround capability. Note the check box to
monitor with bass management. This lets you hear mixes through your playback system exactly as they’ll sound when mastered to a surround
format such as Dolby Digital or DTS, when played back from a DVD, Blu-ray, or other multichannel medium.
Figure 5.16:
The Surround tab under SONAR’s Project Options lets you specify which sound card outputs receive each of the 5.1 surround buses for monitoring.
Figure 5.17 shows the surround panner SONAR adds to each audio track sent to a surround bus, and other DAW programs use a similar
arrangement. Rather than o er left, right, and in-between positions, a surround panner lets you place mono or stereo tracks anywhere within the
surround sound
eld. You can also send some amount of the track to the LFE channel, though that’s not usually recommended for music-only
Figure 5.17:
The surround panner in SONAR is much more complex that a typical stereo pan pot, letting you place sources anywhere within the surround sound field.
Most consumer receivers o er various “enhancement” modes to create faux surround from stereo sources or to enhance surround sources with
additional ambience and reverb. It’s important to disable such enhancement modes when mixing surround music because the effects you hear are
not really present in the mix but are added artificially inside the receiver.
“I’m relatively new to studio production/mixing. …I’m nding my nal mixes need more separation and space, and I’ve been reading up on out-of-the-box analog summing. What
options are going to be best and most cost-effective?”
—Part of a letter to a pro audio magazine
“You’re in luck because there are a lot of options in this category. Check out summing boxes from [long list of hardware vendors].”
—Reply from the magazine’s technical editor
In my opinion, the above exchange is a great example of the failure of modern audio journalism. Too many audio publishers fail to
understand that their loyalty must be to their readers, not their advertisers. When you serve the interest of your readers, you will sell more
magazines. And when you have many subscribers, the advertisers will surely follow. When I worked in the electronics eld in the 1970s and
1980s, I read all of the magazines that serve professional design engineers. With every new product announcement or review, these magazines
included comparisons with similar products already available. You almost never see that today in audio magazines, and reviews are almost
always glowing, criticizing products only for superficial problems. This is the answer I would have given that reader:
Separation and space in a mix are directly related to frequencies in one track masking similar frequencies from instruments in other tracks. This is mostly solved using EQ;
knowing what frequencies to adjust, and by how much, comes only with experience and lots of practice. Obviously, reverb and ambience e ects in uence the spaciousness of a
mix, but that’s outside the domain of outboard summing boxes, which can only add distortion and alter the frequency response.
You mentioned that you’re fairly new to audio production, and this is most likely the real reason your mixes lack space and separation. I suggest you study professional mixes of
music you enjoy. Listen carefully to how each instrument and vocal track meshes with all the other tracks. When mixing your projects, if an instrument sounds clear when solo’d
but poorly de ned in the full mix, try to identify other tracks that, when muted, restore the clarity. I’m sure you’ll nd that with more experience your mixes will improve. I
think it’s false hope to expect any hardware device to magically make your mixes come together and sound more professional.
Summing is merely combining sounds, and it’s very low tech. Indeed, summing is the simplest audio process of all, adding either voltages in
analog equipment, or numbers in a DAW program or digital mixer. Let’s take a closer look.
The simplest analog summing mixer is built using only resistors, as shown schematically in Figure 5.18. Only eight inputs are shown in this
example, and only one mono output. But the concept can be expanded to any number of inputs and to include left and right input and output
pairs for stereo. Although this mixer contains only resistors, it actually works pretty well. Of course, there’s no volume control for each channel,
or master volume, nor are there pan pots for stereo. Adding those would require active (powered) circuitry to avoid interaction among the input
Figure 5.18:
A perfectly serviceable analog summing mixer can be built using one 10 K resistor for each input channel. Because it uses only resistors, this summing circuit adds
virtually no noise or distortion.
With simple passive mixers like this, changing a resistor’s value to adjust the volume or pan for one channel affects the volume and panning of
all the other channels. But even without volume and pan controls, a passive mixer like this loses some signal level on each channel. The more
channels you mix together, the softer each becomes at the output. When mixing two channels, each channel is reduced by 6 dB; you lose another
6 dB each time the number of channels is doubled. So mixing four channels lowers them all by 12 dB, and eight channels as shown here loses
18 dB. When mixing 16 channels, each channel is 24 dB softer at the output. Commercial summing boxes typically add an ampli er stage with
an appropriate amount of gain at the output to restore the signal levels back to normal.
The most signi cant thing that happens when you sum tracks is psychoacoustic. A track that sounded clear all by itself may now be masked by
another instrument having a similar frequency range. An electric bass track where every note can be distinguished clearly when solo’d might turn
into a rumbling mush after you add in a chunky-sounding rhythm guitar or piano. I’m convinced this is the real reason people wrongly accuse
“summing” or “stacking” for a lack of clarity in their mixes. The same masking e ect happens whether the tracks are mixed with an analog
circuit or an equivalent series of sample numbers are added in a DAW program or digital hardware mixer. It seems some people prefer to blame
“digital” when they’re unable to get instruments to sit well together in a mix.
Gain Staging
Large mixing consoles are complex, allowing exible routing for audio after it passes through the microphone preamps. It’s important that
signal levels remain at a reasonable level as they pass through every stage of an analog mixer. If a source is too soft at any point in the chain,
noise from the console’s circuitry can be heard; if too loud, you risk distortion. The process of keeping signal levels reasonable throughout a
mixing console is called gain staging, and it’s the responsibility of the recording engineer.
Gain staging is always a concern with analog mixers and other hardware, but it matters much less with DAW software. Analog circuits process
voltages passing through them, and all circuits add some amount of noise and distortion, especially when signal levels are very low or very high.
But modern digital processing uses oating point calculations implemented in software, which can handle a huge range of signal levels—more
than 1,000 dB. So gain staging within digital software is rarely a problem because it has a minuscule affect on audio quality. Most modern digital
software and hardware process audio use 32-bit floating point values, though some use 64 bits for even higher resolution. This is explained more
fully in Chapter 8.
Microphone Preamplifiers
These days a lot of fuss is made over microphone preamps, with various magical properties attributed to this model or that. It wasn’t so long ago
that most recordings were made using whatever preamps were available in the mixing console. Many older recordings made using only the stock
preamps in a good-quality console still sound excellent by modern standards. Some people consider mic pres the most important part of the
signal path, because any quality loss in a preamp a ects everything in the chain that follows. In truth, delity can be harmed at any point in the
signal path.
With passive dynamic and ribbon microphones, the preamp is indeed the
rst link in a chain of audio devices. But active microphones have
their own preamp inside, so with those microphones the preamp in a mixing console or outboard unit isn’t really the rst device. Indeed, what
really matters with preamps is the same as what matters for every audio device: frequency response, distortion, and noise. Most circuit designers
aim to make their preamps transparent. If a preamp is transparent, it will add no coloration of its own, which is certainly my preference.
It’s not that I don’t consider preamps important, because they obviously are. But keep in mind that any two transparent preamps will sound
exactly the same—by de nition. If the response is within 0.1 dB from 20 Hz to 20 KHz, and the sum of all distortion and noise is 80 dB or more
below the signal, a preamp circuit will not alter the sound audibly. A lot of preamps meet that criteria and therefore sound alike by de nition.
Yet you’ll nd people who insist all of their preamps sound di erent anyway. I trust test gear and null tests because they’re 100 percent reliable
and repeatable versus sighted anecdotal opinions, which are not usually repeatable and are certainly less reliable. However, some preamps color
the sound intentionally, and that’s a different issue.
Please understand that “specs” are more complicated than may seem from this simplistic explanation. For example, distortion often increases
at higher levels. So two preamps may have the same distortion when outputting a −10 dBm signal but be very different at +20 dBm. Frequency
response can also change with level or, more accurately, with the amount of gain. So a preamp may be very at when its trim is set for 20 dB
gain, but not so at with 60 or 70 dB gain. In the larger picture, specs do indeed tell you everything needed about every circuit, as long as you
measure all the parameters at di erent signal levels. This is especially true for preamps that aim for a clean, uncolored sound. You have to verify
that they’re transparent at all signal levels and gain settings. But while specs can accurately predict that a preamp is audibly transparent, it’s
much more di cult to look at the specs for an intentionally colored device and divine how its 4% THD or other coloration will actually sound
when used on various instruments and voices.
Preamp Input Impedance
While we’re on the subject of microphones and preamps, it’s worth mentioning preamp input impedance. Most microphones have an output
impedance of around 150 ohms, letting them drive long cables without high-frequency loss. Most preamps have an input impedance much
higher than 150 ohms, which loads the microphone less to avoid losing some of the signal. Mic pres usually have an input impedance of
between 1 K and 10 K, with most around 2 K. Ideally, the input impedance of a preamp will be at least ve to ten times the microphone’s
output impedance, to avoid signal loss and possibly increased distortion. If a microphone having an output impedance of 150 ohms is plugged
into a preamp whose input impedance is 150 ohms, the microphone will be 6 dB softer than if it were sent into a high-impedance input.
Further, active microphones that have a built-in preamp may suffer from higher distortion as they work harder to drive the low impedance.
A recent trend among some preamp makers lets you vary the input impedance yourself, allegedly to ne tune the sound. One popular model
lets you choose between 300 and 1,200 ohms. Another has a potentiometer that lets you adjust the input impedance continuously from
100 ohms to 10 K. In truth, lowering the input impedance of a preamp mainly rolls o low frequencies and might also increase distortion. What
happens as a preamp’s input impedance is lowered depends on the microphone’s design—whether it’s active, how much current the output
stage can provide, and whether it has a transformer. This is yet another way that gear vendors try to up-sell us on the value of subtle distortion.
Preamp Noise
One important spec for microphone preamps that really can vary between di erent models is equivalent input noise, abbreviated EIN. All
electronic circuits generate some amount of noise. Even a resistor, which is a passive device, generates an amount of noise that can be calculated
based on its resistance in ohms and the ambient temperature. A complete explanation of the many di erent sources and types of circuit noise is
beyond the scope of this book, so I’ll hit only the high points.
The noise that’s always present to a ect low-level circuits such as mic pres is called thermal noise, or sometimes Johnson noise after J. B.
Johnson, who discovered it in the 1920s. This type of noise exists at temperatures above absolute zero, rising with increasing temperature, and
it’s caused by random molecular motion. Assuming a room temperature of 70°F (about 21°C), the theoretical lowest noise possible is around
−131 dBu when considering only the audible range. The actual spec for EIN should always include the bandwidth being considered. For
example, the low-noise NE5534 op-amp used in some microphone and phonograph preamplifiers has an EIN voltage spec’d as follows:
In other words, the amount of noise is 3.5 nV (nanovolts) times the square root of the bandwidth in Hz. So for the range 20 Hz through
20 KHz:
From this you can estimate the actual signal to noise ratio you’ll get from a preamp based on a given microphone output voltage. For
example, if a microphone outputs 4.94 millivolts at some SPL level, which is reasonable for a low-impedance dynamic microphone, the signal
to noise ratio will be 10,000 to 1, or 80 dB.
In the interest of completeness, the approximate noise in nV per root Hz for a resistive source at room temperature is calculated as follows:
A preamp’s EIN is typically measured by wiring a 150-ohm metal
lm resistor across its input terminals, setting the preamp gain to 60 dB,
then measuring the noise at the preamp’s output. Metal lm resistors are used because they are more precise than most other types, and
150 ohms is used because that’s the output impedance of most microphones. Given the 60 dB of gain, the EIN of the preamp is simply 60 dB
below the noise measured at its output. If you’re reviewing preamp specs for a proposed purchase, try to verify that the EIN was in fact
measured using a 150-ohm input source. If a manufacturer instead applied a short circuit to the input, that dishonestly biases the test to yield
less noise than you’ll actually realize in use. However, it’s proper to lter out frequencies above and below the audible range using A Weighting
to exclude noise that’s present but won’t be heard. EIN is often spec’d as some number of nanovolts per root Hz:
However, in this case, the slash (/) means “per” and not “divided by.” In fact, you multiply the noise times the bandwidth in Hz, as shown in
the earlier formulas above. It’s also worth mentioning that the EIN for a preamp actually increases at lower gain settings. In practice this isn’t a
problem because a lower gain setting is used with higher input voltages, which in turn increases the overall signal to noise ratio.
All of that said, transformerless preamps having an EIN within 2 dB of the theoretical limit for a low-noise resistor are common and
inexpensive. It’s the last dB or so that’s elusive and often expensive to obtain!
Clean and Flat Is Where It’s At
My preference is to record most things (though not fuzz guitars) clean and at. You can add a little grit later to taste when mixing, but you can’t
take it away if you added too much when recording. To my way of thinking, any preamp that is not clean and at is colored, and I’d rather add
color later when I can hear all of the parts in context. A coloration that sounds good in isolation may sound bad in the mix, and vice versa. So
my personal preference is to defer all such artistic choices until making the nal mix. If I decide I want the sound of tape or tubes, I’ll add that
later as an effect.
Not to go o on a rant, but one big problem with vacuum tubes is they’re not stable. So their sound changes over time as they age, and
eventually they need to be replaced. Few tube circuits are as clean as modern solid state versions, and tubes can also become microphonic. When
that happens, the tube resonates, and it sounds like someone is tapping a microphone. This resonance is especially noticeable if the tube is close
to a loud source such as a bass amp. Further, tube power ampli ers require a speci c amount of DC bias voltage to avoid drawing more current
than the tube can handle. An ampli er’s bias is adjusted using an internal variable resistor, but the optimum bias amount drifts over time as the
tube ages and needs to be adjusted occasionally. Setting a tube’s bias is not a task most end users are capable of doing correctly. Tubes also have
a high output impedance, so most tube-based power amps include an output transformer that further clouds the sound. Finally, with some
modern tube gear, the tube is tacked on for marketing purposes only and is operated at less than its optimum power supply voltage.
This point was made exquisitely by Fletcher1 in a forum post about the value of tube preamps and other tube gear. Fletcher said, “You guys
need to understand that the people building ‘tube’ stu back in the day were going for the highest possible delity attainable. They were going
for the lowest distortion possible, they were trying to get the stu to sound ‘neutral.’ They were not going for the ‘toob’ sound; they were trying
to get away from the toob sound.”
On this point I completely agree with Fletcher. In the 1950s and 1960s, the electronic and chemical engineers at Ampex and Scully, and 3M
and BASF, were aiming for a sound as clean and transparent as possible from analog recorders. They were not aiming for a “tape” sound! This
was also true of Rupert Neve and other big-name console designers of the day. The transformers they used were a compromise because they
couldn’t design circuits to be quiet enough without them. Today, transformers have been replaced with modern op-amps whose inputs are very
quiet and are inherently balanced to reject hum. If boosting frequencies with a vintage Neve equalizer distorts and rings due to its inductors,
that’s a failing of the circuit design and available components, not an intended feature.
“No listener gives a damn which microphone preamp you used.”
—Craig Anderton, audio journalist and magazine editor
Indeed, the problems I hear with most amateur productions have nothing to do with which preamps were used and everything to do with
musical arrangement, EQ choices, and room acoustics. If a mix sounds cluttered with a harsh midrange, it’s not because they didn’t use vintage
preamps. In my opinion, this fascination with the past is misguided. People hear old recordings that sound great and wrongly assume they need
the same preamps and compressors and other vintage gear to get that sound. Every day in audio forums I see a dozen new threads with
“recording chain” in the title, as a newbie asks what mics and other gear were used to record some favorite song or other. This ignores that the
tone of a performance is due mainly to the person playing or singing and the quality of their instrument or voice. I once saw a forum thread
asking, “How can I get that Queen layered vocal sound?” I’ll tell you how: Capture a super-clean recording of people who can sing like the guys
in Queen!
This chapter explained that audio is similar to household plumbing by showing the layout and signal routing for both large-format and compact
mixers, including mute and solo, Aux buses, and automation. A complex system such as a mixing console can be more easily understood by
viewing it as a collection of smaller, simpler modules and signal paths. We also covered DAW software, including how to connect a DAW
computer to a small-format hardware mixer to add overdubs without also recording previous tracks. One big advantage of the method shown is
that you can hear yourself with reverb and EQ or other effects as you record, without including those effects in the recording.
The basics of surround mixing were explained, along with an explanation of summing and gain staging in both analog mixers and digital
software and hardware. I couldn’t resist including a short rant about the value of vintage mic pres and tube gear. I also explained why I prefer to
capture audio sources as cleanly as possible, without distortion or other coloration, because it makes more sense to defer intentional color until
mixing, when you can hear all the parts in context.
Fletcher is a colorful character who founded Mercenary Audio, a p ro audio reseller based in Foxboro, Massachusetts. He’s known in audio forums for his strongly worded op inions, often p ep p ered with salty language.
Chapter 6
Recording Devices and Methods
Recording Hardware
The two basic types of recording systems are analog and digital. In the context of recording, analog usually refers to old-style tape recorders, but
it also includes phonograph records, which are more a means of distribution than a recording medium. However, you’ll occasionally hear about
an audiophile record label capturing a session live in stereo to a record-cutting lathe. Although analog tape is still favored by some recording
engineers, considering only the percentage of users, it was surpassed by digital recording many years ago. The main reasons digital recording
prevails today are the high costs of both analog recording hardware and blank tape, as well as the superior delity and features of modern
digital. Compared to editing analog tape with a razor blade, manipulating audio in a digital system is far easier and vastly more powerful.
Digital editing also lets you undo anything you don’t like or accidentally ruin.
Several di erent types of digital recording systems are available including stand-alone hard disk recorders, computer digital audio workstation
(DAW) setups, multitrack and two-channel digital audio tape (DAT) recorders, portable recorders that write to solid state memory cards, and
stand-alone CD recorders. The most popular of these by far is the computer DAW, for many reasons. Computers powerful enough to record and
mix a hundred tracks or more are amazingly a ordable these days, and they’ll only become less expensive and more powerful in the future.
DAW software can run on a regular home or business computer, rather than require custom hardware that’s expensive to produce for the
relatively few consumers of recording gear. Another big advantage of DAW software is you can upgrade it when new features are added, and you
can easily expand a system with purchased (or freeware) plug-ins. With a hard disk recorder or portable device, if you outgrow it, your only
recourse is to buy a newer model.
It’s easy to prove using the parameters de ned in Chapter 2 that modern digital recording is far more accurate than analog tape. Indeed,
modern digital beats analog tape and vinyl in every way one could possibly assess delity. For example, digital EQ adds less distortion and noise
than analog hardware, it’s precisely repeatable, and in a stereo EQ the left-right channels always match perfectly. But it’s important to point out
that perfect delity is not everyone’s goal. Indeed, recording engineers who favor analog tape prefer it because they like the coloration it adds.
They’re willing to put up with the higher cost, additional maintenance, and poorer editing abilities in exchange for a sound quality they believe
is not attainable any other way. And there’s nothing wrong with that. However, I believe that digital processing can emulate analog tape
coloration convincingly when that effect is desired, for much less cost and inconvenience.
Although this chapter focuses mainly on digital recording, for completeness I won’t ignore analog tape because its history is interesting and
also educational from a “how audio works” perspective. The engineering needed to achieve acceptable delity with analog tape is clever and
elaborate, so let’s start there.
Analog Tape Recording
When I was 17 in 1966, I built my rst studio in my parents’ basement, starting with a Sony quarter-track open reel tape recorder with soundon-sound capability. This let me record one (mono) track onto ¼-inch tape, then copy that to the other track while also mixing a new source
through the microphone or line input. There was only one chance to get the balance correct, and after only a few passes, the tape noise and
distortion became quite objectionable. A year later I bought a second Sony stereo tape deck and a four-track record/play head. At $100 for just
the tape head, this was a big investment for a teenager in 1967! I mounted the new tape head in the rst recorder, replacing the existing record
head, and ran wires from the second deck to the extra track windings on the new head, thus making a four-track recorder. This let me record and
play four tracks all at once or separately, though I used this setup mostly to record and overdub instruments one by one to build a complete
performance by myself.
Analog recording uses thin plastic tape, usually 1 or 1.5 mil (0.001 inch) thick, that’s been coated with a magnetic material called slurry. This
is a gooey paste containing tiny particles of iron oxide that’s applied to the plastic lm. Each particle can be magnetized separately from the
others, which is how audio signals are stored. The amount of magnetization, and how it varies in time, is analogous to the audio signal being
recorded—hence the term analog tape. The recording head consists of a metal core or pole piece, wound with a coil of wire. As the tape passes
by the record tape head, audio applied to the coil magnetizes the core in direct proportion to the amount of voltage and its polarity. The varying
magnetism in the head then transfers to the slurry on the tape in a pattern that changes over time in both amplitude and frequency.
The highest frequency that can be recorded depends on the size of the iron oxide particles, with smaller particles accommodating higher
frequencies. The high-frequency limit is also a ected by how fast the tape travels as it passes by the head and the size of the gap in the pole
piece. If the voltage from a microphone being recorded changes suddenly from one level to another, or from positive to negative, separate
particles are needed to capture each new level or polarity. Most professional recorders operate at a tape speed of either 15 or 30 inches per
second (IPS), which is su cient to capture the highest audible frequencies. Consumer tape recorders usually play at either 7½ or 3¾ IPS, though
speeds as low as 1 and even
IPS are used for low- delity applications such as cassettes and dictating machines. Magnetic transference
also requires close proximity, so the tape must be kept very close to the record head to capture high frequencies. As you can see, particle size,
also requires close proximity, so the tape must be kept very close to the record head to capture high frequencies. As you can see, particle size,
intimate contact between the electromagnet heads and iron oxide, tape speed, and even the head dimensions are all interrelated, and they all
conspire to limit the highest frequencies that can be recorded.
Tape Bias
Tape magnetization is not a linear process, so simply applying an audio signal to a tape head will yield a distorted recording. U ntil the signal to
the record head reaches a certain minimum level, the tape retains less than the corresponding amount of magnetization. As the level to the tape
head increases, the tape particles retain more of the magnetization, and thus better correspond to the audio signal. This is shown as the transfer
curve in Figure 6.1. Tape nonlinearity is similar to the nonlinearity caused by crossover distortion described earlier in Chapter 2.
Figure 6.1:
At low levels where the audio passes through zero from minus to plus, or vice versa, the tape retains less magnetism than was applied. Then at higher levels it remains
fairly linear until the applied magnetism approaches the tape’s saturation point. At that point, applying more magnetism again results in less being retained by the tape.
Figure 6.1 labels the “crossover” area near zero as the nonlinear region, but the plus and minus extremes are also nonlinear as the tape’s
magnetization approaches saturation. At that point the tape literally cannot accept any more magnetization. This is similar to a wet sponge that’s
fully drenched and can’t hold any more liquid.
To avoid tape’s inherent crossover distortion, analog tape recorders apply a bias signal to the record head while recording. The bias signal is a
very high-frequency—typically 50 KHz or higher. Since the bias frequency is above the audible range, it won’t be heard when the tape is played.
But it still supplies the minimum level needed to exceed the magnetization threshold, thus shifting the audio into a range where tape
magnetization is more linear. Tape bias also reduces background hiss, and its purity matters, too; a bias oscillator that outputs a low-distortion
sine wave yields less noise.
Tape Pre-Emphasis and De-Emphasis
Applying a high-frequency bias signal reduces distortion signi cantly, but it improves tape’s inherently poor signal to noise ratio only slightly.
To solve that, clever engineers devised a method called pre-emphasis and de-emphasis. Pre-emphasis simply boosts the treble when recording,
then reduces it by a corresponding amount during playback. Tape hiss is most noticeable at high frequencies, so this is a simple and elegant
solution. There are two standards for pre-emphasis: NAB used in North America and CCIR/DIN used in Europe. NAB stands for the National
Association of Broadcasters, and CCIR is the group Comité Consultatif International des Radiocommunications. DIN refers to the German
standards group Deutsche Industrie-Norm. The two methods are similar but with slightly different EQ curves.
Chapter 1 explained the response time limitation of early-style VU meters and how you must record percussion instruments at a lower level
than the meter shows to account for the meter’s sluggish response. Because analog tape recorders use pre-emphasis, similar vigilance is required
with instruments that have a lot of high-frequency content. Even if an instrument doesn’t contain strong transients that come and go before the
meter can respond fully, distortion can still occur before the meter reaches 0 VU . This is because the output level meter in a mixing console
shows the volume before the recorder has applied pre-emphasis, boosting the treble. So when recording a tambourine or shaker, you’ll typically
set the volume so the VU meter reads no higher than about −10. Further, di erent tape speeds apply di erent amounts of pre-emphasis. Only
through experience will you know how to estimate record levels to avoid distortion.
I’ll also mention that analog tape does not have a hard clipping point as do digital audio and most electronic circuits. So here, too, experience
—and perhaps a taste for intentional subtle distortion—will be your guide when setting levels using traditional VU meters. Most analog tape
recorders are calibrated such that 0 VU is about 10 to 12 dB below the onset of gross distortion. However, there’s no single standard, and users
are free to calibrate their machines such that 0 VU corresponds to di erent levels of tape magnetization, measured in nanowebers per meter
(nWb/m). In the 1950s, Ampex de ned “standard operating level” such that zero on the VU meter corresponds to a magnetic intensity of
185 nWb/m. Modern tape can accept more magnetization before saturating than earlier formulations, so elevated levels of 250 or 320 nWb/m
became the norm. These levels are often stated as a number of dB relative to the earlier standard, such as “plus 3” or “plus 6.”
Note that a similar scheme is used with phonograph records to reduce their inherent hiss and scratch noises. In that case, high frequencies are
boosted following the RIAA curve when the record master is cut, and de-emphasis built into every phono preamp reverses that boost when a
boosted following the RIAA curve when the record master is cut, and de-emphasis built into every phono preamp reverses that boost when a
record is played. RIAA stands for Recording Industry Association of America. Phono equalization also reduces low frequencies when recording
and raises them when the record is played. However, low frequencies are reduced for a di erent reason: Pressing LPs with a at low-frequency
response will make the groove too wide, limiting the length of music that can t on one side. Indeed, vinyl records have many foibles, including
requiring mono bass that’s equal in both channels.
Professional tape recorders contain three tape heads that erase, record, and play, in that order. As the tape moves, it rst reaches the erase head,
which erases any previous content. The record head then magnetizes the tape with the audio being recorded. The play head outputs the recorded
audio, which can be the audio just recorded a moment earlier by the record head. Some consumer tape recorders have only two heads, with one
head used both to record and play back and the other to erase the tape. One important reason for separate record and play heads is to verify the
result in real time while recording. That is, you listen to the playback while recording rather than the audio sent into the recorder. This also
simpli es calibrating the recorder’s bias and pre-emphasis: You can adjust those in real time rather than record tones, play them to verify, record
again, play again, and so forth.
Because of the physical distance between the record and play heads, there’s always a slight delay after audio is recorded before it’s played by
the play head. When new tracks are recorded as overdubs, adding more tracks to a tune in progress, the time displacement causes the new tracks
to be o set in time. That is, the performer hears the prerecorded backing tracks as they come from the play head, but the record head is a few
inches earlier. So when you play all of the tracks together, the newer tracks are out of sync, playing slightly later than the original backing tracks.
To avoid this delay, the record head is temporarily used for playback while overdubbing new material. This process is called Selective
Synchronization, or Sel-Sync for short. The size and shape of a record head is optimized for recording rather than playback, so its frequency
response may not be as good as a head optimized for playback. This is not a problem because it a ects only what performers hear while
recording. When the finished tape is played back as intended using the play head, the response is again normal.
Tape Noise Reduction
To further reduce the background hiss from analog tape, engineers devised a variety of noise reduction schemes. The two most popular noise
reduction systems were developed by Dolby and dbx, both using a method called companding, named for the compression and expansion these
devices use. Companding applies a volume compressor to the audio while recording; then on playback the audio passes through an expander
that reverses the process. The compressor raises soft passages so they won’t be drowned out by tape hiss, and then the expander lowers the
volume by a corresponding amount during playback. This restores the audio to its original level while reducing the tape hiss.
Dolby originally o ered two di erent systems: Dolby A for professional recording studios and Dolby B for consumer use with cassette tapes.
Dolby A divides the audio into four frequency bands that are compressed separately, and then the bands are combined again before recording to
tape. Note that Dolby acts on low-level signals only, raising them to stay above the tape hiss. At playback the process is reversed. The dbx system
is broadband, operating over the full range of volume levels, but it adds pre-emphasis to compress high frequencies more than low frequencies.
As long as the compressor and expander portions of such systems are calibrated properly, playback expansion exactly reverses the compression
applied when recording. But since Dolby A splits the audio into four bands, precise and frequent alignment by studio operators is required.
Further, frequency response errors in the recorder or added distortion prevents the expansion from exactly mirroring the compression.
U nfortunately, Dolby and dbx type companding is only half a solution because they don’t really reduce the underlying noise level. It just
seems that way. Companding merely raises the level of soft music to keep it from being dominated by noise. If a recorder has a signal to noise
ratio of 50 dB, companding can’t reduce the noise lower than 50 dB below the music. However, while the inherent signal to noise ratio remains
the same, for most music the improvement is noticeable and most welcome. I mention this limitation of tape noise reduction because it’s related
to the examples in Chapter 3 that show the audibility of soft artifacts in the presence of louder music.
Years ago, dbx and Burwen made open-ended systems that reduce tape noise after the fact. The dbx device uses a dynamic low-pass lter
whose cuto frequency changes in response to the music. Burwen’s unit was more sophisticated, manipulating separate low-pass and high-pass
cuto frequencies. Such processing would be trivial to implement today as a plug-in, though there are even more e ective ways to reduce noise
after the fact digitally. Chapter 13 explains software noise reduction in detail.
Tape Pre-Distortion
U nlike electronic circuits that are usually clean up to the point of hard clipping, analog tape distortion creeps up slowly, reducing dynamic
range and softening transients. As shown in Figure 6.1, the transfer curve of analog tape slowly attens as the recorded level increases past the
tape’s linear region. To combat this, the tape linearizer circuit was developed. This clever design was included in recorders made by Nagra,
Scully, and MCI, and it reduces distortion by applying an equal but opposite nonlinearity while recording. If calibrated carefully, distortion of 3
percent can be reduced to less than 1 percent for a given signal level. Where tape compresses the highest positive and negative waveform peaks,
pre-distortion intentionally exaggerates those peaks, yielding less waveform flattening on playback.
Some of these explanations are simpli ed; an entire book could be devoted just to the various engineering tricks needed to achieve acceptably
high delity with analog tape. Analog recorders are very “tweaky,” and many things a ect their high- and low-frequency responses, distortion,
and noise levels. This brings us to the following.
The Failings of Analog Tape
“Coloration from analog tape and vinyl records is often revered, but for some reason color added by an A/D/A converter is never acceptable.”
My intent here is not to bash analog recording as much as explain the facts of audio delity and convenience. From my perspective, preferring
digital recording over analog tape is a no-brainer for many reasons. Analog recorders are expensive, and none are currently manufactured.
U nless you can troubleshoot electronics at the circuit level yourself,
nding someone knowledgeable to service them is di cult outside of a
metropolitan area. Further, nding replacement parts from a dwindling supply is a problem that will only become worse over time. Will any
analog recorders even be around to play your archived tapes 20 years from now?
Then there’s the high cost of blank tape, which is also becoming di cult to nd. Many engineers prefer recording at 30 IPS, which uses twice
as much tape as 15 IPS. Tape heads also wear unevenly over time, eventually requiring downtime while they’re sent out for expensive
relapping, unless you purchase a second set of heads as a backup.
Tape has a relatively high level of background noise, so if you record at too low a level—say, below −15 or −20 VU —the hiss is
objectionable. But tape also distorts at higher levels, becoming obnoxious once the level gets much above +3 or so. This requires either carefully
watching the recorded levels and adjusting them manually as the recording progresses or using a limiter to do that for you automatically. Of
course, if you’re recording 24 tracks at once, you need 24 limiters!
To operate optimally, analog tape recorders need constant electrical and mechanical alignment. The heads must be positioned and angled
precisely left-right, top-bottom, and front-back, not to mention the many adjustments required in the record and playback electronics. If the preemphasis and de-emphasis do not match exactly, the frequency response will su er. Even if they match perfectly within the recorder, they must
also match the NAB or CCIR standard. Otherwise, tapes recorded in your studio will sound wrong in other studios, and vice versa. Because the
frequency response of analog tape is relatively poor, it’s common to record 0 VU tones at several frequencies so other studios can calibrate their
machines to match yours, which takes time. Further, the ideal amount of bias applied when recording should be adjusted to match each reel of
blank tape. So after recording for 15 to 60 minutes, depending on tape speed, the session comes to a halt before the next reel of blank tape can
be used.
Analog tape heads need to be cleaned frequently and occasionally demagnetized. Tape also wears out with repeated use, gradually losing high
frequencies, then eventually developing dropouts after the wear is severe enough that the slurry sheds and falls o . An elaborate pop tune can
span many tracks and acquire many overdubs during the course of a large production. Every time you play the tape for the performer to practice
to or to record along with, the tape wears a little more until eventually its sound is unacceptable. Tape is also fragile, and more than once I’ve
seen an errant recorder snap the tape when its motors or brakes acted too quickly.
When doing overdubs with a tape recorder, you’ll often record a part, then rewind to the same place if the performer needs to try again.
Getting back to the same place is a time-consuming nuisance for the engineer and an inspiration killer for the artist. MCI developed an autolocator for their recorders, which helped get to the same place on the tape. But it wasn’t entirely accurate, or reliable, and you still had to wait
while rewinding.
Overdubs are often done using a method called punching in, where portions of an otherwise good performance are recorded over parts
deemed unacceptable. For example, if the bass player recorded a good take except for a short section in the middle of the tune, just that one
part can be rerecorded, replacing the original performance. So you play the tape, including the parts already recorded successfully on the current
track, then press Record a moment before the new part is to be recorded over the old one. But if you press Record too early, you’ll erase the tail
end of the good portion. And if you don’t punch out quickly enough, you’ll overwrite the next section that didn’t need replacing. Doing tight
punch-ins, where only a brief pause separates the existing and replaced passages, is a highly stressful part of any recording session.
In this day of computer DAWs, recording engineers can slice and dice music in numerous ways. If you like the background vocal section in the
rst chorus, it’s easy to copy it to the second chorus rather than require the singers to record the same parts again. This is impossible to do with
analog tape directly, though clever engineers would sometimes copy those tracks to another recorder, then copy them back again to the master
tape at the new place. The downside is the quality su ers with every copy generation. Indeed, this is another important limitation of analog
tape: There’s no way to make a backup safety copy whose quality isn’t degraded from the original. Editing all of the tracks at once on a
multitrack tape is possible using a demagnetized razor blade and splicing tape, but it’s risky. One mistake, and your entire project is ruined.
There is no U ndo with analog tape recorders.
It’s possible to synchronize two or more recorders to obtain more than 16 or 24 total tracks using a method called time-code. The most
common system is SMPTE, for Society of Motion Picture and Television Engineers. This method uses a phase modulated sine wave to store data,
and it sounds like an old-fashioned computer modem. SMPTE data identify the current location on the tape in hours, minutes, seconds, and
video frames. The original purpose of time-code was to synchronize audio and video recorders, but it can also synchronize two audio recorders,
or a recorder and computer DAW. Because this tone is loud and obnoxious, most engineers recorded SMPTE on an outer track—either Track 1 or
Track 24 for a 24-track machine—leaving the adjacent Track 2 or Track 23 empty. Or that track could be used for something like bass that won’t
be harmed by adding a high-cut EQ to lter out tones that leak through. Time code is usually recorded at a level around −10 VU to further
reduce leaking into adjacent tracks. Indeed, cross-talk is yet another limitation of analog tape recording.
Even when an analog tape recorder is operating optimally, its
delity is still poor compared to modern digital recording. Besides relatively
high levels of distortion and noise, analog tape also su ers from print through. This is an echo e ect caused by adjacent layers of tape on the
reel partially magnetizing each other due to their close proximity. Depending on which way the tape is wound when stored, the louder part of
the echo occurs either before or after the original sound. Most engineers store analog tape “tails out” so the echoes come after the original sound,
which is usually less noticeable. But it’s still there. Tapes that are stored for many years have more print through than tapes stored for only a few
days or weeks.
Another failing of analog tape is flutter, a rapid speed change that imparts a warbling sound, in addition to long-term speed variations. Flutter
is usually low enough that you won’t hear it on a professional-quality recorder that’s well maintained. But long-term speed changes can make
the pitch of music vary between the beginning and end of a reel. However, a di erent type of
utter, called scrape
utter, is audible and
disturbing. This occurs when a short section of tape vibrates at a high frequency because there are no supporting rollers between the play and
record heads, or some other part of the tape path.
Finally, some types of analog tape have aged badly, most noticeably certain types and batches produced by Ampex in the 1970s. When those
tapes have been stored for many years, the binder becomes soft, causing oxide to deposit on heads and tape guides in a gooey mess, and layers
of tape on the reel can stick together. Ampex came up with a solution that works well most of the time: baking the tape reel at a temperature of
about 120 degrees for a few hours. But this is a risky procedure, and the tape can end up destroyed. Digital recording solves every single one of
these problems. In all fairness, however, some recording engineers are willing to overlook all of these failings in exchange for what they
perceive as a sound quality that’s more pleasing than digital recording.
Digital Recording
Chapter 8 explores digital audio principles in detail, so this section provides only a brief overview. Digital recording comprises two primary
devices: a converter and a storage medium. The rst is usually called an A/D/A converter because it converts analog voltages to a series of digital
numbers when recording, then does the reverse when playing back. A/D means analog-to-digital, and D/A is digital-to-analog, and most
professional outboard converters have both A/D and D/A sections. All computer sound cards also do both, and many can do both at the same
time, which is needed when overdubbing. Sound cards that can record and play at once are known as full-duplex. When using a computer to
record digital audio, the storage medium is a hard drive inside the computer; an external drive attached through a U SB, FireWire, or SATA port;
or solid state memory.
When recording, the A/D converter measures the voltage of the incoming audio at regular intervals—44,100 times per second for a 44.1 KHZ
sample rate—and converts each voltage snapshot to an equivalent number. These numbers are either 16 or 24 bits in size, with more bits
yielding a lower noise oor. During playback the D/A section converts the sequence of numbers back to the analog voltages that are eventually
sent to your loudspeakers.
Digital recording is sometimes accused of being “sterile” and “cold sounding” by audiophiles, recording engineers, and the audio press.
Modern digital audio certainly doesn’t add coloration like analog tape, but it’s much more accurate. In my opinion, the goal of a recording
medium is to record faithfully whatever source you give it. Once you have everything sounding exactly as you want through the console and
monitor speakers while recording, when you play back the recording, it should sound exactly the same. This is exactly what modern digital
recording does. It may not add a “warm” coloration like analog tape, but it certainly isn’t cold. Indeed, competent digital recording has no sound
of its own at all.
Table 6.1 compares performance specs for a professional Studer analog recorder, plus three digital converters ranging from a $25
SoundBlaster X-Fi consumer sound card through a high-performance LavryBlue model. If transparency is the goal for a recording medium, and I
think it should be, even the $25 sound card beats the Studer by a very large margin.
Table 6.1: Comparison of Audio Fidelity Specs for Three Devices.
In the Box Versus Out of the Box
In this case “the box” refers to a computer. As mentioned earlier, I prefer to do all recording and mixing inside a computer, or In the Box (ITB). I
don’t even use the Console View in SONAR, which emulates the appearance and controls of a hardware mixer. Everything needed is already
available in the Track View, leaving more of the video display available for editing and viewing plug-in settings. But a computer can also be
used with external hardware that’s Out Of the Box, or OTB. In that case a computer runs the DAW program, but the tracks (or subgroups of
tracks) are routed through separate D/A converter outputs to an analog mixing console where the actual mixing takes place.
Proponents of OTB mixing believe that analog console summing is superior to the summing math used by DAW software. And again, some
people are willing to pay much more for their preferred method. This includes the dollar cost of a mixer, as well as the convenience cost when
you have to exactly recreate all the mixer settings each time you work on a previous project. DAW software can save every aspect of a mix,
you have to exactly recreate all the mixer settings each time you work on a previous project. DAW software can save every aspect of a mix,
including volume and pan automation changes and automation for every parameter of every plug-in. You can open a project next week or next
year, and when you press Play, it will sound exactly the same. You can also render a
playing a tune in real time while saving to a Wave file or other recording device.
nal mix in one step, which usually happens faster than
U sing a hardware mixer with outboard e ects requires making detailed notes of every setting. Even then it can be di cult to create an
identical mix. Many controls on hardware mixers use variable knobs rather than switches that can be set precisely. If you make a note that a pan
knob is at 1 o’clock, maybe you’ll set it exactly the same and maybe you won’t. Electronics also drift over time. Perhaps your converter’s input or
output level knob shifted by half a dB since you made the last mix. Or maybe you changed the console’s input level for one channel three
months ago when you had to patch in some odd piece of gear but failed to put it back exactly. When working entirely ITB, recall is complete
and precise.
There are many other advantages of working ITB besides perfect recall. One big feature for me is being able to buy a plug-in e ect once and
use it on as many tracks as needed. Further, plug-ins are perfectly repeatable if you enter the same parameter settings, and the left and right
channels of a stereo plug-in always match exactly. Plug-ins also have less noise and distortion than analog outboard gear, limited only by the
math precision of the DAW software. You can also upgrade software more easily and for less cost than hardware, assuming the hardware can be
upgraded at all. Even if a hardware company o ers an upgrade, it likely must be returned to the factory. U pgrades for software
xes are often
free, and new versions are usually cheaper than buying the program all over again as with hardware. Plug-ins never break down, nor do their
switches and pots become noisy or intermittent over time. However, if a software company goes out of business and the software doesn’t work
on your next computer’s operating system, you’re totally out of luck with no recourse.
Record Levels
It’s di cult to determine the optimum recording level when using analog tape because it depends on many factors. Recording at low levels
yields more tape hiss, but recording louder increases distortion. And as explained previously, instruments that create strong transients or have a
lot of high-frequency content such as cymbals and tambourines can distort even when the VU meter shows a relatively low level. This is due to
both the VU meter’s slow response time and pre-emphasis that boosts high frequencies inside the recorder but is not re ected by the console’s
VU meters. So when recording to analog tape, it’s common practice to include su cient headroom. This is the di erence in decibels between
the average and peak volume levels you can record without objectionable distortion.
Digital recording avoids the need for extra headroom just to accommodate the recording medium. An A/D/A converter is perfectly clean right
up to 0 dB Full Scale. Some people believe that digital recording sounds better when recorded at lower levels, such as −15 dBFS or even softer,
but this is easy to disprove by measuring. If recording really is cleaner with less distortion at lower levels, it’s likely due to poor gain-staging
elsewhere in the analog portion of the system. Table 6.2 compares typical THD+Noise at 1 KHz versus input level for a LavryBlue M·AD-824
Table 6.2: Distortion versus Signal Level.
Signal Level THD+Noise Level Equivalent Distortion
−1 dBFS
−98 dBFS
−3 dBFS
−102 dBFS
−10 dBFS
−109 dBFS
−20 dBFS
−112 dBFS
As you can see, this converter adds less distortion at high levels rather than more, as some people believe. This makes perfect sense because a
higher input level keeps the signal that much louder than the converter’s inherent noise and distortion. However, this doesn’t mean that you
should aim to record everything as close to 0 dBFS as possible. Many singers and musicians will be louder when they actually record than when
they rehearsed while you set the record level.
Even if you record using “only” 16 bits, the background noise of digital audio is 96 dB below the maximum level. This is more than 20 dB
quieter than the best analog recorders. Further, even in a professional studio that’s well isolated from outside sounds and has quiet heating and
air conditioning, the ambient acoustic noise oor is more often the limiting factor than the digital recording medium. I usually aim for record
levels to peak around −10 dBFS. The resulting noise oor is still very low, yet this leaves enough headroom in case an enthusiastic performer
gets carried away. If you often record amateur musicians, you could record even lower than −10, or you could add a limiter to the recording
chain to be sure the level never exceeds 0 dBFS.
As we have seen, the traditional notion of headroom is irrelevant with modern digital recording, other than to accommodate the performers
and the limits of your analog inputs and outputs. My home studio is con gured as shown in Figure 5.11, where each channel insert of a Mackie
mixer goes to one input of an M-Audio Delta 66 sound card. The Delta 66 o ers three operating levels that can be set independently for the
input and output sections: +4 dBu, −10 dBV, and an in-between level M-Audio calls “consumer.” I have the input sensitivity set to +4 dBu.
This requires more output from the mixer’s preamps, which in turn puts the recorded signal that much louder above the analog noise oor. At
this setting the sound card reaches 0 dBFS with an input level of +5 dBu. This is perfectly adequate to raise the signal level well above the
mixer’s noise floor, but it never gets anywhere close to the clipping point of either the mixer or sound card.
mixer’s noise floor, but it never gets anywhere close to the clipping point of either the mixer or sound card.
Recording Methods
Before you hit Record, it’s important to verify the quality of whatever you’re recording. Listen carefully to all musical sources in the room to
identify problems such as rattling drum hardware, squeaky kick drum pedals, mechanical and electrical buzzing from guitar amps, and so forth.
It’s easy to miss small aws in the excitement of a session, especially if you’re both the engineer and bass player. It’s much better to x problems
now rather than curse them later. The same applies to the quality of the source itself. If a drum is badly tuned and sounds lame in the room, it
will sound just as lame after it’s recorded. If a guitar is out of tune when recording, it will still be out of tune when you play it back later.
When recording others, I suggest that you record every performance, including warm-ups. Some musicians start o
great but get worse with
each take either from nervousness or exhaustion. In fact, there’s nothing wrong with a little white lie while they’re rehearsing. Tell them to take
their time and let you know when they’re ready for you to record, but record them anyway. It may end up being their best take of the session.
Related, when recording inexperienced musicians, I always make a point of telling them there’s nothing to be nervous about and that every
mistake can be rerecorded or
xed. Often, the best and most successful recording engineers are “people persons” who have a calm demeanor
and know how to make their clients feel comfortable.
As mentioned in Chapter 5, adding reverb to a performer’s earphones is always a good idea because they sound better to themselves, which in
turn helps them to play or sing with more con dence. Whether recording analog or digital, ITB or OTB, it’s easy to add reverb or EQ or any
other e ects to a performer’s cue mix without recording and committing to those e ects. This is not just for amateurs either. I always add reverb
to the earphones when I’m recording myself and other professional musicians.
Pop bands that have a drummer generally prefer to let the drummer set the tempo, which is as it should be. In classical music a conductor
does the same. But some pop music projects are built from the ground up track by track. In that case a metronome, or click track, is helpful to
establish the tempo for all of the overdubs to come. The tempo can change if needed over the course of a tune. That usually requires entering
tempo change data manually into your DAW software, though some programs have a tap tempo feature that lets you click a mouse button to
establish the pace. Once the tempo has been set, the DAW software automatically plays a click sound while each new track is being recorded.
Most DAWs also let you pick the metronome sounds that play. I generally use high- and low-pitched click sounds, where the higher pitch on
beat 1 is louder than the lower-pitched click on beats 2 through 4. Many people prefer to hear open and closed high-hat samples instead of a
click sound. Keep the click volume as soft as possible but still audible. If clicks are played loudly, a microphone may pick up the sound,
especially if the performer is wearing open-back earphones.
My own pop music productions tend toward long and extravagant, with many di erent sections. I generally write music as I go, making a
mock-up of the piece using MIDI samples that will be replaced later with live playing. Once a tune is complete and I’m satis ed with the
arrangement, I’ll go through the entire piece carefully, deciding on the nal tempos for each section, which I enter manually into SONAR’s
Tempo View. Changes might be small or drastic, depending on the music and context. Even subtle changes of just a few beats per minute (BPM)
can add a nice ebb and ow. Tempos can change over time rather than jumping from one to another. In classical music such tempo changes are
called accelerando and rallentando—progressively speeding up or slowing down, respectively. Click tracks are also used with analog recorders,
with the clicks recorded to a dedicated track in real time before any other tracks are added.
Specific Advice on Digital Audio Workstations
In the days of analog multitrack tape, we used track sheets—printed paper forms containing a written description of what’s on each track,
typically stored along with the tape in its cardboard box. DAW software lets you assign meaningful track names that show on the screen, and you
can even move tracks up or down to group related tracks logically for convenience. Many DAW programs also let you assign track colors to
better identify the tracks in a group. So you could make all the drum tracks orange, acoustic guitars blue, or whatever you prefer.
Recording templates are a useful way to begin and organize your sessions. This is simply an empty project having no recorded audio but with
all of the inputs and outputs assigned. If you add a few Aux buses with plug-ins such as reverb already patched in, you’ll be ready to record an
entire band at a moment’s notice. I suggest you add everything to the template you could possibly need, including MIDI software synthesizers for
an organ, drum machine, and whatever else you might want. You can easily delete any tracks that aren’t needed or keep the tracks but hide
them from view if your DAW software has that feature. Then each time you start a new project by loading the template, use Save As
immediately to save the new project with the proper name. You could also set the template le to Read Only to avoid accidentally overwriting
SONAR has a feature called Per-Project Folders, which stores all of the les for a tune into a single folder on the hard drive. Many other DAWs
can do the same. I recommend this method of organizing your projects because you’ll know exactly what les need to be backed up and where
they are on the drive. If the Wave les for a tune are scattered in various locations, you may not remember to include them all in a backup. Also,
if you import Wave les from another project or use Wave les from a sample library, I suggest you copy them to the current song’s folder. This
way, everything needed to open the project is available in case you have to restore the project to another hard drive after a crash or if you bring
the project to another studio.
Most DAW software creates an “image” le for every Wave le in the project. This is similar to a GIF image graphics le, and it holds a
picture of the waveform your DAW displays while you work on a tune. There’s no need to back up these les because the software recreates
them as needed. I have SONAR set to store all image files in a dedicated folder just so I won’t waste time and disk space backing them up.
them as needed. I have SONAR set to store all image files in a dedicated folder just so I won’t waste time and disk space backing them up.
Earlier I mentioned a record method called punching in, where you can replace part of an otherwise acceptable performance with a new
version by hitting Record on the
y as the tape plays. Most DAW software lets you work that way, but to me punching in is so 1980s. In my
opinion, it’s much better to record to successive tracks repeatedly. All DAW software lets you de ne a Start and End region where recording will
begin and end automatically. This makes it easy for a musician to get “in the zone” while recording the same part again and again until he’s
satis ed. The main advantage of this method is there’s no risk of accidentally overwriting a good performance. Rather, the software records each
successive take to a new track until you press Stop. Some DAWs can record multiple takes—called layers, lanes, or virtual tracks—within a single
track. Once you’re satis ed that you have a good take in there somewhere, it’s easy to edit the tracks or layers to make a single performance
from all the various pieces. Editing multiple takes down to a single composite performance is called comping and is discussed more fully in
Chapter 7.
Copy Protection
My intent with this book is to present the facts about audio more than my personal opinions, but I’ll make an exception for what I consider an
important issue with all software: copy protection. There are many types of copy protection, and only criminals refuse to acknowledge that
software companies deserve to be paid for every copy of their programs that’s used. But often the people who su er most from the
inconvenience of copy protection are honest consumers who paid full price for the programs they use.
Copy protection comes in many forms, and it is an attempt by manufacturers to justi ably limit the use of their software to people who
actually buy it. The simplest form of protection requires you to enter a serial number when the program is rst installed. In practice this protects
very little, since anyone can lend the installation disks to a friend along with the serial number. All this does is minimize the chance that
someone will upload the program to a web torrent for strangers to retrieve. Since they’d have to include the serial number, that number could
be used to identify them—unless they never registered the program in the first place.
A more severe form of copy protection uses a device called a dongle; the most popular system currently is the iLok. Years ago dongles were
included for free with the software, but the iLok must be purchased separately, in addition to the software it protects. Old-style dongles plugged
into a computer’s parallel or serial port, though these days a U SB port is standard. If the dongle is not detected, the program refuses to run. U SB
ports are a big improvement over the old days. I recall years ago visiting a local music store that had many di erent protected programs
installed on their main demo computer. There were half a dozen dongles connected to the computer in a chain, sticking a foot out the back. One
day someone bumped into the computer, and it toppled over, snapping off all the dongles. Ouch.
Another protection method, called challenge/response, requires you to phone or email the manufacturer when the program is installed or ll
in a form online. After you enter your name and address and the program’s serial number, you receive a second code number that’s needed
along with the main serial number before the software will work. I remember well one Saturday a few years ago when I was having a problem
with Sony Vegas, the video editing program I use. I decided to reinstall Vegas to see if the problem would go away. After I reinstalled Vegas, it
“phoned home” to verify my ownership. U nfortunately, Sony’s website was down. The program that had worked just minutes earlier now
refused to open at all. And being a weekend, nobody at Sony was available to help me by telephone. I lost an entire weekend that I could have
been working on my project.
It would be di cult to condemn copy protection if it protected publishers without harming legitimate users. U nfortunately, it often does harm
legitimate users and rarely thwarts software pirates. Some older protection schemes interfere with disk optimizers, requiring you to uninstall all
of the programs each time you defragment your hard disk and then reinstall them all again after. I defragment my hard drive regularly when I’m
working on a large audio or video project. Having to uninstall and reinstall a dozen programs and plug-ins every time would be a terrible
nuisance! Admittedly, this is less of a problem today now that hard disks are cheap and project les are usually kept on a separate drive that can
be defragmented independently.
Any copy protection scheme that requires intervention from the publisher has the potential to cause you disaster. Suppose you’re working on
a project and your hard disk fails. So you go to the local o ce supply store and buy another, only to learn that you already used up your two
allowable installations. Even the seemingly benign method of phoning the vendor for an authorization number is a burden if you’re working on
a weekend and can’t reach them. Or the dongle could simply stop working. You’re in the middle of a project with a client paying $150 per
hour, but you’re totally hosed because even with overnight shipping, the new dongle won’t arrive until tomorrow.
The ultimate disaster is when a software vendor goes out of business. In that case you can forget about ever getting a replacement dongle or
new challenge/response code. I have thousands of hours invested in my music and video programs. This includes not only the time spent
creating my audio tracks, MIDI sequences, printed scores, and video edits, but also the time it took me to learn those programs. This is one
important reason I chose SONAR: It uses a serial number, plus a challenge/response number you need to obtain only once. Once you have both
numbers, they’ll work on subsequent installations, even to another computer.
Microphone Types and Methods
Microphones are at the heart of almost every recording. Just as important is the acoustic environment in which you use them. Chapter 15
explains the inner workings of microphones, so this section addresses mainly how to use them. I’ll admit up front that I’m not a big fan of
dynamic microphones generally, though many pros love them. I prefer condenser microphones that have a at response, with either a cardioid
or omnidirectional pickup pattern as appropriate, to capture a clear sound that’s faithful to the source. Few dynamic microphones have a
or omnidirectional pickup pattern as appropriate, to capture a clear sound that’s faithful to the source. Few dynamic microphones have a
response as
at as condenser models, and even fewer have a response that extends to the highest frequencies. However, you don’t need a
response out to 20 KHz to capture a good tom or kick drum sound, and any decent dynamic mic is fine for that.
U nlike audio electronic gear that’s usually very
at, the frequency response of microphones can vary wildly. Some have an intentional
“presence” boost in the high-mid or treble range. All directional microphones also have a proximity e ect, which boosts low frequencies when
the microphone is placed close to a sound source. This is why many cardioid mics include a built-in low-cut lter. Conventional wisdom says
you should choose a microphone whose frequency response complements the source you’re recording. Again, I’d rather use a
at microphone
and add EQ to taste later when mixing, when I can hear all of the parts in their nal context. But this is just my opinion, and some pros may
disagree. And that’s fine.
Microphones are categorized by how they convert acoustic sound waves into electrical voltages and also by their pickup patterns. Dynamic
microphones use a thin plastic diaphragm attached to a coil of wire surrounding a permanent magnet. Ribbon microphones are similar and are
in fact also considered dynamic because they use electromagnetism to generate a voltage. But ribbon mics use a single thin metal ribbon
suspended in the eld of a permanent magnet; the ribbon serves as both the diaphragm and the coil. Condenser microphones also have a plastic
diaphragm, coated with an extremely thin layer of metal. As sound waves displace the diaphragm, it moves closer or nearer to a xed metal
plate. Together, the diaphragm and
farther to the
xed plate form a capacitor whose capacitance changes as the diaphragm moves in and out, nearer or
xed plate. A special preamp built into the microphone converts the varying capacitance to a corresponding electrical voltage.
Note that many condenser microphones require a DC polarizing voltage to operate. Condenser mics that don’t require external polarization are
called electret condensers, named for the property that lets them hold a permanent DC charge.
A third type of microphone category classi es them by the size of their diaphragms. Most microphones are either large diaphragm or small
diaphragm, typically an inch or larger in diameter, or half an inch or smaller. There are also microphones that I call “tiny diaphragm,” about ¼
inch or less in diameter. When all else is equal, mics that have a large diaphragm give a better signal to noise ratio because they capture more of
the acoustic sound wave and output a higher voltage. Conversely, microphones having a small diaphragm often have a atter high-frequency
response because a diaphragm having a lower mass can vibrate more quickly. They’re also atter because their diaphragm size is smaller than
one wavelength at very high frequencies. Small-diaphragm microphones are prized for their at, extended response, making them a popular
choice for acoustic guitars and other instruments having substantial high-frequency content. Note that there’s no rule for categorizing
microphones by their diaphragm size. The large, small, and tiny size names are merely how I describe them.
As mentioned, active microphones—all condenser types and some newer ribbons—contain a built-in preamp. Therefore, less gain is needed in
your mixer or outboard preamp with those types compared to passive dynamic microphones. Some people are concerned about the potential
interaction between a microphone and its preamp, but with active microphones, this interaction has already occurred inside the mic.
I’ll also mention wireless microphones, which are popular for recording live performances. These are used mostly by singers because it frees
them from the constraint of a mic wire and gives animated performers one less thing to worry about tripping over. Wireless guitar and bass
transmitters are also available, sending the electrical signal from the instrument to an ampli er elsewhere on the stage. In-ear monitors (IEMs)
are equally popular with performers at live shows, and these can be either wired or wireless. U sing a wireless microphone or guitar transmitter
with a wireless IEM system gives performers a huge degree of freedom. But don’t forget to recharge all those batteries before every performance!
Micing Techniques
The rst decision you need to make when choosing where to place microphones is whether you want a close-up dry sound or more ambience
and room tone. Room tone works best in large rooms—say, at least four or ve thousand cubic feet or larger. The room tone in a bedroom is
generally very poor, having a small-room sound that’s often boxy and hollow due to comb ltering and early echoes from nearby boundaries. In
small rooms I suggest placing microphones as close to the source as is practical. You can easily add ambience and reverb later when mixing. An
external reverb will have a better sound quality than the real ambience from a small room, even if your only reverb is an inexpensive plug-in.
At the risk of being too obvious, the farther away you place a microphone, the more room tone you’ll capture. I remember my rst visit to
Studio A at Criteria Recording Studios, being surprised at how reverberant the live room sounded. But the natural reverb has a very neutral
quality, thanks to the large size of the room and the presence of both absorption and di usion. So engineers there can control the ratio of direct
to ambient sound entirely with mic distance when recording. For the rest of us, when in doubt, record closer and dryer because it’s easy to add
reverb to taste later when you can hear everything in context.
Another basic decision is whether to record as mono or stereo. I know professional recording engineers who always record single instruments
using only one microphone, but most record in stereo at least occasionally. For pop music projects, recording each instrument in mono makes a
lot of sense. U sing two microphones, with each panned fully left and right, can make an instrument sound too large and dominating. That might
be just what you want for a solo act where one person plays piano or guitar and sings. And obviously a classical recording of a solo piano or
violin bene ts from a full-sounding stereo eld. But a trumpet or clarinet overdub on a pop tune is often best recorded in mono, then panned
and EQ’d to not interfere with the other instruments.
Another option is to record in stereo with two microphones but not pan them fully left and right when mixing. For example, one microphone
might be panned fully left, with the other panned to the center or slightly left of center. A string or brass section might be more suitable in stereo
to convey the width of the section rather than sound like ve players are all standing in the same place, which is unnatural. There are no hard
and fast rules for this, and the requirements vary depending on the session. You could always hedge your bets by recording in stereo, then decide
and fast rules for this, and the requirements vary depending on the session. You could always hedge your bets by recording in stereo, then decide
whether to use one or both microphones when mixing. Even if you know you’ll use only one microphone in the mix, recording in stereo lets you
decide later which of the two mic placements sounds better. Ideally, a true stereo recording will use a matched pair of microphones having
identical frequency responses. When that’s not possible, both mics should at least be the same model.
As mentioned, microphones have di erent directional properties. There are three basic pickup patterns, with a few minor variations. The
most popular pattern is probably cardioid, named for its resemblance to a stylized heart shape, as shown in Figure 6.2.
Figure 6.2:
The cardioid microphone pickup pattern is name for its heart-like shape when graphed. Sound arriving from the front is typically 5 dB louder than sound arriving from
either side and 20 dB or more louder than sound coming from the rear.
Another basic pickup pattern is called Figure 8, or bidirectional, as shown in Figure 6.3. This pattern is typical for ribbon microphones due to
their open design that exposes both sides of the ribbon. Many large diaphragm condenser microphones also o er Figure 8 as one of several
pickup patterns.
Figure 6.3:
A Figure 8 microphone captures sound equally from the front and rear, rejecting completely sound arriving from the sides.
The third basic pattern is omnidirectional, often shortened to omni. In theory, an omni microphone doesn’t favor any direction, responding
equally to sound arriving from every angle. In practice, most omni mics slightly favor sound from the front at the highest frequencies. Some
microphones o er more than one pickup pattern. For example, the Neumann U 87 has a three-position switch to select cardioid,
omnidirectional, or Figure 8. Other microphones omit a switch but are sold with optional capsules having di erent patterns that screw onto the
mic’s body. Ribbon microphones are usually Figure 8, but most nonswitchable dynamic and condenser mics are either cardioid or
One great thing about a Figure 8 pattern is it rejects sounds arriving from both sides completely at all frequencies. The pickup pattern of
cardioid microphones generally varies with frequency, having poorer rejection from the rear and sides at the lowest frequencies. This is an
important consideration with mic placement because musical instruments and other sounds that are picked up o -axis by a cardioid mic will
sound muddy compared to the brighter ( atter) sound of sources captured on-axis. Many professional recording engineers believe it’s better to
have more leakage that sounds clear rather than less leakage that sounds muffled.
I’ll also mention a fourth pattern called supercardioid, which is even more directional than cardioid. The supercardioid pattern is common
with shotgun microphones, those very long tube-shaped mics attached to a pole that news sound crews use for interviews on the street. The main
advantage of a supercardioid pattern is it lets you pinpoint a sound source from farther away, which helps when recording in noisy or
reverberant environments.
There are a number of common stereo micing arrangements. One is called X/Y, where a pair of cardioid microphones is placed with their tips
adjacent but pointing in opposite directions. Figure 6.4 shows a pair of AKG C-451 cardioid mics set up in an X/Y pattern to capture a solo
cellist in stereo in my home studio. One big advantage of X/Y placement is it avoids comb ltering due to phase cancellation if the recording is
played in mono. No matter which direction a sound arrives from, it reaches both microphones at the same time.
Figure 6.4:
This pair of AKG microphones is arranged in an X/Y pattern, with the tips together but pointing left and right.
A popular X/Y variant used when recording orchestras or other large groups is the Blumlein Pair, named for its inventor Alan Blumlein. In this
case, two Figure 8 microphones are used instead of cardioids, and they’re angled at exactly 90 degrees. Since the Blumlein method captures
sound from both the front and rear, you’d use it when you want to capture more of the room sound.
Another popular stereo arrangement is spaced omni or spaced cardioid, using two microphones spaced some distance apart. For solo
instruments in a studio setting, the microphones can be as close as a few feet, as shown in Figure 6.5. For larger ensembles, a wider spacing is
appropriate, as shown in Figure 6.6. You’ll sometimes see mics spaced only ve or six inches apart to emulate the space between our ears for
recordings meant to be heard through earphones.
Figure 6.5:
These audio-technica AT4033 cardioid microphones are spaced about three feet apart, about two feet in front of the performer.
Figure 6.6:
For this recording of an eight-piece string section, the AT4033 microphones were placed farther apart to better convey the width of the section. Only one mic is visible at
the top of the photo; the other is at the left outside the frame.
The strings session in Figure 6.6 was for a recording that emulated an orchestra, so the sound was mic’d to be intentionally large and wide. For
this session there were two rst violins, two second violins, two violas, and two cellos. Each passage was recorded three times to further give the
illusion of a full orchestra. Wide spacing with distant microphones doesn’t usually work well in small rooms, but my studio is 33 feet front to
back, 18 feet wide, with a ceiling that peaks at 12 feet in the center. You can hear this recording in the example file “string_section.wav.”
A variation on the spaced omni method is called the Decca Tree, developed in the 1950s by recording engineers at Decca records. This method
spaces the left and right omni microphones about six feet apart, but also adds a third omni microphone in the middle about three feet forward
of the outer mics. This method is popular for recording classical orchestras and movie sound tracks, with the mic array placed high above the
conductor’s head. When mixing, the outer mics are panned hard left and right, and the center mic is panned to the middle. A Decca Tree takes
advantage of specially chosen omni mics that are somewhat directional at high frequencies. The original version used Neumann M50 mics, and
this arrangement doesn’t work as well with other omni mics.
The last stereo microphone arrangement I’ll mention is called Mid/Side, and it’s based on a very di erent principle than the other micing
methods. Two microphones are used—one is usually cardioid and the other is always a Figure 8. The cardioid mic points forward, and the
Figure 8 is placed adjacent directly above or below, with its null facing forward. With this arrangement, the cardioid mic captures the center
information, and the Figure 8 captures both the left and right side ambience only. Individually, the mid mic, when the array is properly placed,
will give a good mono picture of the source, and the side mic will sound odd and hollow. When mixed together, the cardioid microphone is
panned to the middle, and the Figure 8 is split left and right, with the polarity of one side reversed. Both the microphone placement and
electrical setup are shown in Figure 6.7.
Figure 6.7:
Mid/Side stereo recording uses two microphones. One is typically a cardioid type facing toward the sound source, and the other is always a Figure 8 facing away from the
source to capture sound from both sides only.
The main selling point of Mid/Side micing is that you can adjust the width of the stereo
eld after the recording is made by raising or
lowering the volume of the Mid microphone. Mid/Side micing also avoids phase cancellation when the stereo mix is played in mono.
The 3-to-1 Rule
As mentioned earlier, when sounds leak into microphones meant for other performers, the result is excess ambience and echoes. But leakage
also causes comb ltering if the delay time is less than about 20 milliseconds and the unwanted sound is fairly loud compared to the desired
sound. A general rule for mic placement is to keep unwanted leakage at least 10 dB below the desired sound. Chapter 1 showed the relationship
between dB volume levels and acoustic distance, with the level falling o at 6 dB per octave each time the distance is doubled. Obtaining a
10 dB level reduction requires a difference of about three times the distance, as shown in Figure 6.8.
Figure 6.8:
When separate microphones are used to capture di erent performers, the 3-to-1 Rule requires the unwanted source to be at least three times farther from the microphone
than the intended source to avoid comb filtering. This assumes that both instruments are sounding at the same acoustic volume and are also mixed at equal levels.
When the volume of the acoustic sources are the same, and the gain of both preamps are equal, this yields peaks of about 2 dB and nulls
about 3 dB deep for the unwanted signal as captured by the main microphone. The same happens for intended sound leaking into the other
microphone. Leakage doesn’t always sound bad, but minimizing comb ltering can only help. Of course, if one source is louder than the other,
or the preamps use di erent amounts of gain to compensate, the peak and null amounts will vary. The same goes for instruments that are not
mixed at the same level for artistic reasons, such as when a rhythm acoustic guitar is mixed at a lower level than a lead acoustic guitar. It should
be noted that the 3-to-1 Rule is often quoted, but that ratio applies only to omni mics on sources that are the same volume. Since recording
setups vary wildly, it may be best to forget rules like this and just fix any problems when you hear them.
Microphone Placement
In a large room it almost doesn’t matter where you put the performers as long as they’re not too close to re ecting walls. But for the small
rooms many people record in today, it’s di cult to not be near a wall or low ceiling. A good general rule is placing the musicians and
microphones at least 10 feet way from any re ecting surface. As explained in Chapter 3, re ections arriving within about 20 milliseconds of the
direct sound are more audibly damaging than reflections that arrive after 20 milliseconds.
Sound travels at a speed of about 1 foot per millisecond. So after a round trip to a boundary 10 feet away and back, the re ections arrive 20
milliseconds later. Re ections that arrive “late” do not create severe comb ltering, partly because they’re softer due to distance. When recording
in a small room, the best solution is to put absorbers or di users on nearby surfaces, including the ceiling. Chapter 19 explains these treatments
in much more detail, so for now I’ll focus mainly on practical considerations.
The most important factors a ecting the quality of sound captured by a microphone are its distance from the source and its directional pattern
and angle to the source. Directional microphones can be aimed toward one instrument and away from others playing at the same time. If you
mainly record yourself playing one instrument at a time, you don’t have to worry so much about angles and o -axis rejection. However, where
the mic is placed, and how close, is still important. All directional mics have a proximity e ect that boosts bass frequencies when placed close to
the source, though this is easily countered later with EQ when mixing. Or you can use an omnidirectional microphone to avoid proximity e ect.
Close-micing is common for most pop music, but even orchestra recordings will use a “spot mic” placed close to a soft instrument, such as a
Ideally, you’ll have an assistant who can move microphones around while you listen through speakers in another totally isolated room, though
few home recordists have that luxury. One solution is to put on earphones and listen to how the sound changes as you move the mics around
while the performer plays. However, it helps to play the earphones loudly to drown out any acoustic leakage into the phones that can in uence
what you hear and in turn a ect your placement choice. You could also make a quick trial recording to con rm everything sounds correct before
recording for real. All that said, I admit that I rarely do any of these things. If you have a decent-sounding room and decent microphones, just
place the mics a reasonable distance away from the instrument or singer and you’ll usually get great results. But what exactly is a “reasonable”
For most sources, the best microphone placement is fairly close to the source. Guitar and bass amps are often mic’d from a few inches away
and rarely more than a foot. You might place a second mic farther away in a good-sounding room, but the main mic is usually very close.
Tambourines and shakers and other hand percussion sound best mic’d closer than a foot away, or you’ll get too much room tone if the room is
small. I generally try to keep singers a foot away from the mic. Being closer can give a more intimate sound but risks popping Ps and Bs and
small. I generally try to keep singers a foot away from the mic. Being closer can give a more intimate sound but risks popping Ps and Bs and
sibilance on S and T sounds. Many recording engineers use a pop lter with singers to block the air blasts that cause popping, though personally
I’ve never used one. I nd it better, and easier, to either put the microphone o to the side or above the singer, pointing down. This way the mic
can still be close enough to get a clear, dry sound, but any air blasts from the singer’s mouth go forward, away from the mic or underneath it.
A wind screen is a common microphone accessory, typically made of open cell foam, which passes sound waves but blocks blasts of air. Wind
screens can serve e ectively as a pop lter, but their main purpose is to minimize the rumbling sound of wind noise when recording outdoors.
Wind screens always affect the high-frequency response at least a little, so they shouldn’t be used unless absolutely necessary.
Microphone shock mounts are almost always useful and needed. Figure 5.3 shows one of my audio-technica 4033 mics with its included shock
mount. This is typically a metal ring that holds the microphone, which in turn is supported by rubber bands to decouple the microphone from
its stand. U nless your oor is made of solid cement, it most likely exes and resonates at least a little as people walk or tap their feet. Without a
shock mount, low-frequency rumble can travel mechanically from the oor through the stand into the microphone. Worse, the frequencies are so
low that many small monitor speakers won’t reproduce them. This is a good reason to always check your mixes through earphones, even if
briefly just for a spot check.
Drums also bene t from close micing, though it depends on the musical style. A big band sounds most authentic when mic’d from farther
away, and even pop recordings sometimes have a stereo pair of mics over the set several feet away. In a large room with a high ceiling, more
distant placement can sound great. In a small room with a low ceiling, that’s not usually an option. One additional complication with drums is
there are so many di erent sound sources near to each other. So no matter what you do, every microphone picks up at least some of the sound
from another nearby drum or cymbal. U sing cardioid and Figure 8 microphones and paying attention to their direction and pickup angles helps
to capture more of the desired source.
A common technique with snare drums is to place one microphone above the drum, pointing down at the drum head, and a second
microphone below the drum, facing up to capture more of the snare sound. When using two mics this way, you must reverse the polarity of one
of them to avoid canceling the low end when both mics are mixed together. U sually the polarity of the bottom microphone is reversed to avoid
cancellation between the top snare mic and other mics on the kit that face the same way.
It’s common to see cardioid microphones suggested for overhead drum mics in a room with a low ceiling to avoid picking up re ections from
a low ceiling, with their resultant comb ltering. I’ve seen this recommended for piano mics, too, when microphones are under the piano’s open
lid, which also gives strong re ections. But this doesn’t really work as intended, as proven by the “comb_ ltering” video in Chapter 1. A
microphone picks up whatever sound exists at its location. When comb ltering is caused by a re ection, the peaks and nulls occur acoustically
in the air. So pointing a microphone one way or another is mostly irrelevant, because the sound in the air has already been in uenced by the
reflections. For this reason, it’s common when recording a grand piano to remove the top lid completely, rather than prop it up on the stick.
Some large instruments such as the cello and double bass radiate di erent frequency ranges in di erent directions. Likewise, an acoustic guitar
sounds very di erent if a microphone is placed close to the bridge versus close to the sound hole. In both cases, there’s no one good place to put
a single mic close up. For best results, you need be several feet away to capture all of the frequencies, which is not always possible. When you
must use close-micing on these sources, omni microphones often give the best results. Earthworks sells tiny omni microphones for close-micing
grand pianos, and DPA o ers tiny supercardioid microphones that clip onto the bridge of a violin or string bass, or a trumpet or sax bell. These
microphones can sound surprisingly good given their very close proximity.
DI=Direct Injection
Acoustic sources obviously need to be recorded with microphones, but electronic instruments such as synthesizers, electric pianos, and electric
basses are often recorded directly from their output jack. The main advantage of recording instruments electrically is the sound is clearer because
there’s no contribution from the room. Electric basses bene t further because few bass amps are at and clean down to the lowest frequencies.
Further, most small rooms degrade the sound much more at low frequencies compared to frequencies above a few hundred Hz. Of course,
sometimes you want the sound of a bass going through an ampli er. Even then, it’s common to record a bass using a microphone and also
record it direct at the same time. If nothing else, this gives more exibility later when mixing. When an electric instrument is recorded
simultaneously through a microphone and directly, you may need to delay the direct sound when mixing to avoid phase cancellation. The
farther the microphone was from the bass or keyboard amplifier’s loudspeaker, the more you need to delay the direct sound.
Any device that has a line-level output can plug directly into a console or sound card input without needing a preamp. This works ne when
the instrument is close to whatever you plug it into. But many keyboards have unbalanced outputs, and some line inputs are unbalanced, so hum
pickup through the connecting cable is a concern. For an electric piano in the control room, using an unbalanced wire 10 to 20 feet long is
usually ne. But if the player is in a separate live room with the other performers or on stage 100 feet away from the mixing console, you’ll
need a DI box. This converts the electric piano or synthesizer output to a low-impedance balanced signal that’s sent to a mic input of a console
or outboard preamp.
DI boxes are available in two basic types: active and passive. The earliest DI boxes were passive, containing only a transformer. The
transformer’s high-Z input connects to a ¼-inch phone jack, and its output is typically a male XLR connected to a microphone input through a
standard mic cable. It’s not practical to build a transformer having a very high-input impedance, at least not as high as 1 MΩ, which is best for
an electric bass with a passive pickup. Active direct boxes with very high input impedances
even be phantom powered, avoiding the need for batteries.
rst appeared in the 1980s. Some models could
even be phantom powered, avoiding the need for batteries.
Every DI box includes a ground lift switch to avoid ground loops between a local bass ampli er, if connected, and the distant input it feeds.
Some models have a polarity reverse switch, and some include an input pad to accommodate higher-output line-level sources. For synthesizers
and electric pianos having a line-level output, a passive DI such as the model shown in Figure 6.9 is ne. But for a passive electric bass or an
old-style Fender Rhodes electric piano with passive magnetic pickups and no preamp, an active DI box having a very high-impedance input is a
better choice.
Figure 6.9:
These Switchcraft SC800 and SC900 passive DI boxes contain high-quality Jensen transformers that convert a high-impedance source to a low-impedance output suitable
for feeding a mic preamp. These models include a pass-through jack in case you want to send the instrument to a keyboard or bass amplifier at the same time.
Photos courtesy of Switchcraft, Inc.
Chapter 4 explained that low-impedance outputs can drive long wires without losing high frequencies due to capacitance. In truth, it’s a little
more complicated than that. Two factors determine what happens when an output drives a length of wire: A low output impedance minimizes
airborne signals such as hum getting through the shield to the signal wires inside because it acts like a short circuit to those signals. But an output
can have a low impedance, yet not be able to provide enough current to charge the capacitance of a long wire. In that case the high-frequency
response will su er, and distortion can also increase. This is rarely a problem with power ampli ers, but it could be a concern when an
electronic keyboard has to drive a very long wire. So this is another reason a DI box is needed with long wires, even if the keyboard’s line output
has a very low impedance.
Additional Recording Considerations
In Chapter 5 I recommend recording without EQ or compression or other e ects, unless the e ect is integral to a musician’s sound and a ects
how they play. Modern digital recording is extremely clean, so there’s no audio quality bene t from committing to an e ect while recording.
Indeed, I’d rather defer all taste decisions until mix-down when I can hear everything in context. In particular, the reverb units built into many
guitar amps are generally poor quality. So unless a guitar player requires the particular reverb sound from his or her ampli er, I prefer they turn
off built-in reverb.
It’s common to use baffles—often called gobos, for go-between—to avoid picking up sounds from instruments other than the intended source a
microphone is aimed at. Gobos serve two related purposes: They prevent the sound of other instruments from getting into the microphone, and
they can reduce the strength of room re ections reaching the mic. Figure 6.10 shows a RealTraps MiniGobo in a typical application, placed
behind a microphone aimed at my Fender guitar amp.
Figure 6.10:
This small gobo helps prevent the sound of other instruments in the room from reaching the microphone at the guitar amp.
Figure 6.10:
This small gobo helps prevent the sound of other instruments in the room from reaching the microphone at the guitar amp.
Gobos are never a perfect solution because sound can go over the top or around the sides. Low frequencies are especially di cult to contain
with baffles, though they can still help a lot. Many gobos are built with one side absorbing and the other side reflecting, so you can choose which
type of surface to put near the performer and microphone. My own preference is for both sides to absorb, because I rarely want a re ecting
surface near a microphone. But some players feel more comfortable when hearing some re ected sound, rather than playing into a dead wall.
Having a choice available allows for experimenting, which is always a good thing when placing microphones.
Most gobos are much larger than the small model shown around my guitar amp. Figure 6.11 shows a RealTraps GoboTrap that’s six feet high
by four feet wide. This gobo includes large wheels so it’s easy to move around, but the wheel base is lower in the center to prevent sound from
going under the gobo.
Figure 6.11:
This wheeled gobo is large enough to block much of the sound from an entire drum set.
Sound leakage from other instruments is not always a bad thing, and it can add a nice “live-sound” feel to a studio recording when the
instruments are close-mic’d. What matters most is the quality of the leakage—mainly its frequency balance—and how long it takes the delayed
sound to arrive at the unintended microphones. I remember learning a valuable lesson when I rst started recording professionally years ago. I
set up a rock band in my studio’s large live room, putting the players far apart from one another, thinking that would reduce the amount of
leakage picked up by the mics. But putting people far apart from each other made the inevitable leakage arrive very late. So instead of less
added ambience, the sound was dominated by obvious echoes! Sound travels all over an enclosed space, even a large space. Lesson learned: Set
up the band with everyone close to one another, as they’d be when performing live. It also helps the band when they can see and hear one
Some directional microphones have a non at response o -axis, so sound arriving from the rear or sides can sound muddy with less highs.
Also, many cardioid microphones are directional at most frequencies but become less directional below a few hundred Hz. I suggest you verify
the off-axis response versus frequency for microphones you buy (or rent).
Vocals and spoken narration are almost always recorded with a totally dry sound. A wraparound ba e called a carrel is shown in Figure 6.12.
These are used for library study desks and telephone call centers, and they’re equally popular with voice-over artists who record at home. Its
three large panels wrap all the way around the microphone, creating a very dry environment that also blocks ambient noise from being
Figure 6.12:
The RealTraps Carrel lets voice-over talent capture a clear, dry recording in an overly ambient environment.
If an artist prefers not to use earphones while overdubbing vocals in the control room, you can play the speakers softly. It’s possible to reduce
leakage into a microphone further by reversing the polarity of one loudspeaker while recording, assuming your mixer hardware or software
leakage into a microphone further by reversing the polarity of one loudspeaker while recording, assuming your mixer hardware or software
o ers a way to do that. Leakage will be minimum when the sound picked up by both speakers is the same, which occurs when the microphone
is exactly centered left and right. You can adjust the pan pot slightly while playing the track and watching the record level meter to nd where
rejection is greatest. Rolling o the lowest and highest frequencies playing through the speakers helps further because most cardioid
microphones reject less at the frequency extremes. That said, I prefer to avoid all leakage when possible because that gives the most exibility
when mixing. Even if the leakage is soft, it will become louder between words and phrases if the track is sent through a compressor.
My nal mic-related tip is really an admission. I’m the laziest person in the world, and I can’t be bothered putting microphones away after
every session. But I want to protect them from dust settling on the capsules, which over time can dull high frequencies. So I leave the mics on
their stands and place plastic sandwich bags over them when they’re not in use.
Advanced Recording Techniques
Low-budget productions often rely on one musician to play an entire section one part at a time, and this is commonly done with string, brass,
and woodwind players. If the parts are all di erent for harmony, having one person play everything can sound convincing. But if your aim is to
create the sound of a large unison section with one person playing the same part several times, comb ltering often results, creating an unnatural
hollow sound. Fortunately, there are several ways to avoid this.
One approach has the performer play di erent instruments for each overdub. This is especially e ective with stringed instruments because
violins and cellos all sound di erent from one another. Comb ltering relies on two or more sound sources having the same frequencies present
in equal levels. So the more each instrument’s frequency output varies for a given note, the weaker the comb
own three or four different instruments, but even using two helps a lot.
ltering will be. Few musicians
Another approach puts the player in di erent parts of the room for each take, while leaving the microphones in the same place. The idea is to
simulate a real section, where players are typically a few feet apart from one another. If you combine this with the player using di erent
instruments for each performance, the result is identical to recording a real section comprised of separate players. A few years ago I recorded a
friend playing two unison cello parts for a pop tune. For the rst take he played his cello, and for the second he moved his chair three feet to
one side and played my cello. The result sounded exactly like two cellists, rather than one part that was double-tracked.
If you’re using two or more players to create a larger section comprising multiple parts, you can use a variation on this method where the
players exchange parts while recording each pass. For the rst recording, one musician plays the lowest part, while the other covers a higher
harmony. Then for the next recording pass, the performers rotate parts, so unisons are avoided by having di erent physical instruments, and
musicians with slightly di erent phrasing play the same part. This works equally well when doubling or tripling a small number of background
singers to get the sound of a larger group. Instead of singing the same part each time, each person sings a different part on subsequent overdubs.
Everyone knows the “chipmunk e ect,” popularized by the 1958 recording The Chipmunk Song by David Seville (real name: Ross Bagdasarian).
In fact, this e ect is much older than that, going back at least to 1939, when it was used on the voices of the Munchkins in The Wizard of Oz.
The basic premise is to record a voice or musical instrument at a given tape speed, then increase the speed when you play back the recording. In
the old days, most tape recorders had xed speeds at 2-to-1 multiples. So you could, for example, record at 7.5 inches per second (IPS) and play
back at 15 IPS. If you’re overdubbing to an existing backing track, the backing plays an octave lower and at half speed while recording. Then
when the tape is played back at the original speed, the overdub sounds an octave higher.
Doubling the speed creates a pretty severe e ect. For example, some words in the Chipmunks songs can be di cult to understand. Such a
severe pitch change also sounds unnatural, though obviously that’s the point with the Chipmunks. But much smaller amounts of pitch change can
work to great advantage. A friend of mine is an amateur singer, and he often shifts the pitch of his lead vocals up one semitone. That small
amount of change makes a huge improvement to the quality of his voice! This also works the other way, playing a backing track at a faster
speed while you sing or play. The result when played back is a lower pitch with a corresponding change in timbre. Think of Tony the Tiger, the
cartoon mascot for Kellogg’s Frosted Flakes; the Jolly Green Giant selling vegetables; or the castle guards, also from The Wizard of Oz.
Years ago in the 1970s, I recorded myself singing all four parts to Bach Chorale #8, one of the
nest examples of four-part harmony writing
you’ll ever hear. I’m not much of a singer, and my range is very limited. So to sing the soprano and alto parts, I used Vari-Speed to raise the
pitch of my voice, and for the bass part, I lowered the pitch. In those days, the only way to change the tape speed by an amount less than 2-to-1
was to vary the frequency of the 60 Hz AC power that drives the tape recorder’s capstan motor. So I bought a 100-watt Bogen PA system power
amp and built a sine wave oscillator whose output frequency could be varied between 30 and 120 Hz.
Chapter 4 explained the principle of 70-volt speaker systems, using power ampli ers that have a special output transformer with a 70-volt
tap. My Bogen tube amp’s transformer included a tap for 70-volt speakers, but it also had a second tap to output 115 volts. I connected my
home-made sine wave oscillator to the ampli er’s input and wired up a connector to mate with the capstan motor of my Ampex AG-440 4-track
tape recorder. I also added a second connector at the power amp’s output for a voltmeter so I could set the output to exactly 115 volts to avoid
damaging the capstan motor.
Since small pitch changes shift the timbre of a voice or instrument, it can also be used to avoid comb ltering when overdubbing unison parts.
Voices and string instruments both contain multiple strong resonances that increase the level of select frequencies. So shifting the pitch up or
down one or two musical half-steps is similar to playing a di erent violin or using a di erent singer. Be aware that Vari-Speed also changes the
down one or two musical half-steps is similar to playing a di erent violin or using a di erent singer. Be aware that Vari-Speed also changes the
rate of your vibrato. The more you intend to raise the pitch of a performance, the slower your vibrato must be while singing or playing to avoid
sounding unnatural. Also be aware that reducing the speed while recording to ultimately raise the pitch requires sustaining notes longer. When I
recorded the four-part Bach Chorale mentioned above, I had a heck of a time with the high soprano part. The slower I played the backing track,
to better hit the highest notes, the harder it was to hold the notes because they had to sustain longer. By the time I reduced the speed enough to
hit the highest notes, I’d run out of breath trying to hold them for as long as needed!
Fortunately, having to hack your own Vari-Speed electronically is ancient history, and today’s audio software greatly simpli es the process.
The computer audio equivalent of Vari-Speed is sample-rate conversion. Most audio editing software does this, letting you choose the amount of
pitch shift in musical half-steps and cents, rather than have to calculate Hz and IPS tape speeds. When I recorded the audio for my Tele-Vision
music video in 2007, I recorded myself singing several background vocal tracks using Vari-Speed. My aim was to change the timbre slightly for
each performance to sound like several people. I rst exported a mix of the backing track and loaded it into Sound Forge. From that I created
four more versions, saving each separately—one up a half step, another up two half-steps, one down a half-step, and another down two halfsteps. Then I loaded all ve backing tracks into a new SONAR project and solo’d them one by one as I sang along. Next, I loaded each vocal
track into Sound Forge and shifted them up or down as needed to get back to the correct key and tempo. After saving all the vocal tracks as new
Wave les, I loaded them into the original SONAR project. Whew! Fortunately, some DAW programs can vary playback speed on the y, in
which case all those extra steps aren’t needed.
This chapter covers a lot of territory, including comparing common recording devices such as analog tape recorders and computer DAW software
for both delity and convenience, as well as cost. Analog tape recorders are complex and require clever engineering to obtain acceptably high
delity. These include tape bias, pre- and de-emphasis, several popular noise reduction schemes, pre-distortion to counter analog tape’s high
inherent distortion, and Sel-Sync to allow overdubbing.
The basics of digital recording were presented, including sample rate and bit depth. I also busted a few digital audio myths, such as the belief
that recording at low levels reduces distortion and showing why headroom doesn’t really apply because digital systems are perfectly clean right
up to the onset of gross distortion. I also compared mixing ITB versus OTB, explained how to use a click track, and mentioned the SMPTE
system that synchronizes two or more audio or video hardware recorders.
I explained a bit about folder organization with DAW projects, along with the value of setting up a recording template. We also considered
why recording overdubs to separate tracks or lanes in a DAW program is superior to the old-school method of punching in, which risks
overwriting a previous acceptable recording.
You learned that the quality of the musical source and the acoustics of the room matter more than anything else, and when recording
amateurs, it helps to put them at ease. Giving performers a good cue mix that lets them hear everything clearly, hopefully with a touch of reverb,
also improves con dence. I couldn’t resist injecting my personal feelings about copy protection, though I agree that software piracy is a real
Microphone types and patterns were described in detail, along with typical mic placement and other recording techniques. Several popular
methods for stereo micing were shown, including Mid/Side, which lets you adjust the stereo width after the fact when mixing. You also learned
that a large room o ers more exibility for mic and instrument placement, versus small rooms where the best choice is to close-mic everything
with the performers and microphones far away from re ecting boundaries. In a small room, absorption and di usion can reduce the strength of
re ections, and gobos can minimize leakage in large and small rooms alike. However, a DI box avoids mic placement and room acoustic issues
when applicable and also avoids leakage and other problems common to electric bass and keyboard amplifiers.
Finally, you learned some clever tricks to make a single performer sound like a section, using di erent instruments, di erent room placement,
and even di erent recording speeds. This led to a bit of history about Vari-Speed and an explanation of how to achieve the same e ect using
modern audio software.
Chapter 7
Mixing Devices and Methods
In the earliest days of recording, “mixing” was performed by placing a group of performers around a single microphone, with some standing
closer or farther than others to balance their volumes. If the sax player had a solo, he’d move closer to the microphone, then move back again
afterward. All of this was recorded live in mono through one microphone to either a tape recorder or a record-cutting lathe. If anyone ubbed
his or her part, the whole group had to start again from the beginning.
As analog recording progressed to multiple microphones, then to multitrack recorders in the 1950s, mixing evolved to the point where a
complex mix often required two or more people to control all the volume changes needed over the course of a song. The mixing engineer
would set up the basic mix and maybe ride the vocal level, while some of the musicians pitched in to vary other track faders at the direction of
the engineer or producer. If only one person was available to do a complex multitrack mix, another option was to mix the song in sections to a
stereo recorder, then join the mixed tape portions together with splicing tape to create the final version.
It’s worth mentioning that despite his enormous contribution to recording technology, Les Paul didn’t actually invent multitrack recording as is
widely believed. Movie studios were placing multiple audio tracks onto optical lm as early as the 1930s. When The Wizard of Oz was released
in 1939, they used separate sound track elements for the orchestra music, spoken dialog, and sound e ects. However, Les Paul certainly
popularized multitracking, and he had a huge hand in re ning the technology. His inventiveness brought the process of multitrack recording and
overdubbing as we know it today to the forefront of music recording.
This chapter presents an overview of mixing devices and methods, along with relevant tips and concepts. Subsequent chapters explain audio
processing in detail, with examples that apply equally to both hardware devices and plug-ins.
Volume Automation
In the 1970s, clever console manufacturers came up with various automation systems to record volume fader moves and replay them later
automatically. You enable automation recording for a channel, then raise or lower the level of a vocal or guitar solo while the tune plays. Then
on subsequent playbacks, the automation reproduces those level changes, and the mix engineer can perform other volume changes in real time
that are captured by the fader automation. Eventually, all of the needed changes are captured and stored, letting the engineer sit back and watch
the faders move automatically while the song plays and the mix is recorded to a stereo recorder.
Some automation systems use motors installed into physical faders to move them up and down, though others employ a voltage controlled
amplifier (VCA) in each channel. When a VCA is used, audio doesn’t pass through the fader. Rather, the fader sends a changing DC voltage to the
VCA, which in turn varies the volume of the audio passing through it. Further, when a channel’s volume is controlled by a voltage, it’s then
possible for one fader to control the level of several tracks at once. Multiple channels can be raised and lowered in groups, either with the same
volume or with some amount of dB o set. So if you have eight drum tracks, each at a di erent volume, they can be raised and lowered together
while maintaining the same relative balance between them.
There are several advantages of mechanical “ ying faders,” as they are often called. One bene t is the audio signal goes through a traditional
passive volume control, which doesn’t add noise or distortion. When you adjust the volume up or down manually, sensors in the fader send
“position data” that are recorded for later playback. Then when the automation is replayed, a motor drives the fader up and down, replicating
the engineer’s moves. Another advantage of moving faders is you can see the current level for that channel just by looking at the fader’s position.
Moving faders are still used on large-format studio consoles, but the moving parts make them expensive to build, and they require maintenance
to continue working smoothly.
VCA systems are mechanically simpler and therefore less expensive to manufacture, though early versions added a small amount of noise and
distortion. With a VCA system, there’s no need for the faders to move; each fader merely adjusts the level of a DC control voltage sent to that
channel’s VCA. The downside is you can’t easily determine the current volume for a channel by looking at the fader. If you decide to raise the
rhythm guitar by 2 dB, its current playback volume may not match the fader’s current physical position. If the track is playing at a level of
−15 dB, but the fader happens to be at −20 dB because it was moved when the automation was not in Record mode, “raising” it by 2 dB
actually lowers the level by 3 dB to −18.
To solve this problem, a pair of LED lights is placed next to each slider. If the physical slider is currently set louder than the VCA’s actual
playback volume, the upper LED lights up. If the slider is lower, the lower LED is lit. This way you can adjust the fader position until neither
light is on, at which point the fader position and actual playback volume are in sync. Any changes you then record by moving the fader will be
as intended.
Early mono and stereo editing was done either by cutting the tape with a demagnetized razor blade to discard the unwanted portions or by
playing a tape while copying only the desired parts to another recorder. When I owned a large professional studio in the early 1980s, a big part
playing a tape while copying only the desired parts to another recorder. When I owned a large professional studio in the early 1980s, a big part
of our business was recording educational and training tapes for corporate customers, as well as narration and voice-over work for radio and TV
ads. Corporate sessions typically ran for three or four hours, employing one or more voice-over actors. The sessions were eventually edited down
to a length of 30 to 60 minutes and duplicated onto cassettes for distribution.
I remember well the tedious burden of editing ¼-inch tape by hand with a razor blade and splicing tape to remove all the ga s and coughs or
to replace a decent section with one deemed even better by the producer/customer. While recording I’d make detailed notes on my copy of the
script, marking how many times a passage was repeated in subsequent takes. The talent would read until they goofed, then back up a bit and
resume from the start of the sentence or paragraph as needed. By making careful notes while recording, I’d know when editing later how many
times a particular section had been read. If my notes said a sentence was read four times, I could quickly skip the rst three takes when editing
later and remove those from the tape.
You have to be very careful when cutting tape because there’s no U ndo. Recording narration for big-name clients surely paid well, but it was a
huge pain and often boring. A typical session for an hour-long program requires hundreds of razor blade edits! And every single cut must be
made at the precise point on the tape. The standard method is to grab both tape reels with your hands and rock the tape slowly back and forth,
listening for a space between words where the cut would be made. When editing tape this way, you don’t hear the words as they normally
sound but as a low-pitched roar that’s barely intelligible. U nlike modern digital editing, you couldn’t see the waveforms to guide you either. And
you certainly couldn’t edit precisely enough to cut on waveform zero crossings!
Splicing blocks have a long slot that holds the tape steady while you cut and another slot that’s perpendicular that guides the razor blade.
Figure 7.1 shows a professional quality splicing block made from machined aluminum. To minimize clicks and pops, most splicing blocks
include a second slot that’s angled to spread the transition over time so one section fades out while the other fades in. This is not unlike crossfades in a DAW program, where both sections play brie y during the transition. The block in Figure 7.1 has only one angled blade slot, but
some models have a second angle that’s less severe. Which angle is best depends on the program material and the tape speed. If the gap between
words where you’ll cut is very short, you can’t use a long cross-fade. Stereo material is usually cut perpendicular to the tape so both channels
will switch to the new piece of tape together.
Figure 7.1:
This splicing block has two razor blade slots to switch from one tape piece to the other either suddenly or gradually over a few milliseconds to avoid clicks and pops.
Today’s digital software is vastly easier to use than a splicing block. Editing goes much faster, and there are as many levels of U ndo as you’ll
ever need. Life is good. I’ll return to audio editing, including specific techniques with video demonstrations, later in this chapter.
Basic Music Mixing Strategies
The basic concepts of mixing music are the same whether you use a separate recorder and mixing console or work entirely within a computer.
The general goals are clarity and separation so all the parts sound distinct, and being able to play the mix at realistic volume levels without
sounding shrill or tubby. Most mixing engineers also aim to make the music sound better than real—and bigger than life—using various audio
processors. These tools include compression, equalization, small- and large-room reverbs, and special e ects such as stereo phasers, autopanners, or echo panned opposite the main sound. Note that many of the same tools are used both for audio correction and for sound design to
make things sound more pleasing.
In my experience, anyone can recognize an excellent mix. What separates the enthusiastic listener from a skilled professional mixing engineer
is the engineer has the talent and expertise to know what to change to make a mix sound great. A beginner can easily hear that his mix is
lacking but may not know the speci c steps needed to improve it. By the way, this also applies to video. I spent many hours ddling with the
brightness, contrast, color, tint, and other settings when I rst bought a projector for my home theater. It was obvious that the image didn’t look
great, but I couldn’t tell what to change. Eventually I asked a professional video engineer friend to help me. In less than ten minutes, he had the
picture looking bright and clear, with perfect color and nothing washed out or lost in a black background.
Most pop music is mixed by adjusting volume levels and also placing each track somewhere left-to-right in the stereo eld with a pan pot. If
an instrument or section was recorded in stereo with two microphones onto separate tracks, you’ll adjust the panning for both tracks. I usually
build a pop mix by starting with the rhythm section—balance and pan the drums, then add the bass, then the rhythm instruments, and nally
bring in the lead vocal or main melody if it’s an instrumental. Then I’ll go back and tweak as needed when I can hear everything in context. I
suspect most mix engineers work the same way, though I know that some prefer to start with the vocal and backing chords. Either method
works, and you don’t have to begin each mix session the same way every time.
Be Organized
Whether you use an analog console or a computer digital audio workstation (DAW), one of the most important skills a recording engineer must
develop is being organized. Multiple buses and inserts can quickly get out of hand, and it’s easy to end up with tracks in a DAW going through
more equalizers or compressors than intended. Perhaps you forgot that an EQ is patched into a Send bus, so you add another EQ to the bus’s
Return rather than tweak the EQ that’s already there. Having fewer devices in the signal path generally yields a cleaner sound.
I also nd it useful to keep related tracks next to each other, such as the MIDI track that drives a software synthesizer on an audio track. By
keeping them adjacent, you’ll never have to hunt through dozens of tracks later to make a change. Likewise, if you record a DAW project that
will be mixed by someone else, your organization will be appreciated. For example, if there are two tracks for the bass, one recorded direct and
the other through a microphone, it’s easy for someone else (or even you) to miss that later when mixing. So you’ll wonder why you can still hear
the bass even when it’s muted. Many DAWs let you link Solo and Mute buttons, so clicking one activates the other, too, which helps when one
instrument occupies two or more tracks.
Other aspects of mix organization include giving DAW tracks sensible names, storing session
les logically to simplify backing up, and
carefully noting everything on each track of a master analog tape reel. Even something as simple as noting if a reel of tape is stored tails out or
in will save time and avoid exasperation when you or someone else has to load the tape to make another mix later.
Monitor Volume
When mixing, playback volume greatly a ects what you hear and how you apply EQ. Volume level especially a ects the balance of bass
instruments due to the Fletcher-Munson loudness curves described in Chapter 3. At low levels, the bass may sound weak, but the same mix
played loudly might seem bass-heavy. A playback volume of 85 dB SPL is standard for the lm industry; it’s loud enough to hear everything
clearly, but not so loud you risk damaging your ears when working for a long period of time. It helps to play your mix at a consistent volume
level throughout a session, but it’s also useful to listen louder and softer once in a while to avoid some elements being lost at low volume, or
overbearing when played loud. If you don’t already own an SPL meter, now is the time to buy one. It doesn’t have to be a fancy model meant
for use by professional acousticians either.
Reference Mixes
Many mix engineers have a few favorite commercial tracks they play as a reference when mixing. While this can help to keep your own mix in
perspective, it may not be as useful as you think. The main problem with reference mixes is they work best when their music is in the same key
as the tune you’re working on. Low frequencies are the most di cult to get right in a mix, even when working in an accurate room. In a typical
home studio, peaks and nulls and room resonances dominate what you hear at bass frequencies. But the room frequencies are xed, and they
may or may not align with the key of the music. So if your tune is in the key of A, but the reference mix is in a di erent key, the resonances in
your room will interact di erently with each tune. Further, a kick drum’s tuning also varies from one song or one CD to another. U nless the
fundamental pitch of the reference tune’s kick drum is similar to the kick drum on the song you’re mixing, it will be di cult or impossible to
dial in the same tone using EQ.
“If there’s any rule in mixing, it’s that there are no rules.”
Well okay, that’s not really true. Most of the decisions a mix engineer makes are artistic, and only a few rules apply in art. But there are also
technical reasons for doing things one way or another. For example, bass instruments are almost always panned to the center. Bass frequencies
contain most of the energy in pop music, so panning the bass and kick drum to the center shares the load equally through both the left and right
speakers. This minimizes distortion and lets listeners play your mix at louder volumes. A mono low end is also needed for vinyl records to
prevent the needle from jumping out of the groove. However, vinyl mastering engineers sum bass frequencies to mono as part of their
Panning the bass and kick to the center also makes the song’s foundation sound more solid. It’s the same for the lead vocal, which should be
the center of attention. But for other instruments and voices, skillful panning can improve clarity and separation by routing tracks having
competing frequencies to di erent loudspeakers. As explained in earlier chapters, the masking e ect makes it di cult to hear one instrument in
the presence of another that contains similar frequencies. If a song has two instruments with a similar tone quality, panning one full left and the
other full right lets listeners hear both more clearly.
Surround sound o ers the mix engineer even more choices for placing competing sources. One of my pet peeves is mix engineers who don’t
understand the purpose of the center channel. As explained in Chapter 5, when two or more people listen to a surround music mix—or watch a
late-night TV talk show broadcast in surround sound—only one person can sit in the middle of the couch. So unless you’re the lucky person,
anything panned equally to both the left and right speakers will seem to come from whichever speaker is closer. This can be distracting, and it
sounds unnatural. In my opinion, surround mixes that pan a lead singer or TV announcer to the left and right channels only, or to all three front
speakers equally, are missing the intent and value of a surround system. The center channel is meant to anchor sounds to come from that
location. So if a talk show host is in the middle of the TV screen delivering his monologue, his voice should emit mainly from the center
location. So if a talk show host is in the middle of the TV screen delivering his monologue, his voice should emit mainly from the center
speaker. The same goes for the lead singer in a music video.
Panning can also be used to create a nice sense of width, making a mix sound larger than life. For example, it’s common to pan doubled
rhythm guitars fully left and fully right. With heavy-metal music, the rhythm guitarist often plays the same part twice. This is superior to copying
or delaying a single performance, which can sound arti cial. Two real performances are never identical, and the slight di erences in timing and
intonation make the sound wider in a more natural way. For country music it’s common to use two hard-panned rhythm acoustic guitars, with
one played on a regular guitar and the other on a guitar using Nashville Tuning. This tuning replaces the lower four strings of a guitar with
thinner strings tuned an octave higher than normal. Or you can use a capo for the second part, playing di erent inversions of the same chords
higher up on the neck. If your song doesn’t have two guitars, the same technique works well with a guitar and electric piano, or whatever two
“chords” instruments are used in the arrangement.
Getting the Bass Right
One of the most common problems I hear is making the bass instrument too loud. You should be able to play a mix and crank the playback
volume up very high, without the speakers breaking up and distorting or having the music turn to mush. One good tip for checking the bass
instrument’s volume level is to simply mute the track while listening to the mix. The di erence between bass and no bass should not be huge
but enough to hear that the low end is no longer present or as solid sounding.
Avoid Too Much Reverb
Another common mistake is adding too much reverb. U nless you’re aiming for a special e ect, reverb should usually be subtle to slight. Again,
there’s no rule, and convention and tastes change over time. Years ago, in the 1950s, lead vocals were often drenched with reverb. But regardless
of the genre, too much reverb can make a mix sound muddy, especially when applied to most of the tracks. You can check for too much reverb
the same way you check for too much bass—mute the reverb and con rm that the mix doesn’t change drastically. By the way, adding too much
reverb is common when mixes are made in a room without acoustic treatment. U ntamed early re ections can cloud the sound you hear, making
it difficult to tell that excess reverb in the mix is making the sound even cloudier.
There are two basic types of arti cial reverb, and I generally use both in my own pop music productions. The rst is what usually comes to
mind when we think of reverb: a Hall or Plate type preset having a long decay time, reminiscent of a real concert hall, or an old-school
hardware plate reverb unit. The other type does not give a huge sound, but it adds a small-room ambience that decays fairly quickly. When
added to a track, the player seems to be right there in the room with you. Typical preset names for this type of ambience reverb are Stage or
Besides the usual reason to add reverb—to make an instrument or voice sound larger, or make a choir sound more ethereal or distant—reverb
can also be used to hide certain recording ga s. Years ago, when recording my own cello concerto, I did many separate sessions in my home
studio, recording all the various orchestra sections one by one. At each session, everyone wore earphones as they played along with my MIDI
backing track. To be sure the players didn’t rush or slow down, I played a fairly loud click track through their earphones. Then after each session
I’d replace the sampled MIDI parts for that section with the real players I just recorded.
At one of the violin section sessions, I left my own earphones on a chair, not far from one of the microphones. Most of the time the click
sound picked up by the microphone was not audible. But one passage had a series of short pizzicato notes with pauses between, and the click
was clearly audible in the spaces between notes. It required hundreds of tiny edits to duck the volume every time a click sounded with no notes
to hide it. And you could hear the sudden drop to total silence as the plucked notes were cut o a little early to hide the click. Applying an
ambience type reverb to the track hid those dropouts completely by extending the notes slightly. Had the reverb been applied before editing, the
holes would be audible because the reverb would have been silenced, too.
Verify Your Mixes
Besides listening at soft and loud volumes, mixes should also be checked in mono to be sure nothing disappears or takes on a hollow comb
ltered sound. Surround mixes should be veri ed in plain stereo for the same reason. It’s also useful to play your mixes through small home or
car type loudspeakers. When playing through your main speakers, listen not only for too much bass, or bass distortion, but also for shrillness that
hurts your ears at high volumes. Again, as a mix is played louder and louder, it should get clearer and fuller but never muddy or painful. The
bass content in a mix should depend mostly on the volume of the main bass instrument and the kick drum. Often one or the other, but not both,
carries most of the low-end weight in a mix. If both the bass and kick drum are full sounding, it can be difficult to hear either clearly.
Thin Your Tracks
Many mix engineers roll o low frequencies on all tracks other than the bass and kick. Many instruments have more low-frequency content than
you want in a mix, especially if they were close-mic’d using a directional microphone that boosts bass due to the proximity e ect. U sing a lowcut lter to thin out tracks that shouldn’t contribute bass energy goes a long way toward achieving clarity in a mix. Instruments might seem
painfully thin when soloed, yet still sound perfectly fine in the context of a complete mix.
Distance and Depth
The two factors that determine front-to-back depth in a mix are high-frequency content and reverb. If you want to bring a track forward to
feature it, boost the high end slightly with an equalizer and minimize the amount of reverb. Conversely, to push something farther back in a mix,
reduce the highs and add reverb. Adding small room ambience–type reverb also helps make a track more present sounding, though don’t overdo
it. As with all e ects, it’s easy to add too much because you get used to the sound as a mix progresses. Again, mute the reverb once in a while to
confirm you haven’t added so much that it dominates the mix.
Bus versus Insert
All hardware consoles and DAW programs let you apply plug-in e ects either individually to one track or on an Aux bus that can be shared by
all tracks. As mentioned in Chapter 5, e ects that do something to the audio are usually inserted onto the track, while e ects that add new
content to the audio typically go on a bus. For example, EQ, wah-wah, chorus, and compression e ects all modify the audio passing through
them and should therefore be inserted onto a track. It rarely makes sense to mix the dry and processed portions together, and many such e ects
include a Wet/Dry mix adjustment anyway to vary their strength. On the other hand, reverb and echo add new content—the echoes—and so are
better placed on an Aux bus.
Further, reverb plug-ins require a lot of computer calculation to generate the large number of echoes needed to create a realistic reverb e ect.
So adding separate reverb plug-ins to many tracks is wasteful, and it limits the total number of tracks you can play all at once before the
computer bogs down. Moreover, if you use only one or two reverbs for all the tracks, the result will be more coherent and sound like it was
recorded in a real room rather than in many di erent acoustic spaces. Not that there’s anything wrong with a combination of acoustic spaces if
that’s the sound you’re after.
Pre and Post, Mute and Solo
As mentioned in Chapter 5, when an Aux Bus E ects Send is set to Post, the Send level follows the track’s main volume control. This is usually
what you want: As you raise or lower the track volume, the amount of reverb or echo remains the same proportionally. When an Aux Bus is set
to Pre, the Send level is independent of the main volume control for that track. So in most cases, when mixing, you’ll use Post where the Send
level follows the volume control, and also includes any inserted e ects such as EQ. One situation where you might want to use Pre is to send a
track to an e ects bus or subgroup before EQ or compression or other e ects are added. Another use for Pre when mixing is to implement
parallel compression. This special technique mixes a dry signal with a compressed version, and is described more fully in Chapter 9.
Mute and especially Solo are very useful to hear details and nd problems. I often solo tracks to listen for and clean up small noises that aren’t
part of the performance, such as a singer’s breath intakes. But don’t go overboard and kill every little breath sound, as that can sound unnatural.
The amount of trimming needed also depends on how much compression you use. Compressing a voice track always brings up the breath
sounds. I also use Solo to listen for thumps or other low-frequency sounds that detract from the overall clarity. If you do this carefully for every
track, you’ll often find the end result sounds cleaner.
Room Tone
Even when recording in a very quiet environment, there’s always some amount of room tone in the background. This is usually a rumble or vent
noise from the air conditioning system, but preamp hiss can also add to the background noise. When editing narration and other projects where
music will not mask the voice, it’s useful to record a bit of room tone you can insert later as needed to hide edits. For example, you may need to
insert silence here and there between words to improve the pace and timing of a voice-over narration. If you insert total silence, the sudden loss
of background ambience will be obvious. U sing a section of room tone sounds more natural, and the edits won’t be noticed.
Before the voice-over talent begins reading, ask them to sit quietly for ve to ten seconds while you record a section of background silence. If
the recorded room tone is too short, you may have to loop (repeat) it a few times in a row, and that could possibly be noticed. Recording at
least ve seconds of silence is also useful if you need to apply software noise reduction to the track later. With this type of processing, the
software “learns” what the background silence sounds like, so it can remove only that sound from a music or voice track. Having a longer section
of the background room tone helps the software to do a better job. Software noise reduction is explained in Chapter 13.
I used a similar technique when editing a live concert video for a friend. One of his concerts drew a huge crowd, with hundreds of people
laughing at the jokes and clapping enthusiastically after every song. But a later concert drew fewer people, and the smaller audience was
obvious and less convincing. When editing the later concert video, I copied several di erent sections of laughing and applause from the earlier
video and added them to the audience tracks. Since both concerts were in the same venue with similar microphone placements, it was
impossible to tell that the weak applause had been supplemented.
Perception Is Fleeting
“We’ve all done stu that sounds great at one moment, then we listen later and say, ‘What was that cheesy sound?’ Or vice versa. I’ll be doing something at the moment and I’ll
question whether it works, then I’ll listen to it later and note how the performance was really smoking and the sound really worked. Artistic judgment and opinion—it’s in nite,
and it could vary even in one individual from moment to moment.”
—Chick Corea, July 2011 Keyboard magazine interview
—Chick Corea, July 2011 Keyboard magazine interview
I couldn’t agree more with Chick. U nderstanding that our hearing and perception are eeting has been a recurring theme in this book, and it
a ects listeners and professional recording engineers alike. Everyone who mixes music, whether for fun or professionally, has made a mix that
sounds stellar at the time but less impressive the next day. Even over the course of one session, perception can change drastically. It’s common to
raise the playback volume louder and louder to keep the mix sounding clear and full, when what you’re really doing is trying to overcome a
lousy sound.
I wish I had a great tip that works every time. Sadly, I don’t. But try to avoid cranking the playback levels ever higher. Keep the volume loud
enough to hear everything clearly, about 85 dB SPL, and occasionally play the mix softer and louder before returning to the normal level. If your
mix sounds poor when listening softly, then that’s the best volume level to listen at when deciding what needs improving.
Be Creative!
A good mix engineer is part producer, part sound designer. If you can learn to think outside the box, you’re halfway there. When I was nearly
done mixing the music for my Cello Rondo video in 2006, I invited my friend Peter Moshay to listen and o er his advice. Peter is a professional
recording and mixing engineer with an amazing client list that includes Mariah Carey, Hall & Oates, Paula Abdul, and Barry Manilow, among
other famous names. At one point early in the tune, the lead cello plays a sweeping ascending line that culminates in a high note. One of Peter’s
suggestions was to increase the reverb during the ascending line, ending up drenched 100 percent in huge reverb by the high note. Change is
good, and it adds interest. The included “sonar_rondo” and “sonar_tele-vision” videos show some of the sound design and other editing I did on
the mixes for those two music videos.
It’s up to the composer and musicians to make musical changes, but a good mix engineer will make, or at least suggest, ideas to vary the audio
—for example, adding severe midrange EQ to create a “telephone” e ect. But don’t overdo it either. An e ect is interesting only if it happens
once or twice. Once in the 1970s, I sat in on a mixing session at a large professional New York City studio. The tune was typical of that era, with
a single lead singer and backing rock band. At one point you could almost see a lightbulb go on over the mix engineer’s head as he patched an
Eventide Harmonizer onto the lead vocal and dialed in a harmony a minor third higher. At a few key places—only—he switched in the
Harmonizer. Everyone present agreed that the result sounded very much like Grand Funk Railroad, a popular band at the time.
Many e ects are most compelling when they’re extreme, but, again, avoid the temptation to overdo it. It might be extreme EQ, or extreme
echo, or anything else you can think up. Always be thinking of clever and interesting things to try. Another great example is the anging e ect
that happens brie y only a few times in Killer Queen by Queen on the line “Dynamite with a laser beam.” Or the chugging sounds in Funky
Town by Lipps, Inc., the cash register sounds in Money by Pink Floyd, and the car horns in Expressway to Your Heart by the Soul Survivors. Even
the bassoon and oboe parts on I Got You Babe by Sonny and Cher could be considered a “sound e ect” because that instrumentation is not
expected in a pop tune. The same applies for wind chimes and synth arpeggios and other ear candy that comes and goes occasionally.
When I was nearly done mixing my pop tune Lullaby, I spent several days thinking up sound e ects that would add interest to an otherwise
slow and placid tune. At one point you can hear the sound of a cat meowing, but I slowed it way down to sound ominous and added reverb to
push it far back into the distance. Another place I added a formant filter plug-in to give a human voice quality to a fuzz guitar line. Elsewhere, I
snapped the strings of my acoustic guitar against the ngerboard and added repeating echo with the last repeat building to an extreme runaway
echo e ect before suddenly dying out. I added several other sound e ects—only once each—and at an appropriate level to not draw attention
away from the music.
I also used a lot of sound e ects in an instrumental tune called Men At Work, an original soundtrack I created to accompany a video tour of
my company’s factory. I created a number of original samples to give an “industrial” feel to the music and sprinkled them around the song at
appropriate places. I sampled a stapler, blasts of air from a pneumatic rivet gun at our factory, large and small saw blades being struck with a
screwdriver, and my favorite: the motor that zooms the lens of my digital camera. The motor sample was much smaller sounding than I wanted,
so I used Vari-speed pitch shifting to drop the sound nearly an octave. The result sounds much like robot motion e ects you’d hear in a movie.
In all cases the microphone was very close to the sound source.
If a band is receptive to your input, you might also suggest musical arrangement ideas such as moving a rhythm guitar or piano part up an
octave to sound cleaner and clash less with other parts. Or for a full-on banging rock performance with no dynamics, try muting various
instruments now and again in the mix so they come and go rather than stay the same throughout the entire tune. Or suggest a key change at a
critical point in the song. Not the usual boring half-step up, but maybe shifting to a more distant key for the chorus, then back again for the
In the Box Versus Out of the Box—Yes, Again
It’s no secret that I’m a huge fan of mixing entirely in the box. I might be an old man, but I gladly embrace change when it’s for the better. To
my mind, modern software mixing is vastly superior to the old methods that were necessary during the early years of multitrack recording and
mixing. One of the most signi cant and useful features of DAW mixing is envelopes and nodes. The rst time I used a DAW having a modern
implementation of track envelopes, I thought, “This is so much better than trying to ride volume levels with a fader.” When you’re riding levels
manually as a song plays, by the time you realize that something is too soft or too loud, the volume should have changed half a second earlier.
A control surface is a modern way to emulate the old methods. I understand that some people prefer not to “mix with a mouse,” though I
personally can’t imagine mixing any other way. A control surface interfaces with DAW software via MIDI and provides sliders and other physical
controls that adjust volume levels and other parameters in the DAW. All modern DAW software can record automation, so using a control surface
lets a $400 program work exactly the same as a $200,000 automated mixing console. But I don’t have space on my desk for yet more hardware,
and I don’t want more things in the way that require touching. I can add an envelope to a track with just a few mouse clicks, so I don’t see how
recording automation using a control surface, and having to
everyone has his or her own preference, as it should be.
rst specify what parameters to automate, could be easier or more e cient. But
I once watched a mix session where the engineer used a control surface to adjust the volume levels in ProTools. Due to a Wave
le render
error, one of the tracks dropped suddenly in level by 20 dB. I watched with amusement as the mix engineer tried in vain repeatedly to counter
the instantaneous volume reduction in real time using a physical fader. I had to contain myself not to scream, “Just use your mouse to draw a
volume change!” But I understand and respect those who prefer physical controllers to software envelopes. That said, let’s move on to the basic
operation of DAW software, including some tips and related advice.
Using Digital Audio Workstation Software
The basic premise of DAW software is that each instrument or voice is recorded onto a separate track, which in turn is stored in a Wave le. This
paradigm mimics a tape recorder and analog mixing console, but with everything virtualized inside a computer. Modern software includes not
only a recorder and mixing console but even virtual outboard gear such as EQ and reverb and other e ects. Once a song is complete and sounds
as you like, you render or export or bounce the
nal mix to a new Wave
le. All three terms are commonly used, and they all mean the same
thing. Most DAW software can render a mix much faster than it takes to play the song in real time, and of course you never have to wait for a
tape to rewind.
SONAR’s Track View shown in Figure 7.2 is the main display—command central—where you add tracks and adjust their parameters, insert
buses, add plug-ins to individual clips or entire tracks, create and edit clip and track envelopes, and so forth. There are other views, including
the MIDI Piano Roll, Console View, Tempo Map, and Event List. For now we’ll consider only the main Track View.
Figure 7.2:
SONAR’s Track View is where most of the action takes place.
I have SONAR con gured to rewind to where it started when I stop playback. The other option is to function like a tape recorder, where
playing resumes at the point you last stopped. This is de nitely personal preference, but I nd that rewinding automatically makes mixing
sessions go much quicker. I can play a section, tweak an EQ or whatever, then play the exact same section again immediately to assess the
All of the tracks in a project are numbered sequentially, though you can move tracks up and down to place related tracks adjacent, such as
multiple background vocal or drum tracks. When a track is moved up or down, its track number changes automatically to re ect its current
position. Some people use a DAW program as if it were a tape recorder. In that case each track contains one Wave le that extends from the
beginning to the end of the song. But tracks can also contain one or more clips: Wave les or portions of Wave les that are only as long as
needed. It makes no sense for a guitar solo that occurs only once in a song to have its Wave le extend for the entire length of the song. That just
wastes disk space and taxes the computer more, as silence in the unused portion is mixed with the rest of the tune. You can see many separate
clips on the various tracks in Figure 7.2.
All of the tracks in Figure 7.2 are minimized, meaning they show only the track name and number, volume, and Mute, Solo, and Record
switches in a narrow horizontal strip. When a track is opened, all of the other parameters are also displayed in the Track Controls section shown
in Figure 7.3. This includes input and output hardware devices, buses, and inserted plug-in e ects or synthesizers. This bass track is opened fully
in Figure 7.3. This includes input and output hardware devices, buses, and inserted plug-in e ects or synthesizers. This bass track is opened fully
to see all the settings. Recorded input comes from the Left channel of my Delta 66 sound card, and the track’s output goes to a bus I set up just
for the bass. That bus then goes to the main stereo output bus to be heard and included in a nal mix. There are three plug-ins on this track: an
EQ, a compressor, then another EQ. If any Aux Send buses had been added to the track, they would show below the Bass Bus.
Figure 7.3:
The Track Controls section lets you adjust every property of a track, including bus assignments and plug-in effects.
Every track parameter is available in the Track Controls section, and an entire mix can be done using only these controls. SONAR also o ers a
Console View that emulates a physical mixing console, but I never use that because it takes over the entire screen and is not needed. Some
people have two computer display screens, so they put the Console on the second screen. I have two video monitors too, but I use my second
monitor for the MIDI Piano Roll display. Again, there’s nothing you can do in the Console View that can’t be done just as easily in Track View.
I tend to jump around from track to track as I work on a project, and it gets tiring having to open up tracks to see their controls, then close
them again to keep the Track View at a reasonable size on the screen. To solve this, SONAR has the Inspector shown in Figure 7.4. When the
Inspector is enabled, it shows all of the parameters for whatever track is currently selected. If the track contains audio, the Inspector shows
audio-related controls such as Aux buses and audio plug-ins and software synthesizers. If the track holds MIDI data, controls relevant for MIDI are
displayed instead.
Figure 7.4:
The Inspector shows all parameters and settings for a track, without having to open up the track. Just click on any track’s number, and the Inspector switches to that track.
SONAR lets you add envelopes that apply to an entire track or to just one clip. Track envelopes can control the track’s volume, pan position,
mute (on/o ), plus any parameter of any plug-in on that track. Figure 7.5 shows a volume envelope on the bass track from Figure 7.3. You can
see three places where I added node groups to raise bass notes that were a little too soft and another place I used nodes to mute a note I disliked
but didn’t want to delete destructively. To create a node, you simply double-click at the appropriate place on the envelope. Nodes can be slid
left and right, as well as up or down. If you click on a line segment between two nodes, both nodes at each end of the line are selected so the
nodes go up and down together. You can also select other groups of nodes to adjust many all at once. Nodes are often set to raise or lower the
volume by a xed amount for some duration, as with the middle two node groups. But they can also fade up or down as shown at the left and
Figure 7.5:
Track Clips are displayed as waveforms, and this is where you add envelopes and nodes to adjust the volume, pan, or any parameter of an inserted plug-in. The lines
represent the envelope, and the small dots are the nodes.
One limitation with volume envelopes is they set absolute levels, rather than relative. Imagine you spent an hour tweaking a vocal track so
every word can be heard clearly, but then you decide later the entire track should be a little louder or softer. It’s a nuisance to have to adjust all
those envelopes again in dozens of places. The good news is most DAW software o ers a second volume control that can scale all of your
envelope automation changes up or down. In SONAR this is done using the track’s Trim control, and most other programs o er something
Slip-Editing and Cross-Fading
Two of the most powerful features of DAW software are slip-editing and cross-fading between clips. As with most DAW programs, SONAR lets
you trim the start and end points of an audio clip and overlap two clips with the volumes automatically cross-fading in and out. Besides splicing
together pieces of di erent takes to create a single best performance, clip editing can be used to create stutter edits, where a single short section
of a track or entire mix repeats rapidly for effect.
The video “sonar_envelopes” shows a clip-level volume envelope lowered from its starting point of +0.5 dB gain down to −2.5 dB reduction.
Then two nodes are added by double-clicking on the envelope. The trailing portion of the envelope is lowered further to −11.5 dB, and then
that portion is slid to the left a bit to start fading the volume earlier. Clips can be faded in or out easily this way, without having to create an
envelope or nodes. In this example the fade-out is shifted left to start earlier, and then a fade-in is applied. Finally, the clip is shortened using a
method called slip-editing. This is a great way to trim the beginning and end of a clip to eliminate any noises on the track before or after a
performance. Slip-editing lets you edit the track’s Wave le nondestructively, so you can extend the clip again later restoring its original length if
In the “sonar_cross-fade” video, I copy a clip by holding Ctrl while dragging the clip to the right. Then when I slide the copy to the left,
overlapping the original clip, SONAR applies a cross-fade automatically, so one clip fades out as the other fades in. Besides the obvious use to
cross-fade smoothly between two di erent sounds, this also helps avoid clicks and pops that might happen when adjacent clips suddenly stop
and start. As you can see, the start and end points that control the duration of the cross-fade region are easily changed.
Track Lanes
Most DAWs o er a loop record mode that records to the same region repeatedly until you press Stop. A region is de ned by Start Time and End
Time markers. Each newly recorded take creates either a new track or a new lane within one track. I usually set SONAR to create lanes within
one track for simplicity and to save screen space. Once I’m satis ed with one of the takes or I know I have enough good material to create a
composite take from all the bits and pieces, it will end up as a single track anyway.
Figure 7.6 shows a track opened up to reveal its three lanes. Note that each lane has Solo and Mute buttons, which simpli es auditioning
them after recording. After deleting the bad takes and using slip-editing to arrange the remaining good takes, a single step called Bounce to Clip
then combines all the pieces to a new clip with its own new Wave le. All of the original recorded clips can then be safely deleted to avoid
wasting space on the hard drive.
Figure 7.6:
Track Lanes let you group related Wave files within a single track.
In this example, all three lanes are related and meant to play at once. The top lane is a Wave le of a gunshot ricochet from a sound e ects
CD. The second lane is the same gunshot sound, but shifted down ve musical half-steps in Sound Forge using Vari-Speed type pitch shifting to
be fuller sounding. You can see that the clip is also longer because of the pitch change. The bottom lane is the same clip yet again, but shifted
down 7 semitones. With all three clips playing at once, the gunshot sounds larger and less wimpy than the original clip playing alone. You can
see in Figure 7.6 that this track also goes to a Send bus with an echo that sounds on the right side only. This further animates the sound, which is
mono on the CD it came from, making it seem to travel left to right more effectively than panning the track using pan automation.
Normalizing is usually done to a
nal mix
le after it’s been rendered. This process raises the volume of the entire track such that the loudest
portion is just below the maximum allowable level. There’s no technical reason to normalize individual track Wave
les, but I sometimes do
that for consistency to keep related tracks at roughly the same volume. Depending on the track, I may open a copy of the clip’s Wave
le in
Sound Forge, apply software noise reduction if needed, then normalize the level to −1 dBFS. On large projects having hundreds of audio clips, I
often rename the
les to shorter versions than SONAR assigned. For example, SONAR typically names
les as [Project Name, Track Name,
Rec(32).wav], which is much longer and more cumbersome than needed. So I might change that to Fuzz Guitar.wav or Tambourine.wav or
There’s no audio quality reason that track Wave les shouldn’t be normalized to 0 dBFS, but I recommend against that for nal mixes that will
be put on a CD. Some older CD players distort when the level gets within a few tenths of a dB of full scale, so I normalize to −1 to be sure that
won’t happen. If your mix will be sent out for mastering, there’s no need to normalize at all because the engineer will handle the
nal level
Editing and Comping
One of the greatest features of modern DAW recording is that editing and mixing are totally nondestructive unless you go out of your way to
alter the track’s Wave
le. With most DAW programs, U ndo works only within a single edit session, but nondestructive editing lets you change
your mind about any aspect of the mix at any time in the future. If you later notice a cross-fade that’s awed, you can reopen the project and
slide the clips around to x it. If you discover the bass is too loud when hearing your current masterpiece on a friend’s expensive hi- , you can
go back and lower it. Every setting in a DAW project can be changed whenever you want—next week, next month, or next year.
The only times I apply destructive editing to a track’s Wave le is to apply software noise reduction or to trim excess when I’m certain I need
only a small portion of a much larger file. Otherwise, I use slip-editing to trim tracks to play only the parts I want.
This book is not about SONAR, but most DAW programs are very similar in how they manage editing and comping. So I’ll show how I comp a
single performance from multiple takes in SONAR; the steps you’ll use in other programs will be very similar. The concepts are certainly the
same. Here, comping means creating a composite performance from one or more separate clips.
Figure 7.7 shows three di erent recorded takes of a conga drum overdub after trimming them to keep only the best parts. Before the tracks
were trimmed, all three clips extended over the full length of the recorded section. I applied cross-fades from one clip to the next manually by
sliding each clip’s fade-in and fade-out times so they overlap. Once the clips are trimmed as shown, they could all be moved into a single lane.
In that case SONAR would cross-fade between them as shown in the “sonar_cross-fade” video. But leaving them in separate lanes makes it easier
to adjust timings and cross-fades later if needed.
Figure 7.7:
Clips in di erent Track Lanes can be easily trimmed to keep only the best parts of multiple takes. Once comping is complete, you can either combine all the clips to a
single new clip or leave them separate as shown here if you might want to change them later.
Another common DAW feature is the ability to loop clips, and this conga overdub is typical of the type of material that is looped. For
example, I could combine the three clips spanning eight bars in Figure 7.7 into one clip, then enable looped playback on the result clip. Once
looping is enabled for a clip, it can be repeated as many times as you like using slip-editing. This technique was rst popularized by Sony’s Acid
program, and clips that can be looped this way are called Acidized Wave les. Another popular looped format is REX les, which stands for
Recycle EXport. This is an older format developed by the Swedish company Propellerhead for their ReCycle software. This le type is still used,
though probably less so than the more modern Acidized Wave files.
Rendering the Mix
Figure 7.8 shows the main portion of the Bounce dialog in SONAR. When exporting a mix you can include, or not, various aspects of the mix as
shown in the check boxes along the right side. You can export as mono or stereo at any supported bit depth and sample rate, and several dither
options are available, including none. Note that a stereo mix rendered to a mono Wave le can potentially exceed 0 dBFS (digital zero) if both
channels have common content and their levels are near maximum. In that case the result Wave le will be distorted. SONAR simply mixes the
two channels together and sends the sum to the mono file.
Figure 7.8:
The Bounce dialog in SONAR is where you specify what to export, at what bit depth and sample rate, and various other choices. Other DAW programs o er similar
Who’s on First?
One of the most common questions I see asked in audio forums is if it’s better to equalize before compressing or vice versa. In many cases you’ll
have one EQ before compressing and a second EQ after. Let’s take a closer look.
If a track has excessive bass content that needs to be ltered, you should do that before the compressor. Otherwise rumbles and footsteps, or
just excessive low-frequency energy, will trigger the compressor to lower the volume unnecessarily. If you compress an un ltered track, the
compressor lowers and raises the volume as it attempts to keep the levels even, but those volume changes are not appropriate and will likely
detract from the sound. The same applies for other frequencies that you know will be removed with EQ, such as excess sibilance on a vocal track
or a drum resonance you plan to notch out. Therefore, you should do any such corrective EQ before compressing.
However, if you boost desirable frequencies before compressing, the compression tends to counter that boost. As you apply more and more EQ
boost, the compressor keeps lowering the volume, reducing that boost. In fact, this is how many de-essers work: They sense the amount of highfrequency content in the sibilance range, then reduce either the overall volume or just the high frequencies, depending on the particular deesser’s design. So when you’re using EQ to change the basic tone of a track, that’s best done after compressing. Again, there are few rules with
art, and I encourage you to experiment. My intent is merely to explain the logic and theory behind mixing decisions that have a basis in science
or that are sensible most of the time.
Figure 7.4 shows the Track Controls for a bass track from one of my projects, with three plug-ins inserted: an EQ, a compressor, then another
EQ. The rst EQ in the chain applies a gentle 6 dB per octave roll-o below 60 Hz to reduce the overall low-frequency content. This is followed
by a compressor having a fairly aggressive 10:1 ratio, which is then followed by the EQ that actually alters the tone of the bass with a slight
boost at 175 Hz.
One situation where compressing rst usually makes sense is when the signal chain includes a severe distortion e ect. Distortion tends to bring
up the noise floor quite a bit because of the high gain it applies, so compressing after distortion raises the noise even further. Another time you’ll
want to compress rst is with an echo e ect whose repeating echoes decay over time. When a compressor follows an echo e ect, it can raise the
level of the echoes instead of letting them fade away evenly as intended.
Time Alignment
Track clips are often moved forward or back in time. One situation is when micing a bass amp while also recording direct, as mentioned in
Chapter 6. In that case you’ll record to two adjacent tracks, zoom way in to see the waveform details, then nudge the mic’d track to the left a bit
until the waveforms are perfectly aligned. Most DAW software has a snap option that slides clips by a xed amount of one beat or a whole bar.
So you’ll need to disable this feature in order to slide a clip by a tiny amount. SONAR can either snap by whole or partial beats and bars or snap
to beat or bar boundaries. I nd “snap by” more useful because musical parts often start on an upbeat. Moving a clip by bar or beat increments
rather than to bar or beat start times preserves the musical timing. For example, if I move a hand claps clip to start one bar earlier, I want the
clip to shift by exactly one bar, even if it started before or after the beat.
You can also slice clips and slide the pieces around to correct timing errors or improve musical phrasing. This is often done as part of the
comping process, and this too requires disabling the snap feature so you can move the clips forward or back in time by very small amounts.
I’ve seen people create fake double-tracking by copying a mono clip to another track, shifted slightly later in time to the right, with the two
tracks panned left and right. But this is inefficient compared to simply adding a delay effect and panning that to the opposite side.
Editing Music
Music editing applies to both mono and stereo Wave
le clips on a DAW track and
nished stereo mixes. Most track editing can be done
nondestructively using slip-edits and cross-fades described earlier. Hard edits—where the start or end point of a clip turns on or o
rather than fading in or out—are usually performed at waveform zero crossings to avoid a click sound. If a clip begins when the wave is not at
zero, the sudden jump in level when it starts is equivalent to adding a pulse wave to the audio. In truth, when splicing between two clips, it’s
not necessary to cut at a zero crossing. What really matters is avoiding a discontinuity of the waveform at the splice point.
Figure 7.9 shows a waveform zoomed in enough to see the individual cycles, with the cursor at a zero crossing. Besides zooming way in
horizontally, SONAR also lets you zoom the wave’s displayed level vertically to better see the zero crossings on portions of audio that are very
soft. To split a clip, you’ll put the cursor exactly at the zero crossing, where the waveform passes through the center line, then hit the “S” key or
whatever method your software uses. If the clip is a stereo Wave le, it’s likely that the left and right channels will not pass through zero at the
same point in time. In that case you may also have to apply a fade-in or fade-out, or cross-fade when joining two clips, to avoid a click. Clip
fade-ins and fade-outs are also useful to avoid a sudden change in background ambience. Even if a splice doesn’t make a click, background hiss
or rumble is more noticeable when it starts or stops suddenly.
Figure 7.9:
Zooming way in on an audio file lets you find and cut at zero-crossings to avoid clicks and pops.
Another common music editing task is reducing the length of an entire piece or excerpting one short section from a longer song. Modern audio
editing software makes this vastly easier than in the old days of splicing blocks. When editing music you’ll usually make your cuts on a musical
beat and often where a new bar begins. The video “music_editing” shows basic music editing in Sound Forge—in this case, repeating parts of a
tune to make a longer “club” version. The voice-over narration describes the editing steps, so there’s no need to explain further here.
It’s good practice to verify edits using earphones to hear very soft details such as clicks or part of something important being cut o
Earphones are also useful because they usually have a better response, extending to lower frequencies than most loudspeakers. This helps you to
hear footsteps or rumble sounds that might otherwise be missed. But earphones are not usually recommended for mixing music because you can
hear everything too clearly. This risks making important elements such as the lead vocal too soft in the mix. Mixes made on earphones also tend
to get too little reverb, because we hear reverb more clearly when music is played directly into our ears than when it’s added to natural room
Editing Narration
Editing narration is typically more tedious and detailed than editing music, if only because most narration sessions require dozens if not
hundreds of separate edits. Every professional engineer has his or her own preferred method of working, so I’ll just explain how I do it. Most of
the voice-over editing I do is after recording myself narrating educational videos for YouTube or for this book. If I were recording others, I’d have
a copy of the script and make detailed notes each time a mistake required a retake. This way I know exactly how many takes to skip before
getting to the one I’ll actually use. It’s di cult to do that when recording myself, so I just read until I make a mistake, pause half a second, then
start again. It’s not that difficult to sort out later when editing.
After recording, I make a backup copy to another hard drive, then load the original Wave le into Sound Forge. Editing in Sound Forge is
destructive, though you could do it with slip-edits in a DAW. The rst step is to nd coughs, thumps, and other noises that are louder than the
voice, then mute or lower them so they’re not the loudest part of the le. Then you can normalize the le without those sounds restricting how
loud the voice can be made. If noise reduction is warranted, apply that next. Then apply EQ or compression if needed. Noise reduction software
requires a consistent background level, so that should always be done before compressing.
Voice recording should be a simple process. If it’s done well, you probably won’t even need EQ or compression or other processing other than
noise reduction if the background noise is objectionable. Of course, there’s nothing wrong with using EQ or compression when needed. Most of
the voice-overs I recorded for the videos in this chapter were captured by a Zoom H2 portable recorder resting on a small box on my desk about
15 inches from my mouth. I usually record narration into Sound Forge, but I was already using that program as the object of the video. The H2
records in stereo, so I kept only the left channel, which reduced the le size to half. Then I normalized the le and edited it in Sound Forge. No
EQ or compression was used. I prefer narration in mono, because that keeps the sound solidly focused in the center. I nd the width of a stereo
voice-over to be distracting, because the imaging changes as the person speaking moves around slightly.
Besides editing out bad takes, coughs, chair squeaks, and other unwanted sounds, you may also need to remove popping “P” and sibilant “S”
sounds. Pops and sibilance are easily xed by highlighting just that part of the word, then reducing the volume 10 to 15 dB. The “voice_editing”
video shows the steps I used to clean up part of a live comedy show I recorded for a friend. Most of the editing techniques shown in this video
also apply to single tracks in a multitrack music project, using destructive editing to clean up extraneous noises. Again, earphones are very useful
to hear soft noises like page turns and lip smacks that might be missed unless your speakers are playing very loudly.
to hear soft noises like page turns and lip smacks that might be missed unless your speakers are playing very loudly.
Although re-amping is a recording process, it’s usually done during mix-down. Re-amping was mentioned in Chapter 3 as a way to fairly
compare microphones and preamps, but its main purpose is as an e ect when mixing. The name comes from the process where a guitar track
that had been recorded direct through a DI is played back into a guitar ampli er. The ampli er is adjusted for the desired tone, then rerecorded
with a microphone. This lets you try di erent ampli ers, di erent mics and placements, and di erent acoustic spaces while mixing. Re-amping
can also be done using a high-quality speaker and microphone to add genuine room ambience to a MIDI piano or other instrument that was
recorded direct. This can be done in the same room as you’re mixing if you mute the monitor speakers while recording, but in larger studios it’s
more common for the loudspeaker to be in a separate live room to capture a more pleasing acoustic character. That also lets you capture a
mono source using stereo microphones.
It’s equally common to send a bass or synth track through a guitar or bass amp to impart the color of that particular ampli er. I’ve done this
to make sampled guitars sound more convincing. For a dry (or sampled) lead guitar, you can even run the playback signal through a fuzz-tone or
other guitar-type e ect. Since guitar and bass amps are generally recorded using close-micing, this can be done in the control room for
Re-recording a clean-sounding track to add ambience is pretty easy: You simply send a line-level output from your mixer to a power amp that
drives the loudspeaker, then record the speaker as you would any other “live” source. If you have powered speakers, you can send the line-level
audio directly. But sending a line-level signal through a guitar amp requires a resistor pad or a transformer-based direct box. Guitar and bass
amps expect a low-level signal from a passive instrument, so sending them a line-level output will cause distortion. Even if you turn the Send
volume way down on your mixer to avoid overdriving the guitar amp, the result is likely to be very noisy. U sing a pad or DI lowers the linelevel output and its noise to a level more suitable for an instrument ampli er. A suitable 20 dB pad can be made from two resistors, as shown in
Figure 7.10.
Figure 7.10:
This 20 dB pad accepts a line-level input signal and reduces it to instrument level suitable for sending to a guitar or bass amplifier.
Backward Audio
Backward audio is a cool e ect, often used with cymbals, reverb, a guitar solo, or even part of an entire mix. The basic premise is to reverse the
audio data so it plays from the end to the beginning. Where many musical (and other) sounds start suddenly, then decay to silence over time,
backward audio grows and swells to a climax that ends suddenly. SONAR can do this automatically by choosing Audio..Reverse from the Process
menu. Sound Forge also offers a Reverse option on its Process menu. Many other DAW and audio editor programs have this feature as well.
In the old days of analog tape, you’d create a backward guitar solo by putting the tape onto the recorder upside down before recording the
overdub onto a free track. Since the tape is upside down, the track numbers are reversed, too. So track 1 that had been on top is now track 8, 16,
or 24 at the bottom, depending on the recorder. If you’re not careful with the reversed track order, you can accidentally overwrite something
important on another track! It helps to make a reversed audition mix before the session for the guitarist to practice along with, since the chord
changes play in reverse order, too.
Creating a backward reverb e ect requires playing a portion of the song backward while recording the Reverb Return to a separate audio
track. Then you put the track or entire mix back to normal, leaving only the reverb reversed. Another method is to record only the reverb while
the song plays normally, then reverse the reverb that was recorded and mix that back in with the song. This can be done with a section of the
entire song or just the reverb for one track as a special effect.
Earlier I mentioned some of the requirements for preparing audio that will be cut to a vinyl record. For example, bass frequencies must be
summed to mono to prevent skipping, and high-frequency pre-emphasis boosts those frequencies quite a lot to overcome the inevitable record
scratches and background noise. If a master recording already contains a lot of high-frequency content, there’s a risk the record will have
noticeable distortion or excess sibilance on vocals and cymbals. Indeed, a loud recording with strong high-frequency content can blow out the
expensive vinyl cutting head.
For this reason, a limiter that a ects only high frequencies is often part of a record-cutting lathe’s signal chain. In one of his Mix Magazine
columns, Eddie Ciletti opined that part of the appeal of vinyl could be due to the protective high-frequency limiting, which can add a pleasing
sheen to the music. A friend of mine is a well-known mastering engineer, and he con rmed that high-frequency limiting is sometimes used even
on CD masters for the same effect.
In the early days of mastering, these specialist engineers also had a say in how the songs were sequenced on each side of an LP. Mastering for
vinyl requires a trade-o between volume level and music length; the louder you cut the music, the less time will t on a side. Further, the outer
vinyl requires a trade-o between volume level and music length; the louder you cut the music, the less time will t on a side. Further, the outer
grooves of an LP record have higher delity than the inner grooves, simply because the grooves pass by more quickly. For a given rotation speed
in RPM, the linear speed in inches per second is greater when playing the outer grooves of a record than the inner grooves near the end. So it’s
common to put louder, brighter tunes at the beginning of a side and softer, mellower songs on the inner grooves where the poorer highfrequency response may not be noticed. It’s also best if both sides of an LP are approximately the same length. This was equally important when
cassettes were popular to avoid wasting tape if one side was longer than the other.
As you can see, preparing a master tape for cutting to vinyl requires a lot of experience and care. What began as a technical necessity in the
early days of recording has evolved into an art form. The most successful early mastering engineers went beyond merely protecting their record-
cutting hardware, and some became sought after for their ability to make recordings sound better than the mix-down tapes they received. Besides
sequencing songs for better presentation and to more fully utilize the medium, mastering engineers would apply EQ and compression, and
sometimes reverb. Today, mastering engineers use all of those tools, and more, to improve the sound of recordings. A good mastering engineer
works in an excellent sounding room with a flat response, along with high-quality full-range speakers that play to the lowest frequencies.
Because of the physical limitations of LP records and cutting lathes, the lacquer master used to make the stamped duplicates is sometimes cut
at half speed. When a master tape is played at half speed, all of the frequencies are lowered one octave. So 20 KHz is now only 10 KHz, which is
easier for the cutter head to handle. While cutting, the lathe spins at 16
RPM instead of 33
RPM. The cutter head doesn’t need to respond as
quickly when creating the grooves, which in turn improves high-frequency response, transient response, and high-frequency headroom.
Save Your Butt
When working on audio and video projects, every few hours I use Save As to save a new version. The
rst time I save a project, at the very
beginning, I name it “[Project Name] 001.” Next time I use 002, and so forth. If I work on something for a few weeks or month, I can easily get
up to 040 or even higher. While recording and mixing the music for my Tele-Vision video, which took the better part of one year, the last
SONAR version was 137. And while assembling and editing the video project in Vegas, I got up to version 143.
Audio and video project les are small, especially compared to the recorded data, so this doesn’t waste much disk space or take extra time to
back up. And being able to go back to where you were yesterday or last week is priceless. At one point while working on Tele-Vision in
SONAR, before replacing all the MIDI with live audio, I realized part of the MIDI bass track had become corrupted a month earlier. It took about
ten minutes to find a previous version where the bass was still correct, and I simply copied that part of the track into the current version.
I don’t use the auto-save feature many DAW (and video) editors o er as an option, because I often try something I’m not sure I’ll like. I’d
rather decide when to save a project, rather than let the program do that automatically every ten minutes or whatever. I need to know I can
U ndo, or just close and quit, rather than save my edits. Some programs clear the U ndo bu er after every Save, though others let you undo and
save again to restore the earlier version. Also, I often call up an existing project for a quick test of something unrelated, rather than start a new
project that requires naming it and creating a new folder. In that case I know I don’t want to save my experiment. And if for some reason I do
decide to save it, I can use Save As to create a new project file in a separate folder.
This chapter explores the early history of mixing and automation to explain how we arrived at the hardware and methods used today. The
earliest mixes were balanced acoustically by placing the performers nearer or farther away from a single microphone. Eventually this evolved to
using more than one microphone, then to multitrack recording that puts each voice or instrument onto a separate track to defer mixing decisions
and allow overdubs. The inherent background noise of modern digital recording is so low that e ects such as EQ and compression can also be
deferred, without the risk of increasing noise or distortion later.
Early automated consoles were very expensive, but today even entry-level DAW software o ers full automation of volume and pan, plus every
parameter of every plug-in. Mix changes can be programmed using a control surface having faders and switches or by creating envelopes and
nodes that are adjusted manually using a mouse.
In the days of analog tape, the only way to edit music and speech was destructively by literally cutting the tape with a razor blade, then
rejoining the pieces with adhesive splicing tape. Today the process is vastly simpler, and there’s also U ndo, which lets you try an edit without
risk if you’re not sure it will work.
Mixing is a large and complex subject, but the basic goals for music mixing are clarity and separation, letting the instruments and voices be
heard clearly without sounding harsh or tubby at loud volumes. Bass in particular is the most di cult part of a mix to get right, requiring
accurate speakers in an accurate room. Muting the bass instrument and overall reverb occasionally helps to keep their level in perspective.
Adding low-cut lters to most nonbass tracks is common, and it prevents muddy mixes. Panning instruments intelligently left-right in the stereo
field also helps to improve clarity by avoiding competing frequencies coming from the same speaker.
Besides panning instruments and voices left-right to create space, high-frequency content and reverb a ect front-back depth. Adding treble
brings sounds forward in a mix, and small- and large-room reverb can help further to de ne a virtual space. But a good mix engineer brings
more to the table than just a good mix. The best mix engineers contribute creative ideas and sometimes even suggest changes or additions to the
musical arrangement.
This chapter also covers the basics of DAW editing, including comping a single performance from multiple takes, using slip-edits and cross-
fades, normalizing, as well as explaining the best order of inserted e ects such as EQ and compression. This chapter also explains how I organize
and name my own projects to simplify backing up. If a picture is worth a thousand words, a video is worth at least a dozen pictures. Therefore,
two short videos show the basics of editing music and speech, including the importance of cutting on waveform zero crossings.
Finally, I explained a bit about the history of mastering. What began in the mid-1900s as a necessity to overcome the limitations of the vinyl
medium, mastering has evolved into an art form in its own right. A talented mastering engineer can make a good mix sound even better using
EQ, compression, and other sound shaping tools.
Chapter 8
Digital Audio Basics
Analog audio comprises electrical signals that change over time to represent acoustic sounds. These voltages can be manipulated in various ways,
then converted back to sound and played through a loudspeaker. Digital audio takes this one step further, using a series of numbers to represent
analog voltages. The process of converting an analog voltage to equivalent numbers is called digitization, or sampling, and a device that does this
is called an analog to digital converter, or A/D. Once the audio voltages are converted to numbers, those numbers can be manipulated in many
useful ways and stored in computer memory or on a hard drive. Eventually the numbers must be converted back to a changing analog voltage in
order to hear the audio through a loudspeaker. This job is performed by a digital to analog converter, or D/A.
All computer sound cards contain at least two A/D converters and two D/A converters, with one pair each for recording and playing back the
left and right channels. I often use the term A/D/A converter for such devices, because most modern converters and sound cards handle both A/D
and D/A functions. Many professional converters handle more than two channels, and some are designed to go in an outboard equipment rack
rather than inside a computer.
Sampling Theory
A number of people contributed to the evolution of modern sampling theory as we know it today, but history has recognized two people as
contributing the most. Back in the 1920s, the telegraph was an important method of real-time communication. Scientists back then aimed to
increase the bandwidth of those early systems to allow sending more messages at once over the wires. The concept of digital sampling was
considered as early as the 1840s, but it wasn’t until the 1900s that scientists re ned their theories enough to prove them mathematically. In a
1925 article, Harry Nyquist showed that the analog bandwidth of a telegraph line limits the fastest rate at which Morse code pulses could be
sent. His subsequent article a few years later clari ed further, proving that the fastest pulse rate allowable is equal to half the available
bandwidth. Another important pioneer of modern sampling is Claude Shannon, who in 1949 consolidated many of the theories as they’re
understood and employed today.
Note that the term sampling as used here is unrelated to the common practice of sampling portions of a commercial recording or sampling
musical instrument notes and phrases for later playback by a MIDI-controlled hardware or software synthesizer. Whereas musical sampling
records a “sample” of someone singing or playing an instrument, digital audio sampling as described here takes many individual “snapshot”
samples of an analog voltage at regularly timed intervals. Snapshot is an appropriate term for this process because it’s very similar to the way a
moving picture comprises a sequence of still images. That is, a movie camera captures a new still image at regularly timed intervals. Other
similarities between moving pictures and digitized audio will become evident shortly.
Figure 8.1 shows the three stages of audio as it passes through an A/D/A converter operating at a sample rate of 44.1 KHz. The A/D converter
samples the input voltage at regular intervals—in this case once every
second—and converts those voltages to equivalent numbers for
storage. The D/A converter then converts the numbers back to the original analog voltage for playback. In practice, the numbers would be much
larger than those shown here, unless the audio was extremely soft.
Figure 8.1:
When an analog signal enters an A/D converter (top), the converter measures the input voltage at regular intervals. Whatever the voltage is at that moment is captured and
held brie y as a number (center) for storage in computer memory or a hard drive. When those numbers are later sent out through the D/A converter (bottom), they’re converted back to
the original analog voltages.
Note that the equivalent digitized numbers are not the same as the analog input voltages. Rather, the incoming voltage is scaled by a volume
control to span a numeric range from −32,768 through +32,767 (for a 16-bit system). The volume control can be in the A/D converter or the
preceding preamp or mixer. If the input volume is set too low, the largest number captured might be only 5,000 instead of 32,767. Likewise, if
the volume is set too high, the incoming audio might result in numbers larger than 32,767. If the input signal is small and the loudest parts
never create large numbers, the result after digitizing is a poor signal to noise ratio. And if the input voltage exceeds the largest number the
converter can accommodate, the result is clipping distortion. By the way, these numbers are a decimal representation of the largest possible
binary values that can be stored in a 16-bit word (215 plus 1 bit to designate positive or negative).
Most A/D/A converters deal with whole numbers only, also called integers. If the input voltage falls between two whole numbers when
sampled, it’s stored as the nearest available whole number. This process is known as quantization, though it’s also called rounding because the
voltage is rounded up or down to the nearest available integer value. When the sampled numbers don’t exactly match the incoming voltage, the
audio waveform is altered slightly, and the result is a small amount of added distortion. In a 16-bit system, the disparity between the actual
voltage and the nearest available integer is very small. In most cases the di erence won’t matter unless the input voltage is extremely small.
When recording at 24 bits, the disparity between the input voltage and its stored numeric sample value is even smaller and less consequential.
Sample Rate and Bit Depth
Two factors a ect the potential quality of a digital audio system: its sample rate and bit depth. The sample rate de nes how often the input
voltage is measured, or sampled, with faster rates allowing higher frequencies to be captured. The bit depth refers to the size of the numbers
used to store the digitized data, with larger numbers giving a lower noise oor. When audio is stored at CD quality, 16-bit numbers are used to
represent the incoming signal voltage—either positive or negative—and a new sample passes in or out of the converter 44,100 times per second.
The highest audio frequency that can be accommodated is one-half the sample rate. So in theory, sampling at 44,100 times per second should
allow recording frequencies as high as 22,050 Hz. But sampling requires a low-pass anti-aliasing lter at the A/D converter’s input to block
frequencies higher than half the sample rate from getting in. Since no lter has an in nitely steep cuto slope, a safety margin is required. A
typical input lter must transition from passing fully at 20 KHz to blocking fully at 22 KHz and above, which requires a lter having a roll-o
slope greater than 80 dB per octave, if not steeper. If higher frequencies are allowed in, the result is aliasing distortion. This is similar to IM
distortion because it creates sum and difference artifacts related to the incoming audio frequencies and the sample rate.
The Reconstruction Filter
You may wonder how the sequence of discrete-level steps that are stored digitally becomes a continuous voltage again at the output of the D/A
converter. First, be assured that the output of a digital converter is indeed a continuous voltage, as shown at the bottom of Figure 8.1. Reports
that digital audio “loses information” either in time between the samples or in level between the available integer numbers are incorrect. This
assumption misses one of the basic components of every digital audio system: the reconstruction lter, also known as an anti-imaging lter.
When the D/A converter rst outputs the voltage as steps shown in the middle of Figure 8.1, the steps really are present. But the next circuit in
the chain is a low-pass filter, which smoothes the steps restoring the audio to its original continuously varying voltage.
As we learned in Chapter 1, a low-pass lter passes frequencies below its cuto frequency and blocks frequencies that are higher. Each sample
in this example changes from one value to the next 44,100 times per second, but the low-pass reconstruction lter has a cuto frequency of
20 KHz. Therefore, the sudden change from one step to the next happens too quickly for the lter to let through. Chapter 4 explained that a
capacitor is similar to a battery and can be charged by applying a voltage. Like a battery, it takes a nite amount of time for a capacitor to
charge. It also takes time to discharge, or change from one charge level to another. So when a voltage goes through a low-pass lter, sudden step
changes are smoothed over and become a continuously varying voltage. This is exactly what’s needed at the output of a D/A converter.
The low-pass lter shown schematically in Figure 8.2 smoothes any voltage changes that occur faster than the resistor and capacitor values
allow. The resistor limits the amount of current available to charge the capacitor, and the capacitor’s value determines how quickly its present
voltage can change based on the available current. Capacitor timing works the same whether charging or discharging, so it doesn’t matter if the
voltage step increases or decreases in level. Either way, the capacitor’s voltage can’t change faster than the resistor allows.
Figure 8.2:
The low-pass reconstruction
lter inside every D/A converter removes both the vertical and horizontal steps from digitized audio as it plays. The
D/A converter is more complex than the simple one-pole low-pass filter shown here, but the concept is identical.
lter in Figure 8.2 has a roll-o
slope of only 6 dB per octave, but a real reconstruction
lter at the output of a
lter must be much steeper. Like an A/D
converter’s anti-aliasing input lter, this output lter must also transition from passing fully at 20 KHz to blocking fully at 22 KHz and above.
The myth that digital audio contains steps or loses data between the samples may have started because audio editor programs show steps on the
waveforms when you zoom way in. But the graphic version of a waveform shown by audio software does not include the reconstruction
and the displayed images are often reduced to only 8 bits of data (256 total vertical steps) for e ciency and to save disk space. It’s easy to prove
that digital audio does not contain steps at the output of a D/A converter. One way is to simply look at the waveform on an oscilloscope.
Another is to measure the converter’s distortion. If steps were present, they would manifest as distortion.
Earlier I mentioned that digital sampling is closely related to moving pictures on
lm. With digital audio, aliasing distortion occurs if
frequencies higher than half the sample rate are allowed into the A/D converter. The same thing happens with visual motion that occurs faster
than a moving picture’s frame rate can capture. A classic example is when the wagon wheels in an old Western movie seem to go backward. If
the wagon is slowing down, you’ll often see the wheels go forward, then backward, then forward again until the spokes are no longer moving
too quickly for the camera’s frame rate to capture. This effect is called aliasing whether it happens with digital audio or motion picture cameras.
The most common sampling rate for digital audio is 44.1 KHz, which supports frequencies as high as 20 KHz. This rate was chosen because
early digital audio was stored on video recorders, and 44,100 divides evenly by the number of scan lines used on video recorders in the 1970s.
Audio for modern video production typically uses a sample rate of 48 KHz—not so much because the delity is deemed better, but because that
number can be evenly divided by the common frame rates of 24, 25, and 30 frames per second. Having each video frame correspond to an
integer number of audio samples simpli es keeping the audio and video data synchronized. But other sample rates are commonly used for
audio, both higher and lower. Voice and other material meant for broadcast can be recorded at a sample rate of 32 KHz to reduce disk space and
bandwidth, and 22.05 KHz can be used for even more savings when a 10 KHz bandwidth is su cient. Many consumer sound cards support even
lower sample rates.
Higher rates are also used, mostly by people who believe that supersonic frequencies are audible or that a bandwidth higher than is actually
needed somehow improves the sound quality for frequencies we can hear. Common high sample rates are 88.2 KHz and 96 KHz, which are exact
multiples of 44.1 and 48 KHz. U sing integer ratios makes it easier for a sample rate converter to reduce the faster rate later when the audio is
eventually put on a CD. Even higher sample rates are sometimes used—192 KHz and 384 KHz—though clearly there’s no audible bene t from
such overkill, and that just wastes bandwidth and hard drive space. Handling that much data throughput also makes a computer DAW work
harder, limiting the total number of tracks you can work with.
Earlier I mentioned that an A/D converter’s anti-aliasing input lter must be very steep to pass 20 KHz with no loss while blocking 22 KHz and
higher completely. Modern digital converters avoid the need for extremely sharp lters by using a technique known as oversampling. Instead of
sampling at the desired rate of 44.1 KHz, the converter takes a new snapshot much more often. Sampling at 64 times the desired eventual rate is
common, and this is referred to as 64x oversampling. With oversampling, the input lter’s roll-o slope doesn’t need to be nearly as steep,
which in turns avoids problems such as severe ringing at the cuto frequency. After the A/D converter oversamples through a less aggressive
input lter, the higher sample rate is then divided back down using simple integer math as described earlier, before the numbers are sent on to
the computer.
Bit Depth
Although 16 bits o ers a su ciently low noise level for almost any audio recording task, many people use 24 bits because, again, they believe
the additional bits o er superior delity. Some programs can even record audio data as 32-bit oating point numbers. But the number of bits
used for digital audio doesn’t a ect delity other than establishing the noise oor. The frequency response is not improved, nor is distortion
reduced. Claims that recording at 24 bits is cleaner or yields more resolution than 16 bits are simply wrong, other than a potentially lower noise
oor. I say potentially lower because the background acoustic noise of a microphone in a room is usually 30 to 40 dB louder than the inherent
noise floor of 16 bit audio.
To my way of thinking, the main reason to use 24 bits is because it lets you be less careful when setting record levels. The only time I record
at 24 bits is for orchestra concerts or other live events. When recording outside of a controlled studio environment, it’s better to leave plenty of
headroom now rather than be sorry later. In those situations, it’s good practice to set your record levels so the peaks never exceed −20 dBFS.
Then if something louder comes along, it will be captured cleanly.
Table 8.1 lists the number range and equivalent noise oor for the most commonly used bit depths. Before the advent of MP3-type lossy
compression, 8 bits was sometimes used with low delity material to save disk space. These days lossy data compression is a much better choice
because the les are even smaller and the delity is far less compromised. I’ll also mention that the −144 dB noise oor of 24-bit audio is
because the
les are even smaller and the
delity is far less compromised. I’ll also mention that the −144 dB noise
oor of 24-bit audio is
purely theoretical, and no converter achieves a noise oor even close to that in practice. The best 24-bit systems achieve a noise oor equivalent
to about 21 bits, or −126 dB, and many are closer to −110 dB.
Table 8.1: Common Digital Audio Bit Depths.
Bit Depth Numeric Range
Noise Floor
8 bits
−48 dB
16 bits
−96 dB
24 bits
−8,388,608–8,388,607 −144 dB
32 bits FP +/−3.4 * 2^38
Too low to matter
Note that with modern 24-bit converters, the noise oor set by the number of bits is lower than the noise oor of their analog components. So
the quantization distortion that results when a sample value is rounded to the nearest number is simply drowned out by the analog noise. In
other words, the distortion and noise floor of a 24-bit converter’s analog input and output stages dominate rather than the number of bits.
Pulse-Code Modulation versus Direct Stream Digital
The type of digital audio presented so far is called pulse-code modulation, abbreviated as PCM. But another method was developed by Sony
called direct stream digital, or DSD, which is used by SACD players. SACD is short for Super Audio CD, but this format never caught on well for
various reasons, including the high cost of players and insu cient interest by publishers to create content. With DSD, instead of periodically
converting a varying voltage to equivalent 16- or 24-bit numbers, a technique known as delta-sigma modulation outputs only a one or a zero.
Which number is generated depends on whether the currently sampled voltage is higher or lower than the previous sampled voltage,
respectively. This method requires a very high sample frequency: DSD uses a sample rate of 2.8224 MHz, which is 64 times higher than 44.1 KHz
used for CD audio.
In engineering math-speak, the Greek letter delta expresses a change, or di erence, and sigma means summation. So delta-sigma implies a
sum of di erences, which in this case is the stream of one and zero di erences that express how the audio changes over time. DSD proponents
argue that delta-sigma sampling o ers a wider bandwidth (100 KHz for SACD disks) and lower noise oor than standard PCM encoding, and the
ultra-high sample rate avoids the need for input and output lters. But the noise of DSD audio rises dramatically at frequencies above 20 KHz, so
lters are in fact needed to prevent that noise from polluting the analog output, possibly increasing IM distortion or even damaging your
tweeters. The filters must also be steep enough to suppress the noise without affecting the audible band.
This explanation is necessarily simpli ed, and the actual implementation of modern PCM and DSD involves some pretty heavy math. But this
explains the basic principles. The bottom line is that both PCM and DSD systems can sound excellent when executed properly.
Digital Notation
Some people consider digital sample values as binary numbers, but that’s not necessarily true. Computers and hard drives do store data in binary
form, but the numbers are exactly the same as their decimal counterparts. For example, the decimal number 14 is written in binary as 1110, but
both numbers represent the same quantity 14. Binary is just a di erent way to state the same quantity. Binary notation is common with
computers because most memory chips and hard drives can store only one of two voltage states: on or o . So accommodating numbers as large
as 256 requires eight separate on-off memory locations, called bits.
For e ciency, data pathways and memory chips store data in multiples of eight bits. This organization is extended to hard drives, live data
streams, and most other places digital data is used. This is why audio data are also stored in multiples of 8 bits, such as 16 or 24. You could store
audio as 19 bits per sample, but that would leave 5 bits of data unused in each memory location, which is wasteful. It’s possible to store an odd
number of bits such that some of the bits spill over into adjacent memory locations. But that requires more computer code to implement, as
partial samples are split for storage, then gathered up and recombined later, which in turn takes longer to store and retrieve each sample.
Another common computer number notation is hexadecimal, often shorted to hex. This is similar to a base-2 (binary) system, but the numbers
are more manageable by humans than a long string of binary digits. With hex notation, each digit holds a number between 0 and 15, for a total
of 16 values. The letters A through F are used to represent values between 10 and 15, with each digit holding four bits of on/o data, as shown
in Table 8.2. Computers use both binary and hexadecimal notation because each digit represents a number that’s evenly divisible by 2, which
corresponds to the way memory chips are organized.
Table 8.2: Hexadecimal Notation.
Decimal Binary Hex
We won’t get into computer memory or binary and hex notation too deeply, but a few basics are worth mentioning. As stated, 8 binary bits
can store 256 di erent integer numbers, and 16 bits can store 65,536 di erent values. These numbers may be considered as any contiguous
range, and one common range for 16-bit computer data is 0 through 65,535. But that’s not useful for digital audio because audio contains both
positive and negative voltages. So digital audio is instead treated as numbers ranging from −32,768 through +32,767. This is the same number
of numbers but split in the middle to express both positive and negative values.
When the audio nally comes out of a loudspeaker, the speaker cone is directed to any one of 32,767 di erent possible forward positions, or
one of 32,768 positions when drawn inward. A value of zero leaves the speaker at its normal resting place, neither pushed forward nor pulled
into the speaker cabinet. By the way, when a range of numbers is considered to hold only positive values, we call them unsigned numbers. If the
same number of bits is used to store both positive and negative values, the numbers are considered to be signed because a plus or minus sign is
part of each value. With signed numbers, the most signi cant binary bit at the far left has a value of 1 to indicate negative, as shown in Table
Table 8.3: Signed Numbers.
0111 1111 1111 1111
0111 1111 1111 1110
0111 1111 1111 1101
0000 0000 0000 0011
0000 0000 0000 0010
0000 0000 0000 0001
0000 0000 0000 0000
1111 1111 1111 1111
1111 1111 1111 1110
1111 1111 1111 1101
1111 1111 1111 1100
−32,766 1000 0000 0000 0010
−32,767 1000 0000 0000 0001
−32,768 1000 0000 0000 0000
Sample Rate and Bit Depth Conversion
Sometimes it’s necessary to convert audio data from one sample rate or bit depth to another. CD audio requires 44.1 KHz and 16 bits, so a Wave
le that was mixed down to any other format must be converted before it can be put onto a CD. Sample rate conversion is also used to apply
Vari-Speed type pitch shifting to digital audio. The same process is used by software and hardware music samplers to convert notes recorded at
one pitch to other nearby notes as directed by a MIDI keyboard or sequencer program.
one pitch to other nearby notes as directed by a MIDI keyboard or sequencer program.
Converting between sample rates that are exact multiples, such as 96 KHz and 48 KHz, is simple: The conversion software could simply
discard every other sample or repeat every sample. Converting between other integer-related sample rates is similar. For example, to convert
48 KHz down to 32 KHz, you’d discard every third sample, keeping two. And in the other direction you’d repeat every other sample so each pair
of samples becomes three. The actual processing used for modern sample rate conversion is more complicated than described here, requiring
interpolation between samples and additional
ltering in the digital domain, but this explains the basic logic. However, converting between
unrelated sample rates, such as 96 KHz and 44.1 KHz, is more difficult and risks adding aliasing artifacts if done incorrectly.
One method
nds the least common denominator sample rate. This is typically a much higher frequency that’s evenly divisible by both the
source and target sample rates. So the audio is converted to that much higher rate using integer multiplication, then converted back down again
using a di erent integer as the divisor. Perhaps the simplest way to convert between unrelated sample rates is to play the audio from the analog
output of one digital system while recording it as analog to a second system at the desired sample rate. This works perfectly well, but it can
potentially degrade quality because the audio passes through extra A/D and D/A conversions.
In my opinion, it makes the most sense to record at whatever sample rate the target medium requires. If you know you’ll have to throw away
half the samples, or risk adding artifacts by using noninteger sample rate conversion, what’s the point of recording at 88.2 KHz or 96 KHz? Any
perceived advantage of recording at a higher sample rate is lost when the music goes onto a CD anyway, so all you’ve accomplished are wasting
disk space and reducing your total available track count. In fairness, there are some situations where a sample rate higher than 44.1 KHz is
useful. For example, if you record an LP, software used later to remove clicks and pops might be better able to separate those noises from the
Bit depth conversion is much simpler. To increase bit depth, new bits are added to each sample value at the lowest bit position, then assigned
a value of zero. To reduce bit depth, you simply discard the lowest bits, as shown in Tables 8.4 and 8.5. In Table 8.4, a 16-bit sample is
expanded to 24 bits by adding eight more lower bits and assigning them a value of zero. In Table 8.5, a 24-bit sample is reduced to 16 bits by
discarding the lower eight bits.
Table 8.4: Increasing Bit Depth.
Original 16-Bit Value
After Converting to 24 Bits
0010 0100 1110 0110 0010 0100 1110 0110 0000 0000
Table 8.5: Decreasing Bit Depth.
Original 24 Bits
After Converting to 16 Bits
0010 0100 1110 0110 0011 1010 0010 0100 1110 0110
Dither and Jitter
When audio bit depth is reduced by discarding the lowest bits, the process is called truncation. In this case, the numbers are not even rounded to
the nearest value. They’re simply truncated to the nearest lower value, which again adds a small amount of distortion. As explained in Chapter
3, dither avoids truncation distortion and so is applied when reducing the bit-depth of audio data. Dither can also be added in A/D converters
when recording at 16 bits. Truncation distortion is not a problem on loud musical passages, but it can a ect soft material because the smaller
numbers are changed by a larger amount relative to their size. Imagine you need to truncate a series of numbers to the next lower integer value.
So 229.8 becomes 229, which is less than half a percent off. But 1.8 gets changed to 1, which is an error of 80 percent!
Dither is a very soft noise, having a volume level equal to the lowest bit. This is about 90 dB below the music when reducing 24-bit data to 16
bits for putting onto a CD. Since all noise is random, adding dither noise assigns a random value to the lowest bit, which is less noticeable than
the harmonics that would be added by truncation. In fact, adding dither noise retains some parts of the original, larger bit-depth, even if they fall
below the noise oor of the new, smaller bit-depth. Again, truncation distortion created when reducing 24-bit data to 16 bits a ects only the
lowest bit, which is 90 dB below full scale, so it’s not likely to be heard except on very soft material when played back very loudly. But still,
dither is included free with most audio editor software, so it only makes sense to use it whenever you reduce the bit-depth.
Another very soft artifact related to digital audio is jitter, which is a timing error between sample periods. Ideally, a D/A converter will output
its samples at a precise rate of one every
second. But in practice, the time between each sample varies slightly, on the order of 50 to
1,000 picoseconds (trillionths of a second). Timing errors that small aren’t perceived as pitch changes. Rather, they add a tiny amount of noise.
Like truncation distortion, it’s unlikely that anyone could ever hear jitter because it’s so much softer than the music, and it is also masked by the
music. But it’s a real phenomenon, and gear vendors are glad to pretend that jitter is an audible problem, hoping to scare you into buying their
latest low-jitter converters and related products.
In order to sample analog audio 44,100 times per second, or output digital data at that rate (or other rates), every digital converter contains an
oscillator circuit—called a clock—that runs at the desired frequency. Simple oscillators can be made using a few transistors, resistors, and
capacitors, but their frequency is neither precise nor stable. Most resistors and capacitors can vary from their stated value by 2 percent or more,
and their value also changes as the temperature rises and falls. High-precision components having a tighter tolerance and less temperature drift
are available for a much higher cost, but even 0.1 percent variance is unacceptable for a digital audio converter.
are available for a much higher cost, but even 0.1 percent variance is unacceptable for a digital audio converter.
The most accurate and stable type of oscillator contains a quartz or ceramic crystal, so those are used in all converters, including budget sound
cards. When a crystal is excited by a voltage, it vibrates at a resonant frequency based on its size and mass. Crystals vibrate at frequencies much
higher than audio sample rates, so the oscillator actually runs at MHz frequencies, with its output divided down to the desired sample rate.
Crystal oscillators are also a ected by the surrounding temperature, but much less so than oscillators made from resistors and capacitors. When
extreme precision is required, a crystal oscillator is placed inside a tiny oven. As long as the oven’s temperature is slightly warmer than the
surrounding air, the oven’s thermostat can keep the internal temperature constant. But that much stability is not needed for digital audio.
External Clocks
For a simple digital system using only one converter at a time, small variations in the clock frequency don’t matter. If a sample arrives a few
picoseconds earlier or later than expected, no harm is done. But when digital audio must be combined from several di erent devices at once,
which is common in larger recording and broadcast studios, all of the clocks must be synchronized. When one digital output is connected to one
digital input using either shielded wire or an optical cable, the receiving device reads the incoming data stream and locks itself to that timing.
But a digital mixer that receives data from several di erent devices at once can lock to only one device’s clock frequency. So even tiny timing
variations in the other sources will result in clicks and pops when their mistimed samples are occasionally dropped.
The solution is an external clock. This is a dedicated device that does only one thing: It outputs a stable frequency that can be sent to multiple
digital devices to ensure they all output their data in lock-step with one another. Most professional converters have a clock input jack for this
purpose, plus a switch that tells it to use either its internal clock or the external clock signal coming in through that jack. While an external clock
is needed to synchronize multiple digital devices, it’s a myth that using an external clock reduces jitter in a single D/A converter as is often
In truth, when a converter is slaved to an external clock, its jitter can only be made worse, due to the way clock circuits lock to external
signals. In a great example of audio journalism at its nest,1 Sound On Sound magazine measured the jitter of four di erent converters whose
prices ranged from a ordable to very expensive. The jitter at each converter’s output was measured with the converter using its own internal
clock, and then again when locked to an external clock. Several di erent external clocks were used, also ranging greatly in price. Every single
converter performed more poorly when slaved to an external clock. So while it’s possible that using an external clock might change the sound
audibly, the only way to know for sure is with a proper blind listening test.
Digital Converter Internals
Figure 8.3 shows a simpli ed block diagram of an A/D converter. After the audio passes through an anti-aliasing input lter, the actual sampling
is performed by a Sample and Hold circuit that’s activated repeatedly for each sample. This circuit freezes the input voltage at that moment for
the length of the sample period, like taking a snapshot, so the rest of the A/D converter receives a single stable voltage to digitize. In this
example only four bits are shown, but the concept is the same for 16- and 24-bit converters. Again, this block diagram is simpli ed; modern
converters use more sophisticated methods to perform the same basic tasks shown here.
Figure 8.3:
The key components in an A/D converter are an anti-aliasing low-pass
lter, a Sample and Hold circuit that takes the actual snapshots at a rate dictated by the clock, and
a series of voltage comparators that each output a zero or a one, depending on their input voltage.
After the Sample and Hold circuit stabilizes the incoming audio to a single nonchanging voltage, it’s fed to a series of voltage comparators.
Depending on the analog voltage at its input, the output of a comparator is either fully positive or fully negative to represent a binary one or
zero, respectively. As you can see, the input resistors are arranged so that each comparator farther down the chain receives half the voltage
(−6 dB) of the one above. Therefore, most of the input voltage goes to the top comparator that represents the most signi cant bit of data, and
subsequent comparators receive lower and lower voltages that represent ever less significant data bits. A comparator is an analog circuit, so other
circuits (not shown) convert the fully positive or negative voltage from each comparator to the off or on voltages that digital circuits require.
The internal operation of a Sample and Hold circuit is shown in Figure 8.4. Again, this is a simpli ed version showing only the basic concept.
The input voltage is constantly changing, but a switch controlled by the converter’s clock closes brie y, charging the capacitor to whatever
voltage is present at that moment. The switch then immediately opens to prevent further voltage changes, and the capacitor holds that voltage
long enough for the rest of the circuit to convert it to a single binary value. This process repeats continuously at a sample rate set by the clock.
Figure 8.4:
A Sample and Hold circuit uses an electronic switch to charge a capacitor to whatever input voltage is currently present, then holds that voltage until the next sample is
A D/A converter is much less complex than an A/D converter because it simply sums a series of identical binary one or zero voltages using an
appropriate weighting scheme. Digital bits are either on or o , and many logic circuits consider zero volts as o and 5 volts DC as on. Each bit in
digital audio represents a di erent amount of the analog voltage, but all the bits carry the same DC voltage when on. So the contribution of each
bit must be weighted such that the higher-order bits contribute more output voltage than the lower-order bits. In other words, the most
signi cant bit (MSB) dictates half of the analog output voltage, while the least signi cant bit (LSB) contributes a relatively tiny amount. One way
to reduce the contribution of lower bits is by doubling the resistor values for successively lower bits, as shown in Figure 8.5.
Figure 8.5:
Each bit of digitized audio represents half as much signal as the next higher bit, so doubling the resistor values contributes less and less from each lower bit.
The resistor network with doubling values in Figure 8.5 works in theory, but it’s di cult to implement in practice. First, the resistor values
must be very precise, especially the smaller-value resistors that contribute the most to the output voltage. If this 4-bit example were expanded to
16 bits, the largest resistor needed for the LSB would be 3,276,800 ohms, which is impractically large compared to 100 ohms used for the MSB.
If the value of the 100-ohm resistor was o by even 0.0001 percent, that would a ect the output voltage more than the normal contribution
from the lowest bit. Again, this type of error results in distortion. Even if you could nd resistors as accurate as necessary, temperature changes
will affect their values significantly. Such disparate resistor values will also drift differently with temperature, adding even more distortion.
The R/2R resistor ladder shown in Figure 8.6 is more accurate and more stable than the doubling resistors circuit in Figure 8.5, and it requires
only two di erent resistor values, with one twice the value of the other. The resistors in an R/2R ladder must also have precise values and be
stable with temperature, but less so than for a network of doubling values. Resistor networks like this can be laser-trimmed automatically by
machine to a higher precision than if they had wildly di erent values. Further, if all the resistors are built onto a single substrate, their values
will change up and down together with temperature, keeping the relative contribution from each bit the same. It would be di cult or
impossible to manufacture a network of resistors on a single chip with values that differ by 32,768 to 1.
Figure 8.6:
An R/2R resistor ladder reduces the contribution from lower bits as required by 6 dB for each step, but uses only two different resistor values.
We’ve explored sample rate and bit depth, but you may have heard the term bit-rate used with digital audio. Bit-rate is simply the amount of
audio data that passes every second. CD-quality monophonic audio transmits 16 bits at a rate of 44,100 times per second, so its bit-rate is
16×44,100 = 705,600 bits per second (bps). The bit-rate for audio data is usually expressed as kilobits per second, or Kbps, so in this case
mono audio has a bit-rate of 705.6 Kbps. Stereo audio at CD quality is twice that, or 1,411.2 Kbps.
Bit-rate is more often used to express the quality of an audio le that’s been processed using lossy compression, such as an MP3 le. In that
case, bit-rate refers to the number of compressed bits that pass every second. If a music le is compressed to a bit-rate of 128 Kbps, every second
of music occupies 128,000 bits within the MP3 le. One byte contains eight bits, which comes to 16 KB of data being read and decoded per
second. So a three-minute song will be 16 KB×180 seconds=2,880 KB in size.
More modern MP3-type compression uses variable bit rate (VBR) encoding, which varies the bit-rate at any given moment based on the
demands of the music. A slow bass solo can probably be encoded with acceptably high delity at a bit-rate as low as 16 Kbps, but when the
drummer hits the cymbal as the rest of the band comes back in, you might need 192 Kbps or even higher to capture everything with no audible
quality loss. So with VBR encoding you won’t know the final file size until after it’s been encoded.
Digital Signal Processing
Digital signal processing, often shortened to DSP, refers to the manipulation of audio while it’s in the digital domain. Audio plug-ins in your
DAW program use DSP to change the frequency response or apply volume compression, or any other task required. “Native” DAW software uses
the computer’s CPU to do all the calculations, but dedicated DSP chips are used in hardware devices such as PCI “booster” cards, hardware
reverb units, surround sound enhancers, and other digital audio processors. A DSP chip is basically a specialized microprocessor that’s optimized
for processing digital audio.
Simple processes like EQ and compression require relatively few computations, and modern computers can handle 100 or more such plug-ins
at once. But reverb is much more intensive, requiring many calculations to create all the echoes needed, with a high-frequency response that falls
o over time as the echoes decay in order to sound natural. Some older audio software o ers a “quality” choice that trades delity for improved
processing speed. This is a throwback to when computers were slow so you could audition an e ect in real time at low quality, then use a
higher quality when rendering the nal version. This is not usually needed today, except perhaps with reverb e ects, so I always use the highestquality mode when given an option.
Electricity may travel at the speed of light, but it still takes a nite amount of time to get digital audio into and out of a computer. The main
factor that determines how long this process takes is the converter’s buffer. This is an area of memory containing a group of digitized samples on
their way into the computer when recording or coming out of the computer when playing back. The size of the bu er and the sample rate
determine the amount of delay, which is called latency.
If a computer had to stop everything else it’s doing 44,100 times per second to retrieve each incoming sample after it’s been converted, there
wouldn’t be time for it to do much else. So as each incoming sample is read and converted to digital data, the A/D converter deposits it into an
area of bu er memory set aside just for this purpose. When the bu er has lled, the audio program retrieves all of the data in one quick
operation, then continues on to update the screen or read a Wave le from a hard drive or whatever else it was doing. The same process is used
when playing back audio. The DAW program deposits its outgoing data into a bu er, and then when the bu er is full, the sound card grabs all
the data at once and converts it back to a changing analog voltage.
Most modern sound cards and DAW programs use the ASIO protocol, which lets you control the size of the bu er. ASIO stands for audio
stream input/output, and it was developed by Steinberg GmbH, who also invented the popular VST plug-in format. Typical bu er sizes range
from 64 samples to 2,048 samples. If the bu er is very large, it might take 100 milliseconds or more to ll completely. If you’re playing a mix
and adjusting a track’s volume in real time, hearing the volume change
second later is not a big problem. But this is an unacceptable
amount of delay when playing a software synthesizer, since you’ll hear each note some time after pressing the key on your keyboard. With that
much delay it’s nearly impossible to play in time. But setting the bu er size too small results in dropouts because the computer is unable to
much delay it’s nearly impossible to play in time. But setting the bu er size too small results in dropouts because the computer is unable to
keep up with the frequent demands to read from and write to the bu er. So the key is nding the smallest bu er size that doesn’t impose audio
dropouts or clicks.
Another contributor to latency is the amount of time needed for audio plug-ins to process their e ects. As mentioned before, some e ects
require more calculations than others. For example, a volume change requires a single multiplication for each sample value, but an EQ requires
many operations. Further, a digital EQ
inherent delay.
lter requires delays with feedback loops that are longer at low frequencies. So that, too, adds to the
The total latency in a DAW project depends on whichever plug-in’s process takes the longest to complete. Most modern DAWs include a
feature called automatic latency compensation, which determines the delay imposed by each individual plug-in and makes sure everything gets
output at the correct time. But some older programs require you to look up the latency for each plug-in e ect or software synthesizer in its
owner’s manual and enter the numbers manually.
Modern computers can achieve latency delays as small as a few milliseconds, which is plenty fast for playing software instruments in real time.
If you consider that 1 millisecond of delay is about the same as you get from one foot of distance acoustically, 10 milliseconds delay is the same
as standing ten feet away from your guitar amp. But beware that DAW programs often misstate their latency. I’ve seen software report its latency
as 10 milliseconds, yet when playing a keyboard instrument it was obvious that the delay between pressing a key and hearing the note was
much longer.
Floating Point Math
Wave les store each sample as 16 or 24 bits of data, but most modern DAW software uses larger 32-bit floating point (FP) math for its internal
calculations. Manipulating larger numbers increases the accuracy of the calculations, which in turn reduces noise and distortion. As with analog
hardware, any change to the shape of an audio waveform adds distortion or noise. When you send audio through four di erent pieces of
outboard gear, the distortion you get at the end is the sum of distortions added by all four devices.
The same thing happens inside DAW software as your audio passes through many stages of numeric manipulation for volume changes, EQ,
compression, and other processing. As a DAW program processes a mix, it reads the 16- or 24-bit sample numbers from each track’s Wave le,
converts the samples to 32-bit FP format, then does all its calculations on the larger 32-bit numbers. When you play the mix through your sound
card or render it to a Wave file, only then are the 32-bit numbers reduced to 16 or 24 bits.
Table 8.6 shows the internal format of a 32-bit oating point number, which has a dynamic range exceeding 1,000 dB, though most of that
range is not needed or used. There are 23 “M” bits to hold the mantissa, or basic value, plus eight more “E” bits to hold the exponent, or
multiplier. The use of a base number plus an exponent is what lets oating point numbers hold such a huge range of values. The actual value of
a oating point number is its mantissa times 2 to the exponent. The “S” bit indicates the number’s sign, either 1 or −1, to represent positive or
negative, respectively. Therefore:
Table 8.6: A 32-Bit Floating Point Number.
Most DAW math adds a tiny amount of distortion and noise, simply because very few operations can be performed with perfect accuracy. For
example, increasing the gain of a track by exactly 6.02 dB multiplies the current sample value by two. This yields an even result number having
no remainder. Likewise, to reduce the volume by exactly 6.02 dB, you simply multiply the sample times 0.5. That might leave a remainder of
0.5, depending on whether the original sample value was odd or even. But most level changes do not result in a whole number. And that’s when
the accuracy of each math operation adds a tiny but real amount of distortion and noise.
To learn how much degradation occurs with 32-bit FP math, I created a SONAR project that applies 30 sequential gain changes to a very clean
500 Hz sine wave I generated in Sony Sound Forge. The sine wave was imported to a track, which in turn was sent to an output bus. I then
added 14 more output buses, routing the audio through every bus in turn. Each bus has both an input and an output volume control, so I
changed both controls, forcing two math operations per bus.
To avoid the possibility of distortion from one gain change negating the distortion added by other changes, I set each complementary
raise/lower pair di erently. That is, the rst pair lowered, then raised the volume by 3.4 dB, and the next pair reversed the order, raising, then
lowering the volume by 5.2 dB. Other gain change pairs varied from 0.1 dB to 6.0 dB. After passing through 15 buses in series, each applying
di erent pairs of volume changes, I exported the result to a new Wave le. The FFT spectrums for both the original and processed sine waves
are shown in Figure 8.7.
Figure 8.7:
These FFT plots show the amount and spectrum of distortion and noise artifacts added by 32-bit DAW processing math. The top graph shows the original 500 Hz sine
wave, and the lower graph shows the result after applying 30 gain changes.
You can see that most of the added artifacts are noise. A pure frequency would show only a single spike at 500 Hz, but both FFT graphs have
many tiny spikes, with most below −120 dB. Distortion added to a 500 Hz tone creates harmonics at 1,000 Hz, 1,500 Hz, 2,000 Hz, and so forth
at 500 Hz multiples. But the main di erence here is more noise in the processed version, with most of the spikes at frequencies other than those
Note that I didn’t even use 24 bits for the sine wave le, nor did I add dither when exporting at 16 bits, so this is a worst-case test of DAW
math distortion. I also ran the same test using SONAR’s 64-bit processing engine (not shown), and as expected the added artifacts were slightly
softer. But since all of the distortion components added when using “only” 32-bit math with 16-bit les are well below 100 dB, they’re already
way too soft for anyone to hear. So if mixes made using two di erent 32-bit DAWs really do sound di erent, it’s not due to the math used to
process and sum the tracks—unless the computer code is incorrect, which has happened. Further, as mentioned in Chapter 5, di erent pan rules
can change the sound of a mix, even when all of the track settings are otherwise identical.
As long as we’re in myth-busting mode, another persistent myth is that digital “overs” are always horrible sounding and must be avoided at all
costs. U nlike analog tape where distortion creeps up gradually as the level rises above 0 VU , digital systems have a hard limit above which
waveforms are clipped sharply. But is brief distortion a few dB above 0 dB Full Scale really as damaging as many people claim? The example
le “acoustic_chords.wav” contains four bars of a gentle acoustic guitar part that I normalized to a level of −1 dBFS. I then raised the volume in
Sound Forge by 3 dB, putting the loudest peaks 2 dB above hard clipping. I saved the le as “acoustic_chords_overload.wav,” then brought it
back into Sound Forge and reduced the volume to match the original. Do you hear a terrible harsh distortion that totally ruins the sound?
Digital Audio Quality
Some people claim that analog recording is “wider” or clearer or more full-sounding than digital audio. However, it’s simple to prove by
measuring, null tests, and blind listening that modern digital audio is much closer to the original source than either analog tape or vinyl records.
To my way of thinking, stereo width is mainly a function of left-right channel di erences, which are mostly related to microphone placement
and panning. Width can also be increased by adding time-based e ects such as delay and reverb. Simply adding stereo hiss at a low level to a
mono track, as happens with analog tape, makes the sound seem wider because the hiss is di erent left and right. Further, the left-right
differences in the minute dropouts that are always present with analog tape can also seem to give a widening effect.
The fullness attributed to analog recording is a frequency response change that can be easily measured and con rmed, and even duplicated
with EQ when desired. So the common claim that digital audio is thin-sounding is easy to disprove. Analog tape adds a low-frequency boost
called head bump. The frequency and amount of boost depends on several factors, mostly the tape speed and record head geometry. A boost of
2 or 3 dB somewhere around 40 to 70 Hz is typical, and this could account for the perception that analog tape is fuller-sounding than digital
audio. But the response of modern digital audio is definitely more accurate. If you want the effect of analog tape’s head bump, use an equalizer!
As mentioned in Chapter 7, the high-frequency limiting applied when cutting vinyl records can add a smoothness and pleasing sheen to the
music that some people confuse with higher delity. Likewise, small amounts of distortion added by analog tape and vinyl can be pleasing with
certain types of music. Further, some early CDs were duplicated from master tapes that had been optimized for cutting vinyl. But CDs don’t
su er from the high-frequency loss that vinyl mastering EQ aims to overcome, so perhaps some early CDs really did sound harsh and shrill. This
is likely another contributor to the “digital is cold” myth. Again, it’s clear that analog tape and vinyl records do not have higher delity, or a
better frequency response, than competent digital. Nor do they capture some mysterious essence of music, or its emotional impact, in a way that
better frequency response, than competent digital. Nor do they capture some mysterious essence of music, or its emotional impact, in a way that
digital somehow misses. Emotion comes mainly from the performance, as well as the listener’s subjective perception and mood.
This chapter explains the basics of digital audio hardware and software. Assuming a competent implementation, the delity of digital audio is
dictated by the sample rate and bit depth, which in turn determine the highest frequency that can be recorded and the background noise oor,
respectively. Simpli ed block diagrams showed how A/D/A converters work, with the incoming voltage captured by a Sample and Hold circuit,
then converted by voltage comparators into equivalent 16- or 24-bit numbers that are stored in computer memory.
In order to avoid aliasing artifacts, the input signal must be ltered to prevent frequencies higher than half the sampling rate from getting into
the A/D converter. Similarly, the output of a D/A must be ltered to remove the tiny steps that would otherwise be present. Although in theory
these filters must have extremely steep slopes, modern A/D converters use oversampling to allow gentler filters, which add fewer artifacts.
When analog audio is sampled, the voltages are scaled by an input volume control to
ll the available range of numbers. Samples that fall
between two numbers are rounded to the nearest available number, which adds a tiny amount of distortion. The 32-bit oating point math
audio software uses to process digital audio also increases distortion, but even after applying 30 gain changes to a clean sine wave, the distortion
added is simply too small to be audible.
You also learned the basics of digital numbering, including binary and hex notation. Although digital audio can be converted by software
algorithms to change its sample rate or bit depth, it’s usually better to record at the same resolution you’ll use to distribute the music. For most
projects, CD-quality audio at 44.1 KHz and 16 bits is plenty adequate. However, recording at a lower level using 24 bits for extra headroom
makes sense for live concerts or other situations where you’re not certain how loud a performance will be.
Several audio myths were busted along the way, including the belief that digital audio contains steps or loses information between the
samples, that external clocks can reduce jitter, and that digital overs always sound terrible.
http ://
Chapter 9
Dynamics Processors
A dynamics processor is a device that automatically varies the volume of audio passing through it. The most common examples are compressors
and limiters, which are basically the same, and noise gates and expanders, which are closely related. Where compressors reduce the dynamic
range, making volume levels more alike, expanders increase dynamics, making loud parts louder or soft parts even softer. There are also
multiband compressors and expanders that divide the audio into separate frequency bands, then process each band independently.
Compressors and Limiters
A compressor is an automatic level control that reduces the volume when the incoming audio gets too loud. Compressors were originally used to
prevent AM radio transmitters from distorting if the announcer got too close to the microphone and to keep the announcer’s volume more
consistent. Then some creative types discovered that a compressor can sound cool as an e ect on voices, individual instrument tracks, and even
complete mixes. So a compressor can be used both as a tool to correct uneven volume levels and as an e ect to subjectively improve the sound
quality. The main thing most people notice when applying compression to a track or mix is that the sound becomes louder and seems more “in
your face.”
Compressors and limiters were originally implemented in hardware, and most modern DAW programs include one or two plug-in versions.
The Sonitus plug-in compressor shown in Figure 9.1 has features typical of both software and hardware compressors, o ering the same basic set
of controls.
Figure 9.1:
A compressor adjusts the volume of an instrument or complete mix automatically to keep the overall level more consistent.
The most fundamental setting for every compressor is its threshold—sometimes called the ceiling. When the incoming audio is softer than the
threshold level, the compressor does nothing and passes the audio with unity gain. So if you set the threshold to −12 dB, for example, the
compressor reduces the volume only when the input level exceeds that. When the input volume later falls below the threshold, the compressor
raises the volume back up again to unity gain with no attenuation.
The compression ratio dictates how much the volume is reduced, based on how much louder the audio is compared to the threshold. A ratio
of 1:1 does nothing, no matter how loud the audio gets. But if the ratio is 2:1, the volume is reduced half as much as the excess. So if the input
level is 2 dB above the threshold, the compressor reduces the level by 1 dB, and the output is only 1 dB louder rather than 2 dB louder, as at the
input. With a ratio of 10:1, the signal level must be 10 dB above the threshold for the output to increase by 1 dB. When a compressor is used
with a high ratio of 10:1 or greater, it’s generally considered a limiter, though technically a limiter has an in nite ratio; the output level never
exceeds the threshold level. Therefore, a compressor reduces the dynamic range di erence between loud and soft parts, whereas a limiter
applies a hard limit to the volume to maintain a constant level. Practically speaking, the compression ratio is the only distinction between a
compressor and a limiter.
The gain reduction meter shows how much the volume is being reduced at any given moment. This meter is abbreviated as GR on the plug-in
screen in Figure 9.1. This is the main meter you’ll watch when adjusting a compressor, because it shows how much the compressor is doing. The
more compression that’s applied, the more the sound is a ected. Notice that a gain reduction meter displays the opposite of a VU meter. The
normal resting position is at zero, and the displayed level goes down toward negative numbers to show how much gain reduction is being
applied. Some hardware compressors use a conventional VU meter to display the input or output level, but when switched to display gain
reduction, the needle rests at zero and goes backward toward −20 as it reduces the volume.
reduction, the needle rests at zero and goes backward toward −20 as it reduces the volume.
Internally, a compressor can only lower the volume. So the net result when applying a lot of compression is a softer output. Therefore, the
makeup gain control lets you raise the compressed audio back up to an acceptable level. The makeup gain is basically an output volume
control, and it typically offers a lot of volume boost when needed in order to counter a large amount of gain reduction.
When a compressor lowers the volume and raises it again later, the volume is changed over some period of time. The attack time controls
how quickly the volume is reduced once the input exceeds the threshold. If set too slow, a short burst of loud music could get through and
possibly cause distortion. So when using a compressor as a tool to prevent overload, you generally want a very fast attack time. But when used
on an electric bass to add a little punch, setting the attack time to a moderately slow 20–50 milliseconds lets a short burst of the attack through
before the volume is reduced. This adds a little extra de nition to each note, while keeping the sustained portion of the note at a consistent
The release time determines how quickly the gain returns to unity when the input signal is no longer above the threshold. If the release time is
too fast, you’ll hear the volume as it goes up and down—an e ect known as “pumping” or “breathing.” Sometimes this sound is desirable for
adding presence to vocals, drums, and other instruments, but often it is not wanted. The best setting depends on whether you’re using the
compressor as a tool to prevent overloading or as an e ect to create a pleasing sound or add more sustain to an instrument. If you don’t want to
hear the compressor work, set the release time fairly long—half a second or longer. If you want the sound of “aggressive” compression, use a
shorter release time. Note that as the release time is shortened, distortion increases at low frequencies because the compressor begins to act on
individual wave cycles. This is often used by audio engineers as an intentional distortion effect.
Some compressors also have a knee setting, which a ects only signals that are right around the threshold level. With a “hard knee” setting,
signals below the threshold are not compressed at all, and once they exceed the threshold, the gain is reduced by exactly the amount dictated by
the compression ratio. A “soft knee” setting works a bit di erently. As the signal level approaches the threshold, it’s reduced in level slightly, at a
lower ratio than speci ed, and the amount of gain reduction increases gradually until the level crosses the threshold. The compressor does not
apply the full ratio until the level is slightly above the threshold. In practice, the difference between a hard and soft knee is subtle.
Besides serving as an automatic volume control to keep levels more consistent, a compressor can also make notes sustain longer. Increasing a
note’s sustain requires raising the volume of a note as it fades out. That is, making the trailing part of a note louder to counter its natural fadeout
makes the note sustain longer. To do this with a compressor, you’ll set the threshold low so the volume is reduced most of the time. Then as the
note fades out, the compressor reduces the volume less and less, which is the same as raising the volume as the note decays. For example, when
you play a note on an electric bass, the compressor immediately reduces the volume by, say, 10 dB because the start of the note exceeds the
threshold by 10 dB. You don’t hear the volume reducing because it happens so quickly. But as the note fades over time, the compressor applies
less gain reduction, making the note sustain longer. You can experiment with the release time to control the strength of the e ect. Ideally, the
release time will be similar to the note’s natural decay time, or at least fast enough to counter the natural decay.
Using a Compressor
First, determine why, or even if, you need to compress. Every track or mix does not need compression! If you compress everything, music
becomes less dynamic and can be tiring to listen to. Start by setting the threshold to maximum—too high to do anything—with the attack time to
a fast setting such as a few milliseconds and the release time between half a second and one second or so. Then set the ratio to 5:1 or greater.
That’s the basic setup.
Let’s say you want to reduce occasional too loud passages on a vocal track or tame a few overly loud kick drum hits. While the track plays,
gradually lower the threshold until the gain reduction meter shows the level is reduced by 2–6 dB for those loud parts. How much gain
reduction you want depends on how much too loud those sections are. The lower you set the threshold, the more uniform the drum hits will
To add sustain to an electric bass track, use a higher ratio to get more compression generally, maybe 10:1. Then lower the threshold until the
gain reduction meter shows 6 to 10 dB reduction or even more most of the time. In all cases, you adjust the threshold to establish the amount of
compression, then use the makeup gain to restore the now-softer output level. Besides listening, watching the gain reduction meter is the key to
knowing how much you’re compressing. The “compression” video lets you hear compression applied in varying amounts to isolated instrument
tracks, as well as to a complete mix.
Common Pitfalls
One problem with compressors is they raise the noise oor and exaggerate soft artifacts such as clicks at a poor edit, breath intakes, and acoustic
guitar string squeaks. For a given amount of gain reduction applied—based on the incoming volume level, threshold, and compression ratio—
that’s how much the background noise will be increased whenever the audio is softer than the threshold. In a live sound setting compression also
increases the risk of feedback.
You can switch a compressor in and out of the audio path with its bypass switch, and this is a good way to hear how the sound is being
changed. But it’s important to match the processed and bypassed volume levels using the compressor’s makeup gain to more fairly judge the
strength of the e ect. Otherwise, you might perceive whichever version is louder as being more solid and full-sounding. Some compressors raise
their makeup gain automatically as the threshold is lowered, making the comparison easier.
When compressing stereo audio, whether a stereo recording of a single instrument or a complete mix, the volume of both channels must be
raised and lowered by the same amount. Otherwise, if the level of only one channel rises above the threshold and the level is reduced on just
that one channel, the image will shift left or right. To avoid this problem, all stereo hardware compressors o er a link switch to change both
channels equally, using the volume of whichever channel is louder at the moment. Most plug-in compressors accommodate mono or stereo
sources automatically and change the volume of both channels together. Stereo hardware units contain two independent mono compressors.
Some are dedicated to stereo compression, but most allow you to link the channels for stereo sources when needed or use the two compressors
on unrelated mono sources.
Compressing a group of singers or other sources that are already mixed together responds di erently than applying compression separately
when each musician is on a separate track. When a single audio source contains multiple sounds, whichever sound is loudest will dominate
because all the parts are reduced in level together. So if you’re recording a group of performers having varying skill levels, it’s better to put each
on his or her own track. This way you can optimize the compressor settings for each performer, applying more severe compression for the
people who are less consistent.
When compressing a complete mix, it’s reasonable to assume that large variations in the individual elements have already been dealt with. It’s
common to apply a few dB of compression to a full mix, sometimes referred to as “glue” that binds the tracks together. This is also known as
“bus compression,” named for the stereo output bus where it’s applied. It’s easy to go overboard with compression on a complete mix. Years
ago, when AM radio dominated the airways, radio stations would routinely apply heavy compression to every song they played. I recall the song
The Worst That Can Happen by The Brooklyn Bridge from 1968. Near the end of this tune, the electric bass plays a long, sustained note under a
brass fanfare. But the New York City AM stations applied so much compression that all you’d hear for seven seconds was that one bass note. The
brass section was barely audible!
Multiband Compressors
A multiband compressor is a group of compressors that operate independently on di erent frequency bands. This lets you tame peaks in one
band without a ecting the level in other bands. For example, if you have a mix in which the kick drum is too loud, a single-band compressor
would decrease the entire track volume with every beat of the kick. By applying compression only to frequencies occupied by the kick, you can
control the kick volume without a ecting the vocals, horns, or keyboards. A multiband compressor used this way also minimizes the pumping
and breathing e ect you get from a normal compressor when the loudest part a ects everything else. With a multiband compressor, the kick
drum or bass will not make the mid and high frequencies louder and softer as it works, nor will a loud treble-heavy instrument make bass and
midrange content softer.
Radio stations have used multiband compressors for many years to customize their sound and remain within legal modulation limits. Such
compressors help de ne a unique sound the station managers hope will set them apart from other stations. You de ne a frequency response
curve, and the compressor will enforce that curve even as the music’s own frequency balance changes. So if you play a song that’s thin-sounding,
then play a song with a lot of bass content, the frequency balance of both songs will be more alike. I’ve also used a multiband compressor on
mic’d bass tracks when the fullness of the part changed too much as different notes were played.
The Sonitus multiband compressor in Figure 9.2 divides the incoming audio into ve separate bands, not unlike a loudspeaker crossover, and
its layout and operation are typical of such plug-ins. You control the crossover frequencies by sliding the vertical lines at the bottom left of the
screen left or right or by typing frequency numbers into the elds below each vertical line. Once you’ve de ned the upper and lower frequency
bounds for each band, you can control every parameter of each band to tailor the sound in many ways. All of the compressors are then mixed
together again at the output.
Figure 9.2:
A multiband compressor divides audio into separate frequency bands, then compresses each band separately without regard to the volume levels in other bands.
As you can imagine, managing ve bands of compression can be a handful. To make this easier, the Common display at the lower right of the
compressor screen shows all of the parameters currently set for each band. From top to bottom, the abbreviations stand for Ratio, Knee, Type
(normal or “vintage”), Makeup Gain, Attack Time, and Release Time. Clicking a number to the left of the Common label displays a screen with
sliders and data entry elds to adjust all the parameters for that band. The makeup gain for each band is adjusted by raising or lowering the top
portion of each band’s “trapezoid” at the lower left.
The threshold controls for each band are at the upper left of the screen, with a data entry eld below each to type a dB value manually if you
prefer. Note the Solo and Bypass buttons for each band. Solo lets you easily hear the contribution from only the selected band or bands, and
Bypass disables gain reduction for a band.
Besides its usual job of enforcing a consistent level across multiple frequency bands, a multiband compressor can also serve as a vocal de-esser,
reduce popping P’s, and minimize acoustic guitar string squeaks (similar to the de-esser setting). If all of the bands but one are set to Bypass,
only the remaining band’s content is compressed. A popping P creates a sudden burst of energy below 80 Hz or so, so you’d use only the lowest
band and set its parameters appropriately including a high ratio for a large amount of gain reduction. Sibilance usually appears somewhere in
the range between 4 and 8 KHz, so you’d engage only Band 4 to compress that range.
Noise Gates and Expanders
A noise gate minimizes background noise such as hum or tape hiss by lowering the volume when the input audio falls below a threshold you
set. This is the opposite of a limiter that attenuates the signal when its level rises above the threshold. If a recording has a constant background
hum, or perhaps air noise from a heating system, you’d set the threshold to just above the level of the noise. So when a vocalist is not singing,
that track is muted automatically. But as soon as she begins to sing, the gate opens and passes both the music and the background noise.
Hopefully, the singing will help to mask the background noise.
One problem with gating is that completely muting a sound is usually objectionable. Most gates let you control the amount of attenuation
applied, rather than simply switching the audio fully on and o . This yields a more natural sound because the noise isn’t suddenly replaced by
total silence. The Sonitus gate shown in Figure 9.3 uses the label Depth for the attenuation amount. Other models might use a related term such
as Gain Reduction for this same parameter.
Figure 9.3:
A noise gate is the opposite of a compressor. When the incoming audio is softer than the threshold, the gate attenuates the sound.
The normal state for a gate is closed, which mutes the audio, and then it opens when the input signal rises above the threshold. The attack
time determines how long it takes for the volume to increase to normal and let the audio pass through. This is generally made as fast as possible
to avoid cutting o the rst part of the music or voice when the gate opens. But with some types of noise, a very fast attack time can create an
audible click when the gate opens. The release time determines how quickly the volume is reduced after the signal level drops below the
threshold. If this is too slow, you’ll hear the background noise as it fades away. But if the release time is too fast, you risk a “chatter” e ect when
valid sounds near to the threshold make the gate quickly open and close repeatedly. When a noise gate turns o and on frequently, that can also
draw attention to the noise.
It’s common to use a gate to reduce background noise, plus a compressor to even out volume levels. In that case you’ll add the gate rst, then
the compressor. Otherwise, the compressor will bring up the background noise and keep varying its level, which makes nding appropriate gate
settings more difficult.
The Sonitus gate I use adds several other useful features: One is a hold time that allows a fast release, but without the risk of chatter; once the
gate opens, it stays open for the designated hold time regardless of the input level. This is similar to the ADSR envelope generator in a
synthesizer, where the note sustains for a speci ed duration. When a signal comes and goes quickly, the gate opens, holds for a minimum period
of time, then fades out over time based on the release setting. Another useful addition is high- and low-cut lters that a ect only the trigger
signal but don’t filter the actual audio passing through the gate. This could be used to roll o the bass range to avoid opening the gate on a voice
microphone when a truck passes by outdoors.
microphone when a truck passes by outdoors.
Even the fastest gate takes a nite amount of time to open, which can a ect transients such as the attack of a snare drum. To avoid this, many
gates have a look-ahead feature that sees the audio before it arrives to open the gate in advance. The look-ahead setting delays the main audio
by the speci ed amount but doesn’t delay the gate’s internal level sensor. So when the sensor’s input level rises above the threshold, the gate
opens immediately, a millisecond or so before the sound gets to the gate’s audio input. This adds a small delay to the audio, so you should be
careful to use identical settings when look-ahead gating is used on related tracks, such as stereo overhead mics on a drum set, to avoid timing or
phase problems.
Finally, this gate also o ers a “punch mode” that can serve as a crude transient shaper e ect by boosting the volume more than needed each
time the gate opens. This adds an extra emphasis, or punch as they call it, to an instrument. Transient shapers are described later in this chapter.
Noise Gate Tricks
Besides its intended use to mute background noise when a performer isn’t singing or playing, a noise gate can also be used to reduce reverb
decay times. If you recorded a track in an overly reverberant room, each time the musician stops singing or playing, the reverb can be heard as it
decays over several seconds. By carefully adjusting the threshold and release time, the gate can be directed to close more quickly than the reverb
decays on its own. This can make the reverb seem less intense, though the amount and decay time of the reverb isn’t changed while the music
A noise gate can also be used as an e ect. One classic example is gated reverb, which was very popular during the 1980s. The idea is to create
a huge wash of reverb that extends for some duration, then cuts o suddenly to silence. Typically, the output of a reverb unit is fed into a
compressor that counters the reverb’s normal decay, keeping it louder longer. The compressor’s output is then sent to a noise gate that shuts o
the audio suddenly when the reverb decays to whatever level is set by the threshold control.
An expander is similar to a noise gate, but it operates over a wider range of volume levels, rather than simply turn on and o based on a signal
threshold level. In essence, an expander is the opposite of a compressor. Expanders can also be used to reduce background noise levels but
without the sudden on-o transition a gate applies. For example, if you set the ratio to a fairly gentle 0.9:1, you won’t notice the expansion
much when the audio plays, but the background hiss will be reduced by 5 to 10 dB during silent parts. The Sonitus compressor plug-in I use also
serves as an expander, and some other compressors also perform both tasks. Instead of setting the ratio higher than 1:1 for compression, you use
a smaller ratio such as 0.8:1.
When an expander is used to mimic a noise gate, its behavior is called downward expansion. That is, the volume level is lowered when the
input level falls below the threshold. This is similar to the old dbx and Burwen single-ended noise reduction devices mentioned in Chapter 6.
But an expander can also do the opposite: leaving low-level audio as is and raising the volume when the input exceeds the threshold. In that
case, the effect is called upward expansion, or just expansion. Which type of expansion you get depends on the design of the expander.
If you have a multiband compressor that also can expand like the Sonitus plug-in, you could do even more sophisticated noise reduction in
di erent frequency bands to lower the noise in bands that don’t currently have content. But more modern software noise reduction works even
better, so multiband gates and expanders are less necessary. However, hardware gates are still useful for live sound applications.
But …
As useful as compressors are, I use them only rarely. Rather, I use volume envelopes in SONAR as needed to raise soft vocal syllables or to lower
passages that are too loud. Programming volume changes manually is more e ective than nding the right compressor settings, which are rarely
correct for the entire length of a tune. And, of course, you can change a volume envelope later if needed. Volume envelopes can also be set to
raise or lower the volume for a precise length of time, including extremely short periods to control popping P’s, and you can draw the attack
and release times. Indeed, the main reason to avoid compressors is to prevent pumping and breathing sounds. The only things I routinely
compress—always after recording, nondestructively—are acoustic guitar and electric bass if they need a little more sustain or presence as an
Likewise, I avoid using gates, instead muting unwanted audio manually either by splitting track regions and slip-editing the clips, or by adding
volume envelopes. When you need to reduce constant background noise, software noise reduction is more e ective than a broadband gate, and
adding envelopes is often easier and more direct than experimenting to nd the best gate settings. Further, when an e ect like a gate is patched
into an entire track, you have to play it all the way through each time you change a setting to be sure no portion of the track is being harmed.
Dynamics Processor Special Techniques
Dynamics processors can be used to perform some clever tricks in addition to their intended purpose. Some compressors and gates o er a sidechain input, which lets them vary the level of one audio stream in response to volume changes in another. When I owned a professional
recording studio in the 1970s and 1980s, besides o ering recording services to outside clients, we also had our own in-house production
company. I recall one jingle session where the music was to be very funky. I created a patch that sent my direct electric bass track through a
Kepex, an early gate manufactured by Allison Research, with the side-chain input taken from the kick drum track. I played a steady stream of
1/16 notes on the bass that changed with the backing chords, but only those notes that coincided with the kick drum passed through the gate.
The result sounded like the band Tower of Power—incredibly tight, with the kick and bass playing together in perfect synchronization.
My studio also recorded a lot of voice-over sessions, including corporate training tapes. These projects typically employed one or more voice
actors, with music playing in the background. After recording and editing the ¼-inch voice tape as needed, a second pass would lay in the music
underneath. But you can’t just leave the music playing at some level. U sually such a production starts with music at a normal level, and then the
music is lowered just before the narration begins. During occasional pauses in the speaking, the music comes up a bit, then gets softer just before
the announcer resumes.
When I do projects like this today, I use a DAW with the mono voice on one track and the stereo music on another. Then I can easily reduce
the music’s volume half a second before the voice comes in and raise it gently over half a second or so during the pauses. But in the days of
analog tape where you couldn’t see the waveforms as we do today in audio software, you had to guess when the voice would reenter.
To automate this process, I devised a patch that used a delay and a compressor with a side-chain input. The music went through the
compressor, but its volume was controlled by the speaking voice sent into the side-chain. The voice was delayed half a second through a Lexicon
Prime Time into the nal stereo mix, but it was not delayed into the compressor’s side-chain input. So when the announcer spoke, the music
would be lowered in anticipation of the voice. I used slow attack and decay times to make the volume changes sound more natural, as if a
person was gently riding the levels. Even today, with a long project requiring many dozens of volume changes, the time to set up a patch like
this in a DAW is more than o set by not having to program all the volume changes. And today, with total recall, once you have the setup in a
DAW template, you can use it again and again in the future. By the way, when a side-chain input is used this way, the process is called ducking
the music, and some plug-ins offer a preset already connected this way.
Finally, when compressing an entire mix, you can sometimes achieve a smoother leveling action by using a low compression ratio such as
1.2:1 with a very low threshold, possibly as low as −30 or even −40. This changes the level very gradually, so you won’t hear the compressor
Other Dynamics Processors
A transient shaper lets you increase or decrease the attack portion of an instrument at the start of each new note or chord. This acts on only the
rst brief portion of the sound, raising or lowering the level for a limited amount of time, so the initial strum of a guitar chord can be brought
out more or less than the sustained portion that follows. It works the same way for piano chords and especially percussion instruments. But a
transient shaper needs a clearly de ned attack. If you use it on a recording of sustained strings or a synthesizer pad that swells slowly, a transient
shaper won’t be able to do much simply because there are no attack transients to trigger its level changes.
The tremolo e ect raises and lowers the volume in a steady repeating cycle, and it’s been around forever. Many early Fender guitar ampli ers
included this e ect, including the Tremolux, which o ered controls to vary the depth, or amount of the e ect, as well as the speed. Note that
many people confuse the term tremolo, a cyclic volume change, with vibrato, which repeatedly raises and lowers the pitch. A tremolo circuit is
fairly simple to design and build into a guitar amp, but vibrato is much more di cult and usually relies on DSP. Likewise, a whammy bar—also
called a vibrato bar or pitch bend bar—on an electric guitar lowers the pitch when pressed. But this is not tremolo, so calling this a tremolo bar
is also incorrect.
An auto-panner is similar to a tremolo e ect, except it adjusts the volume for the stereo channels symmetrically to pan the sound left and
right. Early models simply swept the panning back and forth repeatedly at a xed rate, but more modern plug-in versions can sync to the tempo
of a DAW project. Some even o er a tap tempo feature, where you click a button with the mouse along with the beat of the song to set the leftright pan rate. If you need an auto-panning e ect for only a short section of a song, it might be simpler to just add a track envelope and draw in
the panning manually.
Another type of dynamics processor is the volume maximizer, for lack of a more universal term. Like a limiter, this type of processor reduces
the level of peaks that exceed a threshold. But it does so without attack and release times. Rather, it searches for the zero-crossing boundaries
before and after the peak and lowers the volume for just that overly loud portion of the wave. Brief peaks in a Wave le often prevent
normalizing from raising the volume of a track to its full potential. Figure 9.4 shows a portion of a song le whose peak level hovers mostly
below −6 dB. But looking at the left channel on top, you can see ve short peaks that are louder than −6. So if you normalized this le, those
peaks will prevent normalizing from raising the level a full 6 dB.
Figure 9.4:
In this waveform of a completed mix, a few peaks extend brie y above the −6 dB marker. By reducing only those peaks, normalizing can then raise the volume to a higher
A volume maximizer reduces the level of only the peaks, so if you reduce the peaks to −6 dB, or lower for an even more aggressive volume
increase, you can then normalize the
le to be louder. Figure 9.5 shows a close-up of the loudest peak (see marker) after applying the Peak
Slammer plug-in with its threshold set to −6 dB.
Figure 9.5:
After reducing all the peaks to −6 dB, you can then normalize the file to a louder volume. The marker shows the loudest peak near the center of the screen in Figure 9.4.
This approach lets you raise the level of a mix quite a lot, but without the pumping e ect you often get with aggressive limiting. Note that it’s
easy to do this type of processing manually, too, unless there are many peaks to deal with. Simply zoom in on each peak and lower the volume
manually. As long as the range you select for processing is bounded by zero crossings, as shown in Figure 9.6, the amount of distortion added
will be minimal. There’s no free lunch, however, and some distortion is added by the process. Your ears will tell you if you’ve added too much
in the pursuit of more loudness.
Figure 9.6:
The highlighted area shows the same loudest peak from Figure 9.4, in preparation for reducing its volume manually.
Parallel compression is not a separate processor type but rather a di erent way to use a regular compressor. If you feed a compressor from an
auxiliary or subgroup bus, you can control the blend between the normal and compressed versions using the Return level from that bus. This lets
you apply a fairly severe amount of compression to get a smooth, even sound, but without squashing all the transients, as happens when the
signal passes only through the compressor. This can be useful on drums, electric bass, and even complete mixes. On a bass track it can add punch
similar to a long attack time described earlier, letting some of the initial burst through to increase articulation.
Compressor Internals
Common to every compressor are a level detector and gain reduction module, as shown in the block diagram of Figure 9.7. The level detector
converts the incoming audio to an equivalent DC voltage, which in turn controls the volume of the gain reduction module. The Attack and Decay
time controls are part of the level detector’s conversion of audio to DC. The Threshold setting is simply a volume control for the level detector,
whereby raising the audio volume into the detector lowers the threshold. That is, sending more signal into the detector tells it that the audio is
too loud at a lower level.
Figure 9.7:
A compressor comprises four key components: an input amplifier, a level detector, a gain reduction module, and an output amplifier.
There are a number of ways to create a gain reduction module; the main trade-o s are noise and distortion and the ability to control attack
and release times. Early compressors used a lightbulb and photoresistor to vary the signal level. The audio is ampli ed to drive the lightbulb—it
doesn’t even need to be converted to DC—and the photoresistor diverts some of the audio to ground, as shown in Figure 9.8. This method works
very well, with very low distortion, but it takes time for a lightbulb to turn on and o , so very fast attack and release times are not possible. In
fact, most compressors that use a lightbulb don’t even o er attack and release time settings. You might think that an incandescent bulb can turn
on and o quickly, but that’s only when the full voltage is applied. When used in a compressor, most of the time the audio is so soft that the
bulb barely glows. At those levels the response times are fairly long.
Figure 9.8:
When light strikes a photoresistor its resistance is lowered, shunting some of the audio signal to ground.
To get around the relatively long on and o times of incandescent bulbs, some early compressors used electroluminescent lights. These green,
at panels, also used as night-lights that glow in the dark, respond much more quickly than incandescent bulbs. This design is commonly called
an “optical” or “opto” compressor. Another type of gain module uses a eld e ect transistor (FET) in place of the photoresistor, with a DC
voltage controlling the FET’s resistance. A more sophisticated type of gain reduction module is the voltage controlled ampli er (VCA). This is
the same device that’s often used for console automation as described in Chapter 7, where a DC voltage controls the volume level.
Time Constants
Compressor and gate attack and decay times are based on an electrical concept known as time constants. Everything in nature takes some time to
occur; nothing happens immediately in zero seconds. If you charge a capacitor through a resistor, it begins to charge quickly, and then the
voltage increases more and more slowly as it approaches the maximum. Since it takes a relatively long time to reach the eventual destination,
electrical engineers consider charge and discharge times based on the mathematical constant e, the base value for natural logarithms whose value
is approximately 2.718. The same concept applies to attack and decay times in audio gear, where resistors and capacitors are used to control
those parameters. A hardware compressor typically uses a xed capacitor value, so the knobs for attack and release times control variable
resistors (also called potentiometers).
If a limiter is set for an attack time of one second, and the audio input level rises above the threshold enough for the gain to be reduced by
10 dB, it won’t be reduced by the full 10 dB in that one second. Rather, it is reduced by about 6 dB, and the full 10 dB reduction isn’t reached
until some time later. The same applies for release times. If the release time is also one second, once the input level falls below the threshold,
the volume is raised about 6 dB over that one-second period, and full level isn’t restored until later. Figure 9.9 shows the relationship between
actual voltages and time constants. As you can see, the stated time is shorter than the total time required to reach the final value.
Figure 9.9:
The actual attack and decay times in electrical circuits are longer than their stated values.
For the sake of completeness, the charge and discharge times of a simple resistor/capacitor network are based on the constant e as follows:
To illustrate this concept in more practical terms, I created a 1 KHz sine wave at three volume levels, then repeated that block and processed
the copy with a compressor having a high ratio. Figure 9.10 shows the Wave le containing both the original and processed versions. As you can
see, the volume in the second block isn’t reduced immediately after it jumps up to a level of −1, so a brief burst gets through before the level is
reduced to −8 dB. Then, after the input level drops down to −20, it takes some amount of time for the volume to be restored. Figure 9.11
shows the compressor screen with 100 ms attack and 500 ms release time settings. The ratio is set high at 20:1 to apply hard limiting, making it
easier to understand the expected gain changes.
Figure 9.10:
This Wave
Figure 9.11:
The compressor used for this test has an attack time of 100 milliseconds and a release time of 500 milliseconds.
le contains a 1 KHz tone at three di erent volume levels for two seconds each. Then that block of three tones was repeated and compressed, to see how the
actual attack and decay times relate to the specified times.
Figure 9.12 shows a close-up of the attack portion to see how long the compressor took to reduce the volume. Figure 9.13 shows a close-up of
the release portion, and again the volume is not restored fully until after the specified release time.
Figure 9.12:
When the attack time is set for 100 milliseconds, the volume is reduced to only 37 percent in that period. The full amount of gain reduction isn’t reached until more
Figure 9.13:
Even though the release time is set to 500 milliseconds, the volume is restored to only 63 percent during that time.
than twice that much time.
This chapter explains how compressors came to be invented and describes the most common parameters used by hardware and software
versions. A compressor can be used as a tool to make volume levels more consistent, or as an e ect to add power and fullness to a single track
or complete mix. It can also be used to counter an instrument’s normal decay to increase its apparent sustain. When both the attack and release
times are very quick, the resulting “aggressive” compression adds distortion at low frequencies that can be pleasing on some musical sources.
When using a compressor, the most important controls are the compression ratio and threshold. If the threshold is not set low enough, a
compressor will do nothing. Likewise, the ratio must be greater than 1:1 in order for the volume to be reduced. When adjusting a compressor,
you can verify how hard it’s working with the gain reduction meter. You also learned some common pitfalls of compression: Compression
always raises the noise floor, which in turn amplifies clicks from bad edits and other sounds such as breath noises and chair squeaks.
A multiband compressor lets you establish a frequency balance, which it then enforces by varying the volume levels independently in each
band. But as useful as compressors can be, sometimes it’s easier and more direct to simply adjust the volume manually using volume envelopes
in your DAW software. This is especially true if the corrections are “surgical” in nature, such as controlling a very brief event.
Noise gates are useful for muting tracks when the performer isn’t singing or playing, though as with compressors, it’s often better and easier to
just apply volume changes with envelopes. But gates are still useful for live sound mixing and can be used to create special e ects such as gated
A compressor or gate that has a side-chain input lets you control the volume of one audio source based on the volume of another. This is used
to automatically reduce the level of background music when an announcer speaks, but it can also be applied in other creative ways.
Other dynamics processors include transient shapers, the tremolo e ect, volume maximizers, and parallel compression. In particular, a volume
maximizer can make a mix louder, while avoiding the pumping sound you often get with compression. Finally, this chapter explains the internal
workings of compressors, including a block diagram and an explanation of time constants.
Chapter 10
Frequency Processors
The equalizer—often shortened to EQ—is the most common type of frequency processor. An equalizer can take many forms, from as simple as
the bass and treble knobs on your home or car stereo to multiband and parametric EQ sections on a mixing console or a graphic equalizer in a
PA system. Equalizers were rst developed for use with telephone systems to help counter high-frequency losses due to the inherent capacitance
of very long wires. Today, EQ is an important tool that’s used to solve many speci c problems, as well as being a key ingredient in a mix
engineer’s creative palette.
Classical composers, professional orchestrators, and big band arrangers use their knowledge of instrument frequencies and overtones to create
tonal textures and to prevent instruments from competing with one another for sonic space, which would lose clarity. Often mix engineers must
use EQ in a similar manner to compensate for poor musical arranging, ensuring that instruments and voices will be clearly heard. But EQ is just
as often used to enhance the sound of individual instruments and mixes to add sparkle or fullness, helping to make the music sound even better
than reality.
Equalizer Types
There are four basic types of equalizers. The simplest is the standard bass and treble shelving controls found on home hi- receivers, with boost
and cut frequency ranges similar to those shown in Figure 10.1. The most commonly used circuit for bass and treble tone controls was developed
in the 1950s by P. J. Baxandall, and that design remains popular to this day. Some receivers and ampli ers add a midrange control (often called
presence) to boost or cut the middle range at whatever center frequency and Q are deemed appropriate by the manufacturer.
Figure 10.1:
Treble and bass shelving controls boost or cut a broad range of frequencies, starting with minimal boost or cut in the midrange, and extending to the frequency
extremes. This graph simultaneously shows the maximum boost and cut available at both high and low frequencies to convey the range of frequencies affected.
A graphic equalizer divides the audible spectrum into
ve or more frequency bands and allows adjusting each band independently via a
boost/cut control. Common arrangements are 5 bands (each two octaves wide), 10 bands (one octave wide), 16 bands (
octaves), and 27 to 32
bands (
octave). This type of EQ is called graphic because adjusting the front-panel sliders shows an approximate display of the resulting
frequency response. So instead of broad adjustments for treble, bass, and maybe midrange, a graphic EQ has independent control over the low
bass, mid-bass, high bass, low midrange, and so forth. Third-octave graphic EQs are often used with PA systems to improve the overall frequency
balance, though graphic equalizers are not often used in studios on individual tracks, or a mix bus, or when mastering. However, experimenting
with a ten-band unit such as the plug-in shown in Figure 10.2 is a good way to become familiar with the sound of the various frequency ranges.
Figure 10.2:
Graphic equalizers offer one boost or cut control for each available band.
Console equalizers typically have three or four bands, with switches or continuous knobs to vary the frequency and boost/cut amounts. Often
the highest- and lowest-frequency bands can also be switched between peaking and shelving. Professional consoles usually employ switches to
select between xed frequencies, and some also use switches for the boost and cut amounts in 1 dB steps. While not as exible as continuous
controls, switches let you note a frequency and recall it exactly at a later session.
controls, switches let you note a frequency and recall it exactly at a later session.
The console equalizer in Figure 10.3 shows a typical model with rotary switches to set the frequencies in repeatable steps and continuously
variable resistors to select any boost or cut amount between −15 and +15 dB. Building a hardware EQ with rotary switches costs much more
than using variable resistors—also called potentiometers, or simply pots. A potentiometer is a relatively simple component, but a high-quality
rotary switch having ten or more positions is very expensive. Further, when a switch is used, expensive precision resistors are needed for each
available setting, plus all those resistors must be soldered to the switch adding more to the cost.
Figure 10.3:
This typical console EQ has four bands with rotary switches to vary the frequencies and potentiometers to control the boost and cut amounts. The low and high bands
can be switched between shelving and peaking, and a low-cut filter is also offered.
The most powerful equalizer type is the parametric, named because it lets you vary every EQ parameter—frequency, boost or cut amount, and
bandwidth or Q. Many console EQs are “semi-parametric,” providing a selection of several xed frequencies and either no bandwidth control or
a simple wide/narrow switch. A fully parametric equalizer allows adjustment of all parameters continuously rather than in xed steps. This
makes it ideal for “surgical” correction, such as cutting a single tone from a ringing snare drum with minimal change at surrounding frequencies
or boosting a fundamental frequency to increase fullness. The downside is it’s more di cult to recall settings exactly later, though this, of course,
doesn’t apply with software plug-in equalizers. Figure 10.4 shows a typical single-channel hardware parametric equalizer, and the Sonitus EQ
plug-in I use is in Figure 10.5.
Figure 10.4:
The parametric EQ is the most powerful type because you can choose any frequency, bandwidth, and boost/cut amount for every band.
Figure 10.5:
A plug-in parametric EQ has the same features as a hardware model, but it can also save and recall every setting.
The Sonitus EQ o ers six independent bands, and each band can be switched between peaking, high- or low-frequency shelving, high-pass,
and low-pass. The high-pass and low-pass lters roll o
at 6 dB per octave, so you can obtain any slope by activating more than one band. For
example, to roll o low frequencies at 12 dB per octave starting at 100 Hz, you’d set Band 1 and Band 2 to high-pass, using the same 100 Hz
frequency for both bands. You can also control the steepness of the slope around the crossover frequency using the Q setting.
There’s also a hybrid equalizer called paragraphic. This type has multiple bands like a graphic EQ, but you can control the frequency of each
band. Some let you vary the Q as well. In that case, there’s no real di erence between a paragraphic EQ and a parametric EQ having the same
number of frequency bands.
Finally, there are equalizers that can learn the sound of one audio source and change another source automatically to have a similar frequency
response. The rst such automatic equalizer I’m aware of was Steinberg’s FreeFilter plug-in, which analyzes the frequency balance of a source
le, then applies EQ in third-octave bands to the le being EQ’d to match. Har-Bal (Harmonic Balance) is a more recent product, and it’s even
more sophisticated because it analyzes the audio at a higher resolution and also considers the di erence between peak and average volume
levels in the source and destination. Both products let you limit the amount of EQ applied to avoid adding an unreasonable amount of boost or
cut if frequencies present in one source don’t exist at all in the other.
All Equalizers (Should) Sound the Same
Years ago, before parametric equalizers were common in even entry-level setups, most console equalizers o ered a limited number of xed
frequencies that were selected via switches. The Q was also xed at whatever the designer felt was appropriate and “musical” sounding. So in
those days there were audible and measurable di erences in the sound of various equalizer brands. One model might feature a certain choice of
fixed frequencies at a given Q, while others offered a different set of frequencies.
Console equalizers manufactured by British companies have had a reputation for better sound quality than other nationalities, though in my
view this is silly. Indeed, many veteran recording engineers will tell you that the equalizers in British consoles all sound di erent; some love the
sound of an SSL equalizer but hate the Trident, and vice versa. To my mind, this refutes the notion that “British EQ” is inherently superior. If
British console equalizers are so di erent, what aspect of their sound binds them together to produce a commonality called “British”? One
possible explanation is that peaking equalizers in some British recording consoles have a broader bandwidth than the EQs in some American
consoles, so more boost can be applied without making music sound unnatural. But now that parametric equalizers are no longer exotic or
expensive, all of that goes away. So whatever one might describe as the sound of “British EQ” can be duplicated exactly using any fully
parametric equalizer.
In my opinion, an equalizer should not have an inherent “sound” beyond the frequency response changes it applies. Di erent EQ circuits
might sound di erent as they approach clipping, but sensible engineers don’t normally operate at those levels. Some older equalizer designs use
inductors, which can ring and add audible distortion. However, most modern equalizer designs use op-amps and capacitors because of these
problems with real inductors, not to mention their high cost and susceptibility to picking up airborne hum.
Finally, I can’t resist dismissing the claim by some EQ manufacturers that their products sound better because they include an “air band” at
some ultrasonic frequency. In truth, what you’re really hearing is a change within the audible band due to the EQ’s nite bandwidth. U nless the
bandwidth is extremely narrow, which it obviously isn’t, applying a boost at 25 KHz also boosts to some extent the audible frequencies below 20
Digital Equalizers
“Ultimately, skeptics and believers have the same goal: to separate fact from fiction. The main difference between these two camps is what evidence they deem acceptable.”
Contrary to what you might read in magazine ads, many parametric EQ plug-ins are exactly the same and use the same standard “textbook”
algorithms. Magazine reviews proclaiming this or that plug-in to be more “musical sounding” or to “adds sweeter highs” than other models
further proves the points made in Chapter 3 about the frailty of our hearing perception. However, some digital equalizers add intentional
distortion or other noises to emulate analog designs, and those might sound different unless you disable the extra effects.
In 2009, the online audio forums exploded after someone posted their results comparing many EQ plug-ins ranging from freeware VSTs to
very expensive brands. The tests showed that most of the EQs nulled against each other to silence, as long as extra e ects such as distortion or
“vintage mode” were disabled. As you know, when two signals null to silence, or very nearly so, then by de nition they sound identical.
However, some equalizers that are otherwise identical may not sound the same, or null against each other, because there’s no industry standard
for how the Q of a peaking boost and cut is defined. Chapter 1 explained that the bandwidth of a filter is defined by the −3 dB points, but that’s
for plain
lters. U nlike a band-pass
lter that continues to roll o
frequencies above and below its cuto , an equalizer has a
at response at
frequencies other than those you boost or cut. And equalizers can boost or cut less than 3 dB, so in that case there’s no −3 dB point to reference.
In the comparison of many plug-in EQs, the author noted in his report that simply typing the same numbers into di erent equalizers didn’t
always give the same results. He had to vary the Q values to achieve the most complete null.
The same applies to many hardware equalizers: There are only so many ways to build a parametric EQ. Design engineers tend to use and
reuse the most elegant and e cient designs. Indeed, while analog circuit design (and computer programming) are both based on science, there’s
also an artistic component. An elegant hardware design accomplishes the most using the fewest number of parts, while drawing the least amount
of current from the power supply. For software, the goal is smaller code that’s also e cient to minimize taxing the CPU . The best analog and
digital designs are often in college textbooks or application guides published by component manufacturers, and any enterprising individual can
freely copy and manufacture a commercial product based on those designs. This is not to say that all hardware equalizers use identical circuitry,
but many models are similar.
Another common audio myth claims that good equalizers require less boost or cut to achieve a given e ect, compared to “poorer” EQ designs.
This simply is not true. How much the sound of a track or mix is changed depends entirely on the amount of boost or cut applied, and the Q or
bandwidth. As explained, plug-in equalizers may interpret the Q you specify di erently from one brand and model to another. But whatever the
Q really is, that is what determines how audible a given amount of boost or cut will be. The notion that linear phase equalizers achieve more of
an e ect with less boost and cut than standard minimum phase types is also unfounded. Indeed, how audio responds to EQ depends entirely on
the frequencies it contains. I’ll have more to say about linear phase equalizers shortly.
EQ Techniques
“At the same time we said we wanted to try for a Grammy, we also said we didn’t want to use any EQ. That lasted about eight hours.”
—Ken Caillat, engineer on Fleetwood Mac’s Rumours album, interviewed in the October 1975 issue of Recording Engineer/Producer Magazine
The art of equalization is identifying what should be changed in order to improve the sound. Anyone can tell when a track or complete mix
sounds bad, but it takes talent and practice to know what EQ settings to apply. As with compressors, it’s important to match volume levels when
comparing the sound of EQ active versus bypassed. Most equalizers include both a volume control and a bypass switch, making such
comparisons easier. It’s difficult to explain in words how to best use an equalizer, but I can convey the basic concepts, and the equalization demo
video that accompanies this book lets you hear equalizers at work.
You should avoid the urge to boost low frequencies too much because small speakers and TV sets can’t reproduce such content and will just
distort when the music is played loudly. The same applies for extreme high frequencies. If your mix depends on a substantial boost at very low
or very high frequencies to sound right to you, it’s probably best to change something else. Often, it’s better to identify what’s harming the sound
and remove it rather than look for frequencies that sound good when boosted. For example, cutting excessive low frequencies better reveals the
important midrange frequencies that define the character of an instrument or voice.
Standard mixing practice adds a low-cut lter to thin out tracks that do not bene t from low frequencies. This avoids mud generally and
reduces frequency clashes with the bass or kick drum due to masking. Cutting lows or mids is often better than boosting highs to achieve clarity.
However, adding an overall high-frequency boost can be useful on some types of instruments, such as an acoustic guitar or snare drum. This is
usually done with a broad peaking boost around 4 KHz or higher, depending on the instrument. A shelving boost can work, too, but that always
adds maximum boost at the highest frequencies, which increases preamp hiss and other noise at frequencies that may not be present in the
The same applies for overall low-frequency boost to add fullness. It’s easy to add too much, and you risk raising muddy frequencies and other
low-level subsonic content. Further, if your loudspeakers don’t extend low enough, you’re not hearing everything in the track or mix. So always
verify a nal mix with earphones, unless your speakers can play to below 40 Hz. Adding too much low end is a common problem with mixes
made on the popular Yamaha NS-10 loudspeakers. These speakers are 3 dB down at 100 Hz, and their response below that falls o quickly at
12 dB per octave. So if you boost 50 Hz to increase fullness while listening through NS-10s, by the time you hear the fullness you want, you’ve
added 12 dB more boost than you should have. Some engineers watch the speaker cone of the NS-10 to spot excess low-frequency content,
though using full-range speakers seems more sensible to me.
As mentioned in Chapter 7, adding high end helps to bring a track forward in the mix, and reducing high frequencies pushes it farther back.
This mimics the absorption of air that reduces high frequencies more than lows with distance. For example, an outdoor rock concert heard from
far away consists mostly of low frequencies. So when combined with an appropriate amount of arti cial reverb, you can e ectively make
instruments seem closer or farther away. A lead vocal should usually be up front and present sounding, while background vocals are usually
farther back.
The “equalization” video lets you hear EQ applied to a variety of sources including several isolated tracks, plus a complete mix of a tune by
my friend Ed Dzubak, a professional composer who specializes in soundtracks for TV shows and movies.
Boosting versus Cutting
“The instinctive way to use a parametric equalizer is to set it for some amount of boost, then twiddle the frequency until the instrument or mix sounds better. But often what’s
wrong with a track is frequency content that sounds bad. A better way to improve many tracks is to set the EQ to boost, then sweep the frequency to nd the range that sounds
worst. Once you find that frequency, set the boost back to zero, wait a moment so your ears get used to the original timbre, then cut that range until it sounds better.”
—Ethan Winer, from a letter to Electronic Musician magazine, December 1993
One of the most frequently asked questions I see in audio forums is whether it’s better to boost or to cut. Replies often cite phase shift as a
reason why cutting is better, but Chapters 1 through 3 explained why that’s not true. All equalizers use phase shift, and the amount of phase shift
added by typical equalizers is not audible. So the real answer to which is better is both—or neither. Whether boosting or cutting is better
depends on the particular problem you’re trying to solve. If this wasn’t true, then all equalizers would offer only boost, or only cut.
With midrange EQ, a low Q (wide bandwidth) lets you make a large change in the sound quality with less boost or cut, and without making
the track sound nasal or otherwise a ected. A high Q boost always adds resonance and ringing. Now, this might be useful to, for example, bring
out the low tone of a snare drum by zeroing in on that one frequency while applying a fair amount of boost. When I recorded the audio for my
Tele-Vision music video, one of the tracks was me playing a Djembe drum. I’m not much of a drummer, and my touch is not very good. So I
didn’t get enough of the sustained ringing tone that Djembes are known for. While mixing the track, I swept a parametric EQ with a high Q to
nd the fundamental pitch of the drum, then boosted that frequency a few dB. This created a nice, round sound, as the dull thud of my inept
hand slaps became more like pure ringing tones. You can hear this in the “equalization” demo video.
Often when something doesn’t sound bright enough, the real culprit is a nasal or boxy sounding midrange. When you nd and reduce that
midrange resonance by sweeping a boost to nd what frequencies sound worst, the sound then becomes brighter and clearer, and often fuller by
comparison. Today, this is a common technique—often called “surgical” EQ—where a narrow cut is applied to reduce a bad-sounding resonance
or other artifact. Indeed, identifying and eliminating o ensive resonances is one of the most important skills mixing and mastering engineers
must learn.
So it’s not that cutting is generally preferred to boosting, or vice versa, but rather identifying the real problem, and often this is a low-end
buildup due to having many tracks that were close-mic’d. Cutting boxy frequencies usually improves the sound more than boosting highs and
lows, but not all cuts need a narrow bandwidth. For example, a medium bandwidth cut around 300 Hz often works well to improve the sound
of a tom drum. This is also demonstrated in the equalization video. But sometimes a broad peaking boost, or a shelving boost, is exactly what’s
needed. I wish I could give you a list of simple rules to follow, but it just doesn’t work that way. However, I can tell you that cutting resonance is
best done with a narrow high Q setting, while boosting usually sounds better using a low Q or with a shelving curve.
Common EQ Frequencies
Table 10.1 lists some common instruments with frequencies at which boost or cut can be applied to cure various problems or to improve the
perceived sound quality. The indicated frequencies are necessarily approximate because every instrument sounds di erent, and di erent styles of
music call for di erent treatments. These equalization frequencies for various instruments are merely suggested starting points. The Comments
column gives cautions or observations based on experience. These should be taken as guidelines rather than prescriptions because every situation
is different, and mixing engineers often have different sonic goals. Additional advice includes the following:
Your memory is shorter than you think; return to a flat setting now and then to remind yourself where you began.
Make side-by-side comparisons against commercial releases of similar types of music to help judge the overall blend and tonality.
You can alter the sound of an instrument only so much without losing its identity.
Every instrument can’t be full, deep, bright, sparkly, and so forth all at once. Leave room for contrast.
Take a break once in a while. Critical listening tends to numb your senses, especially if you listen for long periods of time at loud volumes. The
sound quality of a mix may seem very different to you the next day.
• Don’t be afraid to experiment or to try extreme settings when required.
Table 10.1: Boost or Cut Frequencies for Different Instruments.
Mixes That Sound Great Loud
—Liner note on the 1970 album Climbing by the band Mountain featuring Leslie West
This is a standard test for me when mixing: If I turn it up really loud, does it still sound good? The
rst recording I ever heard that sounded
fantastic when played very loud was The Yes Album by the band Yes in 1971. Not that it sounded bad at lower volumes! But for me, that was a
breakthrough record that set new standards for mixing pop music and showed how amazing a recording could sound.
One factor is the bass level. I don’t mean the bass instrument, but the amount of content below around 100 Hz. You simply have to make the
bass and kick a bit on the thin side to be able to play a mix loudly without sounding tubby or distorting the speakers. Then when the music is
played really loud, the fullness kicks in, as per Fletcher-Munson. If you listen to the 1978 recording of The War of the Worlds by Je Wayne, the
tonality is surprisingly thin, yet it sounds fantastic anyway. You also have to decide if the bass or the kick drum will provide most of the fullness
for the bottom end. There’s no rm rule, of course, but often with a good mix, the kick is thin-sounding while the bass is full, or the bass is on
the thin side and the kick is full with more thump and less click.
Another factor is the harshness range around 2 to 4 KHz. A good mix will be very controlled in this frequency range, again letting you turn up
the volume without sounding piercing or fatiguing. I watch a lot of concert DVDs, and the ones that sound best to me are never harsh in that
frequency range. You should be able to play a mix at realistic concert levels—say, 100 dB SPL—and it should sound clear and full but never hurt
your ears. You should also be able to play it at 75 dB SPL without losing low midrange “warmth” due to a less than ideal arrangement or
limited instrumentation. A ve-piece rock band sounds great playing live at full stage volume, but a well-made recording of that band played at
a living room level sounds thin. Many successful recordings include parts you don’t necessarily hear but that still ll out the sound at low
Complementary EQ
Complementary EQ is an important technique that lets you single out an instrument or voice to be heard clearly above everything else in a mix.
The basic idea is to boost a range of frequencies for the track you want to feature and cut that same range by the same amount from other tracks
that have energy in that frequency range. This is sometimes referred to as “carving out space” for the track you want to feature. A typical
complementary boost and cut amount is 2 or 3 dB for each. Obviously, using too much boost and cut will a ect the overall tonality. But modest
amounts can be applied with little change to the sound, while still making the featured track stand out clearly.
When doing this with spoken voice over a music bed, you’ll boost a midrange frequency band for the voice and cut the same range by the
same amount on the backing music. Depending on the character of the voice and music, experiment with frequencies between 600 Hz and
2 KHz, with a Q somewhere around 1 or 2. To feature one instrument or voice in a complete mix, it’s easiest to set up a bus for all the tracks
that will be cut. So you’ll send everything but the lead singer to a bus and add an EQ to that bus. Then EQ the vocal track with 2 dB of boost at
1.5 KHz or whatever, and apply an opposite cut to the bus EQ.
This also works very well with bass tracks. In that case you’d boost and cut somewhere between 100 and 600 Hz, depending on the type of
bass tone you want to bring out. Again, there’s no set formula because every song and every mix is di erent. But the concept is powerful, and it
works very well. As mentioned in Chapter 9, I used to do a lot of advertising voice-over recording. With complementary EQ, the music can be
made loud enough to be satisfying, yet every word spoken can be understood clearly without straining. Ad agencies like that.
Mid/Side Equalization
Another useful EQ technique is Mid/Side processing, which is directly related to the Mid/Side microphone technique described in Chapter 6. A
left and right stereo track is dissected into its equivalent mid and side components. These are equalized separately, then combined back to left
and right stereo channels. This is a common technique used by mastering engineers to bring out a lead vocal when they don’t have access to the
original multitrack recording. To bring out a lead vocal that’s panned to the center of a mix, you’d equalize just the mid component where the
lead vocal is more prominent. You could also use complementary EQ to cut the same midrange frequencies from the side portion if needed.
Since Mid/Side EQ lets you process the center and side information separately, you can also more easily EQ the bass and kick drum on a
nished mix, with less change to everything else. Likewise, boosting high frequencies on the side channels can make a mix seem wider, because
nished mix, with less change to everything else. Likewise, boosting high frequencies on the side channels can make a mix seem wider, because
only the left and right sides are brightened. The concept of processing mid and side components separately dates back to vinyl disk mastering
where it was often necessary to compress the center material in a mix after centering the bass. The Fairchild 670 mastering limiter worked in
this manner. Extending the concept to equalization evolved as a contemporary mastering tool. Often adding 1–3 dB low-end boost only to the
sides adds a nice warmth to a mix without adding mud. Although Mid/Side equalization is mainly a mastering tool, you may nd it useful on
individual stereo tracks—for example, to adjust the apparent width of a piano.
Extreme EQ
“A mixer’s job is very simply to do whatever it takes to make a mix sound great. For me, there are no arbitrary lines not to cross.”
—Legendary mix engineer Charles Dye
Conventional recording wisdom says you should always aim to capture the best sound quality at the source, usually by placing the
microphones optimally, versus “Fix it in the mix.” Of course, nobody will argue with that. Poor sound due to comb ltering is especially di cult
to counter later with EQ because there are so many peaks and nulls to contend with. But mix engineers don’t usually have the luxury of
rerecording, and often you have to work with tracks as they are.
Some projects simply require extreme measures. When I was mixing my Cello Rondo music video, I had to turn 37 tracks of the same
instrument into a complete pop tune with bright-sounding percussion, a full-sounding bass track, and everything in between. Some of the e ects
needed were so extreme, I showed the plug-in settings at the end of the video so viewers could see what was done. Not just extreme EQ, but also
extreme compression to turn a cello’s fast-decaying pizzicato into chords that sustain more like an electric guitar. My cello’s fundamental
resonance is around 95 Hz, and many of the tracks required severely cutting that frequency with EQ to avoid a muddy mess. The “sonar_rondo”
video shows many of the plug-in settings I used.
I recall a recording session I did for a pop band where extreme EQ was needed for a ham- sted piano player who pounded out his entire part
in the muddy octave below middle C. But even good piano players can bene t from extreme EQ. Listen to the acoustic piano on some of the
harder-rocking early Elton John recordings, and you’ll notice the piano is quite thin-sounding. This is often needed in pop music to prevent the
piano from conflicting with the bass and drums.
Don’t be afraid to use extreme EQ when needed; again, much of music mixing is also sound design. It seems to me that for a lot of pop music,
getting good snare and kick drum sounds is at least 50 percent sound design via EQ and sometimes compression. As they say, if it sounds good, it
is good. In his educational DVD Mix It Like a Record, Charles Dye takes a perfectly competent multitrack recording of a pop tune and turns it
into a masterpiece using only plug-ins—a lot of them! Some of the EQ he applied could be considered over the top, but what really matters is
the result.
Linear Phase Equalizers
As mentioned in Chapter 1, most equalizers rely on phase shift, which is benign and necessary. Whether implemented in hardware or software,
standard equalizers use a design known as minimum phase, where an original signal is combined with a copy of itself after applying phase shift.
In this case, “minimum phase” describes the minimum amount of phase shift needed to achieve the required response. Equalizers simply won’t
work without phase shift. But many people indict phase shift as evil anyway. In a misguided e ort to avoid phase shift, or perhaps for purely
marketing reasons, software developers created linear phase equalizers. These delay the audio the same amount for all frequencies, rather than
selectively by frequency via phase shift as with regular equalizers.
You’ll recall from Chapter 1 that comb lter e ects can be created using either phase shift or time delay. A anger e ect uses time delay,
while phaser e ects and some types of stereo synthesizers instead use phase shift. Equalizers can likewise be designed using either method to
alter the frequency response. A minimum phase lter emulates phase shift that occurs naturally, and any ringing that results from boosting
frequencies occurs after the signal. If you strike a resonant object such as a drum, ringing caused by its resonance occurs after you strike it, then
decays over time. But linear phase lters use pure delay instead of phase shift, and the delay as implemented causes ringing to occur before
audio events. This phenomenon is known as preringing, and it’s audibly more damaging than regular ringing because the ringing is not masked
by the event itself. This is especially a problem with transient sounds like a snare drum or hand claps.
The file “impulse_ringing.wav” in Figures 3.5 and 3.6 from Chapter 3 shows an impulse wave alone, then after passing through a conventional
EQ that applied 18 dB of boost with a Q of 6, and again with a Q of 24. To show the e ects of preringing and let you hear what it sounds like, I
created the “impulse_ringing_lp.wav” example using the LP64 linear phase equalizer bundled with SONAR. Figure 10.6 shows the wave le with
the original and EQ’d impulses, and a close-up of a processed portion is in Figure 10.7. This particular equalizer limits the Q to 20, so it’s not a
perfect comparison with the impulse wave in Chapter 3 that uses a Q of 24. Then again, you can see in Figure 10.6 that there’s little di erence
between a Q of 6 and a Q of 20 with this linear phase equalizer anyway.
Figure 10.6:
Figure 10.7:
Zooming in to see the wave cycles more clearly shows that a linear phase EQ adds ringing both after and before the impulse.
le contains the same impulse three times in a row. The second version used a linear phase equalizer to apply 18 dB of boost at 300 Hz with a Q of 6, and the
third is the same impulse but EQ’d with a Q of 20.
When you play the example le in Figure 10.6, you’ll hear that it sounds very di erent from the version in Chapter 3 that applied
conventional minimum phase EQ. To my ears, this version sounds more a ected, and not in a good way. Nor is there much di erence in the
waveform—both visually and audibly—whether the Q is set to 6 or 20.
Equalizer Internals
Figure 8.7 from Chapter 8 shows how it takes time for a capacitor to charge when fed a current whose amount is limited by a resistor. An
inductor, or coil of wire, is exactly the opposite of a capacitor: It charges immediately, then dissipates over time. This simple behavior is the
basis for all audio lters. As explained in Chapter 1, equalizers are built from a combination of high-pass, band-pass, and low-pass lters. Good
inductors are large, expensive, and susceptible to picking up hum, so most modern circuits use only capacitors. As you can see in Figure 10.8, a
capacitor can create either a low- or high-frequency roll-o . Adding active circuitry (usually op-amps) o ers even more exibility without
resorting to inductors.
Figure 10.8:
Both low-pass and high-pass filters can be created with only one resistor and one capacitor.
The low-pass lter at the top of Figure 10.8 reduces high frequencies because it takes time for the capacitor to charge, so rapid voltage
changes don’t pass through as readily. The high-pass lter at the bottom is exactly the opposite: High frequencies easily get through because the
voltage keeps changing so the capacitor never gets a chance to charge fully and stabilize. Note that the same turnover frequency formula applies
to both lter types. Here, the symbol π is pi, which is approximately 3.1416. Further, replacing the capacitors with inductors reverses the lter
types, turning the top filter into a high-pass and the bottom into a low-pass. Then the formula to find the cutoff frequency becomes
The passive version of the Baxandall treble and bass tone control circuit shown in Figure 10.9 incorporates both low-pass and high-pass lters.
In practice, this circuit would be modi ed to add an op-amp active stage, and that was an important part of Baxandall’s contribution. He is
credited with incorporating the symmetrical boost/cut controls within the feedback path of an active gain stage. A passive equalizer that allows
boost must reduce the overall volume level when set to
at. How much loss is incurred depends on the amount of available boost. An active
implementation eliminates that and also avoids other problems such as the frequency response being a ected by the input impedance of the
next circuit in the chain.
Figure 10.9:
A Baxandall tone control is more complex than simple low-pass and high-pass filters, but the same basic concepts apply.
Looking at Figure 10.9, if the Bass knob is set to full boost, capacitor C1 is shorted out, creating a low-pass lter similar to the top of Figure
10.8. In this case, resistor R1 combines with capacitor C2 to reduce frequencies above the midrange. Resistor R2 limits the amount of bass boost
relative to midrange and higher frequencies. The Treble control at the right behaves exactly the opposite: When boosted, the capacitor C3
becomes the “C” at the bottom of Figure 10.8, and the Treble potentiometer serves as the “R” in that same figure.
Other Frequency Processors
Most of this chapter has been about equalizers, but there are several other types of frequency processors used by musicians and in recording
studios. These go by names suggestive of their functions, but at their heart they’re all variable lters, just like equalizers. One e ect that’s very
popular with electric guitar players is the wah-wah pedal. This is simply a very high-Q band-pass lter whose frequency is swept up and down
with a variable resistor controlled by the foot pedal. When you step on the rear of the pedal, the center frequency shifts lower, and stepping near
the front sweeps the frequency higher, as shown in Figure 10.10.
Figure 10.10:
The wah-wah effect is created by sweeping a high-Q band-pass filter up and down.
Wah-wah e ects are also available as plug-ins. Most wah plug-ins o er three basic modes of operation: cyclical, with the frequency sweeping
up and down automatically at a rate you choose; tempo-matched, so the repetitive sweeping follows the tempo of the song; and triggered, where
each upward sweep is initiated by a transient in the source track. The triggered mode is probably the most common and useful, because it
sounds more like a human player is controlling the sweep frequency manually. When using auto-trigger, most wah plug-ins also let you control
the up and down sweep times.
Chapter 9 clari ed the common confusion between tremolo and vibrato. Tremolo is a cyclical volume change, whereas vibrato repeatedly
raises and lowers the pitch. To create vibrato in a plug-in, the audio samples are loaded into a memory bu er as usual, but the clock that
controls the output data stream sweeps continuously between faster and slower rates.
Another useful frequency processor is the formant filter, which emulates the human vocal tract to add a vocal quality to synthesizers and other
instruments. When you mouth vowels such as “eee eye oh,” the resonances of di erent areas inside your mouth create simultaneous high-Q
acoustic band-pass lters. This is one of the ways we recognize people by their voices. The fundamental pitch of speech depends on the tension
of the vocal chords, but just as important is the complex ltering that occurs acoustically inside the mouth. Where a wah-wah e ect comprises a
single narrow band-pass
Figure 10.11:
lter, a formant
lter applies three or more high-Q
lters that can be tuned independently. This is shown in Figure
A formant filter comprises three or more high-Q band-pass filters that emulate the vocal cavities inside a human mouth.
The relationship between center frequencies and how the amplitude of each frequency band changes over time is what makes each person’s
voice unique. The audio example le “formant_ lter.wav” plays a few seconds of a single droning synthesizer note, with a formant lter set to
jump randomly to di erent frequencies. You’ll notice that it sounds sort of like a babbling robot because of the human voice quality. Carefully
controlling the filter frequencies enhances text-to-speech synthesizers, such as used by Stephen Hawking.
Vocal tuning, or pitch correction, is common in pop music, and sophisticated formant ltering has improved the e ect considerably compared
to early plug-ins. Small changes in the fundamental pitch of a voice can pass unnoticed, but a change large enough to create a harmony part
bene ts from applying formant ltering similar to that of the original voice. This reduces the “chipmunk e ect” substantially, giving a less
processed sound.
The vocoder is another e ect that can be used to impart a human voice quality onto music. This can be implemented in either hardware or
software using a bank of ve or more band-pass lters. Having more lters o ers higher resolution and creates a more realistic vocal e ect. The
basic concept is to use a harmonically rich music source, known as the carrier, which provides the musical notes that are eventually heard. The
music carrier passes through a bank of band-pass lters connected in parallel, all tuned to di erent frequencies. Each lter’s audio output is then
routed through a VCA that’s controlled by the frequency content of the voice, called the modulator. A block diagram of an eight-band vocoder is
shown in Figure 10.12.
Figure 10.12:
Vocoders use two banks of band-pass
lters: One bank evaluates frequencies present in the modulating source, usually a spoken voice, and the other lets the same
frequencies from the music “carrier” pass through at varying amplitudes.
frequencies from the music “carrier” pass through at varying amplitudes.
As you can see, two parallel sets of lters are needed to implement a vocoder: One splits the voice into multiple frequency bands, and another
controls which equivalent bands of the music pass through and how loudly. So if at a given moment the modulating speech has some amount of
energy at 100 Hz, 350 Hz, and 1.2 KHz, any content in the music at those frequencies passes through to the output at the same relative volume
levels. You never actually hear the modulating voice coming from a vocoder. Rather, what you hear are corresponding frequency ranges in the
music as they are varied in level over time in step with the voice.
The last type of voice-related frequency processor we’ll examine is commonly referred to as a talk box. This is an electromechanical device
that actually routes the audio through your own mouth acoustically, rather than process it electronically as do formant lters and vocoders. The
talk box was popularized in the 1980s by Peter Frampton, though the
rst hit recordings using this e ect were done in the mid-1960s by
Nashville steel guitarist Pete Drake. Drake got the idea from a trick used by steel guitarist Alvino Rey in the late 1930s, who used a microphone
that picked up his wife’s singing o stage to modulate the sound of the guitar. But that was an amplitude modulation e ect, not frequency
modification. In the 1960s, radio stations often used a talk box in their own station ID jingles, and lately it’s becoming popular again.
To use a talk box, you send the audio from a loudspeaker, often driven by a guitar ampli er, through a length of plastic tubing, and insert the
tube a few inches inside your mouth. As you play an electric guitar or other musical source, the sound that comes back out of your mouth is
picked up by a microphone. Changes in your vocal cavity as you silently mouth the words
lter the music. The only time you need to create
sounds with your vocal cords is for consonants such as “S” and “T” and “B.” With practice you can learn to speak those sounds at the right
volume to blend with the music that’s filtered through your mouth.
Today you can buy talk boxes ready made, but back in the 1960s, I had to build my own from a PA horn driver, a length of ½-inch-diameter
plastic medical hose, and a metal spray can cap with a drilled hole to couple the hose to the driver. I used a Fender Bandmaster guitar ampli er
to power the speaker driver.
Finally, a multiband compressor can also be used as a frequency splitter—or crossover—but without actually compressing, to allow processing
di erent frequency ranges independently. Audio journalist Craig Anderton has described this method to create multiband distortion, but that’s
only one possibility. In this case, distorting each band independently minimizes IM distortion between high and low frequencies, while still
creating a satisfying amount of harmonic distortion. You could do the same with a series of high-pass and low-pass lters, with each set to pass
only part of the range. For example, if you set a high-pass EQ for a 100 Hz cuto and a low-pass to 400 Hz, the range between 100 Hz and
400 Hz will be processed. You could set up additional filter pairs to cover other frequency ranges.
The basic idea is to send a track (or complete mix) to several Aux buses all at once and put one instance of a multiband compressor or EQ
lter pair on each bus. If you want to process the audio in four bands, you’d set up four buses, use postfader to send the track to all four buses at
once, then send the track’s main output to a bus that’s muted so it doesn’t dilute the separate bus outputs. Each bus has a lter followed by a
distortion plug-in of some type, such as an amp-sim, as shown in Figure 10.13. As you increase the amount of distortion for each individual
amp-sim, only that frequency band is a ected. The main volume control for the track also controls all four Aux sends at once; raising the volume
adjusts the amount of overdrive distortion for all of the bands. You could even distort only the low- and high-frequency bands, leaving the
midrange una ected, or almost any other such combination. This type of frequency splitting and subsequent processing opens up many
interesting possibilities!
Figure 10.13:
with an EQ
When audio is split into multiple-frequency bands, each band can be processed independently. In this example a track is sent to four di erent Aux buses at once, each
lter to pass only one band, in turn followed by an amp-sim or other device that adds distortion. The four buses then go on to another bus that serves as the main volume
control for the track.
This chapter covers common frequency processors, mostly equalizers, but also several methods for imparting a human voice quality onto musical
instruments. The simplest type of equalizer is the basic treble and bass tone control, and most modern implementations use a circuit developed
in the 1950s by P. J. Baxandall. Graphic equalizers expand on that by o ering more bands, giving even more control over the tonal balance. The
two most common graphic equalizer types o er either 10 bands or about 30 bands. While graphic equalizers are useful for balancing PA
systems, they’re not as popular for studio use because their center frequencies and bandwidth can’t be varied.
Console equalizers generally have only three or four frequency bands, but each band can be adjusted over a wide range of frequencies.
Equalizers in high-end consoles often use rotary switches to select the frequencies and boost/cut amounts. Equalizers that use switches cost more
to manufacture, but their settings can be recalled exactly. The downside is that using switches limits the available frequencies to whatever the
manufacturer chose.
The most exible equalizer type is the parametric, which lets users pick any arbitrary frequency and boost/cut amount. A parametric EQ also
lets you adjust the bandwidth, or Q, to a ect a narrow or wide range of frequencies and everything in between. By being able to hone in on a
very narrow frequency range, you can exaggerate, or minimize, resonances that may not align with the fixed frequencies of other equalizer types.
This chapter also explains a number of EQ techniques, such as improving overall clarity by thinning tracks that shouldn’t contribute low
frequencies to the mix. Other important EQ techniques include sweeping a parametric equalizer with boost to more easily nd frequencies that
sound bad and should be cut. Parametric equalizers are especially useful to “carve out” competing frequencies to avoid clashes between
instruments. Complementary EQ takes the concept a step further, letting you feature one track to be heard clearly above the others, without
having to make the featured track substantially louder. A few simple schematics showed how equalizers work internally, using resistors and
capacitors to alter the frequency response.
We also busted a few EQ myths along the way, including the common claim that linear phase equalizers sound better than conventional
minimum phase types. In fact, the opposite is true, because linear phase equalizers add preringing, which is more noticeable and more
objectionable than normal ringing that occurs after the fact. Another common myth is that cutting is better than boosting because it avoids phase
shift. In truth, you’ll either cut or boost based on what actually bene ts the source. You also learned that most equalizers work more or less the
same, no matter what advertisements and magazine reviews might claim. Further, every frequency curve has a nite bandwidth. So claims of
superiority for equalizers that have an “air band” letting you boost ultrasonic frequencies are unfounded; what you’re really hearing are changes
at frequencies that are audible.
Finally we looked at other popular frequency processors including wah-wah pedals, formant lters, vocoders, and talk boxes. All of these use
band-pass lters to change the sound, though a talk box is potentially the most realistic way to emulate the human voice because it uses the
acoustic resonances inside your mouth rather than emulate them with electronic circuits.
Chapter 11
Time Domain Processors
Everyone knows the “HELLO, Hello, hello” echo e ect. In the early days of audio recording, this was created using a tape recorder having
separate heads for recording and playback. A tape recorder with separate record and play heads can play back while recording, and the
playback is delayed slightly from the original sound letting you mix the two together to get a single echo. The delay time depends on the tape
speed, as well as the distance between the record and play heads. A faster speed, or closer spacing, creates a shorter delay. A single echo having
a short delay time was a common e ect on early pop music in the 1950s; you can hear it on recordings such as Great Balls of Fire by Jerry Lee
Lewis from 1957. This is often called slap-back echo because it imitates the sound of an acoustic reflection from a nearby room surface.
Creating multiple repeating echoes requires feedback, where the output from the tape recorder’s play head is mixed back into its input along
with the original source. As long as the amount of signal fed back to the input is softer than the original sound, the echo will eventually decay to
silence, or, more accurately, into a background of distortion and tape hiss. If the signal level fed back into the input is above unity gain, then the
echoes build over time and eventually become so loud the sound becomes horribly distorted. This is called runaway, and it can be a useful e ect
if used sparingly.
Eventually, manufacturers o ered stand-alone tape echo units; the most popular early model was the Echoplex, introduced in 1959. I owned
one of these in the 1960s, and I recall well the hassle of replacing the special lubricated tape that frequently wore out. When I owned a large
professional recording studio in the 1970s, one of our most prized outboard e ects was a Lexicon Prime Time, the rst commercially successful
digital delay unit. Today many stand-alone hardware echo e ect units are available, including inexpensive “stomp box” pedal models, and most
use digital technology to be reliable, affordable, and have high quality.
With old tape-based echo units, the sound would become grungier with each repeat. Each repeat also lost high frequencies, as the additional
generations took their toll on the response. However, losing high-end as the echoes decay isn’t necessarily harmful, because the same thing
happens when sound decays in a room. So this by-product more closely emulates the sound of echoes that occur in nature. Many modern
hardware echo boxes and software plug-ins can do the same, letting you roll o highs and sometimes lows, too, in the feedback loop to sound
more natural and more like older tape units.
Different delay times give a different type of effect. A very short single delay adds presence, by simulating the sound of an acoustic echo from a
nearby surface. Beware that when used to create a stereo effect by panning a short delay to the side opposite the main source, comb filtering will
result if the mix is played in mono. The cancellations will be most severe if the volume of the original and delayed signals are similar. The
“echo_demo.wav” le plays the same short spoken fragment four times: First is the plain voice, then again with 30 milliseconds of mono delay
applied to add a presence e ect, then with 15 milliseconds of delay panned opposite the original to add width, and nally with the original and
a very short (1.7 ms) delay summed to mono. The last example clearly shows the hollow sound of comb filtering.
You can add width and depth to a track using short single-echoes mixed in at a low level, maybe 10 to 15 dB below the main signal. I’ve had
good results panning instruments slightly o to one side of a stereo mix, then panning a short echo to the far left or right side. When done
carefully, this can make a track sound larger than life, as if it was recorded in a much larger space. Sound travels about one foot for every
millisecond (ms), so to make an echo sound like it’s coming from a wall 25 feet away, you’ll set the delay to about 50 ms—that is, 25 ms for the
sound to reach that far wall, then another 25 ms to get back to your ears. Again, the key is to mix the echo about 10 dB softer than the main
track to not overwhelm and seem too obvious. U nless, of course, that’s the sound you’re aiming for. The “echo_space.wav” le rst plays a
sequence of timpani strikes dry, panned 20 percent to the left. Then you hear them again with 60 ms of delay mixed in to the right side about
10 dB softer. These timpani samples are already fairly reverberant and large-sounding, but the 60 ms delay further increases their perceived size.
By using echoes that are timed to match the tempo of the music, you can play live with yourself in harmony, as shown in the audio demo
“echo_harmony.wav.” You can add this type of echo after the fact, as was done here, but it’s easier to “play to the echo” if you hear it while
recording yourself. Hardware “loop samplers” can capture long sections of music live, then repeat them under control of a foot switch to create
backings to play along with. A cellist friend of mine uses an original Lexicon JamMan loop sampler to create elaborate backings on the y,
playing along with them live. He’ll start with percussive taps on the cello body or strings to establish a basic rhythm, then add a bass line over
that, then play chords or arpeggios. All three continue to repeat while he plays lead melodies live along with the backing. Les Paul did this, too,
years ago, using a tape recorder hidden backstage.
The Sonitus Delay plug-in in Figure 11.1 can be inserted onto a mono or stereo track, or added to a bus if more than one track requires echo
with the same settings. Like many plug-in delay e ects, the Sonitus accepts either a mono or stereo input and o ers separate adjustments for the
left and right output channels. A mono input can therefore be made wider-sounding by using di erent delay times left and right. Or the Mix on
one side can be set to zero for no echo, but set to 100 percent on the other side.
Figure 11.1:
The Sonitus Delay plug-in o ers a number of useful features including high- and low-cut
tempo of a song in a DAW host that supports
lters, cross-channel feedback, and the ability to sync automatically to the
tempo sync.
Notice the Crossfeed setting for each channel. This is similar to the Feedback amount, but it feeds some amount of echo back to the input of
the opposite channel, which also increases stereo width. If both the Crossfeed and Feedback are set to a high value, the e ect goes into runaway,
as subsequent echoes become louder and louder. This plug-in also includes adjustable high- and low-cut lters for the delayed portion of the
audio to prevent the sound of the echoes from becoming muddy or overly bright.
The Link switch joins the controls for both channels, so changing any parameter for one channel changes the other equally. The Low and High
Frequency lters in the lower right a ect only the delayed output to reduce excess low end in the echoes or to create the popular “telephone
echo” effect where the echo has only midrange with no highs or lows.
Finally, the delay time can be dialed in manually as some number of milliseconds or set to a speci c tempo in beats per minute (BPM) or
synchronized to the host DAW so the timing follows the current song tempo. Regardless of which method is used to set the delay time, the Factor
amount lets you scale that from
through eight times the speci ed time. So if the song’s tempo is 120 BPM, you can use that as the base
delay time but make the left channel one-quarter as fast and the right side two times faster, or any other such musically related combination. You
can easily calculate the length in seconds of one-quarter note at a given tempo with this simple formula:
So at a tempo of 120 BPM, one-quarter note=60/120=0.5 seconds.
Echo is a terri c e ect because it can be used sparingly to add a hint of space, or in extreme amounts for a psychedelic feel. It can also create
an e ect known as automatic double tracking, where a single added echo sounds like a second person is singing. An early recording studio I
built in 1970 had separate rooms for recording and mixing in the upstairs of a large barn owned by a friend. We also had a 10 by 7-foot vocal
booth, with microphone and earphone jacks connected to the control room. Besides recording myself and occasional paying rock bands, the
studio was also a fun hangout. I remember fondly many late nights when a bunch of my friends and I would sit in the booth in the dark. A
microphone was on a short stand on the oor of the booth, connected to a repeating echo patch from a three-head tape recorder in the control
room. We’d all put on earphones, making noises and blabbering on for hours, listening to ourselves through the extreme echo. (Hey, it was the
Reverb is basically many echoes all sounding at once. This e ect is denser than regular echo, so you don’t hear the individual repeats. When
recording with microphones in a reverberant space, you can control the amount of reverb picked up by varying the distance between the
microphone and sound source. When placed very close, the microphone picks up mostly the direct sound with little room tone. If the
microphone is farther away, then you get more of the room’s sound. The distance between the source and mic where the level of direct and
reverberant sound is equal is known as the critical distance. Modern recording methods often place microphones close to the source, well
forward of the critical distance, then add reverb as an effect later during mixdown when it can be heard in context to decide how much to add.
The earliest reverb e ect used for audio recording was an actual room containing only a loudspeaker and one or two microphones. According
to audio manufacturer U niversal Audio, Bill Putnam Sr. was the rst person to add arti cial room reverb to a pop music recording in 1947,
using the sound of a bathroom on the tune Peg o’ My Heart by The Harmonicats. To get the best sound quality, a live reverb room should be
highly re ective and also avoid favoring single resonant frequencies. Getting a su ciently long reverb time is often done by painting the walls
and ceiling with shellac, but sometimes ceramic tiles are used for all of the room surfaces.
and ceiling with shellac, but sometimes ceramic tiles are used for all of the room surfaces.
The advantage of a live room is that the reverb sounds natural because it is, after all, a real room. The best live reverb chambers have angled
walls to avoid the “boing” sound of repetitive utter echo and unrelated dimensions that minimize a buildup of energy at some low frequencies
more than others. The downside of a live reverb chamber is the physical space required and the cost of construction because the room must be
well isolated to prevent outside sounds from being picked up by the microphones.
In 1957, the German audio company EMT released their model 140 plate reverb unit. This is a large wooden box containing a thin steel plate
stretched very tightly and suspended on springs inside a metal frame. A single loudspeaker driver is attached near the center of the plate, and
contact microphones at either end return a stereo signal. The sound quality of this early unit was remarkable, and its price tag was equally
remarkable. When I built my large professional recording studio in the 1970s, we paid more than $7,000 for our EMT 140. Adjusted for
in ation, that’s about $20,000 today! Being a mechanical device with speakers and microphones, it, too, was susceptible to picking up outside
sounds unless it was isolated. We put our EMT plate reverb in an unused storage closet near the studio o ces, far away from the recording and
mixing rooms.
One very cool feature of the EMT plate is being able to vary its decay time. A large absorbing pad is placed very close to the steel plate, with a
hand crank to adjust the spacing. As the pad is moved closer to the vibrating plate, the decay time becomes shorter. An optional motor attached
instead of the hand crank lets you adjust the spacing remotely. A later model, the EMT 240, was much smaller and lighter than the original 140,
taking advantage of the higher mass and density of gold foil versus steel. An EMT 240 is only two by two feet, versus four by eight feet for the
earlier 140, and it’s much less sensitive to ambient sound. You won’t be surprised to learn that the EMT 240 was also incredibly expensive. The
cost of audio gear has certainly improved over the years.
Another mechanical reverb type is the spring reverb, which transmits vibration through long metal springs similar to screen door springs to
create the e ect. A miniature loudspeaker voice coil at one end vibrates the springs, and an electromagnetic pickup at the other end converts the
vibrations back to an audio signal. This design was rst used on Hammond electric organs and was quickly adapted to guitar ampli ers. Some
units used springs of di erent lengths to get a more varied sound quality, and some immersed the springs in oil. Similar to having two pickup
transducers on a plate reverb, using two or more springs can also create di erent left and right stereo outputs. Like a live room and plate reverb,
a spring reverb must also be isolated from ambient sound and mechanical vibration. Indeed, every guitar player knows the horrible—and loud—
sound that results from accidentally jarring a guitar amplifier with a built-in spring reverb unit.
The typical small spring reverbs in guitar amps have a poor sound quality, but all spring units don’t sound bad. The rst professional reverb
unit I owned, in the mid 1970s, was made by AKG and cost $2,000 at the time. Rather than long coil springs, it used metal rods that were
twisted by a torsion arrangement. It wasn’t as good as the EMT plate I eventually bought, but it was vastly better than the cheap reverb units
built into guitar amps.
EMT also created the rst digital reverb unit in 1976, which sold for the staggering price of $20,000. Since it was entirely electronic, the EMT
250 was the rst reverb unit that didn’t require acoustic isolation. But it was susceptible to static electricity. I remember my visit to the 1976 AES
show in New York City, where I hoped to hear a demo. When my friends and I arrived at the distributor’s hotel suite, the unit had just died
moments before after someone walked across the carpet and touched it. And the distributor had only that one demo unit. D’oh!
Modern personal computers are powerful enough to process reverb digitally with a reasonably high quality, so as of this writing, dedicated
hardware units are becoming less necessary. U nlike equalizers, compressors, and most other digital processes, reverb has always been the last
frontier for achieving acceptable quality with plug-ins. A digital reverb algorithm demands thousands of calculations per second to create all the
echoes needed, and it must also roll o high frequencies as the echoes decay to better emulate what happens acoustically in real rooms. Because
of this complexity, all digital reverbs are de nitely not the same. In fact, there are two fundamentally di erent approaches used to create digital
The most direct way to create reverb digitally is called algorithmic: A computer algorithm generates the required echoes using math to
calculate the delay, level, and frequency response of each individual echo. Algorithm is just a fancy term for a logical method to solve a
problem. The other method is called convolution: Instead of calculating the echoes one re ection at a time, a convolution reverb superimposes
an impulse that was recorded in a real room onto the audio being processed. The process is simple but very CPU -intensive.
Each digital audio sample of the original sound is multiplied by all of the samples of the impulse’s reverb as they decay over time. Let’s say
the recording of an impulse’s reverb was done at a sample rate of 44.1 KHz and extends for ve seconds. This means there are 44,100 *
5=220,500 total impulse samples. The rst sample of the music is multiplied by each of the impulse samples, one after the other, with the
result numbers placed into a sequential memory bu er. The next music sample is multiplied by the remaining impulse samples, then added to
the sample numbers currently in the buffer. This process repeats for the entire duration of the music.
In theory, the type of impulse you’d record would be a very brief click sound that contains every audible frequency at once, similar to the
impulse in Figures 3.5 and 3.6 from Chapter 3, and Figures 10.6 and 10.7 from Chapter 10 showing equalizer ringing. A recording of such an
impulse contains all the information needed about the properties of the room at the location the microphone was placed. But in practice, real
acoustic impulses are not ideal for seeding convolution reverbs.
The main problem with recording a pure impulse is achieving a high enough signal to noise ratio. Realistic reverb must be clean until it
decays by 60 dB or even more, so an impulse recorded in a church or concert hall requires that the background noise level of the venue be at
least that soft, too. Early acoustic impulses were created by popping a balloon or shooting a starter pistol. A better, more modern method
records a slow sine wave sweep that includes the entire frequency range of interest. The slower the sweep progresses, the higher the signal to
noise ratio will be. Then digital signal processing (DSP) converts the sweep into an equivalent impulse using a process called deconvolution.
noise ratio will be. Then digital signal processing (DSP) converts the sweep into an equivalent impulse using a process called deconvolution.
This same technique is used by most modern room measuring software to convert a recording of a swept sine wave into an impulse that the
software requires to assess the room’s properties.
Thankfully, developers of convolution reverbs have done the dirty work for us; all we have to do is record a sweep, and the software does
everything else. Figure 11.2 shows the Impulse Recovery portion of Sonic Foundry’s Acoustic Mirror reverb. The installation CD for this plug-in
includes several Wave les containing sweeps you’ll use to record and process your own impulse les, without needing a balloon or starter
pistol. Note that convolution is not limited to reverb and can be used for other digital e ects such as applying the sound character of one audio
source onto another.
Figure 11.2:
The Acoustic Mirror Impulse Recovery processor lets you record your own convolution impulses.
Most reverb e ects, whether hardware or software, accept a mono input and output two di erent-sounding left and right signals to add a
realistic sounding three-dimensional space to mono sources. Lexicon and other companies make surround sound digital reverbs that spread the
e ect across ve or even more outputs. Some hardware reverbs have left and right inputs, but often those are mixed together before creating the
stereo reverb e ect. However, some reverb units have two separate processors for left and right inputs. Even when the channels are summed,
dual inputs are needed to maintain stereo when the device is used “in line” with the dry and reverb signals balanced internally. If you really
need separate left and right channel reverbs, with independent settings, this is easy to rig up in a DAW by adding two reverb plug-ins panned
hard left and right on separate stereo buses.
Because reverb is such a demanding process, and some digital models are decidedly inferior to others, I created a Wave le to help audition
reverb units. Even a cheap reverb unit can sound acceptable on sustained sources such as a string section or vocal. But when applied to
percussive sounds such as hand claps, the reverb’s aws are more clearly revealed. The audio le “reverb_test.wav” contains a recording of
sampled claves struck several times in a row. The brief sound is very percussive, so it’s ideal for assessing reverb quality. I also used this le to
generate three additional audio demos so you can hear some typical reverb plug-ins.
The rst demo uses an early reverb plug-in developed by Sonic Foundry (now Sony) to include in their Sound Forge audio editor program
back in the 1990s, when a ordable computers had very limited processing power. I used the Concert Hall preset, and you can clearly hear the
individual echoes in the “reverb_sf-concert-hall.wav” le. To my ears this adds a gritty sound reminiscent of pebbles rolling around inside an
empty soup can. Again, on sustained sources it might sound okay, but the claves recording clearly shows the flaws.
The second example in “reverb_sonitus-large-hall.wav” uses the Sonitus Reverb plug-in, and the sound quality is obviously a great
improvement. U nlike Sonic Foundry’s early digital reverb, the Sonitus also creates stereo reverb from mono sources, and that helps make the
e ect sound more realistic. The third example is also from Sonic Foundry—in this case, their Acoustic Mirror convolution reverb plug-in. You
can hear this reverb in the “reverb_acoustic-mirror-eastman.wav” file, and like the Sonitus it can create stereo from a mono source.
All of the reverbs were set with their dry (unprocessed) output at full level, with the reverb mixed in at −10 dB. The two Sony reverbs
include a quality setting, which is another throwback to the days of less powerful computers. Of course, I used the highest Quality setting, since
my modern computer can easily process the e ect in real time. For the Sonitus reverb demo, I used the Large Hall preset. The Acoustic Mirror
demo uses an impulse recorded at the U niversity of Wisconsin-Madison, taken 30 feet from the stage in the Eastman Organ Recital Hall. This is
just one of many impulses that are included with Acoustic Mirror.
Modern reverb plug-ins o er many ways to control the sound, beyond just adding reverb. The Sonitus Reverb shown in Figure 11.3 has
features similar to those of other high-quality plug-in reverbs, so I’ll use that for my explanations. From top to bottom: The Mute button at the
upper left of the display lets you hear only the reverb as it decays after being excited by a sound. Many DAW programs will continue the sound
from reverb and echo e ects after you press Stop so you can better hear the e ects in isolation. But if your DAW doesn’t do this, the Mute button
stops further processing while letting the reverb output continue. The volume slider to the right of this Mute button controls the overall input
level to the plug-in, affecting both the dry and processed sounds.
level to the plug-in, affecting both the dry and processed sounds.
Figure 11.3:
The Sonitus Reverb provides many controls to tailor the sound.
Next are high- and low-cut lters in series with the reverb’s input. These are used to reduce bass energy that could make the reverb sound
muddy or excessive high end, which would sound harsh or unnatural when processed through reverb. Note that these lters a ect only the sound
going into the reverb processor. They do not change the frequency response of direct sound that might be mixed into the output.
The Predelay setting delays the sound going into the reverb processor by up to 250 milliseconds. This is useful when emulating the sound of a
large space like a concert hall. When you hear music live in an auditorium or other large space, the rst sound you hear is the direct sound from
the stage. Moments later, sounds that bounced o the walls and ceiling arrive at your ears, and over time those re ections continue to bounce
around the room, becoming more and more dense. So in a real room, the onset of reverberation is always delayed some amount of time, as
dictated by the room’s size. Predelay does the same to more closely emulate a real room.
The Room Size has no speci c time or distance values; it merely scales the virtual length of the internal delay paths that create reverberation.
This is usually adjusted to match the reverb’s overall decay time. Specifying a small room size sounds more natural when the decay time (farther
down the screen) is set fairly short, while a larger room setting sounds better and more natural when used with longer decay times.
The Di usion setting controls the density of the individual echoes. If you prefer to avoid the gravelly sound of older reverbs, where each
separate echo can be distinguished, set the Diffusion to maximum. Indeed, the only time I would use less than 100 percent diffusion is for special
The Bass Multiplier, Crossover, Decay Time, and High Damping controls let you establish di erent decay times for bass, midrange, and treble
frequencies. The Decay Time sets the basic RT60 time, de ned as how long it takes the reverb to decay by 60 dB. The Crossover splits the
processed audio into two bands. In conjunction with the Bass Multiplier, this allows relatively shorter (or longer) decay times for frequencies
below the crossover point. U sing a shorter decay time for bass frequencies generally improves overall clarity. High Damping is similar, but it
shortens the decay time at higher frequencies. The sloped portion at the right of the decay time graphic in Figure 11.3 shows that frequencies
above 5 KHz are set to decay more quickly than the midrange.
The mixer portion of the screen at the bottom lets you control the balance between the original dry sound, the simulated early re ections
(E.R.) from room boundaries, and the overall reverb. Each slider also includes a corresponding Mute button to help you audition the balance
between these three elements.
The width control can be set from 0 (mono output) through 100% (normal stereo), to 200% (adds exaggerated width to the reverb). Finally,
when the Tail button is engaged, the reverb added to a source is allowed to extend beyond the original duration. For example, if you have a
Wave le that ends abruptly right after the music stops, any reverb that’s added will also stop at the end of the le. But when Tail is engaged,
the Wave file is actually made longer to accommodate the added final reverb decay.
As explained in Chapter 5, e ects that do something to the audio are usually inserted onto a track, while e ects that add new content should
go on a bus. Reverb and echo e ects add new content—the echoes—so these are usually placed onto a bus. In that case, the reverb volume
should be set near 0 dB, and the output should be 100 percent “wet,” since the dry signal is already present in the main mix. Reverb plug-ins can
also be patched into individual tracks; in that case you’ll use the three level controls to adjust the amount of reverb relative to the dry sound.
In Chapter 7 I mentioned that I often add two buses to my DAW mixes, with each applying a di erent type of reverb. To add the sound of a
performer being right there in the room with you, I use a Stage type reverb preset. For the larger sound expected from “normal” reverb, I’ll use a
Plate or Hall type preset. The speci c settings I use in many of my pop tunes for both reverb types are shown in Table 11.1. These are just
suggested starting points! I encourage you to experiment and develop your own personalized settings. Note that in all cases the Dry output is
muted because these settings are meant for reverb that’s placed on a bus.
Table 11.1: Stage and Normal Reverb Settings.
Basic Reverb Ambience
Low Cut
75 Hz
55 Hz
High Cut
4 KHz
11 KHz
50 ms
35 ms
Room Size
Bass Multiplier
350 Hz
800 Hz
Decay Time
1.9 seconds
0.4 seconds
High Damping
4.0 KHz
9.5 KHz
Early Reflections (E.R.) −10.5 dB
0.0 dB
−3.0 dB
0.0 dB
Finally, I’ll o er a simple tip that can help to improve the sound of a budget digital reverb: If the reverb you use is not dense enough, and you
can hear the individual echoes, try using two reverb plug-ins with the various parameters set a little differently. This gives twice the echo density,
making grainy-sounding reverbs a little smoother and good reverbs even better. If your reverb plug-ins don’t generate a satisfying stereo image,
pan the returns for each reverb fully left and right. When panned hard left and right, the di erence in each reverb’s character and frequency
content over time can add a nice widening effect.
Phasers and Flangers
The rst time I heard the anging e ect was on the 1959 tune The Big Hurt by Toni Fisher. Flanging was featured on many subsequent pop
music recordings, including Itchycoo Park from 1967 by the Small Faces, Bold as Love by Jimi Hendrix in 1967, and Sky Pilot in 1968 by Eric
Burdon and The Animals, among others. Today it’s a very common e ect. Before plug-ins, the anging e ect was created by playing the same
music on two di erent tape recorders at once. You’d press Play to start both playbacks together, then lightly drag your hand against one of the
tape reel anges (side plates) to slow it down. Figure 11.4 shows an even better method that avoids the need for synchronization by using two
tape recorders set up as an inline effect.
Figure 11.4:
The anging e ect is created by sending audio through two recorders at once, then slowing the speed of one recorder slightly by dragging your thumb lightly against the
Chapter 1 showed how phase shift and time delay are used to create phaser and anger e ects, respectively. As explained there, phaser e ects
use phase shift, and angers use time delay, but that’s not what creates their characteristic sound. Rather, what you hear is the resulting comb
filtered frequency response.
Phaser and anger e ects often sweep the comb lter up and down at a steady rate adjustable from very slow to very fast. When the sweep
speed is very slow, this e ect is called chorus because the constantly changing response sounds like two people are singing. Sweeping a comb
lter can also create a Leslie rotating speaker e ect. If the e ect generates a stereo output from a mono source, that can enhance the sound
further. Some plug-ins can even synchronize the filter sweep rate to a song’s tempo.
The Sonitus plug-in I use in Figure 11.5 wraps both e ect types into one unit called Modulator, so you can select a anger, one of several
phaser types, or tremolo. The Modulator plug-in also outputs different signals left and right to create a stereo effect from a mono source.
Figure 11.5:
The Sonitus Modulator plug-in bundles a anger, tremolo, and several types of phaser e ects into a single plug-in. All can be swept either manually or in sync with the
When the phase shift or delay time is swept up and down repeatedly, the Rate or Speed control sets the sweep speed. Most
angers and
phasers also include a Depth or Mix slider to set the strength of the e ect. U nlike most plug-ins where 100 percent gives the strongest e ect, the
Sonitus Modulator has a maximum e ect when the Mix slider is set to 50 percent. This blends the source and shifted signals at equal volumes,
creating the deepest nulls and largest peaks.
Many angers and phasers include a feedback control to make the e ect even stronger. Not unlike feedback on an echo e ect, this sends part
of the output audio back into the input to be processed again. And also like an echo e ect, applying too much feedback risks runaway, where
the sound becomes louder and louder and highly resonant. That might be exactly what you want. This particular plug-in also o ers polarity
Invert switches for the main Mix and Feedback paths to create a different series of peak and null frequencies.
One of the failings of many digital angers is an inability to “pass through zero” time, needed to set the lowest peak and null frequencies
above a few KHz. When two real tape recorders are used to create anging as in Figure 11.4, the timing di erence is zero when both machines
are in sync. But most plug-in angers add a minimum delay time, which limits their sweep range. The Sonitus Modulator’s Tape button adds a
small delay to the incoming audio, so both the original and delayed version have the same minimum delay time, enabling the sound of real tape
flanging passing through zero time.
Back in the 1960s, I came up with a cool way to create anging using a single tape recorder having separate record and playback heads and
Sel-Sync capability. This avoids the complexity of needing two separate tape machines to play at the same time or varying the speed of one
machine relative to the other. If you record a mono mix from one track to another or use a pair of tracks for stereo, the copy is delayed in time
compared to the original. If you then use Sel-Sync to play the copied track from the record head, the copy plays earlier putting the tracks back in
sync. Then all you have to do is push on the tape slightly with your nger or a pencil at a point between the two heads. This adds a slight delay
to the original track playing from the play head. When the two tracks or track pairs are mixed together, the result is anging that “goes through
zero” when the tape is not pushed.
This chapter explains the basics of time-based audio e ects. The most common time-based e ects used in recording studios are echo and reverb,
but anging and phasing also fall into this category because they use time delay or phase shift to achieve the sound of a comb ltered frequency
Echo is useful to add a subtle sense of space to tracks when the delays are short and soft, but it can also be used in extreme amounts. The
longer you set the delay time, the larger the perceived space becomes. Older analog tape echo units are prized for their vintage sound, but I
prefer modern digital models because their sound quality doesn’t degrade quickly when feedback is used to add many repeats. Modern plug-in
echo e ects are very versatile, and most can be set to automatically synchronize their delay time to the tempo of a DAW project. Many plug-ins
also let you cross-feed the echoes between the left and right channels to add a nice sense of width.
Reverb is equally ubiquitous, and it’s probably safe to say that reverb is used in one form or another on more recordings than not. Early reverb
was created either by placing speakers and microphones in a real room or with mechanical devices such as springs and steel plates under
tension. U nfortunately, mechanical reverb units are susceptible to picking up external noise and vibration. Modern digital reverb avoids that,
though achieving a truly realistic e ect is complex and often expensive. Most reverbs, whether mechanical or electronic, can synthesize a stereo
effect from mono sources by creating different reverb sounds for the left and right outputs.
The two basic types of digital reverb are algorithmic and convolution. The rst calculates all the echoes using computer code, and the second
applies the characteristics of an impulse le recorded in a real room onto the audio being processed. Early reverb plug-ins often sounded grainy,
but newer models take advantage of the increased processing power of modern computers. A good reverb unit lets you adjust decay times
separately for low frequencies, and many include other features such as a low-cut lter to further reduce muddy reverberant bass. Reverb is
useful not only for pop music, but classical music also benefits when too little natural reverb was picked up by the microphones.
Finally, anger and phaser e ects are explained, including the various parameters available on most models. In the old days, anging required
varying the speed of a tape recorder, but modern plug-ins wrap this into a single e ect that’s simple to manage. Most anger and phaser e ects
include an automatic sweep mode, and most plug-in versions can be set to sweep at a rate related to the tempo of a DAW project. The strongest
effect is created when the original and delayed versions are mixed equally, though feedback can make the effect even stronger.
Chapter 12
Pitch and Time Manipulation Processors
Chapter 6 described what vari-speed is and how it works, implemented originally by changing the speed of an analog tape recorder’s capstan
motor while it records or plays back. Today this e ect is much easier to achieve using digital signal processing built into audio editor software
and dedicated plug-ins.
There are two basic pitch-shifting methods: One alters the pitch and timing together, and the other changes them independently. Changing
pitch and timing together is the simpler method, and it also degrades the sound less. This is the way vari-speed works, where lowering the pitch
of music makes it play longer, and vice versa. Large changes in tape speed can a ect the high-frequency response, though this doesn’t matter
much when vari-speed is used for extreme e ect such as “chipmunks” or creating eerie sounds at half speed or slower. The other type of pitch
shifting, where the pitch and timing are adjusted independently, often creates clicks and gurgling artifacts as a byproduct.
Pitch Shifting Basics
Implementing vari-speed type pitch shifting digitally is relatively straightforward: The audio is simply played back at a faster or slower sample
rate than when it was recorded. Computer sound cards don’t actually vary their playback sample rate, so internally, digital vari-speed is created
via sample rate conversion, as explained in Chapter 8. You can e ectively speed up digital audio by dropping samples at regular intervals or
slow it down by repeating samples. To shift a track up by a musical fth, you’d discard every third sample, keeping two. And to drop the
frequency by an octave, you’d repeat every sample once. When the audio is later ltered, which is needed after sample rate conversion, the
dropped or repeated samples become part of the smoothed waveform, avoiding artifacts.
The other type of pitch shifting adjusts the pitch while preserving the duration. It can also be used the other way to change the length of audio
while preserving the pitch. Changing only the pitch is needed when “tuning” vocals or other tracks that are at or sharp. If the length is also
changed, the remainder of the track will then be slightly ahead or behind in time. Changing pitch and timing independently is much more
di cult to implement than vari-speed. This too uses resampling to alter the tuning, but it must also repeat or delete longer portions of the
audio. For example, if you raise the pitch of a track by a musical fth, as above, the length of the audio is reduced by one-third. So small chunks
of audio must be repeated to recreate the original length. Likewise, lowering the pitch requires that small portions of the audio be repeatedly
The rst commercial digital pitch shifter that didn’t change the timing was the Eventide H910 Harmonizer, introduced in 1974. Back then
people used it mainly as an e ect, rather than to correct out of tune singers as is common today. As mentioned in Chapter 7, a Harmonizer can,
of course, be used to create harmonies. There’s no easy way to automate parameters on a hardware device as with modern plug-ins, so to correct
individual notes, you have to adjust the Pitch knob manually while the mix plays. Or you could process a copy onto another track, then mute the
original track in those places. This is obviously tedious compared to the luxury we enjoy today with DAW software.
One common use for a Harmonizer is to simulate the e ect of double-tracked vocals. In the 1960s, it was common for singers to sing the same
part twice onto di erent tracks to enhance the vocal, making it more rich-sounding. This isn’t a great vocal e ect for intimate ballads, but it’s
common with faster pop tunes. Double-tracking is also done with guitars and other instruments, not just voices. One di culty with doubletracking is the singer or musician has to perform exactly the same way twice. Small timing errors that might pass unnoticed on a single track
become much more obvious with two tracks in unison.
A Harmonizer o ers a much easier way to create the e ect of double-tracking from a single performance. The original and shifted versions are
mixed together, with both typically panned to the center. This is called automatic double-tracking, or ADT for short. You can also use a
Harmonizer to add a widening effect by panning the original and shifted versions to opposite sides. Either way, the Harmonizer is typically set to
raise (or lower) the pitch by 1 or 2 percent. A larger shift amount adds more of the e ect, but eventually it sounds out of tune. One big
advantage of using a Harmonizer e ect for ADT rather than simple delay is to avoid the hollow sound of static comb ltering. Since the e ective
delay through a Harmonizer is constantly changing, the sound becomes more full rather than thin and hollow. Rolling o the high end a bit on
the Harmonizer’s output keeps the blended sound even smoother and less affected.
Figure 12.1 shows the basic concept of pitch shifting without changing duration. In the example at left that raises the pitch, the wave is rst
converted to a higher sample rate, which also shortens its duration. Then the last two cycles are repeated to restore the original length. Of
course, a di erent amount of pitch shift requires repeating a di erent number of cycles. Lowering the pitch is similar, as shown at right in Figure
12.1. After converting the wave to a lower sample rate, the last two cycles are deleted to restore the original length. Time-stretching without
changing the pitch is essentially the same: The wave is resampled up or down to obtain the desired new length, then the pitch is shifted to
Figure 12.1:
To raise the pitch without also changing the duration, as shown at the left, the wave is up-sampled to shift the frequencies higher, then the last two cycles are repeated
to restore the original length. Lowering the pitch instead down-samples the wave to lower the frequencies, then the last two cycles are deleted.
When short sections of audio—say, a few dozen milliseconds—are either repeated or deleted, small glitches are added to the sound. These
glitches occur repeatedly as small sequential chunks of the audio are processed, which adds a gurgling quality. Applying very small amounts of
pitch change or time stretch usually doesn’t sound too bad. But once the amounts exceed 5 or 10 percent, the degradation is often audible.
Shifting the pitch upward, or making the audio shorter, deletes small portions of the waves. This is usually less noticeable than the opposite,
where portions of the audio are repeated.
DSP pitch shifting algorithms have improved over the years, and some newer programs and plug-ins can shift more than a few percent without
audible artifacts. Many programs also o er di erent shifting algorithms, each optimized for di erent types of source material. For example, the
Pitch Shift plug-in bundled with Sony Sound Forge, shown in Figure 12.2, o ers 19 di erent modes. Six are meant for sustained music, three for
speech, seven more for solo instruments, and three just for drums. By experimenting with these modes, you can probably nd one that doesn’t
degrade the quality too much.
Figure 12.2:
The Sony Pitch Shift plug-in can change the pitch either with or without preserving the duration. When Preserve Duration is enabled, the available pitch shift ranges
from one octave down to one octave up. But when pitch shifting is allowed to change the length, the available range is increased to plus or minus 50 semitones, or more than four octaves
in each direction.
The audio example “cork_pop.wav” is a recording I made of a cork being yanked from a half-empty wine bottle. If you listen carefully, you
can even hear the wine sloshing around after the cork is pulled. I then brought that le into Sound Forge and varied the pitch over a wide range
without preserving the duration and recorded the result as “cork_shifted.wav.” This certainly changes the character of the sound! Indeed, varispeed type pitch shifting is a common way to create interesting e ects from otherwise ordinary sounds. Lowering the pitch can make everyday
sounds seem huge and ominous, an e ect a composer friend of mine uses often. In one of my favorite examples, he processed a creaking door
spring, slowing it way down to sound like something you’d expect to hear in a dungeon or haunted castle.
Some pitch shifters take this concept even further and adjust vocal formants independently of the pitch change. As explained in Chapter 10,
formants are the multiple resonances that form inside our mouths as we create various vowel sounds. So when the pitch of speech or singing is
shifted, the frequencies of those resonance are also shifted. A pitch shifter that can manipulate formants adjusts the resonances independently.
shifted, the frequencies of those resonance are also shifted. A pitch shifter that can manipulate formants adjusts the resonances independently.
You can even use such software to shift only the formants, leaving the basic pitch and timing alone. This might be used to make a male singer
sound like a female, and vice versa, or make someone sound younger or older.
Both of the pitch shift methods described so far change the pitch by multiplying the frequencies to be higher or lower. Multiplying retains the
musical relationship between notes and between the harmonics within a single note. So if a track or mix is shifted up an octave, all of the note
frequencies and their harmonics are doubled, and the music remains in tune with itself. But a third type of pitch shifter instead adds a constant
frequency offset. This is called bode shifting, or sideband modulation, or sometimes heterodyning. This type of frequency shifting is used in radio
receivers, but less often for audio.
If you apply sideband modulation to shift an A-440 note up a whole step to the B at 494 Hz, the fundamental pitch is shifted by 54 Hz as
expected, but all the harmonics are also shifted by the same amount. So the new B note’s second harmonic that would have become 988 Hz is
now only 930 Hz, which is nearer to an A# note. Other notes and their harmonics are also shifted by the same xed o set, which puts them out
of tune. For this reason, sideband modulation is not often used to process music. However, it has other audio uses, such as disguising voices as in
spy movies or creating inharmonic bell-like sound effects from other audio sources.
Auto-Tune and Melodyne
One problem with correcting the pitch of singers or other musicians is they’re not usually consistently out of tune. If a singer was always 30 cents
at through the entire song, xing that would be trivial. But most singers who need correction vary some notes more than others, and, worse,
often the pitch changes over the course of a single held note. It’s very difficult to apply pitch correction to a moving target.
Auto-Tune, from the company Antares, was the rst contemporary hardware device—soon followed by a plug-in version—that could
automatically track and correct pitch as it changes over time. If a singer swoops up into a note, or warbles, or goes at only at the end, AutoTune can make the pitch consistent. It can also add vibrato to the corrected note. You can even enter a musical scale containing valid notes for
the song’s key, and Auto-Tune will coerce pitches to only those notes. This avoids shifting the pitch to a wrong note when a singer is so at or
sharp that his or her pitch is closer to the next higher or lower note than to the intended note. So if a song is in the key of C, you’ll tell AutoTune to allow only notes in a C scale. You can also use a MIDI keyboard to enter the speci c note you want at a given moment, regardless of
what was actually sung or played. Auto-Tune then applies whatever amount of pitch change is needed to obtain that note.
An additional parameter lets you control how quickly the pitch correction responds. For example, if the correction speed is set to slow,
normal intended vibrato will not be “ attened out,” while long-term deviations will be corrected. But when the correction speed is set to the
fastest setting, changes in pitch can happen more quickly than would be possible for a human to sing. The result is a vocal quality reminiscent of
a synthesizer, and this effect was used famously on the song Believe by Cher. This use of Auto-Tune has become so ubiquitous that it’s commonly
referred to as the Cher effect. Coupled with MIDI keyboard input, this can make a singer sound even more like a synthesizer. It can also be used
to impart a musical pitch to nonpitched sounds such as normal speech. You can hear that e ect applied with great success in the very funny
video series Auto-Tune the News on YouTube.
There’s no doubt that the development of pitch correction was a remarkable feat. But one important shortcoming of Auto-Tune is that it’s
monophonic and can detect and correct only one pitch at a time. That’s great for singers and solo instruments, but if one string is out of tune on
a rhythm guitar track, any correction applied to x the wrong note also a ects all the other notes. Celemony Melodyne, another pitch-shifting
program, solves this by breaking down a polyphonic source into individual notes that it displays in a grid. You can then change the one note
that’s out of tune without a ecting other notes that don’t need correction. It can also x the pitch of several sour notes independently so they’re
all in tune, plus related tricks such as changing a major chord to minor. Since you have access to individual notes on the grid, you can even
change the entire chord. In case it’s not obvious, this is an amazing technical achievement for audio software!
Acidized Wave Files
Besides correcting out-of-tune singers, independent pitch shift and time stretch are used with Acidized Wave files in audio looping programs. The
rst program for constructing music by looping prerecorded musical fragments was Sony’s Acid, and it’s still popular. Acid looks much like a
regular DAW program, and in fact it is, but it also lets you combine disparate musical clips to create a complete song without actually
performing or recording anything yourself. Hundreds of third-party Acid libraries are available in every musical style imaginable, and some other
DAW programs can import and manipulate Acidized Wave les. You can piece together all of these elements to create a complete song much
faster than actually performing all the parts. If the prerecorded les you’ve chosen aren’t in the right key or tempo for your composition, Acid
will pitch-shift and time-stretch them automatically.
To create an Acid project, you rst establish a tempo for the tune. As you import the various prerecorded loops onto di erent tracks, the
software adjusts their timing automatically to match the tempo you chose. You can also change the pitch of each loop, so a bass part in the key
of A can be played back in the key of C for four bars, then in the key of F for the next two bars, and so forth. As with all pitch/time-shifting
processes, if the current tempo or key is very di erent from the original recordings, artifacts may result, but for the most part this process works
very well. It’s a terri c way for singers to quickly create backing tracks for their original songs without having to hire musicians or a recording
studio. It’s also a great way to create songs to practice or jam with, and for karaoke.
Acidized les store extra data in the header section of the Wave le. This is a nonaudio portion of the le meant for copyright and other text
information about the le. In this case, acidizing a le stores data that tell programs like Acid where the musical transients are located to help
information about the
le. In this case, acidizing a le stores data that tell programs like Acid where the musical transients are located to help
them change the pitch and time with less degradation. Another aspect of Acid-speci c data is whether the loop’s musical key should change
along with chord changes in the song. If you add an organ backing to an Acid project, the organ parts need to change tempo and key along with
the song. But a drum loop should ignore chord changes and only follow the tempo. Otherwise the pitch of the drums and cymbals will shift, too.
This chapter explains the basics of pitch shifting and time stretching and shows how they’re implemented. Vari-speed type pitch shifting is a
simple process, using resampling to change the pitch and duration together. Changing the pitch and duration independently is much more
complex because individual wave cycles must be repeated or deleted. When the amount of change is large, glitches in the audio may result. But
modern implementations are very good, given the considerable manipulation applied.
Pitch shifting is useful for
xing a singer’s poor intonation, and it’s an equally valuable tool for sound designers. When used with extreme
settings, it can make a singer sound like a synthesizer. Modern implementations can adjust vocal formants independently from the pitch and
duration, and Melodyne goes even further by letting you tune or even change individual notes within an entire chord.
Finally, Acidized Wave
les are used to create complete songs and backings, using loop libraries that contain short musical phrases that are
strung together. The beauty of Sony Acid, and programs like it, is that the loops can be played back in a key and tempo other than when they
were originally recorded.
Chapter 13
Other Audio Processors
Tape-Sims and Amp-Sims
Tape simulators and guitar amp simulators usually take the form of digital plug-ins that can be added to tracks or a bus in DAW software. There
are also hardware guitar amp simulators that use either digital processing or analog circuits to sound like natural ampli er distortion when
overdriven. Even hardware tape simulators are available, though the vast majority are plug-ins. The main di erence between a tape-sim and an
amp-sim is the amount and character (spectrum) of the distortion that’s added and the labeling on the controls. Guitar players often require huge
amounts of distortion, where distortion expected from an analog tape recorder is usually subtle.
My rst experience with a tape simulator was the original Magneto plug-in from Steinberg. It worked pretty well—certainly much better than
recording my DAW mixes onto cassettes and back again, as I had done a few times previously. The Ferox tape-sim in Figure 13.1 is typical of
modern plug-ins. Some models let you choose the virtual tape speed, pre-emphasis type, and other settings can be used to vary the quality of the
added distortion using analog tape nomenclature. With Ferox, you instead enter the parameters directly.
Figure 13.1:
The Ferox tape simulator imitates the various types of audio degradation that analog tape recorders are known and prized for.
Image courtesy of ToneBoosters.
The audio example “tape-sim.wav” plays ve bars of music written by my friend Ed Dzubak, rst as he mixed it, then with the Ferox tape-sim
applied. I added only a small amount of the e ect, but you can easily hear the added crunch on the soft tambourine tap near the end of bar 2.
To make this even clearer, at the end of the file, I repeated the tambourine part four times without, then again with, Ferox enabled.
Even though I own a nice Fender SideKick guitar amp, I still sometimes use an amp-sim. When I recorded the rhythm guitar parts for my TeleVision music video, I didn’t know how crunchy I’d want the sound to be in the nal mix. It’s di cult to have too much distortion on a lead
electric guitar, but with a rhythm guitar, clarity can su er if you add too much. So I recorded through my Fender amp with only a hint of
distortion, then added more with an amp-sim during mixdown when I could hear everything in context. The Alien Connections ReValver plug-in
bundled with SONAR Producer shown in Figure 13.2 is typical, and it includes several di erent modules you can patch in any order to alter in
sound in various ways.
Figure 13.2:
The Alien Connections ReValver guitar amp simulator offers overdrive, EQ, reverb, auto-wah, and several amplifier and speaker types.
The rst hardware amp-sim I’m aware of was the original SansAmp from Tech 21, introduced in 1989, with newer models still in production.
This is basically a fuzz tone e ect for guitar players, but it can add subtle amounts of distortion as well as extreme fuzz. Line 6 is another
popular maker of hardware amp-sims, and their POD line of digital e ects includes simulations of many di erent guitar ampli er types and
In the audio
le “amp-sim.wav,” you can hear a short recording of two clean guitar chords plain, then with the ReValver amp-sim plug-in
engaged. This example shows just one of the many ampli er character types that can be added. For lead guitars—or anything else—you can add
some very extreme fuzz effects.
Other Distortion Effects
Another type of distortion e ect is bit-depth reduction, sometimes called bit crushing. These are always in the form of plug-ins that work in a
DAW host or other audio editor program. With this type of distortion, you specify the number of output data bits. Bit-reduction e ects are used
mainly for intentionally low- productions, and in this application the bit-reduced audio is not dithered, adding truncation distortion as well as
reducing resolution.
I used the freeware Decimate plug-in to let viewers of my AES Audio Myths YouTube video hear how the audio quality of music degrades as
the bit-depth is reduced to below 16 bits. The video demo “other_e ects” contains part of a cello track I recorded for a friend’s pop tune to
understand the relationship between what you hear and the number of bits used at that moment as shown on the screen. It starts at 16 bits, then
transitions smoothly down to only 3 bits by the end. Cellos are rich in harmonics, so on this track the e ect becomes most noticeable when the
bit-depth is reduced to below 8 bits.
The Aphex Aural Exciter was already described in Chapter 3, and I explained there how to emulate the e ect using EQ and distortion plug-ins.
It’s mentioned here again only because it adds a subtle trebly distortion to give the impression of added clarity, so it, too, falls under the category
of distortion effects.
Software Noise Reduction
Chapter 9 showed how to use a noise gate to reduce hiss and other constant background noises, but software noise reduction is more
sophisticated and much more e ective. With this type of processing, you highlight a section of the Wave le containing only the noise, and then
the software “learns” that sound to know what to remove. This process is most applicable to noise that’s continuous throughout the recording
such as tape or preamp hiss, hum and buzz, and air conditioning rumble. Software noise reduction uses a series of many parallel noise gates,
each operating over a very narrow range of frequencies. The more individual gates that are used, the better this process works. Note that this is
unrelated to companding noise reduction such as Dolby and dbx that compress and expand the volume levels recorded to analog tape. Software
noise reduction works after the fact to remove noise that’s already present in a Wave file.
As mentioned in Chapter 9, a problem with conventional gates is they process all frequencies at once. If a bit of bass leaks into the
microphone pointing at a tambourine, the gate will open even if the tambourine is not playing at that moment. The same happens if high
frequencies leak into a microphone meant to pick up an instrument containing mostly low frequencies. By splitting the audio into di erent
bands, each gate opens only when frequencies in that band are present, leaving all the other bands muted. So if the noise is mostly trebly hiss,
and the audio contains mostly low frequencies, the higher-frequency gates never open. When set for a high enough resolution, noise reduction
software can e ectively remove 60 Hz hum and its harmonics, but without touching the nearby musical notes at A# and B. Since hundreds or
even thousands of bands are typically employed, the threshold for each band is set automatically by the software to be just above the noise
present in that band. For this reason, software noise reduction is sometimes called adaptive noise reduction, because the software adapts itself to
the noise.
Figure 13.3 shows the main screen of the original Sonic Foundry Noise Reduction plug-in, which I still use because it works very well. Other,
newer software works in a similar fashion. Although this software is contained in a plug-in, the process is fairly CPU -intensive, so it makes sense
to use it with a destructive audio editor that applies the noise reduction permanently to a Wave le. The rst step is to highlight a section of the
le that contains only the noise to be removed. Then you call up the plug-in, check the box labeled Capture noiseprint, and click Preview. It’s
best to highlight at least a few seconds of the noise if possible. With a longer noise sample, the software can better set the thresholds for each
band. Be sure that only pure noise is highlighted and the section doesn’t contain the decaying sound of music. If any musical notes are present in
the sampled noise, their frequencies will be reduced along with the noise.
Figure 13.3:
The Sonic Foundry Noise Reduction plug-in comprises a large number of noise gates, each controlling a di erent narrow range of frequencies. The software analyzes the
noise, then sets all of the gate parameters automatically.
When you click Preview, you’ll hear the background noise sample play once without noise reduction, and then it repeats continuously with
the noise reduction applied while you adjust the various settings. This lets you hear right away how much the noise is being reduced. After you
stop playback, select the entire le using the Selection button at the bottom right of the screen. This tells the plug-in to process the entire le,
not just the noise-only portion you highlighted. When you nally click OK, the noise reduction is applied. Long les take a while to process,
especially if the FFT Size is set to a large value. Making the FFT size larger divides the audio into more individual bands, which reduces the
noise more e ectively. U sing fewer bands processes the audio faster, but using more bands does a better job. I don’t mind waiting, since what
really matters is how well it works.
As with pitch and time manipulation plug-ins, several di erent modes are available so you can nd one that works best with your particular
program material and noise character. Note that one of the settings is how much to reduce the noise. This is equivalent to the Attenuation,
Depth, or Gain Reduction setting on a noise gate. Applying a large amount of noise reduction often leaves audible artifacts. So when the noise to
be removed is extreme, two passes that each reduce the noise by 10 dB usually gives better results than a single pass with 20 dB of reduction.
Applying a lot of noise reduction sounds much like low bit-rate lossy MP3 type compression, because the technologies are very similar. Both
remove content in specific frequency bands when it’s below a certain threshold.
You can also tell the plug-in to audition only what it will remove, to assess how damaging the process will be. This is the check box labeled
Keep residual output in Figure 13.3. You’ll do that when using Preview to set the other parameters to ensure that you’re not removing too much
of the music along with the noise. If you hear bits of the music playing, you may be reducing the noise too aggressively.
Another type of noise reduction software is called click and pop removal, and this is designed to remove scratch sounds on vinyl records. If
you think about it, it would seem very di cult for software to identify clicks and pops without being falsely triggered by transient sounds that
occur naturally in music. The key is the software looks for a fast rise time and a subsequent fast decay. Real music never decays immediately, but
most clicks and pops end as quickly as they begin. Like Sonic Foundry’s Noise Reduction plug-in, their Click and Crackle Remover plug-in has a
check box to audition only what is removed to verify that too much of the music isn’t removed along with the clicks.
The last type of noise reduction I’ll describe is a way to reduce distortion after the fact, rather that hiss and other noises. Chapter 6 described
the tape linearizer circuits built into some analog tape recorders. In the 1970s I designed such a circuit to reduce distortion on my Otari twotrack professional recorder. The original article from Recording Engineer/Producer Magazine with complete schematics is on my website, and the article also shows plans for a simple distortion analyzer. A tape linearizer applies equal but opposite distortion
during recording, overdriving the tape slightly to counter the compression and soft-clipping the tape will add. U sing similar logic, distortion can
sometimes be reduced after the fact, using an equal but opposite nonlinearity. The Clipped Peak Restoration plug-in, also part of the Sonic
Foundry noise reduction suite, approximates the shape the original waveform might have been if the wave peaks had not been clipped.
Other Processors
One of the most useful freeware plug-ins I use is GPan from GSonic, shown in Figure 13.4. This simple plug-in lets you control the volume and
panning separately for each channel of a stereo track. I use this all the time on tracks and buses to narrow a too wide stereo recording or to keep
only one channel of a stereo track and pan it to the center or to mix both channels to mono at the same or different volume levels.
Figure 13.4:
The GSonic GPan plug-in lets you control the volume and panning for each channel of a stereo track independently.
Chapter 1 explained one way to create a stereo synthesizer, using phase shift to create opposite peak and null frequencies in the left and right
channels. But some designs are more sophisticated, and less fake-sounding, than simple comb
ltering that di ers left and right. By applying
di erent amounts of phase shift to the left and right channels, sounds can seem to come from locations wider than the speakers, or even from
other places in the room, including overhead. This popular e ect can also make stereo material sound wider than usual. Several companies o er
widening plug-ins that accept either mono or stereo sources.
Vocal Removal
Another “e ect” type is the vocal remover, sometimes called a center channel eliminator. I put “e ect” in quotes because this is more of a
process than an e ect, though plug-ins are available to do this automatically. A lead vocal is usually panned to the center of a stereo mix so its
level can be reduced by subtracting one channel from the other. This is similar to the Mid/Side processing described in Chapters 6 and 10. The
basic procedure is to reverse the polarity of one channel, then combine that with the other channel at an equal volume. Any content common to
both channels will be canceled, leaving only those parts of the stereo mix that are di erent on the left and right sides. U nfortunately, most vocal
removal methods reduce a stereo mix to mono because the two channels are combined to one. However, you could synthesize a stereo e ect as
explained previously. Chapter 17 explains a different method using Dolby ProLogic available in most consumer receivers.
It’s impossible to completely remove a vocal or reduce its level without a ecting other elements in the mix. Even though most vocals are
placed equally in the left and right channels, stereo reverb is often added to vocal tracks. So even if you could completely remove the vocal,
some or all of the reverb is likely to remain. If you plan to record yourself or someone else singing over the resultant track, the new vocal can
have its own reverb added, and you may be able to mix the new voice and reverb loud enough to mask the ghost reverb from the original track.
Another problem is that vocals are not the only thing panned to the center of a mix. U sually, the bass and kick drum are also in the middle, so
those are also canceled. You can avoid this by using EQ to roll o the bass on one channel before combining it with the other. If one channel has
less low-frequency content than the other, bass instruments will not completely cancel, though their tonality will be affected.
I’ve used Sound Forge to remove vocals in Wave les destructively, but processing the left and right channels on separate tracks of a DAW is
usually faster. This way you can more easily adjust the channel levels while the song plays to achieve the most complete cancellation. If your
DAW doesn’t o er a direct way to split one stereo track to two separate mono tracks, a plug-in such as GPan described earlier is needed. You’ll
put the same music on two tracks, then send the left channel (only) of one track panned to the center and do the same for the right channel on
the other track. Then reverse the polarity of one track while adjusting its volume as the song plays, listening for the most complete cancellation.
Finally, insert a low-cut EQ filter onto either track, and adjust the cutoff frequency to bring back the bass and kick drum.
Ring Modulators
The last audio e ect I’ll describe is the ring modulator, which applies amplitude modulation (AM) to audio. This is a popular feature in many
synthesizers for creating outer space–type sounds and other interesting e ects, but it can be implemented as a plug-in and applied to any audio
source. Amplitude modulation is a cyclical variation in amplitude, or volume. The simplest type of AM is tremolo, as described in Chapter 9.
Tremolo rates typically vary from less than 1 Hz up to 10 Hz or maybe 20 Hz. But when amplitude modulation becomes fast enough, it
transitions from sounding like tremolo into audible sum and di erence frequencies, also called side bands. In fact, there’s no di erence between
tremolo and ring modulation other than the speed of the modulation. By the way, the term “ring” derives from the circle arrangement of diodes
used to implement this in analog circuitry, similar to the full-wave diode bridge in Figure 21.13 in Chapter 21.
When the volume of one frequency is modulated by another, sum and di erence frequencies result. But unlike IM distortion that adds sum and
di erence components, the output of a ring modulator contains only the sum and di erence frequencies. Even a basic tremolo does this, though
it’s di cult to tell by listening. If you pass a 500 Hz sine wave through a tremolo running at 3 Hz, it sounds like the 500 Hz tone varies in
volume at that rate. In truth, the output contains 497 Hz and 503 Hz. Musicians know this e ect as beat frequencies, and they use the repeated
pulsing of two strings playing together to help tune their instruments. When the pulsing slows to a stop, both strings are then at the same pitch,
as shown in the video “beat_tones.”
To prove that tremolo generates sum and di erence frequencies, and vice versa, I created the audio demo “beat_frequencies.wav” from two
similar frequencies. I started by generating a 200 Hz sine wave in Sound Forge lasting two seconds. Then I mixed in a second sine wave at
203 Hz at the same volume. The result sounds like a 201.5 Hz tone whose volume goes up and down at a 3 Hz rate via tremolo. And it is the
same: The two processes are by de nition the same, whether you create the e ect by mixing two sine waves together or modulate one sine
wave’s volume with another. So even though this demo sounds like tremolo, an FFT of the wave le shows that it contains the original two
Most ring modulator plug-ins use a sine wave for the modulating frequency, though some o er other waveforms such as triangle, sawtooth,
square, or even noise. In that case, the sum and di erence frequencies generated are more complex because of the additional modulating
square, or even noise. In that case, the sum and di erence frequencies generated are more complex because of the additional modulating
frequencies, which adds more overtones giving a richer sound. Note that frequency modulation (FM) also changes audibly with the modulation
speed, where the sound transitions from vibrato at slow rates to many added frequencies as the speed increases. Even with two pure sine waves,
FM creates a much more complex series of overtones than AM. This is the basis for FM synthesis and is described in more detail in Chapter 14.
To show the effect of ring modulation at a fast rate, which is how it’s normally used, the second section of the “other_effects” video applies AM
using the freeware Bode Frequency Shifter plug-in from Christian-W Budde. I programmed a track envelope in SONAR to sweep the modulating
frequency continuously upward from 0.01 Hz to 3.5 KHz, rst on a clean electric guitar track, then again on a triangle recording that repeats four
times. You can hear the effect begin as stereo tremolo, varying the volume differently left and right. Notice that the fourth time the triangle plays,
the pitch seems to go down even though the ring modulation continues toward a higher frequency. This is the same as digital aliasing, which, as
explained in earlier chapters, also creates sum and di erence frequencies. In this case, as the modulating speed increases, the di erence between
the modulating frequency and frequencies present in the triangle becomes smaller. So the result is audible di erence frequencies that become
lower as the modulation rate increases.
This chapter explains miscellaneous audio e ects, including tape-sims, amp-sims, and bit-depth reduction. Although software noise reduction is
not an audio e ect, it’s a valuable tool for audio engineers. U nlike a conventional noise gate, this process applies hundreds or even thousands of
gates simultaneously, with each controlling a di erent very narrow range of frequencies. Digital audio software can also remove clicks and pops
from LP records and even restore peaks that had been clipped to reduce distortion after the fact. Finally, this chapter demonstrates that the same
amplitude modulation used to create a tremolo e ect can also generate interesting sounding sum and di erence frequency aliasing e ects when
the modulating frequency is very fast.
Chapter 14
A synthesizer is a device that creates sounds electronically rather than acoustically, using mechanical parts as with a piano, clarinet, or electric
guitar. The sound is literally synthesized, and electronic sound shaping is often available as well. The rst synthesizer was designed in 1874 by
electrical engineer Elisha Gray. Gray is also credited with inventing the telephone, though Alexander Graham Bell eventually won the patent and
resulting fame. Gray’s synthesizer was constructed from a series of metal reeds that were excited by electromagnets and controlled by a twooctave keyboard, but the first practical synthesizer I’m aware of was the Theremin, invented by Léon Theremin and patented in 1928.
Several modern versions of the Theremin are available from Moog Music, the company started by another important synthesizer pioneer,
Robert Moog. One of the current Moog models is the Etherwave Plus, shown in Figure 14.1. U nlike most modern synthesizers, the Theremin
plays only one note at a time, and it has no keyboard. Rather, moving your hand closer or farther away from an antenna changes the pitch of the
note, and proximity to a second antenna controls the volume in a similar fashion. By the way, synthesizers that can play only one note at a time
are called monophonic, versus polyphonic synthesizers that can play two or more notes at once. These are sometimes called monosynths or
polysynths for short.
Figure 14.1:
The Etherwave Plus from Moog Music is a modern recreation of the original Theremin from the 1920s.
Photo courtesy of Moog Music, Inc.
Controlling the pitch by waving your hand around lets you slide the pitch smoothly from one note to another, a musical e ect called
portamento. U sing antennas also lets you easily add vibrato that can be varied both in frequency and intensity just by moving your hand.
However, the Theremin is not an easy instrument to play because there are no “anchor points” for the notes. U nlike a guitar or piano with xed
frets or keys, the Theremin is more like a violin or cello with no frets. The original Theremin was built with tube circuits, though modern
versions are of course solid state.
Analog versus Digital Synthesizers
Early commercial keyboard synthesizers were analog designs using transistor circuits for the audio oscillators that create the musical notes and
for the various modulators and lters that process the sounds from the oscillators. These days, most hardware synthesizers are digital, even when
they’re meant to mimic the features and sounds of earlier analog models. Modern digital synthesizers go far beyond simply playing back
recorded snippets of analog synths. Most use digital technology to generate the same types of sounds produced by analog synthesizers. In this
case the synthesizer is really a computer running DSP software, with a traditional synthesizer appearance and user interface. Many o er
additional capabilities that are not possible at all using analog techniques. There are many reasons for preferring digital technology to recreate
the sound of analog synthesizers: The pitch doesn’t drift out of tune as the circuits warm up, there’s less unwanted distortion and noise, and
patches—the arrangements of sound sources and their modifiers—can be stored and recalled exactly.
Another type of analog-style digital synthesizer is programmed entirely in software that runs either as a stand-alone computer program or as a
plug-in. You can also buy sample libraries that contain recordings of analog synthesizers to be played back on a keyboard. In my opinion, using
samples of analog synthesizers misses the point because the original parameters can’t be varied as the notes play. One of the joys of analog
synthesis is being able to change the lter frequency, volume, and other parameters in real time to make the music sound more interesting. With
samples of an analog synthesizer you can add additional processing, but you can’t vary the underlying parameters that de ned the original sound
Additive versus Subtractive Synthesis
There are two basic types of synthesis: additive and subtractive. An additive synthesizer creates complex sounds by adding together individual
components—often sine waves that determine the strength of each individual harmonic (see Figure 14.2). A subtractive synthesizer instead starts
with a complex waveform containing many overtones, followed by a low-pass or other lter type that selectively removes frequencies. Both
methods have merit, though additive synthesis is more complicated because many individual components must be synthesized all at once and
mixed together. With subtractive synthesis, a single oscillator can create an audio wave containing many harmonics and a single lter can process
that complex sound in many ways.
Figure 14.2:
Additive synthesis creates a complex sound by adding different frequencies together. Varying the volume of each frequency affects the tone color, also called timbre.
The Hammond organ was arguably the rst commercial additive synthesizer, though acoustic pipe organs have been around for many
centuries. A Hammond organ contains a long spinning rod with 91 adjacent metal gear-like “tone wheels” driven by an electric motor. Each gear
spins near a magnetic pickup that generates an electrical pulse as a tooth crosses its magnetic eld. The rod spins at a constant rate, and each
tone wheel has a di erent number of teeth that excite its own pickup. So each wheel generates a di erent frequency, depending on how many
teeth pass by the pickup each second.
Nine harmonically related frequencies are available at once—each with its own volume control, called a drawbar—using an elaborate
switching mechanism attached to each key of the organ’s keyboard. The drawbars determine the volume of each harmonic, letting the player
create a large variety of timbres. You can adjust the drawbars while playing to vary the tone in real time. As with all organs, the sound starts
when the key is rst pressed and stops when it’s released. Hammond organs have a feature called percussion that mimics the sound of a pipe
organ. When enabled, it plays a short burst of an upper harmonic adding emphasis to the start of each note.
The rst commercially viable all-in-one subtractive synthesizer was the Minimoog, introduced in 1970, and it plays only one note at a time.
However, inventor Robert Moog sold modular synthesizers as early as 1964. U nlike modern self-contained synthesizers, a modular synthesizer is
a collection of separate modules meant to be connected together in various con gurations using patch cords—hence the term patch to describe a
particular sound. So you might route the output of an audio oscillator to a voltage controlled ampli er (VCA) module to control the volume,
and the output of the VCA could go to a lter module to adjust the harmonic content. Few synthesizers today require patch cords, though the
term “patch” is still used to describe an arrangement of synthesizer components and settings that creates a particular sound.
The Minimoog simpli ed the process of sound design enormously by preconnecting the various modules in musically useful ways. Instead of
having to patch together several modules every time you want to create a sound, you can simply ip a switch or change a potentiometer setting.
Other monophonic analog synthesizers soon followed, including the ARP 2500 and 2600, and models from Buchla & Associates, E-MU Systems,
Sequential Circuits, and Electronic Music Labs (EML), among others. A modern version of the Minimoog is shown in Figure 14.3.
Figure 14.3:
The Minimoog was the first commercially successful subtractive synthesizer, bringing the joys of analog synthesis to the masses.
Photo courtesy of Moog Music, Inc.
Voltage Control
Another important concept pioneered by Moog was the use of voltage control to manipulate the various modules. U sing a DC voltage to control
the frequency and volume of an oscillator, and the frequency of a lter, allows these properties to be automated. U nlike an organ that requires a
complex switching system to control many individual oscillators or tone wheels simultaneously, the keyboard of a voltage controlled synthesizer
needs only to send out a single voltage corresponding to the note that’s pressed. When an oscillator accepts a control voltage to set its pitch, it’s
called a voltage controlled oscillator, or VCO. Further, the output of one oscillator can control other aspects of the sound, such as driving a VCA
to modulate the volume. Or it can add vibrato to another oscillator by combining its output voltage with the voltage from the keyboard, or vary
a note’s timbre by manipulating a voltage controlled filter (VCF).
Music sounds more interesting when the note qualities change over time, and being able to automate sound parameters makes possible many
interesting performance variations. For example, a real violinist doesn’t usually play every note at full volume with extreme vibrato for the
entire note duration. Rather, a good player may start a note softly with a mellow tone, then slowly make the note louder and brighter while
adding vibrato, and then increase the vibrato intensity and speed. A synthesizer whose tone can be varied by control voltages can be
programmed to perform similar changes over time, adding warmth and a human quality to musical notes created entirely by a machine. Of
course, you can also turn the knobs while a note plays to change these parameters, and good synthesizer players do this routinely.
Sound Generators
Oscillators are the heart of every analog synthesizer, and they create the sounds you hear. An oscillator in a synthesizer is the electronic
equivalent of strings on a guitar or piano. But where a string vibrates the air to create sound directly, an oscillator creates a varying voltage that
eventually goes to a loudspeaker. Either way, the end result is sound vibration in the air you can hear.
The ve basic waveforms used by most synthesizers are sine, triangle, sawtooth, square, and pulse. These were shown in Figure 1.22 and are
repeated here in Figure 14.4. As you learned in Chapter 1, fast movement at a wave’s rising or falling edges creates harmonic overtones. The
steeper the wave transition, the faster the speaker moves, and in turn the more high-frequency content produced.
Figure 14.4:
Most oscillators that generate musical pitches o er several di erent wave shapes, each having a characteristic sound quality, or timbre. The same wave shapes are also
used for oscillators that modulate the volume, pitch, and filter frequency.
A sine wave is not very interesting to hear, but it’s perfect for creating the eerie sound of a Theremin. Most patches created with subtractive
synthesis use a sawtooth, square, or pulse wave shape. These are all rich in harmonics, and each has a characteristic sound. Sawtooth and pulse
waves contain both even and odd harmonics, where square and triangle waves contain only odd harmonics. The classic sound Keith Emerson
used in the song Lucky Man from 1970 is based on a square wave.
One important way to make a synthesizer sound more interesting is to use two or more oscillators playing in unison but set slightly out of
tune with each other. This is not unlike the di erence between a solo violin and a violin section in an orchestra. No two violin players are ever
perfectly in tune, and the slight differences in pitch and timing enhance the sound.
When one oscillator of an analog-style synthesizer varies the pitch of another oscillator, the controlling oscillator usually operates at a low
frequency to add vibrato, while the main (audio) oscillator runs at higher frequencies to create the musical notes. Many analog synthesizers
contain dedicated low-frequency oscillators intended only to modulate the frequency of audio oscillators, or vary other sound attributes. As you
might expect, these are called low-frequency oscillators, or LFOs for short. Most LFOs can be switched between the various wave shapes shown
in Figure 14.4, and some can also output noise to create a random modulation.
An LFO creates a tremolo e ect when used to modulate the volume, a vibrato e ect when varying the pitch, or a wah-wah e ect when
applied to a lter. If the modulating wave shape is sine or triangle, the pitch (or volume) glides up and down. A sawtooth wave instead gives a
swooping e ect when modulating the pitch, and a square or pulse wave switches between two notes. When sent to a VCA for tremolo, square
and pulse waves turn the volume on and off, or switch between two volume levels.
One important way synthesizer players add interest is by varying the sound quality of notes over their duration. The simplest parameter to
vary is volume. This is handled by another common modulator, the ADSR, shown in Figure 14.5. An ADSR is an envelope generator, which is a
fancy way of saying it adjusts the volume or some other aspect of the sound over time. ADSR stands for Attack, Decay, Sustain, and Release.
Some synthesizers have additional parameters, such as an initial delay before the ADSR sequence begins, or more than one attack and decay
segment. Note that the Attack, Decay, and Release parameters control the length of each event, while the Sustain knob controls the sustained
volume level. The solid portion in Figure 14.5 shows an audio waveform zoomed out too far to see the individual cycles. In other words, this
shows a volume envelope of the sound.
Figure 14.5:
The ADSR is an envelope generator that controls the volume of a note over time, or the cutoff frequency of a filter.
The Attack time controls how long it takes for the volume to fade up initially. Once the volume reaches maximum, it then decays at a rate set
by the Decay setting. But rather than always decay back to zero volume, it instead fades to the Sustain volume level. This lets you create a short
note burst for articulation, then settle to a lower volume for the duration of the note. Only after you release the key on the keyboard does the
volume go back to zero at a rate determined by the Release setting.
An ADSR lets you create very short staccato notes as used in the instrumental hit Popcorn from 1972 by Gershon Kingsley. For the lead synth
sound from that song, the attack is very fast, and the decay is almost as fast. In this case the note should not sustain, so the sustain level is set to
zero. By extending the decay time, notes can be made to sound somewhat like a guitar, then more like a piano. Notes can also continue after you
release the key by setting a longer release time. Early synthesizers played all notes at the same volume, with the same attack and sustain levels
used throughout a passage. But when players can control the volume of every note—for example, by striking the key harder or softer—the ADSR
levels are relative to the performed volume of the note.
An ADSR is often used to change the volume over time, but it can also control a lter to vary a note’s timbre as it sustains. When an ADSR
controls a lter to change the tone quality, an ADSR volume envelope can also be used, or it can be disabled. Many synth sounds sweep the lter
frequency either up or down only, though it can of course sweep both ways. In truth, an ADSR always sweeps the lter both up and down, but
one direction may be so fast that you don’t hear the sweep. Many synth patches use one of those two basic ADSR lter settings, where the attack
is short with a longer decay, or vice versa.
Classic Moog-type synthesizers use a low-pass lter with a fairly steep slope of 24 dB per octave, with the cuto frequency controlled by an
ADSR (see Figure 14.6). This type of lter is o ered in many analog synthesizers. The lter’s Initial Frequency knob sets its static cuto
frequency before an ADSR or LFO is applied. The lter’s ADSR then sweeps the frequency up from there and back down over time, depending
on its settings. As when using an ADSR to vary the volume, you can do the same with a lter to create short staccato notes or long sustained
tones. With a synthesizer lter, the sweep direction changes the basic character of the sound. The lter’s Gain control determines the width of the
sweep range.
sweep range.
Figure 14.6:
The classic analog synthesizer filter is a low-pass at 24 dB per octave, with an adjustable resonant peak at the cutoff frequency.
Another important lter parameter is resonance, described in Chapter 1. When the resonance of a low-pass lter is increased, a narrow peak
forms at the cuto
frequency. This emphasizes that frequency and brings out single harmonics one at a time as it sweeps up or down. Many
analog-style synthesizers let you increase the resonance to the point of self-oscillation, not unlike echo e ects that let you increase the feedback
above unity gain. Setting the resonance very high, just short of outright feedback, imparts a unique whistling character to the sound.
The last
lter parameter we’ll consider is keyboard tracking. Normally, a synthesizer
lter is either left at a static position or swept
automatically by its own ADSR as each note is played. Most analog synths also have an option for the
lter to track the keyboard, so playing
higher notes moves the initial lter frequency higher in addition to any sweeping e ect. For example, if you have a resonant lter boost that
brings out the third harmonic of a note, you’ll want the lter to track the notes as you play up and down the keyboard. So whatever note you
play, the lter emphasizes that same-numbered harmonic. Keyboard tracking also makes possible sound e ects such as wind noise that can be
varied in pitch or even played as a melody. To do that you’ll use white or pink noise as the sound source, then increase the lter resonance, and
have the filter track the keyboard without influence from an ADSR.
MIDI Keyboards
Most hardware synthesizers include a piano-like keyboard to play the musical notes, though there are also smaller desktop and rack-mount
sound modules containing only the sound-generating electronics. Modern hardware synthesizers that include a keyboard also send MIDI data as
you play to control other synths or sound modules. Dedicated keyboard controllers that contain no sounds of their own are also available. A
keyboard controller usually includes many knobs to manipulate various functions, and these are typically used as a master controller for several
synthesizers in a larger system. Software synthesizers running within a digital audio workstation (DAW) program can also be controlled by a
keyboard controller or a conventional keyboard synthesizer with MIDI output.
Modern synthesizers send MIDI data each time a key is pressed or released. In that case, the data for each note-on and note-o message
includes the MIDI note number (0–127), the channel number (0–15), and the note-on volume, called velocity (0–127). MIDI note data and other
parameters are described more fully in the bonus MIDI chapter on the website for this book. Old-school analog synthesizer keyboards instead
output DC voltages that correspond to which note was struck, as well as trigger signals when notes are pressed and released.
Most MIDI keyboards are velocity sensitive, sometimes called touch sensitive, and their output data include how loudly you play each note.
The standard MIDI term for how hard you strike a key is velocity, and that’s one factor that determines the volume of the notes you play. The
other factor is the overall volume setting, which is another standard MIDI data type. Internally, most MIDI keyboards contain two switches for
every key: One engages when you rst press the key, and another senses when the key is fully depressed. By measuring how long it took for the
key to reach the end of its travel, the keyboard’s electronics can calculate how hard you struck the key and translate that into a MIDI velocity
In addition to transmitting note-on velocity to indicate how loudly a note was played, some MIDI keyboards also include a feature called after
touch. After pressing a key to trigger a note, you can then press the key even harder to generate after touch data. Some synthesizers let you
program after touch to add a varying amount of vibrato or to make the note louder or brighter or vary other aspects of the note’s quality while
the note sustains. One of the coolest features of the Yamaha SY77 synthesizer I owned in the 1990s was the harmonica patch, which used after
touch to shift the pitch downward. While holding a note, pressing the key harder mimics a harmonica player’s reed-bending technique. Most
MIDI keyboards that respond to after touch do not distinguish which key is pressed harder, sending a single data value that a ects all notes
currently sounding. This is called monophonic after touch. But some keyboards support polyphonic after touch, where each note of a chord can
be modified independently, depending on how much harder individual keys are pressed.
Beyond Presets
Modern synthesizers are very sophisticated, and most include tons of great preset patches programmed by professional sound designers. Sadly,
many musicians never venture beyond the presets and don’t bother to learn what all the knobs and buttons really do. But understanding how a
synthesizer works internally to make your own patches is very rewarding, and it’s also a lot of fun!
Figure 14.7 shows the block diagram of a basic old-school analog synthesizer that uses control voltages instead of MIDI data. It includes two
audio VCOs, a mixer to combine their audio outputs, a VCA to vary the volume automatically, and a VCF that changes its lter cuto frequency
to shape the tone color. There are also two ADSR envelope generators, one each for the VCA and VCF, and an LFO to add either vibrato,
to shape the tone color. There are also two ADSR envelope generators, one each for the VCA and VCF, and an LFO to add either vibrato,
tremolo, or both.
Figure 14.7:
This block diagram shows the basic organization of a pre-MIDI monophonic subtractive analog synthesizer.
Note that there are two distinct signal paths: One is the audio you hear, and the other routes the control voltages that trigger the ADSRs and
otherwise modulate the sound. Every time a key is pressed, two di erent signals are generated. One is a DC voltage that corresponds to the note
pressed to set the pitch of the audio VCOs. Each key sends a unique voltage; a low note sends a small voltage, and a higher note sends a larger
voltage. In addition, pressing a key sends a separate note-on trigger signal telling the ADSRs to begin a new sequence of rising, falling, and
sustaining control voltages. When the key is nally released, a note-o signal tells the ADSR to initiate the Release phase and fade its output
voltage down to zero.
Also note the switches in this example synthesizer that let you choose whether the VCA and VCF are controlled by the ADSR or the LFO. When
either switch is set to ADSR, the VCA or VCF is controlled by the ADSR’s programmed voltage, which in turn varies the volume or sweeps the
lter over time. Setting either to LFO instead applies tremolo or vibrato, respectively. In practice, many synthesizers use volume knobs to
continuously adjust the contribution from both the ADSR and the LFO. This lets you have both modulators active at once in varying proportions.
I built my rst home-made analog synthesizer around 1970 with a lot of help from my friend Leo Taylor, who at the time worked as an
engineer for Hewlett-Packard. This synth was modeled loosely after a Minimoog, except it had no keyboard—only two oscillators, a VCF, and a
VCA. Since I couldn’t afford a real Minimoog at the time, I had to build my own. In this case, poverty was the mother of invention.
In what was surely one of the earliest attempts by anyone to sync a computer with an analog tape recorder, Leo and I recorded a short tune
written by my friend Phil Cramer. We started by recording 60 Hz hum—Hz was called cycles per second or CPS back then—onto one track of my
Ampex AG-440 4-track half-inch professional recorder. That hum was then squared up (overdriven to clip and become a square wave) and
played from the recorder into the control port of a small H-P computer Leo borrowed from work. (In those days a “small” computer was the size
of a small refrigerator!)
Leo wrote a program in binary machine code to read the 60 Hz sync tone recorded on Track 1, entering it one instruction at a time on the
computer’s 16 front-panel toggle switches. There were no diskettes or hard drives back then either. The musical note data for the tune, a modern
three-part invention, were also entered via the computer’s toggle switches. It took us 11½ hours to program the computer, enter the song data,
and record each of the three tracks one at a time. You can hear the result in the audio file “1970_synth.mp3.”
In 1974, I built another synthesizer, shown in Figures 14.8 and 14.9. This one was much more ambitious than the rst and included a 61-note
keyboard, four audio oscillators, two VCFs and two VCAs, an LFO, portamento, a ten-step sequencer, and circuitry to split the keyboard at any
point so both synthesizer “halves” could be played simultaneously for a whopping two-note polyphony. It took Leo Taylor and me two years to
design this synthesizer, and I spent another nine months building it in the evenings. Leo was the real brains, though I learned a lot about
electronics over those two years. By the end of this project, I had learned enough to design all the keyboard and LFO control circuits myself.
Fourteen separate plug-in circuit cards are used to hold each module, and each card was hand-wired manually using crimped wiring posts and
24-gauge buss wire with Teflon sleeving.
Figures 14.8 and 14.9:
type patterns.
Ethan built this two-note analog synthesizer in the 1970s. It contains all of the standard analog synth modules, plus a sequencer to play repeating arpeggio
Also during this period, Leo and I designed what may have been the
rst working guitar synthesizer based on a pitch-to-voltage converter. I
tried to sell the design to Electronic Music Labs (EML), a synthesizer manufacturer near me in Connecticut, but they were too short-sighted to see
the potential market. Today, guitar controllers for synthesizers are common!
The best way to learn about analog synthesizers is to see and hear at the same time. The video “analog_synthesizers” pulls together all of the
concepts explained so far, and it also shows the waveforms on a software oscilloscope to better relate visually to what you hear.
Alternate Controllers
In addition to a built-in keyboard for playing notes, many modern synthesizers and keyboard controllers also provide a pitch bend controller.
This usually takes the form of a knurled wheel that protrudes slightly from the left side of the keyboard, letting you glide the pitch between
nearby notes, create vibrato manually, or otherwise add expression to a performance. When you rock the wheel forward, the note’s pitch goes
higher, and pulling it back lowers the pitch. Some synthesizers use a joystick or lever instead of a wheel. Either way, when you let go of the
wheel or lever, a spring returns it to its center resting position, restoring the normal pitch. The standard pitch bend range is plus/minus two
semitones, or one musical whole step up or down, though most synthesizers let you change the range through a setup menu or via MIDI data.
Being able to set the bend range as much as one or even two octaves in either direction is common, though when the bend range is very large,
the pitch is more difficult to control because small movements have a larger effect.
Another common built-in controller is the modulation wheel, or mod wheel for short. U nlike the pitch bend wheel, a mod wheel stays where
you leave it rather than returning to zero when you let go. The most common use is for adding vibrato, but most synthesizers can use this wheel
to control other parameters, such as a lter’s initial cuto frequency. Again, assigning the mod wheel to control a di erent parameter is done in
the synth’s setup menu or by sending MIDI data.
Most keyboard synthesizers also have a jack that accepts a foot pedal controller. This is typically used to hold notes after you release the keys,
like the sustain pedal on an acoustic piano. My Yamaha PFp-100 piano synthesizer includes three pedal jacks—one for a sustain pedal and two
others for the sostenuto and soft pedals that behave the same as their counterparts on a real grand piano. An additional jack accepts a
continuously variable foot controller to change the volume in real time while you play. Many synths let you reassign these pedals to control
other aspects of the sound if desired, and these pedals also transmit MIDI control data.
Some synthesizers include a built-in ribbon controller. This is a touch-sensitive strip that’s usually programmed to vary the pitch, but instead of
rocking a wheel or joystick, you simply glide your nger along the ribbon. If the ribbon is at least a foot long, it’s easier to adjust the pitch in
ne amounts than with a wheel. It can also be used to add vibrato in a more natural way than using a pitch bend wheel, especially for musicians
who play stringed instruments and are used to creating vibrato by rocking a nger back and forth. You can also play melodies by sliding a nger
along the ribbon.
Besides the traditional keyboard type we all know and love, other controllers are available for guitar and woodwind players. Most guitar
controllers use a special pickup to convert the notes played on an electric guitar to MIDI note data, and many also respond to bending the strings
by sending MIDI pitch bend data. Converting the analog signal from a guitar pickup to digital MIDI note data is surprisingly di cult for several
reasons. First, if more than one note is playing at the same time, it’s di cult for the sensing circuits to sort out which pitch is the main one, even
if the other notes are softer. To solve this, the pickups on many guitar-to-MIDI converters have six di erent sections, with a separate electrical
output for each string.
Another obstacle to converting a guitar’s electrical output to MIDI reliably is due to the harmonics of a vibrating string changing slightly in
pitch over time. A vibrating string produces a complex waveform that varies, depending on how hard and where along the string’s length it’s
plucked. When a string vibrates, it stretches slightly at each back and forth cycle of the fundamental frequency. So when a string is at a far
excursion for its fundamental frequency, the pitch of each harmonic rises due to the increased string tension. This causes the waveform to “roll
around” over time, rather than present a clean zero crossing to the detection circuit. U nless extra circuitry is added to correct this, the resulting
MIDI output will jump repeatedly between the correct note and a false note an octave or two higher.
The short demo video “vibrating_strings” uses a software oscilloscope to show and hear three di erent wave types: a static sawtooth, a plucked
note on an electric bass, and an electric guitar. An oscilloscope usually triggers the start of each horizontal sweep when the wave rises up from
note on an electric bass, and an electric guitar. An oscilloscope usually triggers the start of each horizontal sweep when the wave rises up from
zero. This is needed to present a stable waveform display. The sawtooth wave is static, so the oscilloscope can easily lock onto its pitch, and
likewise a frequency measuring circuit can easily read the time span between zero crossings. But the harmonics of the bass and guitar notes vary
as the notes sustain, and the additional zero crossings make pitch detection very di cult. The electric guitar wave has even more harmonics than
the bass, and you can see the waveform dancing wildly. When I built the guitar-to-MIDI converter mentioned earlier, I added a very steep
voltage-controlled low-pass filter to remove all the harmonics before the circuit that measures the frequency.
Another alternate MIDI input device is the breath controller, which you blow into like a saxophone. U nlike a keyboard or guitar controller, a
wind controller is not used to play notes. Rather, it sends volume or other MIDI sound-modifying data while a note sustains. Musicians who are
uent with woodwind and brass instruments can use this e ectively to vary the volume while they play notes on the keyboard. Some synth
patches also respond to a breath controller by varying the timbre. This helps to better mimic a real oboe or trumpet, where blowing harder
makes the note louder and also brighter sounding.
One of the most popular alternate MIDI controllers is the drum pad. These range from a simple pad you tap with your hand through elaborate
setups that look and play just like a real drum set. Modern MIDI drum sets include a full complement of drum types, as well as cymbals and a
hi-hat that’s controlled by a conventional foot pedal. Most MIDI drum sets are also touch sensitive, so the data they send varies by how hard you
strike the pads. Better models also trigger different sounds when you strike a cymbal near the center or hit the snare drum on the rim.
The last alternate input device isn’t really a hardware controller but rather uses the external audio input jack available on some synthesizers.
This lets you process a guitar or voice—or any other audio source—through the synth’s built-in lters and other processors. Of course, you can’t
apply LFO vibrato to external audio, but you could apply a lter or volume envelope using an ADSR that’s triggered each time you play a note
on the keyboard.
All of the synthesis methods described so far create and modify audio via oscillators, lters, and other electronic circuits. This truly is synthesis,
because every aspect of the sound and its character is created manually from scratch. Another popular type of synthesizer is the sampler, which
instead plays back recorded snippets, or samples, of real acoustic instruments and other sounds. This is sometimes called wavetable synthesis,
because the sound source is a series of PCM sample numbers representing a digital audio recording. In this case, the “table” is a block of memory
locations set aside to hold the data. Note the distinction between short recorded samples of musical instruments played by a human versus the
rapid stream of samples that comprise digital audio. Both are called “samples,” but instrument samples are typically several seconds long, where
digital audio samples are usually
second long.
The earliest samplers were hardware devices that could record sounds from a microphone or line input, store them in memory, and
manipulate them in various ways with the playback controlled by a MIDI keyboard. The Synclavier, introduced in 1975, was the rst
commercial digital sampler I’m aware of, but at $200,000 (and up), it was far too expensive for most musicians to a ord. Every hardware
sampler is a digital device containing a computer with an operating system of some sort that runs digital audio software. There’s no fundamental
difference between stand-alone hardware samplers and sampler programs that run on a personal computer.
Early samplers had very limited resources compared to modern types. For example, the E-mu Emulator, rst sold in 1981, o ered only 128
kilobytes (!) of memory, with a sampling rate of 27.7 KHz at 8 bits. This can store only a few seconds of low-quality mono music. To squeeze
the most out of these early systems, a concept known as looping was devised, which repeats a portion of the recorded note for as long as the key
is held. Without looping, every sample Wave le you record would have to extend for as long as the longest note you ever intend to play. The
most important part of a musical note is the rst half-second or so because it contains the attack portion that de nes much of an instrument’s
basic sound character. Figure 14.10 shows a sampled electric bass note that’s been set up for looping in Sound Forge.
Figure 14.10:
This sampled bass note contains a looped region that repeats the last part of the Wave file continuously for as long as the note needs to sound.
When this Wave
le is played in a sampler that recognizes embedded loop points, the entire
le is played, and then the trailing portion
bounded by the Sustaining Loop markers repeats continuously while the synthesizer’s key remains pressed. When the key is released, the
sustained portion continues to loop until the Release portion of the ADSR completes. Figure 14.11 shows a close-up of just the looped portion.
Figure 14.11:
This close-up shows the looped portion of the bass note in Figure 14.10. When playback reaches the small upward cycle just before the end of the Wave
sampler resumes playing at the beginning of the looped region. This short section of the file continues to repeat for as long as the key is held.
le, the
The start and end loop points are usually placed at zero crossings to avoid clicks. In truth, any arbitrary points on the waveform could be used
as long as they both match exactly to allow seamless repeating. The “sample_looping” video shows this Wave le being played in Sound Forge
with loop playback mode enabled. You can see and hear the note as it plays the main portion of the le, then remains in the looped portion
until I pressed Stop. Sound Forge includes a set of tools for auditioning and fine-tuning samples to find the best loop points.
Setting loop points is not too di cult for a simple waveform such as this mellow sounding electric bass. You establish where the loop begins
and ends, being careful that the splice makes a smooth transition back to the earlier part of the waveform. However, with complex waveforms,
such as a string or horn section, or a vocal group singing “Aah,” the waveform doesn’t repeat predictably within a single cycle. Further, with a
stereo Wave le, the ideal loop points for the left and right channels are likely to be di erent. If the loop ending point does not segue smoothly
back to the starting boundary, you’ll hear a repeated thump or click each time the loop repeats.
As you can see, looping complex stereo samples well is much more di cult than merely nding zero crossings on a waveform. Many
instrument notes decay over time, so you can’t just have a soft section repeat back to a louder part. Nor does repeating a single cycle work very
well for complex instruments and groups; this sounds unnatural because it doesn’t capture the “rolling harmonics” of plucked string instruments.
I’ve done a lot of sample looping over the years, and it’s not uncommon to spend 20 minutes nding the best loop point for a single di cult
stereo sample. However, I’ve had good success using an inexpensive program called Zero-X Seamless Looper, which greatly simpli es the
process of finding the best loop points.
Sometime it’s simply not possible to loop a sustained portion of a Wave le without clicks or thumps in one channel. Sound Forge includes
the Crossfade Loop tool, shown in Figure 14.12, that applies a destructive cross-fade around the loop points. This modi es the Wave le and
“forces” a smooth transition when perfect loop points are simply not attainable. Note that embedded loop points are part of the standard
metadata stored in the header portion of Wave les, so when those les are imported into Vienna or any other sample creation program, the
loop points will be honored.
Figure 14.12:
The Crossfade Loop tool in Sound Forge lets you force clean loop points by applying a cross-fade directly to the Wave file.
Also note in Figure 14.12 the Post-Loop section at the right of the screen. I mentioned earlier that a sampler plays the initial portion of the
recording when each note’s playback begins, and then the looped portion repeats for as long as the note is held, continuing until the ADSR’s
Release time is exhausted. But looped sample les can also incorporate a third section of the recorded sample, which is played instead of the
looped portion during the Release portion of the ADSR envelope. This is useful to add realism to a sampled piano, for example, because a real
looped portion during the Release portion of the ADSR envelope. This is useful to add realism to a sampled piano, for example, because a real
piano makes a soft mechanical “clunk” sound as the key falls back into place. It’s also useful for other instruments such as woodwinds and brass;
a note that’s ended naturally by a musician sounds different from a sustaining note faded out artificially with a volume control.
Creating a complete high-quality sampled instrument set usually requires a large number of di erent Wave
le recordings. The original
“standard” for creating sample sets recorded every third note the instrument is capable of playing, and then the sampler applies resampling-type
pitch shifting to create the in-between notes. If you’re sampling a violin whose lowest note is a G at 196 Hz, you’d record that G note, then the
Bb a minor third above, then the C# above that, and so forth up to the highest note you want to include. Today, most sample playback is done
on computers having gigabytes of memory, and even more gigabytes of hard drive space, so many modern commercial sample libraries include
recordings of every note in the instrument’s range.
Further, a high-quality sample set requires recording the same notes played at di erent volume levels. A bassoon played softly has a very
di erent sound quality than when played loudly where the overtones are much more prominent. So you can’t just use a single loud recording,
then lower the volume to create soft notes. Many sample sets are recorded with the performer playing at two di erent volume levels—loud and
soft—but some include many more in-between levels. This, too, increases the amount of memory and hard drive space needed for a complete
sampled instrument. However, with clever programming, a single loud sample can sometimes work. The sampler simply lowers the frequency
of a low-pass lter at softer volumes to reduce the harmonic content. The softer you play, the lower the cuto
actually works pretty well for sampled pianos and is a good compromise when available memory is limited.
frequency is set to. This method
Software Synthesizers and Samplers
As mentioned, hardware and software samplers are more alike than di erent, though digital implementations of analog-type synthesizers are
very di erent internally than their analog counterparts, even if outwardly they look and respond the same. Software synthesizers operate in one
of two basic modes: as either a stand-alone computer program you play live using a MIDI keyboard or as a plug-in that receives data from a
MIDI track in a DAW program. Many software synths include both types, so you can use them either way.
A stand-alone program makes sense for playing live concerts, using a laptop computer as the “hardware” platform. But a plug-in version is
more practical when using a DAW to create complete productions by recording MIDI data. This way you can record the notes as you would
when recording any other type of instrument overdub but with the added bene t of being able to edit wrong notes and make other changes after
recording. Plug-in synthesizers can also be run through audio plug-ins in the DAW, such as EQ, compression, reverb, and echo. Of course, a standalone synthesizer can be recorded into a DAW program as audio, though that loses much of the exibility MIDI o ers. Not only does it have the
ability to change notes and patch sounds, but a plug-in synth also lets you record and edit controller data afterward, such as pitch bend and lter
Sample Libraries
As mentioned, most commercial sample libraries contain a collection of Wave le recordings of various notes played by an instrument or section
or sung performances for sampled solo voices and choirs. These samples are often recorded at two or more volume levels, and they usually
contain a looped portion that repeats inde nitely for as long as the key is held. However, some sample libraries don’t use looping, with the
Wave les instead sustaining for 8 to 10 seconds or even longer. There are many di erent sample library formats, including AKAI, SoundFont,
GigaSampler, Kontact, Roland, Kurzweil, and others. Describing the internal details of every sample format is beyond the intent of this book, so
I’ll give an overview of the process using the format I’m most familiar with, which is SoundFonts. Please understand that the same concepts
apply to all sample formats.
I group sampled instruments into two broad categories: percussive instruments and sustained instruments. To my way of thinking, percussive
instruments are those that create a sound with a single strike that decays naturally over time. This includes drums and bells but also the piano,
which, in fact, is o cially classi ed as a percussion instrument. A guitar or plucked bass could also be considered percussive in this context
because once a note is struck, the sound eventually decays on its own. Sustained instruments include the violin and other bowed instruments, and
clarinets and trombones and other blown instruments. However, some instruments fall into both groups, such as the tambourine, triangle, and
maracas. These can be struck once for a percussion e ect or shaken repeatedly to sustain inde nitely. The same applies to the tremolo plucking
style on a mandolin or a snare drum or timpani roll.
I divide musical instruments into these two categories because it a ects how their samples are programmed inside a sample library. Purely
percussive instruments do not need to be looped and, if sampled well, are mostly indistinguishable from a “real” recording of a live
performance. However, piano notes interact di erently when played live versus sampled. When played live with the sustain pedal pressed, a
struck piano string excites other strings that were not struck. But by and large, sampled pianos can sound very realistic. And while a continuously
shaken tambourine or triangle can be considered a sustained sound, the repeated striking is not usually difficult to loop.
Sampled sustained instruments usually sound less realistic than sampled percussive instruments. To my ears, nothing sounds worse than a
sampled saxophone solo. Short sax and brass note “stabs” can often sound acceptable, but sustained passages on a solo clarinet or cello almost
always sound fake to an experienced listener. How realistic a sampled performance sounds depends on the type of musical passage being
played, the quality of the samples and how well they were looped, and how well the sample reacts to MIDI expression controls. Real violas and
utes can change their volume and timbre while notes sustain, and it’s very di cult to program such changes realistically when using samples.
Then again, background violins playing sustained whole notes can usually sound acceptable if their basic recorded tonality is pleasing.
Then again, background violins playing sustained whole notes can usually sound acceptable if their basic recorded tonality is pleasing.
Although samples and synthetic recreations of expressive instruments usually sound pretty poor, you can often get acceptable results by using
one recording of a real instrument, along with one or two sampled or synthesized versions. It’s best if the real instrument is a little louder in the
mix than the others to help further hide the synthetic elements. I’ve heard sampled brass sections sound pretty good when one real trumpet or
sax was prominent in the mix.
Creating Sample Libraries
Figure 14.13 shows the main screen of the Vienna SoundFont Studio, a program included for free with most SoundBlaster sound cards. Years
ago, early SoundBlaster cards had a reputation for mediocre sound quality, but modern versions are capable of very high
program requires a SoundBlaster to be present in the computer, but SoundFont
delity. The Vienna
les are a universal format usable by most modern software
samplers. So once you create a SoundFont bank, it can be played back by DAW software through whatever sound card is attached to your system.
Further, SoundFonts are based on standard Wave les, so their delity is limited only by quality of the sample recordings they’re created from. I
have a SoundBlaster card in my computer, though I use it only to create and edit SoundFonts.
Figure 14.13:
The Vienna program that creates and edits sample sets in the SoundFont format is very comprehensive. Besides letting you organize complex groups of instruments into
banks and presets, the samples can be programmed to respond to the keyboard and other controllers in every way supported by the MIDI standard.
Vienna SoundFont Studio shown in Figure 14.13 is a comprehensive sample management program that lets you import and loop Wave
organize them into banks and presets (patches), specify reverb and other standard MIDI e ects, and de ne split points based on both a range of
note pitches and a range of note-on velocities. The MIDI standard allows up to 128 sound banks, each containing up to 128 patches, with all
16,384 patches available for playback at once. Of course, nobody needs that many patches in a single sample set. Further, the amount of
memory in your computer will surely limit you to fewer banks and patches, depending on the size of the Wave files used.
The main reason to allow so many banks and patches is for organization. For example, Bank 0 could contain three or four di erent French
horn sample sets, Bank 1 might hold a few di erent violin section patches, and so forth. But even that type of organization is wasteful of
memory, because you’ll load every instrument type even if a tune needs only one or two. I organize my own SoundFont collection by instrument
categories. Cellos.sf2 contains seven di erent cellos, Flutes.sf2 has six utes, and so forth. This way I can load only the instruments I need, then
try di erent versions while the song plays to decide which sounds best for that particular song. I often use two di erent versions to create a
unison section, rather than have three parts all play the same violin patch. This sounds more natural, like di erent musicians playing di erent
instruments. It also helps to avoid comb filtering when the same note is played by several instruments at once, as is common with string sections.
Creating a custom sample set is reasonably straightforward, if tedious. The biggest hurdle is understanding how the banks and patches are
organized. The main Vienna screen in Figure 14.13 is divided into several sections. The upper left area displays a tree view of the currently
loaded SoundFont. In this le, Bank 0 contains ve di erent concert harp patches, but only Patch 000 (Fluid Harp) is visible. This patch is
opened to show the keyboard zones, or key ranges, that will trigger each Wave le sample. The key ranges for each sample are set in the upperright portion, though the key range display can be switched to instead show the velocity switch points. This is needed when multiple samples
are used to play the same note, with di erent samples triggered depending on how hard you strike the key. The lower section of the screen is
where the various sound modifying parameters are defined, such as ADSR values, filter Q, reverb, LFO rate and intensity, and so forth.
Key and Velocity Switching
As mentioned earlier, ideally a separate sample will be recorded for every note the sampled instrument can play. But that requires a huge
amount of memory to store all those samples. Therefore, it’s common to record samples at selected intervals—perhaps every two to four halfsteps. The samples are then pitch-shifted up or down by the sampler during playback to produce the in-between notes. So instead of recording
all 47 notes on a modern concert harp, the sample recording of middle C could serve the range from two notes below middle C through two
notes above, and so forth. This particular sampled harp uses 14 separate stereo Wave les, with most used to play a range of only three or four
adjacent notes.
If you try to cover too broad a range with a single sample, the notes at each extreme can sound unnatural. This is especially true with
instruments that have an inherent body resonance, such as violins and acoustic guitars. The primary pitch of the in-between notes are raised and
lowered during playback, but the resonant frequencies of the instrument’s body are also shifted. So a low trumpet note might sound more like it
was played on a trombone, or a high violin note could sound as if the violin is only ve inches long. Another problem when sampling notes at
too few intervals is the sudden change in timbre that results when you play a scale as it crosses a sample boundary point. Two adjacent notes on
a real harp or cello usually sound similar, so playing an ascending scale ows smoothly from one note to the next and doesn’t suddenly change
tonality. But a note that’s been shifted up in pitch by several whole steps sounds very di erent from another note sampled an octave away that’s
now shifted down several whole steps. In that case, playing one note after the other yields a large change in tonality and sounds fake.
MIDI sample sets can contain as many or as few Wave files as you’d like for each patch. You define which samples will play for single notes or
note ranges and also which will play at di erent volumes of the same notes or note ranges, depending on the MIDI key velocity. Most MIDI data
range from 0 through 127, for a total range of 128 values. However, some systems consider the range to be 1 through 128. So note-on velocities
can range from 0 (silence) through 127 (maximum loudness). Again, most musical instruments have a di erent quality when played loudly
versus softly, so you might use one sample of a snare drum that was played softly for note velocities of 0 through 70 and then switch to another
sample of the same drum struck harder for higher velocities.
You can also set an exclusive class for note groups so that playing one note automatically mutes another in the same group. The classic
example is when programming hi-hat samples. When a closed hi-hat sample is triggered, it should immediately turn o the open hi-hat sample
if that’s currently sounding. Otherwise, you’d have both samples playing at once, which sounds phony. To program one or more samples to mute
all the others that are related, assign them the same class number. The SoundFont format supports up to 127 exclusive class groups, though it’s
doubtful you’d ever need more than two or three.
I prefer to loop Wave les in Sound Forge because of its superior tools and its destructive cross-fade option when nding clean loop points is
not otherwise possible. But plain looping without cross-fades can be set up entirely in Vienna, as shown in Figure 14.14.
Figure 14.14:
Vienna’s Looping screen shows an overview of the Wave file at the top, as well as the transition between the loop end (left) and start (right) at the bottom.
Figure 14.15 shows the same information as the bottom portion of Figure 14.14 but zoomed in to better see the splice details. As you can see
in this close-up, sample number 16,753 has just descended through zero, and sample number 11,847 (earlier in the Wave le) resumes at the
same level and continues negative. If you click the U p and Down buttons next to the Local Loop End and Start values, the displays scroll
horizontally helping you nd the ideal boundaries. You can click the Play Loop button at any time to hear the looped portion of the Wave le
played back with the current loop points, using the Key Number field to specify which MIDI note triggers the sample.
Figure 14.15:
This close-up of the loop points from Figure 14.14 is zoomed in to show the individual cycles and samples, to help find the best transition.
Sampler Bank Architecture
Sample banks are organized in a tree-like structure, as shown at the upper left of the screen in Figure 14.13. The basic building block is the
samples that are imported as Wave les, and the banks and patches you call up in a DAW or sequencer program are built from those Wave les.
To create a new sample set, you’ll use File .. New, then browse to nd and import the Wave les you recorded and optionally already looped.
You can import a single Wave le or an entire group of les at once, and Vienna automatically creates a new Bank 0 with a new Patch 000 that
contains all of those les. You can then specify the range of notes each sample will respond to, as well as the root note for that sample. For
example, middle C is MIDI note number 60, and the A note 1½ octaves lower is 45. So if you have a sample of an A bass note at 110 Hz and
want it to play the range from G below through B above, you’ll specify the root note as 45, then slide the markers in the upper right of the
screen in Figure 14.13 to span only that range. The harp sample highlighted in Figure 14.13 is a C# note, and it’s set to play from the B below
C# to the D above. Stereo Wave les are handled as separate left and right samples, though for most sample sets both channels will use identical
settings other than their pan position.
For samples that will be looped, the note can end by fading out while the looped portion repeats or set to resume playing after the looped
region to reproduce the natural sound of the note stopping. Of course, this assumes the sample recording includes the note’s natural decay when
the musician stopped and that the Wave le was set to play the end portion when the loop points were programmed. Once all of the samples
have been assigned to a patch and the note ranges (and optionally the velocity ranges) are set, you can de ne a large number of e ects
parameters such as coarse and ne tuning, LFO vibrato values when the mod wheel is used, and envelope and lter ADSR settings. Figure 14.16
shows the ADSR section of Vienna and the little pop-up window that appears when you click to edit a parameter.
Figure 14.16:
This part of the screen lets you assign values for each portion of the ADSR, plus many other settings that determine how a sample is to be played.
Again, most sample formats include features that are the same as or similar to SoundFonts, though some of the nomenclature might be
di erent. But the basic concepts of looping Wave les and assigning root note numbers and note and velocity ranges per sample are the same, as
is organizing samples into patches and banks accessed by name in your DAW program.
FM Synthesis
FM synthesis was invented by John Chowning at Stanford U niversity in the 1960s. The patent was later licensed to Yamaha, which produced the
DX7, the rst commercially successful FM synthesizer. Today, FM synthesis is a popular staple available on many hardware and software
synthesizers. The harmonics of the square, pulse, and sawtooth waves used by an analog synthesizer mimic the overtone series of acoustic
instruments and most other sounds that occur in nature. FM synthesis can generate conventional harmonics, but it can also create inharmonic
overtones, yielding a sound reminiscent of bells and gongs. Coupled with ADSRs to vary the volume, or the frequency of a traditional analog
type filter, FM synthesizers can create many unusual and musically interesting sounds.
Earlier I mentioned that when one oscillator controls the frequency of another, the controlling oscillator typically runs at a low frequency to
create vibrato. Figure 14.17 shows a 200 Hz sine wave with frequency modulation applied at a slow rate to create vibrato. You can see the wave
cycles compress and expand as the wave’s frequency changes over time.
Figure 14.17:
Applying FM at a slow rate creates vibrato, which repeatedly sweeps the frequency higher and lower.
When the modulating frequency rises above 20 Hz or so, passing into the audible range, the sum and di erence side bands added to the carrier
are perceived as changes in timbre rather than as vibrato. In truth, these side bands are always created, even at slow modulating frequencies, but
increasing the vibrato rate progressively raises their volume, making them more audible. Figure 14.18 shows the same 200 Hz sine wave but
with a modulation frequency of 100 Hz. Here you can see that the basic shape of the waveform has changed, which, of course, a ects its tone
quality. So it’s no longer a pure tone containing only 200 Hz.
Figure 14.18:
When FM is applied at a fast rate, frequency shifts occur within single cycles of the carrier, creating interesting tone colors.
In FM-speak, the oscillator that creates the main tone you hear is called the carrier, while the oscillator that applies the vibrato is a modulator.
These oscillators are usually called operators when describing how a patch is constructed, and an arrangement of oscillators that modulate one
another in various ways is called an algorithm. When an ADSR is applied to the modulator’s volume, the tone color changes over time in a way
reminiscent of sweeping a lter. The “fm_synthesis” video shows simple vibrato added to a 200 Hz sine wave, with the vibrato frequency
increased from 1 Hz up into the audible range. Then the amount of modulation is decreased and increased several times in a row so you can
hear how that a ects the tone color. You can see the vibrato plug-in’s knobs move in the video to better relate what you see and hear to what’s
actually happening.
Figure 14.19 shows the Operator Matrix View of Native Instrument’s FM7 plug-in synthesizer. Here, Oscillator B plays the notes you hear as
controlled by the MIDI keyboard, while Oscillator A modulates the frequency of Oscillator B. The modulation depth (amount) ranges from zero
to 100 and is set to 47 in this patch. At the same time, Oscillator D also plays notes controlled by the keyboard, with Oscillator C serving as a
modulator with a bit less depth. Audio oscillators B and D are mixed with 100 percent of B to 48 percent of A to create the complete sound.
Figure 14.19:
The Operator Matrix View of Native Instrument’s FM7 plug-in synthesizer lets you configure multiple carriers and modulators that sound at once.
Having two sets of oscillators creates a sound that’s even more complex than a single oscillator pair, because the ratio of oscillator frequencies
Having two sets of oscillators creates a sound that’s even more complex than a single oscillator pair, because the ratio of oscillator frequencies
can be di erent for the two pairs. Di erent ADSR-type envelopes can be applied to all four oscillators to change the overall volume as usual and
also to change the modulation amount over time. Further, modulators can be integer-related to the carrier or not. An A note at 220 Hz
modulated by an A an octave below at 110 Hz sounds very di erent from an A-220 modulated by a C note at 130.8 Hz. A modulator oscillator
can also follow the keyboard to rise and fall in pitch as different notes are played, or it can remain at a constant frequency.
FM synthesis is a very deep subject, and operator matrices are often much more complex than the simple algorithm shown in Figure 14.19.
Oscillators can modulate their own frequency, and one modulating oscillator can modulate another, which in turn modulates the audio oscillator
that creates the musical notes. The number of possible combinations is truly staggering. The best way to learn more about how FM algorithms
affect the sound quality is to explore the factory presets in whatever FM synthesizer you happen to own.
Physical Modeling
Physical modeling is a form of additive synthesis, but instead of adding and ltering multiple waveforms, modeled sounds are derived entirely
through mathematical calculations. The rst commercial physical modeling synthesizer I’m aware of was Yamaha’s VL1 in 1994, which was soon
followed by the more a ordable VL70M in 1996. Rather than simply sum a number of static sine waves, physical modeling uses complex
equations that mimic the sound sources, body resonances, and other attributes of acoustic instruments. I remember hearing the VL1 when it was
demonstrated at the 1995 New York AES show. What struck me most was the realism of the trumpet patch as sustained notes transitioned from
one to another.
U nlike samples that can play only one static note followed by another, physical modeling more realistically creates the sounds that real
instruments make in between successive notes. When a trumpet or clarinet plays a legato passage, the player’s breath continues uninterrupted for
the duration of the passage, even as the notes change. But sampled notes start with a new breath. Further, real instruments often create subtle
sounds as the notes transition from one to the other. Sampled strings behave similarly. Real violinists often play several notes in a row without
reversing the bow direction, and they sometimes slide from one note to another note—an e ect called glissando. But with string samples, each
new note begins with a new bow stroke as the old note ends. This is acceptable for staccato passages, where each note is short and clearly
de ned. But for owing passages, samples of instruments such as the trombone and cello almost always sound arti cial because real musicians
don’t play that way.
Physical modeling circumvents the limitations of sampled acoustic instruments by calculating a mathematical representation of the vibration of
a physical string or drum head and how it interacts with the instrument’s sound chamber and other sources of resonance. An accurate physical
model of a ute will have a di erent harmonic structure when overblown, just like a real ute, and a plucked string will go slightly sharp and
add a “splat” sound when played very hard. As you can imagine, physical modeling is a complex process that requires many extensive
computations. In my opinion, physical modeling is the nal frontier of electronic synthesis—at least the type of synthesis that aims to recreate
the sound of real musical instruments.
Granular Synthesis
Granular synthesis joins together short sound fragments into a longer whole. The basic concept is a sequence of many very short sounds that are
strung together in various ways to create new tone qualities. Granular synthesis is not unlike musique concrète, a twentieth-century composition
style that was often created from a montage of disparate sounds by splicing together short pieces of analog recording tape. But with modern
digital implementations of granular synthesis, the sounds can be divided into even smaller segments, called grains. These typically range in
length from 1 to 50 milliseconds. Granular synthesis often employs sampled music as the sound source, but any sounds can be used, including
static square waves and the like, or even sounds of nature. The sounds can be played one after the other or morphed from one to another, or
several sounds can be played at once.
Many people use MIDI synthesizers in their DAW audio projects, or software samplers to augment or substitute for real instruments they may not
know how to play. Many of my own tunes include software synthesizers, and I use samplers all the time for drums, keyboards, and woodwinds.
One important attribute of software synthesizers is how much computing power they require. The number of notes playing simultaneously at
any moment usually a ects this as well. The common term for synthesizers and other audio processes that require a lot of computer resources is
CPU intensive. Here, CPU refers to the central processing unit, the heart and brains of every personal computer. A software synthesizer that plays
a simple square wave with only volume changes over time is not very CPU intensive, compared to a complex patch having two sweeping lters,
three ADSR envelopes, plus ve Wave les samples that cross-fade from one to the other as each note sustains. If you have a large number of
tracks, each with a complex patch playing several notes at once, at some point the computer’s CPU will not be able to perform all the
calculations quickly enough to play the song in real time.
As computers have become more powerful, overtaxing the CPU is less of a problem today. Then again, it’s a truism that software increases in
complexity to take advantage of more powerful hardware. The solution for synthesizers that are highly CPU intensive is to prerender the audio
to a new Wave le. Most DAWs can do this automatically. In SONAR this process is called freezing a synth track, and there are many available
options. Freezing a synthesizer track is not di cult even if your DAW doesn’t o er an automatic method. You solo the synth track, then export
the audio to a new Wave le. It doesn’t matter if the computer can play that synthesizer’s audio in real time. The render takes as long as it takes
the audio to a new Wave le. It doesn’t matter if the computer can play that synthesizer’s audio in real time. The render takes as long as it takes
to complete. Then you load the resulting Wave le to a new audio track and mute or disable the synthesizer’s MIDI and audio tracks so they
don’t require further computation.
Algorithmic Composition
Algorithmic composition isn’t a synthesis device or method, but it’s used exclusively with synthesizers, so it ts well in this chapter. The simplest
form of computer-generated performance is the arpeggiator. Early versions were designed as plug-ins for MIDI sequencer programs to create
arpeggios and other simple musical patterns, without the performer having to play all the notes. The idea is you’ll sustain a single chord, and the
arpeggiator plays the notes of that chord individually up or down over some number of octaves. The “arpeggiator.wav” demo le plays a few
bars from one of my pop tunes as I recorded it through a Fender Rhodes sample set, then again after adding the Arpeggiator MIDI plug-in shown
in Figure 14.20.
Figure 14.20:
The Cakewalk Arpeggiator o ers a number of ways to create arpeggiated note patterns. It reads the chord to know what notes to play, then generates new notes at the
tempo, and within the note range, you specify. The Legato control determines how long the notes play for, ranging from choppy to very smooth. It can even add a swing feel to the
Another, more advanced type of “composing” MIDI plug-in is the drum pattern generator, based on the programmable drum machines that
became popular in the 1970s. These range from simple types useful for little more than a metronome, through programs that play sophisticated
patterns using high-quality sampled drum sets. But algorithmic composition can go far beyond simple one-instrument musical patterns. Indeed,
the concept of using computers to compose music has been around for many years.
Beyond arpeggios and drum patterns, programs are available that create complete MIDI backing tracks for songs in many di erent musical
styles. This is a useful way for songwriters to record their original tunes without having to hire musicians or a recording studio or to get
inspiration from an accompaniment pattern while trying out di erent lyrics. It’s also great for singers to make their own karaoke backing tracks
or for musicians who just want to create music to play along with for fun.
One of the most popular auto-arranger programs is Band-in-a-Box (BIAB) from PG Music, available for both Windows and Mac computers.
With this type of program, you enter the chords for your song, which can change as often as every eighth note, then choose a song style. BIAB
o ers literally hundreds of styles to choose from, which the program then uses to generate bass, drum, keyboard, guitar, and other instrument
parts. Once all of the chords have been entered, you can try di erent style patterns, and you can also overdub melodies as either MIDI or Wave
Another popular auto-arranger program is SoundTrek’s Jammer, for Windows only, shown in Figure 14.21. Jammer includes fewer styles than
BIAB, but it o ers more ways to customize and control the music it generates. Where BIAB is mostly pattern-based, applying set patterns onto
the chord changes you enter, Jammer uses intelligent algorithms to generate original performances. To my ears, the music Jammer creates seems
a little more hip, though Band-in-a-Box has a loyal following, too.
Figure 14.21:
SoundTrek’s Jammer is a complete song creation environment, and the music it generates can be surprisingly realistic.
Many keyboard hardware synthesizers also o er auto-arranger features, from inexpensive Casio models to the high-end Korg Oasis and M3
workstations. With these keyboards you play single notes in the bass range with your left hand, and the keyboard generates music in various
styles while you play melodies with your right hand. If you play only one note, the keyboard assumes a major chord at that note for the music it
creates. But you can play other chord types with your left hand to specify minor chords, seventh chords, and so forth. The music and sound
quality from some of these keyboards can be very good!
Notation Software
Another popular type of music-making computer program is notation software, used mainly by composers and musicians who need to create
printed music for performers to play. All notation programs can import standard MIDI les, so you can start a project in your DAW as a MIDI
sequence playing virtual instruments, then import the MIDI tracks into notation software to publish printed sheet music. Modern notation
software o ers many of the features of a full MIDI sequencer, such as changing instrument sounds to experiment while composing, varying the
tempo, or trying a part an octave higher or lower. You can also change the printed (only) key automatically for trumpets, French horns, and
other transposing instruments.
Orchestra and big band music is often written as a full score using software samplers to play the various parts, and then the notation software
extracts each instrument part to a separate le for more re ned formatting and printing. The exibility o ered by these programs is impressive,
and you can spend a lot of time making things look as perfect as you want. It took me about a month to typeset the score and 26 individual
orchestra parts for my cello concerto, and Figure 14.22 shows one page of the solo cello part. As you can see, all of the standard music notation
conventions and symbols are supported, including dynamic markings,
element can be placed precisely on the page.
Figure 14.22:
ngerings, clef types, and natural and arti cial harmonics, and every
Modern notation software can create a full orchestra score with publication quality, then extract all the individual parts automatically.
This chapter covers a lot of territory, including detailed explanations of popular synthesis methods. Along the way, I show the internal details of
analog synthesizers, including sound generators, modulators, and lters. A comparison of additive and subtractive synthesis methods is presented,
as well as an explanation of digital synthesis types including FM, physical modeling, and granular synthesis. In addition, you learned about Bob
Moog’s clever use of DC voltages to control oscillator and filter frequencies, and other synthesizer parameters.
This chapter also includes an in-depth explanation of MIDI keyboards and alternate control devices such as guitar and breath controllers.
This chapter also includes an in-depth explanation of MIDI keyboards and alternate control devices such as guitar and breath controllers.
Samplers are also covered, including an overview of sample libraries and the way they employ key and velocity switching to increase realism.
Software and hardware auto-accompaniment products are described, along with a brief overview of notation software.
Part 3
A transducer is a device that converts one form of energy to another. A lightbulb that converts electricity to light is a transducer, as are electric
and gasoline motors that convert electricity and combustion, respectively, to mechanical motion. Even a foam or berglass absorber used for
acoustic treatment could be considered a type of transducer, because it converts acoustic energy to heat. In that case, however, the energy is
discarded rather than reused in its new form.
These days most electronic devices have su ciently high quality to pass audio with little or no noticeable degradation. But electromechanical
transducers—microphones, contact pickups, phonograph cartridges, loudspeakers, and earphones—are mechanical devices, and thus are
susceptible to frequency response errors and unwanted coloration from resonance, distortion, and other mechanical causes. For example, the
cone of a loudspeaker’s woofer needs to be large in order to move enough air to ll a room with bass you can feel in your chest, but it’s too
massive to move quickly enough to reproduce high frequencies efficiently and with broad dispersion.
Therefore, most loudspeakers contain separate drivers for the di erent frequency ranges, which is yet another cause of response errors when
sounds from multiple drivers combine in the air. At frequencies around the crossover point where two drivers produce the same sound, comb
filtering peaks and nulls result from phase differences between the drivers. Where most audio gear is flat within a fraction of a dB over the entire
audible range, with relatively low distortion, the frequency response of loudspeakers and microphones can vary by 5 to 10 dB or more, even
when measured in an anechoic test chamber to eliminate room effects. Transducers also add much more distortion than most electronic circuits.
A microphone doesn’t require separate woofers and tweeters, but its diaphragm can move only so far before bottoming out in the presence of
loud sounds. Further, as a microphone’s diaphragm approaches that physical limit, its distortion gradually increases. The diaphragm in a
dynamic microphone is attached to a coil of wire whose mass restricts how quickly it can vibrate, which in turn limits its high-frequency
response. All microphones have a resonant frequency, resulting in a peak at that frequency as well as ringing. As explained in Chapter 1, a mass
attached to a spring resonates at a frequency determined by a combination of the two properties. With a dynamic microphone, the resonant
frequency is determined by the mass of the coil and the springiness of the diaphragm suspension. In a sealed capsule design, the air trapped
inside can also act as a spring.
The same resonance occurs with loudspeakers: They can continue to vibrate after the source stops, unless they’re mechanically or electrically
damped. Indeed, the design of transducers is always an engineering compromise between frequency response both on and o
handling capability (for speakers), SPL handling (for microphones), overall ruggedness, and, of course, the cost to manufacture.
axis, power
Chapter 15
Microphones and Pickups
Chapter 6 explained microphone basics such as their pickup patterns and common placements, without getting overly technical. This chapter
delves more deeply into the various microphone types and how they work internally. Microphones respond to air pressure, converting pressure
changes into a varying voltage. When the changing voltage is sent to a loudspeaker, the speaker moves to recreate the original pressure changes
at the same rate and volume. If everything goes well, sound from the loudspeaker should resemble closely what was picked up by the
Most microphones employ a lightweight diaphragm that vibrates in response to pressure from the acoustic waves striking it. However, there
are also contact microphones that physically attach to an instrument such as an acoustic guitar or stand-up bass. In this design, sound waves
transfer directly from the instrument’s body to the pickup, rather than rst passing through the air. Di erent parts of an acoustic instrument’s
body vibrate di erently than others, so depending on where the pickup is attached, the fundamental pitch or a particular harmonic might be
stronger or weaker.
Another type of input transducer is the phonograph cartridge. Years ago, many phono cartridges were made from piezoelectric crystals that
generate voltage when twisted or bent. A lightweight thin cantilever—similar to a tiny see-saw—transfers motion from a needle in the record’s
groove applying pressure to the crystal. These days most cartridges are electromagnetic, using a coil of wire and magnet, though piezo pickups
are still used in inexpensive phonographs.
The more common electromagnetic phono cartridge employs a moving magnet design, where a tiny magnet placed near a coil moves in step
with the record grooves. An equivalent design instead moves the coil, while the magnet remains stationary. While more expensive, moving coil
cartridges usually have a better high-frequency response than moving magnet types. One reason is because a coil weighs less than a magnet and
so can vibrate more quickly. A moving coil design also has fewer turns of wire to minimize its weight, so its output impedance is lower and is
a ected less by capacitance in the connecting wires. One important downside of moving coil designs is their very low output level due to having
so few turns of wire. Therefore, a moving coil pickup requires a special high-gain, low-noise preamp.
Microphone Types
The earliest microphones were made of carbon granules packed into a small metal cup with an acoustic diaphragm on top, as shown in Figure
15.1. Carbon passes electricity, though not as well as copper wire. When the granules are packed loosely, they have a higher resistance—and
pass less electricity—than when packed tightly. Voltage from a battery is sent through the metal diaphragm and cup assembly. When positive
wave pressure reaches the diaphragm, the carbon compresses slightly, which lowers its resistance, letting more electricity pass through to the
output. Negative wave pressure instead pulls the diaphragm away from the carbon, so it’s compressed less and the output voltage decreases.
Figure 15.1:
A carbon microphone generates electricity by modulating the DC voltage from a battery. A capacitor blocks the constant at-rest DC voltage from passing to the audio
output, so only voltage changes get through.
Carbon microphones were used years ago in telephones, and some may still be in service today. Technically, this is considered an active
microphone because the output voltage is derived from a DC power source. In old telephones, the voltage is supplied by the phone company
and comes down the same wires used for the voice audio. Many telephones today contain electronics for memory-dial and other modern
features, so they can more easily include a preamp suitable for use with dynamic or condenser microphones that are higher quality than the
older carbon types.
Other early microphones, called crystal mics, use piezoelectric elements. This is a thin wafer of crystal or ceramic material, sandwiched
between two thin metal plates that carry the output voltage. Like a piezo phono cartridge, a crystal microphone generates voltage when the
between two thin metal plates that carry the output voltage. Like a piezo phono cartridge, a crystal microphone generates voltage when the
exes the crystal. One advantage of piezo microphones (and phono cartridges) is their relatively high output voltage, so less gain is
needed in the preamp. However, the output impedance is quite high, requiring short, low-capacitance connecting wires.
Piezo mics often have a “peaky” response that emphasizes midrange frequencies, so they’re not used today for professional audio recording.
However, that type of response works well for communications applications, and crystal microphones were very popular with amateur radio
operators in the mid-twentieth century. I’ll also mention that crystal mics were an essential part of the Chicago ampli ed blues harmonica sound
that developed in the late 1940s and early 1950s. Home tape recorders became popular in that period, and most included a crystal mic. Because
of their high impedance, high output level, and ¼-inch phone plug, these mics were an obvious match for a guitar ampli er. The combination
of a convenient size and shape, a peaky response that complemented the harmonica, enough output to drive an amplifier into distortion, and the
ampli er’s built-in reverb formed the basis of a sound that’s still with us today. I won’t dwell further on older microphones types that are no
longer used professionally, though it’s useful to understand how these early microphones work and their history.
The mass and weight of a microphone’s diaphragm a ects its high-frequency response, so dynamic microphones with an attached coil tend to
roll o well below 20 KHz. The diaphragms in condenser and ribbon mics are much lighter, and without the weight of the coil, they have a
better high-end response with less ringing. “Tiny” diaphragm condensers are better still, because their lighter diaphragms can respond to
frequencies beyond 20 KHz, which improves their response below 20 KHz. Ribbon diaphragms are also light, but their length limits their
response to less than 20 KHz.
Besides classifying microphones by how they create electricity from sound waves, they’re also categorized by how they respond to changes in
wave pressure. A pressure microphone responds to the absolute amount of air pressure reaching its diaphragm, which makes it omnidirectional.
Whether a wave arrives from the front, the rear, or the side, the wave’s pressure exerts the same positive or negative force on the microphone’s
The other type is the pressure-gradient microphone. The diaphragm in a pressure-gradient mic is open on both sides, so it instead responds to
the difference in pressure reaching the front and back of the diaphragm. Therefore, these microphones are inherently directional; if the same
sound wave strikes both the front and rear of the diaphragm equally, the result is no physical movement and therefore no electrical output. The
classic example of a pressure-gradient pickup pattern is the figure 8, shown earlier in Figure 6.3.
Dynamic Microphones
Among modern designs, dynamic microphones are very popular because they’re sturdy and have an acceptable, if not always great, frequency
response. When Electro-Voice years ago introduced their 664 dynamic microphone, they showed o its ruggedness at product demos by using it
as a hammer, earning this mic the endearing name “Buchanan Hammer” for Buchanan, Michigan, where Electro-Voice was located. Dynamic
microphones generate electricity by placing a lightweight coil of very ne wire in a magnetic eld, using a principle called electromagnetic
induction. This is shown in Figure 15.2.
Figure 15.2:
A dynamic microphone generates electricity by placing a coil of wire within a magnetic
corresponding output voltage is produced.
eld. As the diaphragm pushes the coil back and forth through the
eld, a
Note that the compliant diaphragm edge is shown to make the operating principle clear. In practice, many microphone diaphragms are more
like a drum head; the diaphragm stretches slightly, and the center is displaced by sound waves even though the edges are secured to the housing.
Also note that the round bar magnet shown is just one possible shape. The magnet for the microphone in Figure 15.3 is round, like a very thick
coin, with a hole in the center to accept the coil similar to Figure 16.3.
The further the coil moves through a magnetic eld, the larger the output voltage. And the faster it moves, the faster the voltage changes.
Therefore, the output voltage of a dynamic microphone corresponds to the changing wave pressure striking its diaphragm, within the frequency
response and other mechanical limits of the moving parts. Figures 15.3 and 15.4 show a cheap dynamic microphone I took apart to reveal the
plastic diaphragm and attached tiny coil.
Figures 15.3 and 15.4
These photos show a dynamic microphone’s plastic diaphragm (top) and the attached coil of
ne wire (bottom). This cheap mic gave its life for audio
Dynamic microphones having a sealed enclosure as in Figure 15.2 are omnidirectional, responding to sound arriving from all angles. A
pressure transducer responds to changes in atmospheric pressure, not air ow, so in theory the direction the sound arrives from doesn’t matter.
At any point in space, the barometric pressure is whatever it is. In practice, even the best omnidirectional microphones become slightly
directional at higher frequencies. At very high frequencies, the microphone’s body is larger than the acoustic wavelengths, so some sound from
the rear is blocked from reaching the front of the diaphragm. The diaphragm’s diameter is another factor, again mostly at very high frequencies
where the wavelengths are similar to the diameter. This increased directionality at higher frequencies applies to all omni microphones, not just
dynamic models.
Dynamic Directional Patterns
Achieving other directional patterns with dynamic microphones requires adding an acoustical delay for sound arriving at the rear. Figure 15.5
shows a simpli cation of the method Electro-Voice uses to create a cardioid response in their Variable-D series, such as the RE20 model. Sound
arriving from the front strikes the diaphragm, de ecting it as usual to generate an output voltage. Sound from the rear also arrives at the front of
the diaphragm, as well as entering the various port openings to pass through a labyrinth of ba es. At some frequencies, the ba es delay the
sound enough to put those waves in phase with sound going around the mic’s body to reach the front. Since the phase-shifted waves reaching the
rear of the diaphragm now have the same polarity as at the front, the front sound pressure is canceled. Again, this is a simpli ed model; in
practice, multiple sound paths are used to delay di erent frequencies by di erent amounts, extending the directionality over a wider range of
frequencies than a single delay path.
Figure 15.5:
Cardioid dynamic microphones employ an internal acoustic delay, so sound from the rear arrives in phase with the same sound reaching the front of the diaphragm.
Sound from the front also gets into the ports, but it’s delayed twice—once just to reach the ports farther back along the microphone’s body and
again through the ba es—so some frequencies are phase shifted 180 degrees, reinforcing the front sound rather than canceling it. In other
words, pressure on the diaphragm’s front and rear both push the diaphragm in the same direction—one pushing and the other pulling—which
increases the diaphragm’s de ection and microphone’s output. By using a series of spaced vents along the mic’s body rather than a single port,
sound arriving from the front is staggered over time. This helps minimize the proximity effect, an increased output at low frequencies for sources
close to the microphone.
Most modern cardioid dynamic microphones create the necessary acoustical phase shift with a low-pass lter based on a mass-spring system
built from a weighted fabric (the mass) and air trapped in the capsule behind the diaphragm (the spring). This is shown in Figure 15.6. As with
a labyrinth, it’s not possible for a single lter to create a uniform group delay1 over the entire range of audio frequencies, so rear rejection is less
e ective at very low and very high frequencies. However, high frequencies arriving from the rear are blocked from reaching the front of the
diaphragm by the microphone’s body, so that helps maintain the rejection at higher frequencies where the acoustical delay is less effective.
Figure 15.6:
The common design for an acoustical phase shift network combines fabric covering a short tube with air trapped inside a sealed cavity. Together these create a mass-
spring low-pass filter. This delays sound arriving through the rear port, which provides enough phase shift to cancel the same sound at the front.
Since we can’t build an acoustic
lter that has equal phase shift for all frequencies, directional response varies with frequency. Figure 15.7
shows the polar plot of a cardioid dynamic microphone’s response to sound arriving from di erent angles. This type of graph shows how the
response varies with frequency as well as angle of arrival. Polar response versus frequency is often plotted on the same graph using separate
trace lines, as shown here. At 1 KHz the phase shift works as expected, rejecting sound coming from the rear almost completely. But at 100 Hz,
sound from the rear is attenuated much less, partly because of the proximity effect of nearby rear-arriving sounds, which negates the cancellation.
Figure 15.7:
The cardioid pickup pattern is not uniform with frequency because the acoustic phase shift network attenuates high frequencies more than low frequencies.
The output from directional microphones also rises at low frequencies when the mic is closer than a few feet from the sound source. This
applies to all directional microphones, not just dynamics. Chapter 1 explained the Inverse Square Law, which describes how the intensity of
sound waves falls o with increasing distance. The same happens with microphones. When a singer is only a few inches from the front of a
microphone, the level of the direct sound increases due to the short distance, as expected. But higher frequencies must travel farther to reach the
rear ports that create the directional response, so they’re effectively farther away and thus are attenuated more than low frequencies.
Another directional pattern common with dynamic microphones is the supercardioid, which is similar to cardioid but with a slightly tighter
pickup pattern. This design also rejects sound from the rear less, shifting the maximum rejection from 180 degrees o -axis to 150 degrees, as
shown in Figure 15.8.
Figure 15.8:
A supercardioid pickup pattern is similar to cardioid but slightly more directional.
When even more directionality is required, a shotgun microphone is the best choice. These are popular for TV and film use because they reject
ambient noise arriving from other directions. Shotgun mics can use either dynamic or condenser capsules, and you’ll often see them on the end
of a long pole held by a boom operator.
“Some vendors want to have it both ways. I often see ads for microphones claiming their sound is ‘warm’ and ‘accurate’ in the same sentence.”
The frequency response of most dynamic microphones falls o at the highest frequencies due to the mass of the diaphragm and attached coil.
Their combined weight is simply too great to allow vibrating at frequencies much higher than 10 KHz. Many dynamic microphones also have a
peak in the response corresponding to a natural resonance within the capsule. Some people consider this peak to be bene cial, depending on
where it falls. For example, the Shure SM57 has a peak of around 6 dB between 5 and 6 KHz, so it’s popular for use with snare drums and
electric guitar amps that might bene t from such a “presence” boost. Of course, adding EQ to a mic having a at response can give the same
Ribbon Microphones
Ribbon microphones have been around since the 1930s. Technically, ribbon mics are classi ed as dynamic because they generate electricity via a
metal conductor and magnet, as shown in Figure 15.9. They’re constructed from a very thin strip of lightweight metal, which does double duty as
both the diaphragm and coil, suspended in a strong magnetic field.
Figure 15.9:
A ribbon microphone generates electricity using a metal strip inside a magnetic
eld. Unlike a dynamic microphone that contains many turns of wire, the single metal
strip in a ribbon microphone has a very low output voltage, with a correspondingly low output impedance.
The thin ribbons on early models were fragile and easily destroyed by a blast of air from a signer’s mouth or a loud kick drum in close
proximity. Some early models also had a limited high-frequency response. Modern versions are sturdier and can capture higher frequencies.
However, ribbon microphones have an extremely low output voltage because their electrical source is a single strip of metal rather than a coil of
wire having many turns. The low output voltage is converted to a useable level by a step-up transformer, and some modern ribbon mics contain
a preamp designed speci cally to match the microphone. This improves the signal-to-noise ratio by raising the output level to be comparable to
a dynamic mic.
A typical ribbon mic transformer has a turns ratio somewhere between 20 to 1 and 45 to 1, to convert 0.75 ohms to 300 ohms, or 0.1 ohm to
200 ohms. The ratio between the number of turns on the transformer’s primary coil versus its secondary determines the change in voltage, but
the impedance changes by the square of the ratio. So a ratio of 45 to 1 increases the voltage 45 times but increases the impedance by a factor of
2,025 which is 45 squared. This brings the ribbon’s 0.1-ohm output impedance up to the 200 ohms expected by a mic preamp. Figure 15.10
shows the pieces that comprise a modern ribbon microphone’s capsule.
Figure 15.10:
Ribbon microphones have come a long way since the 1930s! This photo shows the construction of a modern high-end ribbon mic that’s more rugged than earlier
models and also features an improved high-frequency response.
Photo courtesy of Royer Labs.
Most ribbon microphones inherently have a bidirectional pickup pattern, also called gure 8 due to the shape of the polar response when
plotted. Figure 6.3 from Chapter 6 showed a gure 8 pickup pattern, and Figure 15.11 shows why sound arriving from either side is rejected.
The concept is pretty simple: Sound pressure arriving from the side impinges equally on both sides of the exposed ribbon. The waves therefore
cancel, and the ribbon isn’t deflected.
Figure 15.11:
A ribbon microphone has a bidirectional pickup pattern because both sides of the ribbon are exposed to the air.
One important feature of ribbon mics is that their gure 8 pattern is uniform over the entire range of frequencies, varying only slightly due to
re ections inside the microphone’s case and grill. Where a cardioid mic’s pickup pattern varies substantially with frequency, you can place a
ribbon mic to reject sound sources from both sides, con dent that all of the sound will be rejected, not just the midrange, and that any o -axis
sound picked up will not sound colored. Of course, re ections from a rejected source can bounce o room surfaces and nd their way to the
front or rear of the mic.
A ribbon’s self-resonance is at a very low frequency, and the pressure gradient between the front and rear of the ribbon rises at 6 dB per
octave. This compensates for the ribbon’s inherent 6 dB per octave roll-o
above resonance, resulting in a net at frequency response. Ribbon
mics tend to have a uniform response that extends to high frequencies, with fewer ripples than many dynamic models. The ribbon diaphragm is
very thin with a low mass, and no coil is attached as with dynamic mics, so the ribbon can move quickly. However, the frequency response of a
classic “long ribbon” mic doesn’t extend quite as high as a good small diaphragm condenser mic. Modern designs use shorter ribbons, often
stiffened by vertical creases to vibrate more like a flat diaphragm.
Condenser Microphones
Condenser mics are the
rst choice of many recording professionals, favored for their extended high-frequency response. They use electrostatic
properties to generate the electric signal rather than electromagnetism as with dynamics and ribbons. The word condenser is an obsolete name
for the more modern term capacitor, but it’s still used by recording engineers to describe this type of microphone.
The diaphragm material for a dynamic microphone can be almost anything that’s light and compliant enough for the purpose because the
connection to the attached coil is purely mechanical. But the diaphragm in a condenser microphone must be capable of conducting electricity
because it forms one plate of a capacitor. The diaphragm in modern condenser mics is typically very thin Mylar lm, with an even thinner layer
of gold applied to make it conductive. You can see in Figure 15.12 that some condenser microphones also feature switch-selectable pickup
Figure 15.12:
Modern large-diaphragm multipattern condenser microphone capsules employ two gold-plated Mylar diaphragms placed back to back, with a rigid metal plate between
them. A switch changes the DC bias voltage on the diaphragms to vary the pickup pattern between omnidirectional, figure 8, and cardioid.
Photo courtesy of TELEFUNKEN Elektroakustik.
Condenser microphones are often categorized by the diameter of their diaphragm—large or small. A large diaphragm is generally an inch or
larger in diameter, and a small diaphragm is typically half an inch or less. Another type uses what I call a “tiny” diaphragm, often about 1/4
inch in diameter, though some are even smaller. Originally designed for measurement due to their extremely at frequency response, tiny
diaphragm mics have been embraced by recording engineers since the 1980s.
Generally speaking, the smaller the diaphragm, the faster it can move and, in turn, the higher the frequency the microphone will respond to.
The downside of small diaphragms is they capture a smaller portion of the acoustic waves, so they have less output and thus a poorer signal to
noise ratio. On the other hand, receiving less of the wave means a small diaphragm mic can handle a louder acoustic volume before distorting.
The DPA 4090 shown in Figure 15.13 can handle SPLs as high as 134 dB before clipping, with an equivalent input noise of 23 dB A-weighted.
There are also extremely tiny condenser mics made by DPA and others that clip directly onto violins, trumpets, and other instruments.
Figure 15.13:
The diaphragm in this precision DPA 4090 omni condenser microphone is slightly smaller than 1/4 inch in diameter.
Condenser microphones require DC power for two purposes: A DC bias voltage is needed to charge the internal capacitor formed by the
diaphragm(s) and rigid back plate and to power the built-in preamp that all condenser microphones require. The capacitor must be charged
initially, because the voltage across the capacitor changes as the capacitance varies in response to sound waves de ecting the diaphragm. It’s the
changein voltage across the capacitor that eventually appears at the microphone’s output. Without an initial DC voltage to start with, changing
the capacitance does nothing. However, electret condenser microphones are charged permanently during manufacture, so this type of mic needs
voltage only to power the built-in preamp. Regardless of a condenser mic’s element type, the necessary voltage can come from either batteries or
an external power supply.
The basic operation of a condenser microphone is shown in Figure 15.14. The bias voltage is shown as the schematic symbol for a
conventional battery, though it’s labeled 48 volts, since that’s the standard for phantom power. Some microphones can operate with as little as
15 volts, using an internal DC-to-DC converter to generate the higher voltage needed to polarize the capsule.
Figure 15.14:
The electrical output from a condenser microphone is derived from a DC voltage that changes as the diaphragm is deflected by sound waves.
The output voltage of a condenser microphone is fairly large, but its extremely high output impedance provides only an in nitesimal amount
of current. Therefore, the built-in preamp must have a very high input impedance to prevent loading the capsule’s output. A typical value is
10 M—10 million ohms—or even higher. These preamps use either an FET transistor or a vacuum tube to achieve a suitably high input
impedance. Although “preamp” is the common term for the electronics in a condenser mic, it’s a bit of a misnomer. It’s really an active
impedance converter.
U nlike the transformer in a ribbon mic that raises the output impedance along with the voltage, a condenser capsule’s impedance must be
reduced to a value usable in the outside world. As explained in Chapters 2 and 4, a circuit that operates at an extremely high impedance is
susceptible to high-frequency losses due to wiring and other capacitance. So a condenser mic’s electronics are built into the microphone close to
the capsule. This also reduces the chance of picking up hum and radio signals, because the capsule and preamp are shielded inside the mic’s
metal enclosure.
Condenser Directional Patterns
A condenser microphone, like any other microphone built with a sealed back, responds to changes in atmospheric pressure and is inherently
omnidirectional. When sound is blocked from the rear, wave pressure arriving at the front of the mic de ects the diaphragm no matter which
angle it comes from. A cardioid pickup is created by drilling a pattern of holes in the back plate to allow sound waves to reach the diaphragm
through that path, similar to the rear vents in a dynamic microphone. If a second diaphragm is added on the other side of the back plate, other
patterns can be created by combining the outputs from both diaphragms.
Figure 15.15 shows how the three most common pickup patterns are selected by applying a positive, a negative, or zero voltage to the rear
diaphragm. It’s also possible to obtain in-between directional patterns by varying the DC bias voltage applied to the rear diaphragm, rather than
diaphragm. It’s also possible to obtain in-between directional patterns by varying the DC bias voltage applied to the rear diaphragm, rather than
applying the full 48 volts using a switch.
Figure 15.15:
The pickup pattern of a pressure-gradient condenser microphone can be changed by switching the bias voltage applied to the rear diaphragm.
Condenser microphones tend to have an excellent frequency response, thanks to the low mass of their diaphragms. Not only are the
diaphragms light enough to vibrate quickly, but their self-resonance can be better controlled. So like ribbon microphones, condenser
microphones tend to have a smooth response across the audible range, with minimal rippling. Condenser microphones that have a tiny
diaphragm ¼ inch or less are especially well suited to capturing very high frequencies. These are often used by acousticians to measure
loudspeaker and room response.
One downside of condenser microphones is they’re more fragile than dynamic mics. Another issue is the close proximity between the
diaphragm and the back plate, which makes them sensitive to humidity. In very humid conditions, a condenser microphone might make a
sputtering sound as the DC bias voltage arcs across the narrow gap, or the mic may not work at all.
Another type of condenser microphone uses the changing capacitance to modulate a radio frequency (RF) signal rather than generate audio
directly by varying a DC bias voltage. This is exactly the same as conventional FM radio, where the frequency of a radio oscillator, or carrier, is
modulated at an audio rate, then demodulated to reproduce the original sound. Sennheiser has been making RF condenser microphones this way
for many years, and one advantage is less influence from humidity because no polarizing voltage is used.
Other Microphone Types
Boundary microphones are omnidirectional and are meant to be mounted directly onto a re ecting surface—generally a wall of a recording
studio or other room, though they’re often used on the stage oor in theaters. They can also be placed on a large surface such as a conference
table or lectern. This type of mic is commonly called PZM, short for Pressure Zone Microphone, though that name is a trademark of Crown
International, the company that licensed and produced the rst commercial version in 1980. Like Kleenex, the trademarked name has become
the generic name in the industry.
The main feature of a PZM is it avoids comb ltering due to re ections from nearby surfaces. Since the microphone element is aimed directly
toward a large surface and is placed very close to that surface, it receives only direct sound from the surface rather than a mix of direct and
re ected sound. Of course, audio sources can still sound distant and reverberant when they’re far away from the microphone due to other
reflections in the room.
A parabolic microphone combines a directional microphone capsule with a parabolic re ector to create a microphone that’s extremely
directional. The principle is the same as a TV satellite dish, and the dish diameters are even similar; sound waves travel much more slowly than
radio waves, so acoustic wavelengths are in turn much shorter than radio wavelengths. A dish that’s 18 inches in diameter is one wavelength at
750 Hz when used with a microphone, but it’s also one wavelength at 650 MHz when used as a radio antenna.
Optical microphones work much like radar, but instead of radio waves, they send a laser beam toward a vibrating surface, then convert the
varying re ections into an audio signal. This type of microphone is not so useful in a recording studio, but it’s great for spying on your
neighbors. If you aim the mic at a closed window, sound from people talking on the other side of the window vibrates the glass ever so slightly.
The changing reflections are then decoded back to the original sound.
A U SB microphone can be any basic type, but most are cardioid condenser or dynamic. What distinguishes a U SB mic is its built-in preamp,
plus an A/D converter that shows as an audio sound card input in your recording software. The main advantage for home recordists and
podcasters is they’re simple to connect to a computer. They don’t need a preamp or mixer, just a computer with a U SB port. Some U SB mics
have a built-in earphone jack and small mixer so you can hear yourself while recording, and there are also stereo U SB mics.
Phantom Power
Phantom Power
Phantom power is a clever method of sending DC voltage to microphones that rely on external power. The most common is 48 volts; often the
phantom power switch on a mixer or preamp is labeled “48 V,” but phantom power as de ned in the standard IEC-61938 covers 12, 24, and 48
volts. Rather than require two additional wires just for the power feed, the same two wires that send audio from the microphone to the preamp
are also used to send the 48 volts to the mics. Even better, microphones that do not need power can usually be connected safely and won’t be
harmed or otherwise affected.
Phantom power works with balanced microphones only, which is usually the case with mics that have an XLR output connector. It can also be
used to power active DI boxes. Phantom power is usually built into mixers or microphone preamps, though stand-alone units are available. Most
condenser microphones that use a vacuum tube need more than 48 volts to operate the preamp, so they come with their own power supply
rather than relying on phantom power. The block diagram of a phantom power system in Figure 15.16 is divided to show the preamp and
microphone portions.
Figure 15.16:
Phantom power sends 48 volts through matched resistors to both signal wires from the microphone’s output transformer. That voltage is then taken from the center tap
of the output transformer and used internally by the microphone. Mics that don’t need power are not a ected as long as they use balanced wiring and their output transformer or voice
coil is not connected to the ground.
The key to phantom power is applying exactly the same voltage to both the plus and minus signal wires. Precision 1 percent (or better)
tolerance resistors are used to avoid upsetting the balanced connection. Since transformers pass audio frequencies but not DC, the 48 volts is
taken from a center tap on the output side of the mic’s output transformer. It’s also possible for a microphone or active DI to receive phantom
power even if it doesn’t have an output transformer. In that case, a corresponding pair of matched resistors inside the unit taps into both signal
wires to retrieve the voltage.
You should never connect an unbalanced microphone of any type to a preamp that provides phantom power. You probably won’t harm the
preamp because the 6.8 K resistors limit the amount of current that can be drawn, but the microphone might be damaged.
There’s a long-standing myth that ribbon mics should never be connected to phantom power. Like many myths, this has some basis in fact:
The original RCA Model 44 microphone had a center tapped output transformer with the tap grounded to reduce hum. If this mic was connected
to a phantom power source, voltage was applied across the ribbon, causing it to tear. Once users gured that out, the center taps were
disconnected, solving the problem. Yet the “all ribbon mics are very sensitive and shouldn’t be connected to phantom power” myth continues.
It’s also possible to damage a balanced mic that doesn’t need power by passing its output through a conventional Tip/Ring/Sleeve-type patch
bay. When you insert a ¼-inch balanced plug into a balanced jack, it’s possible for the plug’s tip contact to brie y touch the jack’s ring and the
plug’s ring to touch the jack’s grounded sleeve at the same time. If that happens, the full 48 volts is sent back into the microphone’s coil or
ribbon. U sing a ¼-inch patch panel for balanced microphones is not usually recommended, but some people do it anyway. If you use this type
of patch bay with microphones, be sure to turn off the phantom power supply every time before plugging or unplugging any microphone.
Microphone Specs
“I’ve certainly spent many hours with finicky artists trying different vocal mics, all of which sound remarkably similar, and all I have to say is that I felt it was a waste of time.”
—Alan Parsons, famous recording engineer/producer
All of the standard audio specs apply to microphones as well—frequency response, distortion, ringing, and noise. Noise is a factor only with
active microphones, those that contain a built-in preamp. Of course, a passive microphone that outputs a very small signal requires more gain
active microphones, those that contain a built-in preamp. Of course, a passive microphone that outputs a very small signal requires more gain
from the preamp, so the preamp’s noise can be a problem with soft sources. Microphone noise is often referred to as self-noise, and it’s usually
spec’d relative to an equivalent A-weighted ambient SPL. So if a given microphone is stated to have a self-noise of 15 dB SPL, the noise you’ll
get in practice is the same as placing a mic having no inherent noise in a room whose background noise is 15 dB SPL.
THD and IM distortion are also important microphone specs. U nlike electronic circuits that are usually very clean right up to the onset of gross
distortion, microphone distortion often creeps up slowly at higher SPLs. A mic’s diaphragm can
ex only so far before it bottoms out in either
direction, not unlike a loudspeaker, but tension on the diaphragm increases before hitting a hard limit of excursion. Active microphones contain
electronics, so that’s another potential source of distortion. As mentioned in Chapter 5, many active microphones include a built-in attenuating
“pad” that can be switched on when recording drums and other loud sources. The pad connects between the mic’s capsule and internal preamp
to avoid overdriving the preamp. But eventually the diaphragm itself will distort. Further, even when using dynamic mics, some older preamps
don’t allow setting their gain low enough to avoid distortion when the input voltage is very high, so in that case you’ll need an external passive
Measuring Microphone Response
Measuring the frequency response of a microphone requires an anechoic chamber, as shown in Figure 15.17. Trying to measure the response of a
microphone (or loudspeaker) in a regular room doesn’t work because re ections from the room’s surfaces skew the response. You also need a
loudspeaker sound source with a response that’s very at, or at least known. Many high-end microphones come with a printed response graph
measured at the factory for that speci c mic. You can also send a microphone to a third-party calibration company, which they’ll return along
with a graph of the measured response.
Figure 15.17:
An anechoic chamber absorbs sound at all frequencies down to 100 Hz or even lower. This avoids re ections that skew the results when measuring the frequency
response of microphones and speakers. Note the steel mesh floor with additional absorption below to avoid floor reflections.
Photo courtesy of Orfield Labs.
It’s possible to measure a mic’s frequency response yourself, but it’s tricky. One way is to hoist the microphone and a known- at loudspeaker
20 or 30 feet up in the air outdoors to avoid re ections from the ground. However, nding a loudspeaker that itself is accurate enough will be a
challenge, not to mention the mechanical logistics of such a test. You can also measure the response in a regular room, using a technique called
gating to ignore the re ected waves that arrive soon after the original direct sound. Most room measuring software includes an option to specify
the gate time, and this method is also used to measure loudspeakers. The problem is you need to set the input gate to turn o the audio only a
few milliseconds after the direct sound arrives at the mic, unless all of the room’s surfaces are far away. So unless you have a large room with a
very high ceiling and you put the microphone and speaker up on a tall ladder, you’ll need to set the gate time so short that low frequencies are
excluded from the measurement.
I once saw a post in an audio forum by a student who used this method to measure a number of popular studio monitors. He borrowed a
motorized lift to raise each speaker 20 feet above the oor in his college’s auditorium and used an equally tall microphone stand to plac