Building Electro-optical Systems

Building Electro-optical Systems
Philip C. D. Hobbs
Electrooptical Innovations
Briarcliff Manor, New York
Copyright © 2009 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted
under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written
permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the
Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978)
750-4470, or on the web at Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
(201) 748-6011, fax (201) 748-6008, or online at
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or completeness
of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives or written sales materials.
The advice and strategies contained herein may not be suitable for your situation. You should consult with a
professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any
other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at (800) 762-2974, outside the United States at (317)
572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not
be available in electronic formats. For more information about Wiley products, visit our web site at
Library of Congress Cataloging-in-Publication Data:
Hobbs, Philip C. D.
Building electro-optical systems : making it all work / Philip C.D. Hobbs.—2nd ed.
p. cm.—(Wiley series in pure and applied optics)
Includes bibliographical references and index.
ISBN 978-0-470-40229-0 (cloth)
1. Electrooptical devices–Design and construction. I. Title.
TA1750.H63 2008
621.381 045—dc22
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
In memory of my father,
Gerald H. D. Hobbs
John 6:40
We have a habit in writing articles published in scientific journals to make the work as
finished as possible, to cover up all the tracks, to not worry about the blind alleys or
describe how you had the wrong idea first, and so on. So there isn’t any place to publish,
in a dignified manner, what you actually did in order to get to do the work.
—Richard P. Feynman, Nobel lecture 1996
1 Basic Optical Calculations
Introduction / 1
Wave Propagation / 3
Calculating Wave Propagation in Real Life / 9
Detection / 33
Coherent Detection / 34
Interferometers / 36
Photon Budgets and Operating Specifications / 38
Signal Processing Strategy / 44
2 Sources and Illuminators
Introduction / 52
The Spectrum / 52
Radiometry / 54
Continuum Sources / 55
Interlude: Coherence / 59
More Sources / 63
Incoherent Line Sources / 68
Using Low Coherence Sources: Condensers /
Lasers / 71
Gas Lasers / 72
Solid State Lasers / 73
Diode Lasers / 75
Laser Noise / 83
Diode Laser Coherence Control / 89
3 Optical Detection
Introduction / 91
Photodetection in Semiconductors
/ 92
Lenses, Prisms, and Mirrors
Introduction / 145
Optical Materials / 145
Light Transmission / 149
Surface Quality / 150
Windows / 151
Pathologies of Optical Elements / 152
Fringes / 153
Mirrors / 158
Glass Prisms / 160
Prism Pathologies / 165
Lenses / 165
Complex Lenses / 171
Other Lens-like Devices / 175
Coatings, Filters, and Surface Finishes
Signal-to-Noise Ratios / 92
Detector Figures of Merit / 94
Quantum Detectors / 100
Quantum Detectors with Gain / 109
Thermal Detectors / 117
Image Intensifiers / 118
Silicon Array Sensors / 120
How Do I Know Which Noise Source Dominates? / 131
Hacks / 136
Introduction / 180
Metal Mirrors / 182
Transmissive Optical Coatings / 184
Simple Coating Theory / 186
Absorptive Filters / 196
Beam Dumps and Baffles / 198
White Surfaces and Diffusers / 204
Introduction / 208
Polarization of Light / 208
Interaction of Polarization with Materials
Absorption Polarizers / 215
Brewster Polarizers / 216
Birefringent Polarizers / 217
/ 211
Double-Refraction Polarizers / 218
TIR Polarizers / 221
Retarders / 223
Polarization Control / 226
7 Exotic Optical Components
Introduction / 262
Fiber Characteristics / 262
Fiber Theory / 266
Fiber Types / 272
Other Fiber Properties / 277
Working with Fibers / 281
Fiber Devices / 287
Diode Lasers and Fiber Optics / 292
Fiber Optic Sensors / 292
Intensity Sensors / 293
Spectrally Encoded Sensors / 295
Polarimetric Sensors / 298
Fiber Interferometers / 299
Two-Beam Fiber Interferometers / 300
Multiple-Beam Fiber Interferometers / 301
Phase and Polarization Stabilization / 305
Multiplexing and Smart Structures / 307
Fiber Sensor Hype / 307
9 Optical Systems
Introduction / 233
Gratings / 233
Grating Pathologies / 236
Types of Gratings / 237
Resolution of Grating Instruments / 240
Fine Points of Gratings / 242
Holographic Optical Elements / 244
Retroreflective Materials / 245
Scanners / 246
Modulators / 254
8 Fiber Optics
Introduction / 309
What Exactly Does a Lens Do? / 309
Diffraction / 319
Optical Measurements
Introduction / 354
Grass on the Empire State Building / 354
Detection Issues: When Exactly Is Background Bad? / 359
Measure the Right Thing / 364
Getting More Signal Photons / 366
Reducing the Background Fluctuations / 370
Optically Zero Background Measurements / 373
Electronically Zero Background Measurements / 376
Labeling Signal Photons / 380
Closure / 385
Designing Electro-Optical Systems
Aberrations / 336
Representing Aberrations / 340
Optical Design Advice / 344
Practical Applications / 345
Illuminators / 349
Introduction / 387
Do You Really Want to Do This? / 387
Very Basic Marketing / 393
Classes of Measurement / 396
Technical Taste / 398
Instrument Design / 402
Guiding Principles / 407
Design for Alignment / 410
Turning a Prototype into a Product / 413
Building Optical Systems
Introduction / 415
Build What You Designed / 416
Assembling Lab Systems / 416
Alignment and Testing / 421
Optical Assembly and Alignment Philosophy
Collimating Beams / 426
Focusing / 428
Aligning Beams with Other Beams / 430
Advanced Tweaking / 434
Aligning Laser Systems / 439
Adhesives / 441
/ 425
Cleaning / 443
Environmental Considerations / 446
13 Signal Processing
Introduction / 509
Resistors / 510
Capacitors / 512
Transmission Lines / 522
Transmission Line Devices / 528
Diodes and Transistors / 530
Signal Processing Components / 539
Digitizers / 548
Analog Behavior of Digital Circuits / 558
Electronic Subsystem Design
Introduction / 448
Analog Signal Processing Theory / 449
Modulation and Demodulation / 453
Amplifiers / 462
Departures from Linearity / 462
Noise and Interference / 467
Frequency Conversion / 483
Filtering / 487
Signal Detection / 498
Reducing Interference and Noise / 502
Data Acquisition and Control / 504
14 Electronic Building Blocks
Introduction / 560
Design Approaches / 560
Perfection / 569
Feedback Loops / 571
Signal Detectors / 577
Phase-Locked Loops / 587
Calibration / 590
Filters / 592
Other Stuff / 595
More Advanced Feedback Techniques / 597
Hints / 599
Linearizing / 601
Digital Control and Communication / 604
Electronic Construction Techniques
Introduction / 612
Circuit Strays / 612
Stray Coupling / 617
Ground Plane Construction / 618
Technical Noise and Interference / 621
Product Construction / 625
Getting Ready / 628
Prototyping / 629
Surface Mount Prototypes / 634
Prototyping Filters / 637
Tuning, or, You Can’t Optimize What You
Can’t See / 639
Digital Postprocessing
Miscellaneous Tricks / 607
Bulletproofing / 607
Introduction / 644
Elementary Postprocessing / 645
Dead Time Correction / 650
Fourier Domain Techniques / 650
Power Spectrum Estimation / 666
Digital Filtering / 671
Deconvolution / 675
Resampling / 676
Fixing Space-Variant Instrument Functions
Finite Precision Effects / 680
Pulling Data Out of Noise / 681
Phase Recovery Techniques / 685
/ 678
Front Ends
Introduction / 688
Photodiode Front Ends / 690
Key Idea: Reduce the Swing Across Cd / 692
Transimpedance Amplifiers / 693
How to Go faster / 714
Advanced Photodiode Front Ends / 721
Other Types of Front End / 731
Hints / 734
19 Bringing Up the System
Introduction / 738
Avoiding Catastrophe / 741
Debugging and Troubleshooting / 744
Getting Ready / 745
Indispensable Equipment / 748
Analog Electronic Troubleshooting / 749
Oscillations / 753
Other Common Problems / 755
Debugging and Troubleshooting Optical Subsystems / 758
Localizing the Problem / 762
Appendix: Good Books
You are fools, to say you learn from your mistakes. I learn from the mistakes of other men.
—Otto von Bismarck
This is a book of lore. Lore is an old word for wisdom and knowledge. While it often
refers to magic and epic poetry, what I mean by it is altogether more homely: a mixture
of rules of thumb, experience, bits of theory, and an indefinable feeling for the right way
to do things, a sort of technical taste. It is what makes the difference between analyzing
a design once completed and coming up with a good design to fit a particular purpose.
Course work and textbooks have lots of analysis but most contain no lore whatsoever.
One of the odd things about lore is that it lives in the fingers more than in the brain,
like piano playing. In writing this book, I have often run up against the difference between
how I do something and how I think I do it, or how I remember having done it. Since
it’s the actual lore of doing that is useful, I have where possible written or revised each
section when I was actually doing that task or consulting with someone who was. I hope
that this gives those sections a sense of immediacy and authenticity.
Lore is acquired slowly through experience and apprenticeship. Beginners pester experts,
who help fairly willingly, mostly because they’re kept humble by stepping in potholes
themselves. This mutual aid system works but is slow and unsystematic. As a beginner, I
once spent nearly six months trying to get a fancy laser interferometer to work properly, a
task that would now take about a week. The reason was a breakdown in the apprenticeship
system—everyone consulted said “Oh, that comes with practice”—perfectly true, and
by no means unsympathetic, but not too helpful. Conversations with many others in the
field indicate that this sort of thing is the rule and not the exception. Time, enthusiasm,
and confidence are far too precious to go wasting them like that.
This book is an attempt to provide a systematic and accessible presentation of the
practical lore of electro-optical instrument design and construction—to be the book I
needed as a graduate student. It is intended for graduate students at all levels, as well as
practicing scientists and engineers: anyone who has electro-optical systems to build and
could use some advice. Its applicability ranges from experimental apparatus to optical
disc players.
The range of topics covered here is enormously broad, and I wish I were master of
it all. Most of it was invented by others whose names I don’t know; it’s the lore of a
whole field, as filtered through one designer’s head. It’s mostly been learned by watching
and doing, or worked out with colleagues at a white board, rather than reading journal
articles, so there aren’t many references. For further reading, there is a list of 100 or so
good books in the Appendix that should fill in the gaps.
I hope that a book like this can erect bridges between subdisciplines, prevent common
mistakes, and help all those working on an instrument project to see it as a whole. So
much good stuff gets lost in the cracks between physics, electrical engineering, optical
engineering, and computer science, that a salvage attempt seemed justified. I apologize
to those whose work has been acknowledged inadequately or whose priority has been
overlooked, and hope that they can remember once needing a book like this.
Designing and constructing electro-optical instruments is without a doubt one of the most
interdisciplinary activities in engineering. It makes an absorbing and rewarding career,
with little danger of growing stale. On the other hand, the same interdisciplinary quality
means that instrument building is a bit scary and keeps us on our toes. The very broad
range of technologies involved means that at least one vital subsystem lies outside the
designer’s expertise, presenting a very real danger of major schedule slippage or outright
failure, which may not become apparent until very late in the project.
We in electro-optics rely on whatever subset of these technologies we are familiar with,
together with a combination of outside advice, collaboration, and purchased parts. Often,
there are many ways of reaching the goal of a robust, working system; then the problem
is where to start among a range of unfamiliar alternatives. It’s like the classic computer game ADVENT: ‘You are in a maze of twisty little passages, all
different.’ Some judicious advice (and perhaps a map left by a previous adventurer)
is welcome at such times, and that’s what this book is about, the lore of designing and
building electro-optical instruments that work.
To have confidence in an instrument design, we really need to be able to calculate its
performance ahead of time, without constructing an elaborate simulation. It is a nontrivial
matter, given the current fragmented state of the literature, to calculate what the resolution
and SNR of a measurement system will be before it is built. It’s not that there isn’t lots
of information on how to calculate the performance of each lens, circuit, or computer
program, but rather the complexity of the task and the very different ways in which the
results are expressed in the different fields encountered. For example, what is the effect
of fourth-order spherical aberration in the objective lens on the optimal band-setting filter
in the analog signal processor, and then on the signal-to-noise ratio of the ultimate digital
data set? Somebody on the project had better know that, and my aim is to make you that
The book is intended in the first instance for use by oppressed graduate students in
physics and electrical engineering, who have to get their apparatus working long enough
to take some data before they can graduate. When they do, they’ll find that real-world
design work has much the same harassed and overextended flavor, so in the second
instance, it’s intended for working electro-optical designers. It can be used as a text in
a combined lecture–laboratory course aimed at graduate students or fourth-year undergraduates, and as a self-teaching guide and professional reference by working designers.
The warm reception that the first edition received suggests that despite its faults it has
filled a real need. In this edition, everything has been revised, some previously over-terse
sections have been expanded, and more than 100 pages’ worth of new material has
been added. Component lists and electronic designs have been updated where needed.
Only a very few things have been dropped, owing to space constraints or component
Textbooks usually aim at a linear presentation of concepts, in which the stuff on page n
does not depend on your knowing pages n + 1 . . . N . This is very valuable pedagogically,
since the reader is initially unfamiliar with the material and usually will go through the
book thoroughly, once, under the guidance of a teacher who is presenting information
rapidly. Reference books are written for people who already have a grasp of the topic
but need to find more detail or remind themselves of things dimly remembered. Thus
they tend to treat topics in clumps, emphasizing completeness, and to be weak on overall
explanations and on connections between topics.
Those two styles work pretty well in some subject areas, but design lore is not one of
them. Its concepts aren’t branched like a tree, or packed like eggs in a crate, but rather are
interlinked like a fishnet or a sponge; thus a purely linear or clumped presentation of lore
is all but impossible without doing violence to it. Nonetheless, to be of any use, a lore
book must be highly accessible, both easy to work through sequentially and attractive to
leaf through many times.
Computer scientists use the concept of locality of reference—it’s a good thing if an
algorithm works mainly with data near each other in storage, since it saves cache misses
and page faults, but all the data have to be there, regardless. That’s the way I have tried
to organize this book: most of the lore on a particular topic is kept close together in the
book for conceptual unity and easy reference, but the topics are presented in a sufficiently
linear order that later chapters build mainly on earlier ones, and important connections are
noted in both forward and backward directions.† A certain amount of messiness results,
which (it is to be hoped) has been kept close to a minimum. This approach gives rise to
one minor oddity, which is that the same instruments are considered from different angles
in different chapters, so some flipping of pages is required to get the whole picture.
The book is organized into three sections: Optics; Electronics and Signal Processing; and Special Topics In Depth (Front Ends and Bringing Up the System). There
is also Supplementary Material, available from the websites
sci tech med/electrooptical and, which comprises Chapter 20 on
Thermal Control and chapter problems for the whole book.
The material is presented in varying levels of detail. The differences in the detail levels
reflect the amount of published lore and the measured density of deep potholes that people
fall into. For example, there are lots of potholes in optomechanical design, but weighty
books of relevant advice fill shelf after shelf. Anyway, mechanical problems aren’t usually
what cause instrument projects to fail—unexamined assumptions, inexperience, and plain
discouragement are. To get the job done, we talk instead about how to avoid common
mistakes while coming up with something simple that works reliably.
The one big exception to this general scheme is Chapter 1. It pulls in strands from
everywhere, to present the process and the rhythm of conceptual design, and so contains
things that many readers (especially beginners) may find unfamiliar. Don’t worry too
much about the technical aspects, because there’s more on all those things later in the
book, as well as pointers to other sources.
A complete instrument design course based on this book would probably have to
wait for a first- or second-year graduate class. Undergraduate students with a good
grasp of electromagnetism, physical optics, and Fourier transforms might benefit from a
† Because
electro-optical lore is so interconnected, useful connections that are tangential to the discussion are
relegated to footnotes. An occasional polemic is found there too.
fourth-year course on optical instruments based selectively on the first ten chapters. To
get the most out of such a course, the audience should be people with instruments of
their own to build, either in a lab course, as a senior project, or as part of their graduate work. Because of the complicated, interdisciplinary nature of instrument building,
the laboratory part of the course might best be done by teams working on an instrument project rather than individually, provided that each designer knows enough about
everybody else’s part to be able to explain it.
Chapter Problems
Chapter problems for the book are available on the websites listed above. Making complicated tasks intuitive is the true realm of lore—knowing the mathematical expression
for the fringe pattern of a defocused beam is less useful than knowing which way to turn
which knob to fix it. The most powerful method for gaining intuition is to use a combination of practical work and simple theoretical models that can be applied easily and
stay close to the real physics. Accordingly, the emphasis in the problems is on extracting
useful principles from theory and discussion.
Most of the problems have been taken from real design and scientific work, and so
tend to be open-ended. Most students will have had a lot of theoretical training, but
nowadays most will not have the skills of a Lightning Empiricist, a gimlet-eyed designer
who’s fast at mental rule-of-thumb calculations and who sanity checks everything by
reflex. Perhaps this book can help fix that.
A certain number of errors and misconceptions—hopefully minor—are bound to creep
into a book of this type, size, and scope, unfortunately. I welcome your comments
and corrections, large and small: errata and omissions will be made available at ftp:// tech med/electro-optical/errata2.txt and
www/beos2e/errata2.txt and will be incorporated in future printings. Send e-mail to
[email protected]
P. C. D. Hobbs
Briarcliff Manor, New York
Michaelmas (September 29), 2008
To acquire lore, one needs a big sandbox and long uninterrupted stretches of time to
spend there, absorbed in the play. I am forever grateful to my parents for providing that
sort of environment in my growing up, and for believing in me even when only the mess
was visible.
I learned most of this material through participating in the stimulating and supportive technical cultures of the places where I’ve been fortunate enough to study and to
work: the Edward L. Ginzton Laboratory at Stanford University, Stanford, California;
the Department of Physics and the Department of Geophysics & Astronomy at the University of British Columbia and Microtel Pacific Research (both in Vancouver BC) and
the IBM Thomas J. Watson Research Center at Yorktown Heights, New York. I owe a
special debt to IBM and to my managers there, Arthur Ciccolo, Frank Libsch, and John
Mackay, for supporting this project and for generously allowing me time and resources
to work on it.
I also wish to thank some of the many other gifted people who I have been privileged
to have as close colleagues, teachers, and friends, particularly J. Samuel Batchelder (who
first suggested I write this book), Donald M. DeCain, Kurt L. Haller, Gordon S. Kino,
the late Roger H. Koch, Brian A. Murray, Martin P. O’Boyle, Marc A. Taubenblatt,
Theodore G. van Kessel, and Robert H. Wolfe. Without them I’d still be stuck in one of
those potholes way back along the road.
Most of all, I wish to thank my wife, Maureen, and our offspring, Bronwen, Magdalen,
and Simon, for their patience and encouragement while I wrote and wrote.
P. C. D. H.
Basic Optical Calculations
An excellent plumber is infinitely more admirable than an incompetent philosopher. The
society which scorns excellence in plumbing because plumbing is a humble duty and tolerates
shoddiness in philosophy because it is an exalted activity will have neither good plumbing
nor good philosophy. Neither its pipes nor its theories will hold water.
—John W. Gardner†
Okay, we’ve decided to build an electro-optical system. It’s going to be so great that
everybody who sees it will hate us. Now comes the fun part, the planning and designing,
and then the hard part, the building and testing. To design and plan, we have to have
some way of knowing how our brainchild ought to behave before it is built—that is,
At the conceptual stage, the measurement principle is poorly understood and many
quite different schemes are suggesting themselves. To make sense of it, you need a white
board and a couple of smart and experienced colleagues to bounce ideas off, plus some
pointers on how to begin. The aim of this chapter is to equip you to do a conceptual
instrument design on the back of an envelope. It assumes some background in optics,
especially some acquaintance with plane waves and Fourier transforms.
The indispensable ingredients of a conceptual design are:
† John
A measurement idea
Operational requirements (field of view, scan speed, spot size, sensitivity, etc.)
A photon budget
A rough optical design
A detection strategy
A signal processing strategy
W. Gardner, Excellence, Can We Be Equal and Excellent Too? Harper, New York, 1961, p. 86.
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
The best way to get them is through several sessions at that white board, with a lot
of thought and calculation in between. (It is amazing how many people think they’ve
finished with engineering calculations when they hand in their last exam, but that attitude
is sudden death in the instrument-building business.) Once you have these, you can make
a list of the main technical risks, in descending order of hairiness, and pretty soon you
have a plan for how to proceed. The size of these technical risks is important—they
can range from finding parts to violating laws of physics. Right at the beginning, we
must decide whether the measurement is even possible, which requires more imagination than analysis. The invention of two-photon Doppler-free spectroscopy is a good
Example 1.1: Two-Photon Doppler-Free Spectroscopy.† Gas-phase spectroscopy is
limited at low pressure by the random thermal motions of the gas molecules, and
at high pressures by their collisions. Molecules with different speeds along the laser
beam axis experience different Doppler shifts, so that their absorption features occur at
different frequencies in the lab frame, leading to Doppler broadening. A two-photon
transition involves the absorption of two photons, whose energies must sum to the
transition energy. The absorption (and any resulting fluorescence) can be modulated
by chopping the excitation beam. By using two excitation beams going in opposite
directions, some events will involve absorption of one photon from each beam,
which can occur only when both beams are unblocked by their choppers. If the
modulation frequencies of the two beams are different, this part of the signal will
appear at the sum and difference of the chopping frequencies. If a molecule has
nonzero speed along the beam, then to leading order in V /c, it will see each beam
shifted by
−ki · v
νi =
Since the two beams have oppositely directed k vectors, one will be upshifted and the
other downshifted by the same amount; the sum of the photon energies, h(ν1 + ν2 ), is
unshifted. Thus these mixing components are present only when the laser is tuned exactly
to the rest-frame resonance frequency—they show no first-order Doppler broadening. An
apparently inherent physical limit is circumvented with an extra chopper and a mirror or
two; such is the power of a good measurement idea.
Once the idea is in hand, analysis is needed, to decide between competing alternatives.
Such feasibility calculations are at the heart of electro-optical systems lore, and their most
important fruit is a highly trained intuition, so we’ll do lots of examples. The places
where assumptions break down are usually the most instructive, so we’ll also spend
some time discussing some of the seedier areas of optical theory, in the hope of finding
out where the unexamined assumptions lurk. We’ll begin with wave propagation and
† This
example is adapted from L. S. Vasilenko, V. P. Chebotaev, and A. V. Shishaev, JETP Lett. 3 (English
translation), 161 (1970).
Maxwell’s Equations and Plane Waves
Any self-respecting book on optics is obliged to include Maxwell’s equations, which
are the complete relativistic description of the electromagnetic field in vacuo and are
the basis on which all optical instruments function. They are also useful for printing on
T-shirts. Fortunately, this is a book of practical lore, and since Maxwell’s equations are
almost never used in real design work, we don’t need to exhibit them. The most basic
equation that is actually useful is the vector wave equation,
∇ 2E −
1 ∂ 2E
= 0,
c2 ∂t 2
where c is the speed of light. Most of the time we will be calculating a monochromatic
field, or at least a superposition of monochromatic fields. We can then separate out
the time dependence as exp(−iωt) and write k = ω/c, leaving the vector Helmholtz
(∇ 2 + k 2 )E = 0.
Its simplest solution is a vector plane wave,
E(x) = E0 eik·x
where the two fixed vectors are E0 , the electric field vector, which may be complex, and
k is the wave vector, whose magnitude k = |k| = ω/c is called the propagation constant.
If E0 is real, the field is said to be linearly polarized along E0 ; if its real and imaginary
parts are the same size, so that the instantaneous E rotates without changing length, the
field is circularly polarized ; otherwise, it’s elliptically polarized (see Section 1.2.8).
Power flows parallel to k in an isotropic medium, but need not in an anisotropic one, so
it is separately defined as the Poynting vector S = E × H (see Sections 4.6.1 and 6.3.2).
In the complex notation, the one-cycle average Poynting vector is S = Re[E × H∗ ].
Plane Waves in Material Media
So far, we have only considered propagation in vacuo. Electromagnetics in material media
is enormously complicated on a microscopic scale, since there are ∼1022 scatterers/cm3 .
Fortunately, for most purposes their effects are well approximated by mean field theory,
which smears out all those scatterers into a jelly that looks a lot like vacuum except for
in a change in the propagation velocity, the E/H ratio, and some loss. A plane wave
entering a material medium via a plane surface remains a plane wave, with different k
and E0 .
In a medium, light travels at a speed v = c/n. The constant n, the refractive index ,
is given by n = μr r , where μr and r are the relative magnetic permeability and
dielectric constant of the material at the optical frequency, respectively. Since μr is nearly
always 1 in optics,
n = r . In addition,
the material will change the wave impedance,
Z = E/H = μ/ = (120π ) μr /r . The analogy between wave impedance and
transmission line impedance is a fruitful one.
In absorbing media, the refractive index is complex.† Assuming the medium is linear
and time invariant, the temporal frequency cannot change, so k is different in the medium;
the new k is kn = nk0 , where k0 is the vacuum value. We usually drop the subscripts, so
k is taken to be in the medium under consideration.
There are one or two fine points deserving attention here. One is that n is not constant with ω, a phenomenon known as dispersion. Besides making light of different
colors behave differently, this leads to distortion of a time-modulated signal. The carrier
wave propagates at the phase velocity vp = ω/k, but the envelope instead propagates,
approximately unchanged in shape, at the group velocity vg , given by
vg = ∂ω/∂k.
Since the carrier propagates at the phase velocity v, as an optical pulse goes along
its carrier “slips cycles” with respect to its envelope; that’s worth remembering if you
build interferometers. The group velocity approximation (1.5) holds for short propagation
distances only, that is, when the difference t in the transit times of different frequency
components is much less than the pulse width τ . In the opposite limit, where t τ ,
the output is a time Fourier transform of the input pulse.‡
The other fine point is that r is in general a tensor quantity; there is no guarantee
that the response of the material to an applied electric field is the same in all directions.
In this book we’re concerned only with the linear properties of optical systems, so a
tensor is the most general local relationship possible. The resulting dependence of n
on polarization leads to all sorts of interesting things, such as birefringence and beam
walkoff . There are in addition strange effects such as optical activity (also known as
circular birefringence), where the plane of polarization of a beam rotates slowly as it
propagates. We’ll talk more about these weird wonders in Chapters 4 and 6.
Aside: The Other Kind of Polarization. The dielectric constant r expresses the
readiness with which the charges in the medium respond to the applied electric field; it
is given by r = 1 + 4π χ , where χ is the electric susceptibility (zero for vacuum); the
electric polarization P is 0 χ E. This is a different usage than the usual optical meaning
of polarization, and it’s worth keeping the two distinct in your mind.
1.2.3 Phase Matching
The two basic properties of any wave field are amplitude and phase. At any point in
space-time, a monochromatic wave has a unique phase, which is just a number specifying
how many cycles have gone by since time t = 0. Since it’s based on counting, phase
is invariant to everything—it’s a Lorentz scalar, so it doesn’t depend on your frame
of reference or anything else, which turns out to be a very powerful property. The
requirement for phase matching at optical interfaces is the physical basis of geometrical
optics. A plane wave has the unique property of being translationally invariant, meaning
that if you move from one point to another, the only thing that changes is an additive
refractive index ñ is often quoted as n + ik, where n and k are real and positive, but it is conceptually
simpler to leave n complex, because the Fresnel formulas and Snell’s law still work with absorbing media.
‡ The impulse response of a linearly dispersive medium is a chirp, and the Fourier transform can be computed
as the convolution of a function with a chirp. This turns out to be important in digital signal processing, where
it leads to the chirp-Z transform.
† Complex
phase shift (equivalent to a pure time delay). In particular, at a planar interface, moving
the reference point within the plane cannot change the phase relationship between the
fields on either side.
1.2.4 Refraction, Snell’s Law, and the Fresnel Coefficients
If a plane wave encounters a plane interface between two semi-infinite slabs of index n1
and n2 , as shown in Figure 1.1, the light is partially reflected and partially transmitted—a
standard problem in undergraduate electromagnetics classes. We expect the fields to
consist of an incident and a reflected plane wave on the input side and a single transmitted
plane wave on the output side. Phase matching at the interface requires that the tangential
k vectors of all the waves be the same, which reproduces the law of reflection for the
reflected component and Snell’s law for the transmitted one:
n1 sin θ1 = n2 sin θ2 .
If there are m parallel planar interfaces, k is the same in all the layers, so since (in
the j th layer) kj = nj k0 , we can use the phase matching condition to get k⊥ in the j th
= n2j k02 − k2 .
This is important in the theory of optical coatings. The continuity conditions on tangential
E and perpendicular D across the boundary give the Fresnel formulas for the field
≡ rp12
n2 cos θ1 − n1 1 − [(n1 /n2 ) sin θ1 ]2
tan(θ1 − θ2 )
tan(θ1 + θ2 )
n2 cos θ1 + n1 1 − [(n1 /n2 ) sin θ1 ]2
≡ tp12 =
2n1 cos θ1
2 sin θ1 cos θ2
sin(θ1 + θ2 )
n2 cos θ1 + n1 1 − [(n1 /n2 ) sin θ1 ]2
for light linearly polarized (i.e., E lying) in the plane of incidence. This plane is defined
by the surface normal n̂ (unrelated to n) and kinc . In Figure 1.1, it is the plane of
k ||
Figure 1.1. Refraction and reflection of a plane wave at a plane dielectric boundary. The angle of
refraction θ2 is given by Snell’s law.
the page. For light linearly polarized perpendicular to the plane of incidence, these
n1 cos θ1 − n2 1 − [(n1 /n2 ) sin θ1 ]2
sin(θ1 − θ2 )
≡ rs12 = −
sin(θ1 + θ2 )
n1 cos θ1 + n2 1 − [(n1 /n2 ) sin θ1 ]2
2n1 cos θ1
2 sin θ2 cos θ1
≡ ts12 =
sin(θ1 + θ2 )
n1 cos θ1 + n2 1 − [(n1 /n2 ) sin θ1 ]2
The two polarizations are known as p and s, respectively. As a mnemonic, s polarization means that E sticks out of the plane of incidence.† The quantities r and t are
the reflection and transmission coefficients, respectively. These Fresnel coefficients act
on the amplitudes of the fields.‡ The transmitted and reflected power ratios R and T ,
given by
n2 cos θ2 2
|t| ,
R = |r|2 and T =
n1 cos θ1
are known as the reflectance and transmittance, respectively.
The Fresnel coefficients have fairly simple symmetry properties; if the wave going
from n1 to n2 sees coefficients r12 and t12 , a wave coming in the opposite direction sees
r21 and t21 , where
rp21 = −rp12 ,
tp21 = (n1 cos θ1 )/(n2 cos θ2 )tp12 ;
rs21 = −rs12 ,
ts21 = (n2 cos θ2 )/(n1 cos θ1 )ts12 .
The symmetry expressions for t21 are more complicated because they have to take account
of energy conservation between the two media.
1.2.5 Brewster’s Angle
Especially sharp-eyed readers may have spotted the fact that if θ1 + θ2 = π/2, the denominator of (1.8) goes to infinity, so rp = 0. At that angle, sin θ2 = cos θ1 , so from Snell’s
law, tan θ1 = n2 /n1 . This special value of θi is called Brewster’s angle θB . Note that
the transmitted angle is π/2 − θB , which is Brewster’s angle for going from n2 into n1 .
Brewster angle incidence with very pure p-polarized light is the best existing technique
for reducing reflections from flat surfaces, a perennial concern of instrument designers
(see Section 4.7.3).
Laser tube windows are always at Brewster’s angle to reduce the round-trip loss
through the cavity. The loss in the s polarization due to four high angle quartz–air
surfaces is nearly 40% in each direction. Regeneration in the cavity greatly magnifies
this gain difference, which is why the laser output is highly polarized. Brewster angle
† The
s is actually short for senkrecht, which is German for perpendicular. The two polarizations are also called
TE and TM, for transverse electric and transverse magnetic, that is, which field is sticking out of the plane of
incidence. This nomenclature is more common in waveguide theory.
‡ There is another sign convention commonly used for the p-polarized case, where the incident and reflected E
fields are taken in opposite directions, yielding a confusing sign change in (1.8). We adopt the one that makes
rp = rs at normal incidence.
incidence is also commonly used in spectroscopic sample cells, Littrow prisms, and other
high accuracy applications using linearly polarized, collimated beams.
Ideally, a smooth optical surface oriented at θB to the incoming p-polarized beam
would reflect nothing at all, but this is not the case with real surfaces. Roughness and
optical anisotropy make it impossible to make every single region encounter the light
beam at the same angle or with the same refractive index, so there are always residual
reflections even at θB . Surface layers also prevent complete canceling of the reflected
wave, because the two will in general have different Brewster’s angles and because of
the phase delay between the reflections from the top and bottom of the layer. Below the
critical angle, dielectric reflections always have phase 0 or π , so there’s no way to tip
the surface to get rid of a phase-shifted reflection.
Aside: Fossil Nomenclature. When Malus discovered polarization (in 1808) by looking at reflections from a dielectric, he quite reasonably identified the plane of polarization
with the plane of incidence. This conflicts with our modern tendency to fix on the E field
in considering polarization. There are still some people who follow Malus’s convention,
so watch out when you read their papers.
1.2.6 Total Internal Reflection
If n1 > n2 , there exists an angle θC , the critical angle, where Snell’s law predicts that
sin θ2 = 1, so θ2 = π/2: grazing incidence. It is given by
θC = arcsin(n2 /n1 ).
Beyond there, the surds in the Fresnel formulas (1.8)–(1.11) become imaginary, so t
vanishes and r sits somewhere on the unit circle (the reflectivity is 1 and the elements
of E become complex).
This total internal reflection (TIR) is familiar to anyone who has held a glass of water,
or looked up at the surface while underwater. It is widely used in reflecting prisms. There
are two things to remember when using TIR: the reflection phase is massively polarization
dependent and the fields extend beyond the surface, even though there is no propagating
wave there. A TIR surface must thus be kept very clean, and at least a few wavelengths
away from any other surface.
By putting another surface sufficiently close by, it is possible to couple light via the
evanescent field, a phenomenon called frustrated TIR or, more poetically, evanescent
coupling. This is the optical analogue of quantum mechanical tunneling.
The reflection phase represents a relative time delay of the propagating wave. The
s-polarized wave is delayed more, because it has a larger amplitude in the evanescent
region, which requires more of a phase slip between the incident and reflected waves
(remember the continuity conditions). This sort of physical reasoning is helpful in keeping
sign conventions straight, although it is not infallible. The phase shift δ between s and
p polarizations is†
2 cos θi sin2 θi − (n2 /n1 )2
δ = δs − δp = −arctan
sin2 θi
† M.
Born and E. Wolf, Principles of Optics, 6th ed. (corrected). Pergamon, Oxford, 1983, pp. 47–51.
1.2.7 Goos–Hänchen Shift
The angle-dependent phase shift on TIR functions much as dispersion does in the time
domain, delaying different components differently. Dispersion causes the envelope of a
pulse to propagate at the group velocity, which is different from the phase velocity. In the
same way, the phase shift on TIR causes the envelope of a reflected beam to be shifted
slightly in space from the incident beam, the Goos–Hänchen shift. It is less well known
than the group velocity effect, mainly because the effect doesn’t build up as it goes
the way dispersion effects do, so the shift is small under normal circumstances, though
large enough to cause image aberrations on reflection from TIR surfaces. The place it
does become important is in multimode fiber optics, where a ray undergoes many, many
reflections and the shift accordingly adds up.
The variation in the Goos–Hänchen shift comes from the speeding up of the wave
that sticks out the most into the low index material. It is responsible for the apparently
paradoxical behavior of optical fiber modes near cutoff (see Section 8.3.1). We expect
high angle modes to slow down due to the decrease of kz with angle—they spend more of
their time bouncing back and forth instead of traveling down the fiber axis. In fact, they
do slow down with increasing angle at first, but then speed up again as they near cutoff,
when the wave sticks farther and farther out into the low index material. To leading
order, the effect is the same as if the wave bounced off an imaginary surface one decay
length into the low index material. Note that this does not contradict the last section;
the phase shift is a delay, but the Goos–Hänchen shift makes the mode propagation
anomalously fast.
1.2.8 Circular and Elliptical Polarization
What happens when the p and s components get out of phase with each other? The
exponential notation, remember, is just a calculating convenience; real physical quantities
always give real numbers. The instantaneous E-field strength is
E 2 = [Re{Ex eiωt }]2 + [Re{Ey eiωt+φ }]2 .
A linearly polarized monochromatic wave has an E that varies sinusoidally, passing
through zero twice each cycle. When E has complex coefficients, the p and s components
oscillate out of phase with one another. If the two are the same size and a quarter cycle
apart, the real (i.e., physical) part of the E vector will spin through 2π once per cycle,
without changing its length, like a screw thread. Its endpoint will traverse a circle, so
that this is known as circular polarization. Like screws, there is right and left circular
polarization, but unlike screws, the names are backwards.
If the two components are not exactly equal in magnitude, or are not exactly π /2
radians apart, the vector will still rotate, but will change in length as it goes round,
tracing an ellipse. This more general case is elliptical polarization. Circular and linear
polarizations are special cases of elliptical polarization. Elliptical polarization can also
be right or left handed.
1.2.9 Optical Loss
In a lossless medium, the E and H fields of a propagating wave are exactly in phase
with each other. Any phase difference between them is due to absorption or gain in the
material. A material with dielectric constant − i has a loss tangent δ = / . In
such a material, H lags E in phase by 12 arctan δ.
A real optical system is too complicated to be readily described in terms of vector fields,
so being practical folk, we look for an appropriate sleazy approximation. We know
that in a homogeneous and time-invariant medium, all propagating light waves can be
decomposed into their plane wave spectra; at each (temporal) frequency, there will be a
unique set of vector plane waves that combine to produce the observed field distributions.
The approximations we will use are four:
1. Scalar Optics: Replace vector field addition with scalar addition.
2. Paraxial Propagation: Use an approximate propagator (the Huyghens integral) to
calculate wave propagation, limiting us to beams of small cone angles.
3. Fourier Optics: Use a simplistic model for how individual spatial Fourier components on a surface couple to the plane wave components of the incident light.
4. Ray Optics: Ignore diffraction by using an asymptotic theory valid as λ → 0.
Analytical models, within their realm of applicability, are so much superior to numerical models in intuitive insight and predictive power that it is worth sacrificing significant
amounts of accuracy to get one. Numerical models have their place—but the output is
just a pile of special cases, so don’t use them as a crutch to avoid hard thinking.
1.3.1 Scalar Optics
If we shine an ideal laser beam (one perfectly collimated and having a rectangular
amplitude profile) through a perfect lens and examine the resulting fields near focus, the
result is a complete mess. There are nonzero field components along the propagation
axis, odd wiggly phase shifts, and so on, due entirely to the wave and vector nature of
the fields themselves. The mess becomes worse very rapidly as the sine of the cone angle
θ of the beam approaches unity. The effect is aggravated by the fact that no one really
knows what a lens does, in sufficient detail to describe it accurately analytically—a real
system is a real mess.
For most optical systems, we don’t have to worry about that, because empirically it
doesn’t affect real measurements much. Instead, we use scalar optics. Scalar optics is
based on the replacement of the six components of the true vector electromagnetic field
by a single number, usually thought of as being the electric field component along the
(fixed) polarization axis. In an isotropic medium, the vector wave equation admits plane
wave solutions whose electric, magnetic, and propagation vectors are constant in space,
so that the field components can be considered separately, which leads to the scalar
Helmholtz equation,
(∇ 2 + k 2 )E = 0.
(This rationale is nothing but a fig leaf, of course.) Any solution of (1.17) can be decomposed in a Fourier function space of plane waves, which are identified with one Cartesian
Figure 1.2. Scalar addition is a good approximation to vector addition except near high-NA foci.
component of the vector field. A scalar plane wave traveling along a direction k has the
ψ(x) = ei(k·x−ωt) ,
where the vector k has length k = 2π/λ. Conceptually, the true vector field can be built
up from three sets of these.
The only difficulty with this in free space is that the field vectors of the plane waves
are perpendicular to their propagation axes, so that for large numerical aperture,† where
the propagation axes of the various components are very different, the vector addition
of fields near a focus is not too well approximated by a scalar addition. Far from focus,
this is not much of a worry, because the components separate spatially, as shown in
Figure 1.2.
Aside: Plane Waves and δ-Functions. This separation is not entirely obvious from
a plane wave viewpoint, but remember that plane waves are δ-functions in k-space; that
makes them just as singular in their way as δ-functions. They aren’t always an aid to
intuition. Of course, it is not free space propagation that provides useful data, but the
interaction of light with matter; boundaries of physical interest will in general mix the
different polarization components. Such mathematical and practical objections are swept
under the rug.‡
numerical aperture (NA) of a beam is given by NA = n sin θ, where n is the refractive index of the
medium and θ is the half-angle of the beam cone. By Snell’s law, the numerical aperture of a beam crossing
an interface between two media at normal incidence remains the same.
‡ The actual electromagnetic boundary conditions at surfaces of interest are very complicated, and usually poorly
understood, so that most of the time the inaccuracies we commit by approximating the boundary conditions
are smaller than our ignorance.
† The
Polarization and finite bandwidth effects are usually put in by hand . This means that
we keep track of the polarization state of the beam separately and follow it through the
various optical elements by bookkeeping; for frequency-dependent elements, we keep the
frequency as an independent variable and integrate over it at the end. Such a procedure is
inelegant and mathematically unjustified, but (as we shall see) it works well in practice,
even in regimes such as high numerical aperture and highly asymmetric illumination,
in which we would expect it to fail. Everyone in the optical systems design field uses
it, and the newcomer would be well advised to follow this wholesome tradition unless
driven from it by unusual requirements (and even then, to put up a fight).
1.3.2 Paraxial Propagation
Discussions of beam propagation, pupil functions, optical and coherent transfer functions,
and point spread functions take place with reference to the plane wave basis set. There are
an infinite variety of such basis sets for decomposition of solutions of the scalar Helmholtz
equation, nearly none of which are actually useful. Plane waves are one exception, and
the Gauss–Legendre beams are another—or would be if they quite qualified. Gaussian
beams (as they are usually called) don’t even satisfy the true scalar Helmholtz equation,
because their phase fronts are paraboloidal rather than spherical, and because they extend
to infinity in both real- and k- (spatial frequency) space.
Instead, they satisfy the slowly varying envelope equation, also known as the paraxial
wave equation. First, we construct a field as a product of a plane wave eikz times an
envelope function (x) that varies slowly on the scale of a wavelength. This field is
plugged into the scalar Helmholtz equation, the product rule for the Laplacian operator
is invoked, and the subdominant term d 2 /dz2 is discarded, leaving a Schrödinger-type
equation for the envelope , the paraxial wave equation
d 2 d 2
= 0.
+ 2ik
A general solution to this equation for all (x, y, z) is given by the Huyghens integral ,
(x, y, z) = −
(x − x )2 + (y − y )2
exp ik
2(z − z )
dx dy ,
(x , y , z )
(z − z )
where P is the x y plane. In diffraction theory (1.20) is also known as the Fresnel
approximation. The Huyghens integral is an example of a propagator, an integral operator that uses the field values on a surface to predict those in the entire space. It is slightly
inconvenient to lose the explicit phase dependence on z, but that can be recovered at the
end by calculating the phase of an axial ray (one traveling right down the axis of the
system) and adding it in. The Huyghens kernel depends only on x−x and so is a convolution (see Section 1.3.8), leading naturally to a Fourier space (k-space) interpretation.
In k-space, (1.20) is
(x, y, z) =
U (u, v)ei(2π/λ)(ux+vy) e−i(2πz/λ)[(u
2 +v 2 )/2]
where P is the uv plane and U is the plane wave spectrum of at z = 0, which is
given by
dx dy
U (u, v) =
(x, y, 0)e−i(2π/λ)(ux+vy)
λ λ
The quantities u and v are the direction cosines in the x and y directions, respectively,
and are related to the spatial frequencies kx and ky by the relations u = kx /k, v = ky /k.
What we’re doing here is taking the field apart into plane waves, propagating each wave
through a distance z by multiplying by exp(ikz z), and putting them back together to get
the field distribution at the new plane. The (u, v) coordinates of each component describe
its propagation direction. This is a perfectly general procedure.
Aside: Use k-Space. The real-space propagator (1.20) isn’t too ugly, but the
Rayleigh–Sommerfeld and Kirchhoff propagators we will exhibit in Section 9.3.2 are
not easy to use in their real-space form. The procedure of splitting the field apart into
plane waves, propagating them, and reassembling the new field is applicable to all these
propagators, because in k-space they differ only slightly (in their obliquity factors, of
which more later). This is really the right way to go for hand calculations.
It is actually easier to spot what we’re doing with the more complicated propagators, because the exp(ikz z) appears explicitly. The Huyghens propagator ignores the
exp(ikz) factor and uses an approximation for exp[iz(kz − k)], which obscures what’s
really happening.
1.3.3 Gaussian Beams
The Gauss–Legendre beams are particular solutions to the paraxial wave equation. The
general form of a zero-order Gauss–Legendre beam that travels along the z axis in the
positive direction and whose phase fronts are planar at z = 0 is
(x, y, z, t) =
2 1
exp iφ(z) + (x 2 + y 2 )
π w(z)
w2 (z) 2R(z)
where R(z), φ(z), w(z), zR , and w0 are given in Table 1.1. (Remember that the scalar field
E is the envelope multiplied by the plane wave “carrier” eikz−ωt .) These parameters
depend only on the beam waist radius w0 and the wavelength λ of the light in the
The envelope function is complex, which means that it modulates both the amplitude
and the phase φ of the associated plane wave. This gives rise to the curved wavefronts
(surfaces of constant phase) of focused beams, and also to the less well-known variations
in ∂φ/∂z with focal position, the Gouy phase shift.
Gaussian beams reproduce the ordinary properties of laser beams of small to moderate
numerical aperture. They also form a complete set of basis functions, which means that
any solution of (1.19) can be described as a sum of Gaussian beams. This useful property
should not be allowed to go to the user’s head; Gaussian beams have a well-defined axis,
and so can only represent beams with the same axis. The number of terms required for
a given accuracy and the size of the coefficients both explode as the beam axis departs
from that of the eigenfunctions.
TABLE 1.1.
TEM00 Gaussian Beam Parameters (Beam Waist at z = 0)
Central intensity
Central intensity at the waist
Total power
I0 = 2P /(π w2 )
I0W = 2P /(π w02 )
P = π w2 I0 /2
Beam waist radius (power density on axis = I0 /e2 )
1/e2 Power density radius
3 dB Power density radius vs. 1/e2 radius
Radius within which I > given Ith
Power included inside r ≤ w
99% Included power radius
w0 = λ/(π NA)
w(z) = w0 [1 + (z/zR )2 ]1/2
w1/2 (z)√
= 0.5887 w(z)
r = w/ 2 ln1/2 (I0 /Ith )
r99 = 1.517 w
Fourier transform pair
Separation of variables
exp[−π(r/λ)2 ] ⊃ exp(−π sin2 θ )
exp(−π(r/λ)2 )
= exp[−π(x/λ)2 ] exp[−π(y/λ)2 ]
Numerical aperture (1/e2 points in k-space)
Radius of curvature of phase fronts
Rayleigh range (axial intensity 50% of peak)
Displacement of waist from geometric focus
Envelope phase shift
Equivalent projected solid angle
NA = λ/(π w0 )
R(z) = z + zR
zR = π w0 /λ = λ/(π(NA)2 )
z ≈ −zR
φ(z) = tan−1 (z/zR )
eq = π(NA)2 = λ2 /(π w02 )
In this book, as in most of practical electro-optical instrument design, only this
lowest-order mode, called TEM00 , is needed. This mode describes the field distribution
of a good quality laser beam, such as that from a HeNe or circularized diode laser.
At large z, the Gaussian beam looks like a spherical wave with a Gaussian cutoff
in u and v, but for small z, it appears to be a collimated beam. The distance, called
zR or the Rayleigh range, over which the beam stays approximately collimated goes as
1/(NA)2 —the beam waist goes as 1/NA and the the √
angular width as NA. At z = ±zR ,
the 1/e2 beam diameter has increased by a factor of 2, so that the central intensity has
The Gaussian beam is a paraxial animal: it’s hard to make good ones of high NA. Its
extreme smoothness makes it exquisitely sensitive to vignetting, which of course becomes
inevitable as sin θ approaches 1, and the slowly varying envelope approximation itself
breaks down as the numerical aperture increases (see Example 9.8).
There are a variety of parameters of Gaussian beams which are frequently of use,
some of which are summarized in Table 1.1; P is the total power in watts, I is the
intensity in W/m2 , w is the 1/e2 intensity radius, w0 is the beam waist radius, zR is the
Rayleigh range, and NA is measured at the 1/e2 intensity points in k-space. Of interest
in applications is the envelope phase, which shows a ±π/4 phase shift (beyond the plane
wave’s exp(ikz)) over the full depth of focus (twice the Rayleigh range), so that in a
focused beam it is not a good assumption that the phase is simply exp(ikz). This phase
is exploited in phase contrast systems such as the Smartt interferometer.
Aside: Gaussian Beams and Lenses. When a Gaussian beam passes through a lens, it
is transformed into a different Gaussian beam. For the most part, ray optics is sufficient
to predict the position of the beam waist and the numerical aperture, from which the
waist radius and Rayleigh range can be predicted. There are some useful invariants of
this process: for example, a Gaussian beam whose waist scans back and forth by b waist
radii will be transformed into another beam whose waist scans b times the new waist
radius. A position c times the Rayleigh range from the waist will image to a point c
times the new Rayleigh range from the new waist. A corollary is that the number of
resolvable spots, that is, the scan range divided by the spot diameter, is also invariant.
These invariants, which are not limited to the Gaussian case, allow one to juggle spot
sizes, focal positions, and scan angles freely, without having to follow them laboriously
through the whole optical system.
1.3.4 The Debye Approximation, Fresnel Zones, and Fresnel Number
The plane wave decomposition of a given disturbance can be calculated from (1.22) or
its higher-NA brethren in Section 9.3.6, and those work regardless of where we put the
observation plane. When discussing the NA of a lens, however, we usually use a much
simpler method: draw rays representing the edges of the beam and set NA = n sin θ .
This sensible approach, the Debye approximation, obviously requires the beam to be
well represented by geometric optics, because otherwise we can’t draw the rays—it
breaks down if you put the aperture near a focus, for instance. We can crispen this up
considerably via the Fresnel construction.
In a spherical wave, the surfaces of constant phase are equally spaced concentric hemispheres, so on a plane, the lines of constant phase are concentric circles, corresponding
to annular cones, as shown in Figure 1.3. Drawing these circles at multiples of π radians
divides the plane into annular regions of positive and negative field contributions, called
r = 10.2
5 μm
r = 10
.50 μ m
Figure 1.3. The Fresnel zone construction with f = 10 μm and λ = 0.5 μm. For a plane wave,
taking the phase on axis as 0, alternating rings produce positive and negative field contributions at
f , so blocking alternate ones (or inverting their phases with a λ/2 coating) produces a focus at f .
For a converging spherical wave, all zones produce positive contributions at the focus.
Fresnel zones. The zones are not equally spaced; for a beam whose axis is along ẑ and
whose focus is at z = 0, the angular zone boundaries in the far field are at
θn = cos−1
1 + (2n + 1)λ/(4f )
(the equation for the N th zone center is the same, except with 2N instead of (2n + 1)).
The Fresnel number N is the number of these zones that are illuminated. This
number largely determines the character of the beam in the vicinity of the reference
point—whether it is dominated by diffraction or by geometric optics. The Debye approximation is valid in the geometric limit, that is, N 1.
Taking r to be the radius of the illuminated circle and applying a couple of trigonometric identities to (1.24) gets us a quadratic equation for N , the number of annular zone
centers falling inside r. Assuming that N λ f , this simplifies into
which due to its simplicity is the usual definition of Fresnel number.
In small Fresnel-number situations, the focus is displaced toward the lens from its
geometric position and diffraction is important everywhere, not just at the focus. The
number of resolvable spots seen through an aperture of radius r is N/2.
Example 1.2: Gaussian Beams and Diffraction. For small numerical apertures, the
position of the beam waist does not coincide with the geometric focus, but is closer. This
somewhat counterintuitive fact can be illustrated by considering a beam 40λ in radius,
with a lens of 104 λ focal length placed at its waist. The lens changes the Gaussian beam
parameters, as we can calculate. Geometrically, the incoming beam is collimated, so
the focus is 104 λ away, but in reality the Rayleigh range of the beam is 40(π /4) spot
diameters, or 1260λ. This is only 1/8 of the geometric focal length, so the lens makes
only a small perturbation on the normal diffractive behavior of the original beam. At the
geometric focus, N = 402 /104 = 0.16, so the total phase change due to the lens is only
π/6 across the beam waist.
1.3.5 Ray Optics
We all know that the way you check a board for warpage is by sighting along it, because
light in a homogeneous medium travels in straight lines. The departures of light from
straight-line propagation arise from nonuniformities in the medium (as in mirages) and
from diffraction. Most of the time these are both small effects and light can be well
described by rays, thought of as vanishingly thin pencil beams whose position and direction are both well defined—the usual mild weirdness exhibited by asymptotic theories.
Ray optics does not require the paraxial approximation, or even scalar waves.
In the absence of diffraction (i.e., as λ → 0), the direction of propagation of a
light beam in an isotropic medium is parallel to the gradient of the phase† ∇φ (see
† M.
Born and E. Wolf, Principles of Optics, 6th ed. (corrected). Pergamon, Oxford, 1983, p. 112.
Section 9.2.3). This means that a beam whose phase fronts are curved is either converging or diverging (see Section 9.2.2) and that rays can be identified with the normals to the
phase fronts. Rays are the basis of elementary imaging calculations, as in the following
Example 1.3: Imaging with a Camera Lens. As a simple example of the use of ray
optics, consider using a 35 mm camera to take a head-and-shoulders portrait of a friend.
For portraits, the most pleasing perspective occurs with a camera-to-subject distance of
a few feet, 4 feet (1.3 m) being about optimal. What focal length lens is required?
The film frame is 24 by 36 mm in size and the outline of a human head and shoulders
is about 400 by 500 mm. Thus the desired magnification is 24/400, or 0.06. The rules of
thin-lens optics are:
1. Rays passing through the center of the lens are undeviated.
2. Rays entering parallel to the axis pass through the focus.
3. The locus of ray bending is the plane of the center of the lens.
All of these rules can be fixed up for the thick-lens case (see Section 4.11.2). The similar
triangles in Figure 1.4 show that the magnification M is
and elementary manipulation of the geometric identities shown yields
so si = f 2 .
sin a = (A + B)/do = B/f = A/So
sin b = (A + B)/di = A/f = B/Si
Figure 1.4. Portraiture with a 35 mm camera.
Using (1.26) and (1.27), we find that
f =
or 73.5 mm. Since the optimum distance is only a rough concept, we can say that a
portrait lens for 35 mm photography should have a focal length of around 70 to 80 mm.
We can associate a phase with each ray, by calculating the phase shift of a plane wave
traversing the same path. In doing this, we have to ignore surface curvature, in order that
the wave remain plane. A shorthand for this is to add up the distances the ray traverses
in each medium (e.g., air or glass) and multiply by the appropriate values of k.
1.3.6 Lenses
In the wave picture, an ideal lens of focal length f transforms a plane wave eik(ux+vy) into
a converging spherical wave, whose center of curvature is at (uf, vf ). It does so by inserting a spatially dependent phase delay, due to propagation through different thicknesses
of glass. In the paraxial picture, this corresponds to a real-space multiplication by
L(x, y : f ) = exp
iπ 2
(x + y 2 ) .
Example 1.4: A Lens as a Fourier Transformer. As an example of how to use the
Huyghens propagator with lenses, consider a general field (x, y, −f ) a distance f
behind a lens whose focal length is also f . The operators must be applied in the order
the fields encounter them; here, the order is free-space propagation through f , followed
by the lens’s quadratic phase delay (1.30) and another free-space propagation through f .
Using (1.20) twice, the field becomes
(x, y, +f ) =
y −i(2π/λf )(xx +yy )
(x , y , −f ),
which is a pure scaled Fourier transform. Thus a lens performs a Fourier transform
between two planes at z = ±f . If we put two such lenses a distance 2f apart, as shown
in Figure 1.5, then the fields at the input plane are reproduced at the output plane, with
a Fourier transform plane in between. The image is inverted, because we’ve applied two
forward transforms instead of a forward (−i) followed by a reverse (+i) transform, so
(x, y) → (−x, −y).
If we put some partially transmitting mask at the transform plane, we are blocking
some Fourier components of , while allowing others to pass. Mathematically, we are
multiplying the Fourier transform of by the amplitude transmission coefficient of
the mask, which is the same as convolving it with the Fourier transform of the mask,
appropriately scaled. This operation is called spatial filtering and is widely used.
The Fourier transforming property of lenses is extremely useful in both theory and
applications. Perhaps surprisingly, it is not limited to the paraxial case, as we will see
Exit Pupil
Incident Beam
Figure 1.5. Spatial filtering.
1.3.7 Aperture, Field Angle, and Stops
Not every ray that enters an optical system will make it out the other side. At most
locations in a system, there is no sharp boundary. At a given point in space, light going
in some directions will make it and that going in other directions will not; similarly
for a given angle, there may be part of the system where it can pass and part where
it cannot. However, each ray that fails to make it will wind up hitting some opaque
surface. The surface that most limits the spatial field of a ray parallel to the axis is
called the field stop and that which limits the angular acceptance of a point on the axis
most, the aperture stop. At these surfaces, the boundary between blocked and transmitted
components is sharp. These locations are shown in Figure 1.6. It is common to put the
aperture stop at a Fourier transform plane, since then all points on the object are viewed
from the same range of angles. Optical systems image a volume into a volume, not just
Upper R
im Ray
l Ra
Field Angle
θ = Aperture Angle
Figure 1.6. Definitions of aperture and field angle.
a plane into a plane, so the stops can’t always be at the exact image and transform
Aside: Vignetting. Aperture and field are defined in terms of axial points and axial rays.
There’s no guarantee that the aperture and field are exactly the same for other points and
other directions. Rays that get occluded somewhere other than the field or aperture stops
are said to have been vignetted . Vignetting isn’t always bad—it’s commonly used to get
rid of badly aberrated rays, which would degrade the image if they weren’t intercepted.
When a laser beam hits the edge of an aperture, it is also loosely termed vignetting,
even when it does happen at one of the stops.
1.3.8 Fourier Transform Relations
Fourier transforms crop up all the time in imaging theory. They are a common source
of frustration. We forget the transform of a common function or can’t figure out how
to scale it correctly, and what was a tool becomes a roadblock. This is a pity, because
Fourier transforms are both powerful and intuitive, once you have memorized a couple
of basic facts and a few theorems. In an effort to reduce this confusion, here are a few
things to remember. Following Bracewell, we put the factors of 2π in the exponents and
write g ⊃ G and G = Fg for “g has transform G:”
G(f ) =
g(x)e−i2πf x dx,
g(x) =
G(f )ei2πf x df.
A side benefit of this is that we work exclusively in units of cycles. One pitfall is that
since for a wave traveling in the positive direction, the x and t terms in the exponent
have opposite signs, so it is easy to get mixed up about forward and inverse transforms.
Physicists and electrical engineers typically use opposite sign conventions.
Useful Functions. The Heaviside unit step function U (x) is 0 for x < 0 and 1 for
x > 0. The derivative of U (x) is the Dirac δ-function, δ(x). Sinc and jinc functions come up in connection with uniform beams: sinc(x) = sin(π x)/(π x) and
jinc(x) = J1 (2π x)/(π x). Even and odd impulse pairs II(x) = [δ(x − 12 ) + δ(x + 12 )]/2
and II (x) = [δ(x + 12 ) − δ(x − 12 )]/2 have transforms cos(πf ) and i sin(πf ),
Conjugacy. Conjugate variables are those that appear multiplied together in the kernel
of the Fourier transform, such as time in seconds and frequency in hertz. In optical Fourier
transforms, the conjugate variables are x/λ and u, which is as before the direction cosine
of the plane wave component on the given surface, that is, u = kx /k.
Convolution. A convolution is the mathematical description of what a filter does in
the real time or space domain, namely, a moving average. If g(x) is a given data stream
and h(x) is the impulse response of a filter (e.g., a Butterworth lowpass electrical filter,
with x standing for time):
h(ξ )g(x − ξ )dξ =
g(ξ )h(x − ξ )dξ.
h(x) ∗ g(x) =
The second integral in (1.34) is obtained by the transformation u → x − ξ ; it shows
that convolution is commutative: g ∗ h = h ∗ g. Convolution in the time domain is multiplication in the frequency domain,
F(h ∗ g) = H G,
where capitals denote transforms, for example, G(f ) = F(g(x)). This makes things
clearer: since multiplication is commutative, convolution must be too. A lot of imaging
operations involve convolutions between a point spread function (impulse response) and
the sample surface reflection coefficient (coherent case) or reflectance (incoherent case).
The Huyghens propagator is also a convolution. Note that one of the functions is flipped
horizontally before the two are shifted, multiplied, and integrated. This apparently trivial
point in fact has deep consequences for the phase information, as we’ll see in a moment.
The convolution theorem is also very useful for finding the transform of a function,
which looks like what you want, by cobbling together transforms that you know (see
Example 1.5).
Symmetry. By a change of variable in the Fourier integral, you can show that g(−x) ⊃
G(−f ), g ∗ (x) ⊃ G∗ (−f ), g ∗ (−x) ⊃ G∗ (f ), and (if g is real) G(−f ) = G∗ (f ).
Correlation and Power Spectrum. The cross-correlation g h between functions g
and h is the convolution of g(x) and h∗ (−x): g h = g(x) ∗ h∗ (−x) ⊃ GH ∗ . This can
also be shown by a change of variables.
An important species of correlation is the autocorrelation, g g, whose transform is
GG∗ = |G|2 , the power spectrum. The autocorrelation always achieves its maximum
value at zero (this is an elementary consequence of the Schwarz inequality) and all phase
information about the Fourier components of g is lost.
Equivalent Width. We often talk about the width of a function or its transform. There
are lots of different widths in common use; 3 dB width, 1/e2 width, the Rayleigh criterion, and so on. When we come to make precise statements about the relative widths
of functions and their transforms, we talk in terms of equivalent width or sometimes
autocorrelation width. The equivalent width of a function is
we (g) =
−∞ g(x
)dx =
It is obvious from this that if g and G are nonzero at the origin, the equivalent width of
a function is the reciprocal of that of its transform. It is this relationship that allows us
to say airily that a 10-wavelength-wide aperture has an angular spectrum 0.1 rad wide.
Functions having most of their energy far from zero are not well described by an
equivalent width. For example, if we move the same aperture out to x = 200λ, it will
have a very large equivalent width (since g(0) is very small), even though the aperture
itself hasn’t actually gotten any wider. Such a function is best described either by quoting
its autocorrelation width, which is the equivalent width of the autocorrelation g g, or
by shifting it to the origin. (We commonly remove the tilt from a measured wavefront,
which is equivalent to a lateral shift of the focus to the origin.) Autocorrelations always
achieve their maximum values at zero. Since the transform of the autocorrelation is the
power spectrum, the autocorrelation width is the reciprocal of the equivalent width of
the power spectrum.
Shifting. Given g(x), then shifting the function to the right by x0 corresponds to subtracting x0 from the argument. If g(x) is represented as a sum of sinusoids, shifting it
this way will phase shift a component at frequency f by f x0 cycles:
F(g(x − x0 )) = e−i2πf x0 G(f ).
Scaling. A feature 10λ wide has a transform 0.1 rad wide (in the small-angle approximation). Making it a times narrower in one dimension without changing its amplitude
makes the transform a times wider in the same direction and a times smaller in height,
without changing anything in the perpendicular direction:
g(at) ⊃
You can check this by noting that the value of the transform at the origin is just the
integral over all space of the function.
Integrating and Differentiating. For differentiable functions, if g ⊃ G, then
⊃ i2πf G,
which is easily verified by integrating by parts. If g is absolutely integrable, then
+ Kδ(f ),
g dx ⊃
where K is an arbitrary integration constant. It follows from (1.39) that the derivative of
a convolution is given by
(h ∗ g) = h ∗
⊃ i2πf GH.
Power Theorem. If we compute the central value of the cross-correlation of g and h,
we get the odd-looking but very useful power theorem:
dx g(x)h (x) =
df G(f )H ∗ (f )
(e.g., think of g as voltage and h as current). With the choice g = h, this becomes
Rayleigh’s theorem,†
dx|g(x)| =
df |G(f )|2 ,
which says that the function and its transform have equal energy. This is physically
obvious when it comes to lenses, of course.
† The
same relationship in Fourier series is Parseval’s theorem.
Asymptotic Behavior. Finite energy transforms have to fall off eventually at high
frequencies, and it is useful to know how they behave as f → ∞. A good rule of thumb
is that if the nth derivative of the function leads to delta functions, the transform will
die off as 1/f n . You can see this by repeatedly using the formula for the transform of a
derivative until you reach delta functions, whose transforms are asymptotically constant
in amplitude.
Transform Pairs. Figure 1.7 is a short gallery of Fourier transform pairs.
Example 1.5: Cobbling Together Transforms. In analyzing systems, we often need a
function with certain given properties, but don’t care too much about its exact identity, as
long as it is easy to work with and we don’t have to work too hard to find its transform.
For example, we might need a function with a flat top, decreasing smoothly to zero on
Figure 1.7. A pictorial gallery of Fourier transform pairs. Bracewell has lots more.
both sides, to represent a time gating operation of width tg followed by a filter. The
gross behavior of the operation does not depend strongly on minor departures from ideal
filtering, so it is reasonable to model this function as the convolution of rect(t/tg ) with
a Gaussian:
m(t) = exp[−π(t/τ )2 ] ∗ rect
whose transform is
M(f ) = τ tg e−π(f τ ) sinc(f tg ).
One can write m(t) as the difference of two error functions, but the nice algebraic
properties of convolutions make the decomposed form (1.44) more useful.
1.3.9 Fourier Imaging
We have seen that a lens performs a Fourier transform between its front and back focal
planes, and that in k-space, the propagation operator involves Fourier decomposing the
beam, phase shifting the components, and reassembling them. There is thus a deep
connection between the imaging action of lenses and Fourier transforms. Calculating
the behavior of an imaging system is a matter of constructing an integral operator for
the system by cascading a series of lenses and free-space propagators, then simplifying.
Nobody actually does it that way, because it can easily run to 20th-order integrals. In a
system without aberrations, we can just use ray optics to get the imaging properties, such
as the focal position and numerical aperture, and then use at most three double integrals
to get the actual fields, as in Example 1.4.
Most of the time, we are discussing imaging of objects that are not self-luminous,
so that they must be externally illuminated. Usually, we accept the restriction to thin
objects —ones where multiple scattering can be ignored and the surface does not go in
and out of focus with lateral position. The reasoning goes as follows: we assume that our
incoming light has some simple form Ein (x, y), such as a plane wave. We imagine that
this plane wave encounters a surface that has an amplitude reflection coefficient ρ(x, y),
which may depend on position, but not on the angle of incidence, so that the outgoing
wave is
Eout (x, y) = Ein (x, y)ρ(x, y),
and then we apply the Huyghens integral to Eout .
Small changes in height (within the depth of focus) are modeled as changes in the
phase of the reflection coefficient. Since different plane wave components have different
values of kz , we apply a weighted average of the kz values over the pupil function. The
breakdown of this procedure due to the differences in kz z becoming comparable to a
cycle gives rise to the limits of the depth of focus of the beam. We ignore the possibility
that the height of the surface might be multiple-valued (e.g., a cliff or overhang) and any
geometric shadowing.
A very convenient feature of this model, the one that gives it its name, is the simple
way we can predict the angular spectrum of the scattered light from the Fourier transform
of the sample’s complex reflection ρ(x, y). The outgoing wave in real space is the product
ρEin , so in Fourier space,
Eout (u, v) = Ein (u, v) ∗ P (u, v)
where ρ(x/λ, y/λ) ⊃ P . The real power of this is that Eout (u, v) is also the angular
spectrum of the outgoing field, so that we can predict the scattering behavior of a thin
sample with any illumination we like.
If Ein is a plane wave, Ein (x, y) = exp[i2π(uin x + vin y)/λ], then its transform is
very simple: Ein (u, v) = δ(u − uin )δ(v − vin ). Convolution with a shifted delta function
performs a shift, so
Eout (u, v) = Ein (u − uin , v − vin ).
The angular spectrum of Eout is the spatial frequency spectrum of the sample, shifted
by the spatial frequency of the illumination—the spatial frequencies add. In an imaging
system, the spatial frequency cutoff occurs when an incoming wave with the largest
positive u is scattered into the largest negative u the imaging lens can accept.† Since
u2 + v 2 ≤ (NA)2 , if the NAs of the illumination and the collecting lenses are equal, the
highest spatial frequency an imaging system can accept is 2 NA/λ.
The conceptual deficiencies of this procedure are considerable, even with thin objects.
It works fine for large holes punched in a thin plane screen, but for more complicated
objects, such as transparent screens containing phase objects (e.g., microscope slides),
screens with small features, or nearly anything viewed in reflection, the approximations
become somewhat scalier. The conceptual problem arises right at the beginning, when
we assume that we know a priori the outgoing field distributions at the boundary.
There is no real material that, even when uniform, really has reflection or transmission coefficients independent of angle and polarization at optical frequencies, and the
situation is only made worse by material nonuniformity and topography. This and the
scalar approximation are the most problematic assumptions of Fourier optics; paraxial propagation is a convenience in calculations and not a fundamental limitation (see
Section 9.3.5).
1.3.10 The Pupil
As anyone who has ever been frustrated by an out-of-focus movie knows, the image
plane of an optical system is rather special and easily missed. Some other special places
in an optical system are less well known. The most important of these is the pupil ,
which is an image of the aperture stop. If we look into the optical system from the
object side, we see the entrance pupil . Looking from the image side, we see the exit
pupil . By moving from side to side, we can locate the position in space of a pupil
by how it moves in response. (This is the same way we tell how far away anything
There’s nothing magical about pupils, although they are talked about in terms that
may confuse newcomers—they really are just places in an optical system, which can be
imaged where you want them and otherwise manipulated just as a focal plane can.
The aperture stop is usually put at the Fourier transform plane, to avoid nonuniform
vignetting. The field distribution at a pupil then is the Fourier transform of that at the
object or an image, appropriately scaled and with an obliquity correction. Equivalently,
the field function at the transform plane is a scaled replica of the far-field diffraction
† There’s
nothing special about the choice of axes, so the limiting resolution might be different along y or at
other angles.
pattern of the object, as derived using the Huyghens integral (1.20). In an imaging
system, the propagator is a convolution in space, so the imaging properties are controlled
by the illumination pattern and detector sensitivity function at the transform plane. Since
the transform plane is usually at the pupil, these are loosely called pupil functions, and are
two-dimensional versions of the complex frequency response of an electronic system.†
(They are still called pupil functions even when the transform plane is not at the pupil.
Laziness is the father of invention.)
Aside: Perspective. The center of the entrance pupil (or really, of the Fourier transform
plane in the object space) is the center of perspective. If you’re trying to make a panoramic
view using an image mosaic, you’ll want both foreground and background objects to have
the same perspective—because otherwise, the positions of the joints between mosaic
elements would have to be different depending on the distance. You can accomplish this
by rotating the camera around its center of perspective.
1.3.11 Connecting Wave and Ray Optics: ABCD Matrices
This section could be subtitled “How to combine optical elements without drowning in
multiple integrals.” In an optical system consisting of lenses, mirrors, and free-space
propagation, it is possible to model the paraxial imaging properties by means of very
simple transformation matrices, one for each element or air space, which are multiplied
together to form a combined operator that models the entire system. Here we shall discuss
the 2 × 2 case, appropriate for axially symmetric systems or for systems of cylindrical
lenses whose axes are aligned. Generalization to 4 × 4 matrices is straightforward but
more laborious.
In the small-angle approximation (where sin θ ≈ θ ), a ray at height x above the optical
axis and propagating at an angle θ measured counterclockwise from the optical axis is
represented by a column vector (x, θ )T , and it transforms as
where the matrix abcd is the ordered product of the ABCD matrices of the individual
elements. Let’s do an example to see how this works.
Example 1.6: Deriving the ABCD Matrix for a Thin Lens. In the ray tracing section,
we saw that a thin lens brings all rays entering parallel to the axis and that a ray passing
through the center of the lens is undeviated. We can use these facts to derive the ABCD
matrix for a thin lens, as shown in Figure 1.8. The undeviated central ray, (0, θ )T is
unchanged, so element B must be zero and element D must be 1. The ray parallel to the
axis, (1, 0)T , remains at the same height immediately following the lens, so that element
† The analogy depends on the Debye approximation, so the exponential in/exponential out property of linear
systems doesn’t hold as accurately in Fourier optics as in most circuits, but it’s still pretty good if the Fresnel
number is high.
−1/f f
−f θ
Figure 1.8. Action of a lens, for deriving its ABCD matrix.
TABLE 1.2. ABCD Matrices for Common Operations
1 z
Free-space propagation through distance z
0 1
Thin lens of focal length f
−1/f 1
Magnification by M
0 1/M
0 −1
Fourier transform
A is also 1. However, it is bent so as to cross the axis at f , so element C must be −1/f .
Thus a thin lens has an ABCD matrix given in Table 1.2.
Optical layouts conventionally have light going from left to right, whereas matrix
multiplication goes right to left. Thus we have to write the matrix product backwards:
the ABCD matrix of the first element encountered by the beam goes at the right, with
subsequent operations left-multiplying it in succession, as in Figure 1.9.
It is straightforward to extend this formalism to small deviations from given angles
of incidence, for example, oblique reflection from a spherical mirror; when doing that,
however, excellent drawings are required to avoid confusion about just what is going on.
Example 1.7: Portraiture Calculation Using ABCD Matrices. Example 1.3 demonstrated how to find elementary imaging parameters such as magnification and focal
length rapidly using the thin-lens rules on rays passing through the center of the lens
and rays passing through the focus. Let us follow the path of a more general paraxial
ray using ABCD matrices. We note first that the light from the object propagates through
do = 1300 mm of free space, then a thin lens of focal length f = 73.5 mm, and finally
another free-space propagation through a distance di . The column vector representing the
(h, 0)T
(h, −h/f)T
(h, −h/do)T
(0, −h/f)T
(0, −h/do)T
(−h di/do, −h/f)T
(−h di/do, −h/do)T
1 di
1 do
−1/f 1
0 1
0 1
− do/di
1−di/f do + di −dodi/f
1 −do/f
−1/f −1/M
Figure 1.9. Imaging geometry with ray vectors and ABCD matrices: rays (h, θ )T are successively
multiplied by ABCD matrices corresponding to free space do , a lens of focal length f , and free
space di . For the imaging condition, 1/do + 1/di = 1/f , which makes the last three equalities
ray must be acted on by the matrix operators (written in reverse order as already noted):
0 1
1 − di /f do + di − di do /f
⎣ ⎦
1 − do /f
1 − 0.0136di 1300 + 18.68di
1 di
1 do
0 1
Comparing the algebraic form of the matrix product in (1.50) to the prototypes in
Table 1.2, it is apparent that the combination of a lens plus free space on either side
behaves as a Fourier transformer (scaled by a magnification of −f ) when do = di = f .
Furthermore, the imaging condition demands that all rays leaving an object point coincide
at the same image point; this means that b, the (1, 2) element of the matrix, must be
zero, which reproduces (1.27). These sorts of considerations are very valuable for more
complex systems, where the thin-lens ray picture is cumbersome.
Useful as these are, matrix multiplication is not sufficiently powerful to model such
elementary operations as the addition of a thin prism of angle φ and index n, which
requires adding an angle θ = (n − 1)φ. These matrix operators ignore wave effects
and are completely unable to cope with absorbing or partially scattering objects such as
beamsplitters and diffraction gratings. While these can of course be put in by hand, a
more general operator algebra is desirable, which would take account of the wave nature
of the light and model light beams more faithfully.
Nazarathy and Shamir† have produced a suitable operator algebra for Fourier optics.
The key simplification in use is that they have published a multiplication table for these
operators, which allows easy algebraic simplification of what are otherwise horrible
high order multiple integrals. These transformations are in principle easy to automate
and could be packaged as an add-on to symbolic math packages. This algebra takes
advantage of the fact that commonly encountered objects such as lenses, gratings, mirrors,
prisms, and transparencies can be modeled as operator multiplication in the complex field
representation, which (as we have seen earlier) many cannot be so modeled in the ray
Another way of coming at this is to use ABCD matrices for the operator algebra,
and then convert the final result to a Huyghens integral. In the paraxial picture, an
axisymmetric, unaberrated, unvignetted optical system consisting of lenses and free space
can be expressed as a single ABCD matrix, and any ABCD matrix with d = 0 can be
decomposed into a magnification followed by a lens followed by free space:
1 z
0 1
f =−
Element a does not appear because that degree of freedom is used up to ensure that
the determinant of the matrix is unity, as required by the conservation of phase space
volume. A magnification by M corresponds to the integral operator
(x, y) =
x y
dx dy (x , y )δ x −
δ y −
Identifying these matrix operators with the corresponding paraxial integral operators
(1.20), (1.30), and (1.53), we can construct the equivalent integral operator to a general
ABCD matrix with d = 0:
(x, y) =
dx dy x 2 + y 2
iπ (x − x )2 + (y − y )2
x y
† M.
Nazarathy and J. Shamir, First-order optics—a canonical operator representing lossless systems. J. Opt.
Soc. Am. 72, 356–364 (March 1982).
This transformation is simple, and it can save a lot of ugly integrals. The special case
where d = 0 corresponds to a lens, followed by a scaled Fourier transform:
The constraint that c = −1/b keeps the determinant 1, as before. Here the parameters
are M = −b, f = −b/a so that the equivalent integral in the wave picture is
(x, y) =
dx dy (Mx , My ) exp
iπ M
x 2 + y 2
xx + yy +
These two equivalences allow wave and ray descriptions of the same optical system to
be freely interchanged, which is very convenient in calculations.
The problem of offsets, both in position and angle, can be dealt with by using the
augmented vectors [x, θ, 1]T and 3 × 3 matrices. A general element producing a transformation ABCD and then adding an offset [x, θ ]T is then
θ ⎦ .
This is especially useful in getting some idea of the allowable tolerances for wedge
angle, tilt, and decentration; a lens of focal length f , decentered by a distance d, adds an
angular offset θ = d/f . Similarly, a window of thickness t and index n, whose axis is
α degrees off the normal, looks like a free-space propagation of t/n with a spatial offset
of x = αt (n − 1). In the integral representation, offsets are modeled as convolutions
with shifted δ-functions; a shift of ξ is a convolution with δ(x − ξ ).
Aside: Complex ABCD Matrices and Diffraction. Siegman† shows that a Gaussian
amplitude apodization can be modeled using the ABCD matrix for a thin lens with an
imaginary focal length. This isn’t magic, it’s just that a thin lens is a multiplication by an
imaginary parabolic exponential, I (x) = exp[iπ x 2 /(λf )], so a Gaussian of 1/e2 radius
w, A(x) = exp(−x 2 /w2 ), might be said mathematically to be a lens of focal length
iπ w 2 /λ.‡ Thus by making the first (rightmost) ABCD matrix a Gaussian aperture,
−iλ/(π w 2 )
you can carry the beam radius right through the ABCD calculation, including converting
it to a Helmholtz integral. This makes it simple to find the beam waist, for instance,
and if you’re building interferometers with very small diameter beams, allows you to
calculate the phase front matching at the beam combiner.
† A.
E. Siegman, Lasers. University Science Books, Mill Valley, CA 1986, pp. 786– 797.
that we’re using field amplitudes and not intensity here.
‡ Note
1.3.12 Source Angular Distribution: Isotropic and Lambertian Sources
A light source (such as the Sun) whose output is independent of direction is said to be
isotropic; there’s no special direction. Light inside an integrating sphere (Section 5.7.7)
is also isotropically distributed because there’s no special direction. When there’s a
surface involved, though, things change, on account of obliquity. If you shine a light on
a perfectly matte-finished surface, its surface looks equally bright no matter what angle
you look from. If you tilt it, it gets foreshortened by the perspective, but if you looked
at it from a distance through a drinking straw, you wouldn’t be able to tell from its
brightness whether it was tilted or not. A surface that passes the drinking-straw test is
said to be Lambertian.
If you replace your eye with a photodiode, each drinking-straw patch of surface
contributes the same amount of photocurrent. As the angle increases, the patches get
longer like evening shadows, so cos θ times fewer patches will fit on the surface of the
source. Another way to put this is that the total projected area of the source goes down
like the cosine of the angle of incidence, so the detected photocurrent will be multiplied
by the obliquity factor cos θ . Obliquity factors come in just about everywhere—often
disguised as n̂ · ∇ψ —and sometimes they seem mysterious, but all that’s really going
on is that shadows get longer in the evening.
1.3.13 Solid Angle
Waves expand as they propagate, but as the ray model predicts, a given wave’s angular
spread is asymptotically constant as R → ∞. A plane angle is measured between two
straight lines, so its measure doesn’t depend on how far out you go. A cone has the
same property in three dimensions, leading to a natural generalization, solid angle. The
measure of a plane angle is the arc length cut out by the angle on the unit circle, so we
define the solid angle of a cone to be the area it cuts out of the unit sphere. (Note that
the cone need not be circular in cross section, or convex in outline, or even be a single
glob—it just needs to have a shape that’s independent of distance from the vertex.) This
area is of course
sin θ dθ dφ,
where θ is the polar angle (measured from the surface normal) and φ is the azimuth
(angle in the horizon plane). In optics, we’re normally calculating the flux in or out of a
surface, so we have to worry about obliquity. Once again, obliquity is nothing deep or
difficult to understand—when a beam of light hits a surface at an angle θ off normal, the
illuminated patch is stretched out by sec θ , just as afternoon shadows of vertical objects
lengthen as tan θ . Mathematically, the outward flux through each bit of surface dA is
P · dA. It simplifies matters if we fold the obliquity into the quoted solid angle, so we
usually work with the projected solid angle , where
sin θ cos θ dθ dφ.
To crispen this idea up, consider a circular angular pattern of half-angle ψ around the
surface normal, that is, one that covers the angular disc θ < ψ, 0 ≤ φ < 2π . Its solid
angle is = 2π(1 − cos ψ) = π(ψ 2 − ψ 4 /12 + · · ·) and its projected solid angle is
= π sin2 ψ = π(ψ 2 − ψ 4 /3 + · · ·). Conveniently, if n = 1 then = π(NA)2 , which
is a useful and easily remembered rule.
A Lambertian surface (one that has no preferred direction) emits into π steradians (the
projected solid angle of a hemisphere). Optics folk tend to be loose about the distinction
between and , but it isn’t hard to keep straight—if the emitter or receiver is a
surface, there’s obliquity to worry about, so use ; if not (e.g., as in gas spectroscopy)
use . In the usual low-NA situations, the two are equivalent for practical purposes.
There are two cautions to keep in mind: first, be careful if the surface isn’t flat—it’s
the angle between the local surface normal and the light rays that matters. Second, both
solid angle and obliquity are far-field concepts, so near a focus we have to use the plane
wave decomposition of the field to get the right answer.
1.3.14 Étendue: How Much Light Can I Get?
The first thing that an optical system has to be able to do is transmit light. Apart from
solar telescopes, electro-optical systems are limited at least some of the time by how
much light they can emit, collect, or detect. Figuring out how much you have and how
much more you can get is the aim of radiometry. In Section 1.3.11, we saw that a given
light beam can be focused into a smaller area, but only at the price of increasing its
numerical aperture. Since sin θ cannot exceed unity, a given beam cannot be focused
arbitrarily tightly. Some beams can be focused better than others; for example, a beam
from an incandescent bulb cannot be focused as tightly as one from a laser. The difference
is in their degree of spatial coherence.
The spatial coherence of a beam is a measure of how well its different components
(Fourier or real-space) stay in phase with each other. This is revealed by how deep
the interference fringes are when different components are made to interfere with one
another, as in Young’s slit experiment (there’s more on this in Section 2.5.4). The theory
of imaging with partially coherent light is discussed by Goodman, Born and Wolf, and
others and is beyond the scope of this book. As a practical matter, we usually want spatial
coherence low enough to eliminate fringes in a full-field (i.e., not scanning) imaging
system and high enough not to limit our ability to focus it on our area of interest. The
coherence area of an optical field gives an idea of how far apart the slits can be and still
have interference.
Conversely, one important attribute of an optical system is how well it can cope with
low coherence sources. To transmit the most light from such sources, the system needs
both a large area and a large angular acceptance. The figure of merit for this attribute is
called the étendue and is given by
E = n2 A ,
where A is the clear area and n is the refractive index of the medium in which the
projected solid angle is measured. It’s usually just written A , which assumes that
n = 1, but we’ll carry the n along explicitly. For on-axis circular pupils (the usual
case), E = n2 Aπ(NA)2 . This quantity is invariant under magnification, which increases A
while decreasing proportionately, and under refraction. Étendue is a purely geometric
property, which explicitly neglects the transmittance of the optical system. This is fine
as long as this is reasonably uniform up to the edges of A and . It is less useful
with systems whose transmittance is a strong function of angle, high-index dielectric
interfaces. The useful étendue is not preserved on passing through a succession of such
elements, so the transmittance must be expressed as a function of position and angle,
and carried along mathematically. Étendue is related to the statistical mechanics notion
of phase space volume, and the conservation of étendue is the optical analogue of the
conservation of phase space volume by adiabatic processes. (The ABCD matrices of
Section 1.3.11 are all unitary, which is the paraxial version of this.)
With a given low coherence source, any two lossless optical systems with the same
étendue will pass the same total optical power, if the source is matched to their characteristics with an appropriate magnification. Any mismatch will reduce the power actually
transmitted. A corollary is that the étendue of a system stays the same if you send the
light back through the other way. The étendue of an optical system cannot be larger than
that of its poorest component, and can easily be worse due to mismatch. This is worth
keeping in mind, for example, in the choice of polarizing prisms; types relying on total
internal reflection (such as the Glan–Taylor) have much smaller acceptance angles than
those relying on double refraction (such as Wollastons), so a bigger prism can have a
smaller étendue.
Example 1.8: Étendue and Mismatch. Consider coupling sunlight into a 100×, 0.95
NA microscope objective (f = 2 mm, FOV diameter = 100 μm). If we shine
sunlight (9 mrad angular diameter) in the pointy end, we get an effective n2 A of
π (0.005 cm)2 [π(4.5 mrad)2 ] = 5 × 10−9 cm2 · sr. If we turn it around, the étendue is
unaltered, but we get the 6 mm diameter back element instead. The angular acceptance
on the exit pupil is a few degrees, so we don’t lose anything, and the effective n2 A
goes up by a factor of 3600 to 1.8 × 10−5 cm2 · sr—and the source is still mismatched.
1.3.15 What Is ‘‘Resolution’’?
The classical definitions of Rayleigh and Sparrow specify that two point-like objects
of equal brightness are resolved if their images are separated by a defined multiple of
the diameters of their diffraction discs. This definition is reasonably adequate for photographic detection, where the photon statistics do not significantly reduce the precision of
the measurement.
With modern detectors, it is impossible to specify the resolution of an optical system
when signal-to-noise considerations are absent. For example, for a two-point object, one
can model the image as the sum of two Airy patterns, whose locations and intensities are
parameters. By fitting the model to the observed data, the positions and intensities of the
two sources can be extracted. With a high enough signal-to-noise ratio and a sufficiently
accurate knowledge of the exact imaging characteristics of our systems, there is no clear
limit to the two-point resolution of an optical system, as defined in this way. Optical
lithography is another example where the “resolution limit” has repeatedly turned out not
to be where it was expected, largely on account of the very high contrast of photoresist
and, recently, phase shift masks and computational mask design.
What we really mean by resolution is the ability to look at an object and see what is
there, in an unambiguous way that does not depend on our choice of model. This modelindependent imaging property does degrade roughly in line with Rayleigh and Sparrow,
but it is a much more complicated notion than simple two-point resolution. Most of the
disagreement surrounding the subject of resolution is rooted here.
To calculate what an instrument will detect, we need to know how to model the operation
of a photodetector. Fortunately, this is relatively simple to do, providing that the detector
is reasonably uniform across its sensitive area. From a physical point of view, all detectors
convert optical energy into electrical energy, and do it in a square-law fashion—the
electrical power is proportional to the square of the optical power, with a short time
average through the detector’s impulse response. Throughout the rest of this chapter, we
will normalize the scalar field function ψ so that the (paraxial) power function ψψ ∗ has
units of watts per square meter.
A general square-law detector with an input beam ψ(x) and a responsivity R will
produce an output signal S given by
S(t) = R
ψ(x, t)∗ n̂ · ∇ψ(x)/kd 2 x
which for small NA is
S(t) = R
|ψ(x), t)|2 d 2 x,
where angle brackets denote time averaging through the temporal response of the detector
and the integral is over the active surface of the detector. The gradient ∇ψ is parallel
to the local direction of propagation (see Section 9.2.3) and the dot product supplies the
obliquity factor, as we saw in Section 1.3.13. If the detector is seriously nonuniform, the
responsivity becomes a function of x, so R(x) must be put under the integral sign.
The square-law behavior of detectors has many powerful consequences. The first is
that all phase information is lost; if we want to see phase variations, we must convert
them to amplitude variations before the light is detected. Furthermore, provided that no
light is lost in the intervening optical components (watching vignetting especially), the
detector can in principle be placed anywhere in the receiving part of the optical system,
because the time averaged power will be the same at all positions by conservation of
energy. This has great practical importance, because we may need to use a small detector
in one situation, to minimize dark current or ambient light sensitivity, and a large one in
another, to prevent saturation and attendant nonlinearity due to high peak power levels.
The small detector can be put near focus and the large one far away. This freedom applies
mathematically as well; provided once again that no additional vignetting occurs, (1.63)
can be applied at an image, a pupil, or anywhere convenient.† This becomes very useful
in interferometers.
As in all interactions of light with matter, the surface properties of the detector and their
variation with position, polarization, and angle of incidence are important. Fortunately,
detector manufacturers endeavor to make their products as easy to use as possible, so
that the worst nonuniformities are eliminated, and in addition, by the time the light gets
to the detector, its numerical aperture is usually reduced sufficiently that obliquity factors
and dependence on overall polarization are not too serious. As usual, they can be put
in by hand if needed, so we’ll continue to use the scalar model and neglect these other
There are a fair number of head-scratchers associated with square-law detection. We’ll
talk more about it in Section 3.3.
† This
is exact and not a Debye approximation.
1.5.1 Interference
An interferometer is nothing more than a device that overlaps two beams on one detector,
coherently, rather than combining the resulting photocurrents afterwards, incoherently.
Coherent addition allows optical phase shifts between the beams to give rise to signal
changes. In many applications the two beams are different in strength and the weaker one
carries the signal information. Consequently, they are often referred to as the signal and
local oscillator (LO) beams, by analogy with superheterodyne radios. Coherent detection
gives the optical fields the chance to add and subtract before the square law is applied,
so that the resulting photocurrent is
|ψLO (x)e−i(ωLO t+φLO (x, t)) + ψS (x)e−i(ωS t+φS (x, t)) |2 dA
i(t) = R
= iLO + iS + iAC ,
assuming that the beams are polarized identically (if there are signal beam components in
the orthogonal polarization state, they add in intensity, or equivalently in i). The individual
terms are
iLO = R
d 2 x ψLO ψLO
iS = R
d 2 x ψS ψS∗ ,
iAC = 2R Re
d x
ψLO ψS∗
= 2R Re exp(−i ωt)
|ψLO (x)||ψs (x)| exp(i φ(x t))dA .
The first two terms, iLO and iS , are the photocurrents the two beams would generate if
each were alone. The remaining portion is the interference term. It contains information
about the relative phases of the optical beams as a function of position. The interference
term can be positive or negative, and if the two beams are at different optical frequencies
it will be an AC disturbance at their difference frequency ω. If the two beams are
superposed exactly and have the same shape (i.e., the same relative intensity distributions,
focus, and aberrations), ψLO and ψS differ only by a common factor, so the interference
term becomes
iAC = 2 iLO iS cos(ωt + φ).
Aside: Fringe Visibility. Looking at the light intensity on the detector (or on a sheet of
paper), we can see a pattern of light and dark fringes if the light is sufficiently coherent.
These fringes are not necessarily nice looking. For laser beams of equal strength, they
will go from twice the average intensity to zero; for less coherent sources, the fringes
will be a fainter modulation on a constant background. The contrast of the fringes is
expressed by their visibility V ,
V =
Imax − Imin
Imax + Imin
which we’ll come back to in Section 2.5.4 in the context of coherence theory.
1.5.2 Coherent Detection and Shot Noise: The Rule of One
Application of coherent detection to improve the signal-to-noise ratio is covered in
Section 3.11.7. There are three key observations to be made here: coherent detection
is extremely selective, preserves phase information, and provides noiseless signal amplification.† These three properties give it its power. If the two beams are exactly in phase
across the entire detector, the amplitude of the interference term is twice the square root
of the product of the two DC terms:
iAC (peak) = 2 iLO iS .
If iS is much weaker than iLO , this effectively represents a large amplification of ψS .
The amplification is noiseless—the LO shot noise is
iNshot =
2eiLO ,
which is exactly
√ the rms value of iAC when iS is 1 electron per second (the noise current
is down by 2 due to the ensemble average over 2π phase). Thus with η = 1, a signal
beam of 1 photon/s is detectable at 1σ confidence in 1 s in a 1 Hz bandwidth, which is
a remarkable result—bright-field measurements can be made to the same sensitivity as
dark-field measurements.
This leads us to formulate the Shot Noise Rule of One: One coherently added photon
per One second gives an AC measurement with One sigma confidence in a One hertz
bandwidth. (See Sections 1.8.1 and 13.1 for more on AC versus DC measurements.)
Aside: Photons Considered Harmful. Thinking in terms of photons is useful in noise
calculations but pernicious almost everywhere else—see Section 3.3.2.
1.5.3 Spatial Selectivity of Coherent Detection
If the phase relationship is not constant across the detector, fringes will form, so the
product ELO Es∗ will have positive and negative regions; this will reduce the magnitude
of the interference term. As the phase errors increase, the interference term will be
reduced more and more, until ultimately it averages out to nearly zero. This means that
a coherent detector exhibits gain only for signal beams that are closely matched to the
LO beam, giving the effect of a matched spatial filter plus a noiseless amplifier.
† A.
V. Jelalian, Laser Radar Systems. Artech House, Boston, 1992, pp. 33–41.
Another way to look at this effect is to notionally put the detector at the Fourier transform plane, where the two initially uniform beams are transformed into focused spots.
A phase error that grows linearly across the beam (equally spaced fringes) corresponds
to an angular error, which in the transform plane means that the two focused spots are
not concentric. As the phase slope increases, the spots move apart, so that their overlap is greatly reduced. Ultimately, they are entirely separate and the interference term
drops to zero. Mathematically these two are equivalent, but physically they generally
are not.
If the detector is placed at a focused spot, the local photocurrent density can be so
large as to lead to pronounced nonlinearity; this is much less of a problem when the
beams fill a substantial fraction of the detector area. On the other hand, two spots that
do not overlap will not give rise to any interference term whatsoever, which is not in
general true of two broad beams exhibiting lots of interference fringes; even if the broad
beams are mathematically orthogonal, small variations in sensitivity across the detector
will prevent their interference pattern from averaging to exactly zero.
It is hard to say exactly how serious this effect is in a given case, as it depends strongly
on the details of the sensitivity variations. Sharp, strong variations (e.g., vignetting) will
give rise to the largest effects, while a smooth center-to-edge variation may do nothing at
all noticeable. If the application requires >40 dB (electrical) selectivity between beams at
different angles, consider changing focus to separate them laterally, or relying on baffles
or spatial filters as well as fringe averaging.
1.6.1 Two-Beam Interferometers
Two-beam interferometers implement the scheme of Section 1.5 in the simplest way: by
splitting the beam into two with a partially reflecting mirror, running the two through
different paths, and recombining them. Figure 1.10 shows the heavy lifters of the interferometer world, the Michelson and Mach–Zehnder. Mach–Zehnders are more common
in technological applications, because the light goes in only one direction in each arm,
so it’s easier to prevent back-reflections into the laser. On the other hand, a Michelson
is the right choice when robust alignment is needed, because one or both mirrors can be
replaced by corner cubes (don’t tell anyone, but if the cubes are offset from the beam axis,
Output 2
Beam Splitter
Output 1
Output 1
Figure 1.10. Workhorse two-beam interferometers: (a) Michelson and (b) Mach–Zehnder. The
compensator plate in (a) more or less eliminates the effects of dispersion, which will smear out
the white-light fringes otherwise, and also reduces the effect of finite aperture.
that’s really a skinny Mach–Zehnder). Example 1.12 shows an intensive use of a corner
cube type interferometer. Michelsons are a bit easier to align, because autocollimation
(sending the beam back on itself) is an easy criterion to use.
An interferometer is intrinsically a four-port device; light is steered between the output
ports by interference. If the two beams are perfectly coherent with one another, the output
powers PO+ and PO− from the two output ports are
PO± = P1 + P2 ± 2 P1 P2 cos φ,
where P1 and P2 are the split beam powers and φ is the phase angle between them. The
sign difference comes from beam 1 being reflected and beam 2 transmitted going into
port + and vice versa for port −.
1.6.2 Multiple-Beam Interferometers: Fabry–Perots
If instead of splitting the light into multiple paths, we just take two partially reflecting
mirrors and put them next to each other, parallel, we get a Fabry–Perot (F-P) interferometer. The multiple reflections give Fabry–Perots much greater selectivity for a given
size, at the expense of far greater vulnerability to mirror errors and absorption. We’ll
go through the math in Section 5.4, but the upshot is that a plane-mirror F-P whose
mirrors have reflectance R and are spaced d apart in a medium of index n has a total
TF -P =
(1 − R)2
where θ is the angle of incidence of the beam on the mirrors (inside the medium). This
obviously consists of spikes at multiples of ν = 1/(2nd), the free spectral range (FSR).
The√FWHM of the peaks is FSR/F, where F is the finesse. Although F is nominally
(π R)/(1 − R), it is really a measured quantity, because mirror flatness errors are
usually the limiting factor. If the mirrors have rms error δ in waves, that limits the
finesse to
Fmax < 1/(2δ),
which is a pretty serious limitation most of the time—achieving a finesse of 100 requires
mirror accuracy and alignment precision of better than λ/200. Real F-Ps have a peak T
less than 1 (sometimes a lot less), and their total reflectance RF-P is less than 1 − TF-P .
Recently, fabrication precision and coating quality have advanced to the point where
finesse values of 2 × 105 or higher can be obtained, at a price.
Inside a F-P, the field is enhanced a great deal; the easiest way to calculate it is
to notice that the forward and backward propagating fields inside the cavity are nearly
equal, and that the transmitted power has to be T times the forward power. In a perfect
F-P, that means that if Pinc is coming in, the forward power inside is TF-P /(1 − R) · Pinc .
1.6.3 Focused-Beam Resonators
Fabry–Perots can have variable or fixed spacing; a fixed F-P is called an etalon. Etalons
can be tuned over a small range by tipping them, but the finesse drops pretty fast when
you do that since the N th reflections start missing each other completely.
The highest finesse F-Ps are not plane-mirror devices, but rather more like laser resonators; as the finesse goes up, even small amounts of diffraction become an increasing
difficulty. They need careful matching of the incoming wave to the spherical wave cavity
mode, which is a big pain; fortunately, single-mode fiber coupled ones are available—buy
that kind if you possibly can. Otherwise, not only do you face critical matching problems, but the minor pointing instability of the laser will turn into noise that cannot
easily be removed. Fibers have their problems, but very high finesse focused F-Ps are
much worse.
Aside: Confocal Cavities and Instability. It might seem that the ideal optical resonator
would be a confocal cavity, where the two mirrors’ centers of curvature coincide at the
center of the cavity. This is not so, at least not for lasers. Such a cavity is a portion of
a single sphere, and there is no special direction in a sphere—any direction is as good
as any other, hence a confocal resonator has no stable axis. A tilt of radians in one
mirror produces a shift of the resonator axis of δθ ≈ [L/(L)], where L is the distance
between the mirror vertices and L is the distance between their foci—which goes to ∞
as L → 0. The NA of the resonant mode depends on how far off confocal the cavity
is—resonant wavefronts will coincide with the cavity mirror surfaces, so NA = 0 for
planar mirrors, NA = 1 for confocal mirrors, and in between, the NA can be backed out
from the equation for R(z) in Table 1.1. (Should L be positive or negative?)
1.7.1 Basis
Photons are like money: a certain number are needed for the job at hand, and they’re
easier to lose than to gain back. Thus the idea of a budget applies to photons as to
finances, but it is more complicated in that not all photons are useful—as though we
had to budget a mixture of green and purple dollars. A photon budget is an accounting
of where photons come from, where they go, and how many are expected to be left by
the time they are converted to electrons by the detector. Also like the other kind, people
sometimes don’t even expect to be on budget; they settle for “this was the best we could
do, but I’m not sure what was the problem.” In the author’s experience, it is possible to
achieve an SNR within 3 dB of budget almost always, and 1 dB most of the time. Don’t
give up, this theory stuff really works.
On the other hand, the budget must be realistic too. Don’t try to measure anything in
a bandwidth of less than 10 Hz, unless you have lots of cheap graduate students, and
remember that you need a decent signal-to-noise ratio to actually do anything. Sensitivity
limits are frequently given as noise equivalent power (NEP) or noise equivalent temperature difference (NET or NETD), meaning the amount of signal you need in order to
have a signal-to-noise ratio of 1, or equivalently a confidence level of 1 σ (68%). Don’t
let this convince you that an SNR of 1 is useful for anything, because it isn’t. Generally
for any reasonable measurement you need an SNR of at least 20 dB—even to do a single
go/no-go test with a reasonable false call probability, you’ll need at least 10 or 15 dB
(3–5 σ ). Tolerable images need at least 20 dB SNR; good ones, about 40 dB. Just how
large an SNR your measurement has to have is a matter of deepest concern, so it should
be one of the first things on the list. Don’t rely on a rule of thumb you don’t understand
fully, including this one. There’s lots more on this topic in Section 13.6.
Arriving at a photon budget and a set of operational specifications is an iterative
process, as the two are inseparably linked. As the concepts are simple, the subtlety
dwells in the actual execution; we will therefore work some real-life examples. The first
one will be a shot-noise-limited bright-field system; the second, a background-limited
dark-field system, and the third, an astronomical CCD camera. Before we start, you’ll
need to know how to combine noise sources (see Section 13.6.7) and to think in decibels.
Aside: Decibels. One skill every designer needs is effortless facility with decibels.
There are just two things to remember: first, decibels always measure power ratios,
never voltage. G(dB) = 10 log10 (P2 /P1 ) (we’ll drop the subscript from now on and use
“ln” for natural log). That formula with a 20 in it is a convenience which applies only
when the impedances are identical, for example, a change in the noise level at a single
test point. If you don’t remember this, you’ll start thinking that a step-up transformer
has gain. Second, you can do quick mental conversions by remembering that a factor of
two is 3 dB (since log10 (2) ≈ 0.3010), a factor of 10 is 10 dB, and ± 1 dB is 1.25× or
0.8× (since 100.1 ≈ 1.259). For example, if you’re looking at the output of an amplifier,
and the noise level changes from 10 mV to 77 mV, that’s about an 18 dB change: you
get 80 from 10 by multiplying by 10 and dividing by 1.25 (20 dB − 2 dB, remembering
that dB measure power), or by multiplying by 2 three times (6 + 6 + 6 = 18).
Example 1.9: Photon Budget for a Dual-Beam Absorption Measurement. One way of
compensating for variations in laser output is to use two beams, sending one through
the sample to a detector and the other one directly to a second detector for comparison.
A tunable laser (e.g., Ti:sapphire or diode) provides the light. The desired output is the
ratio of the instantaneous intensities of the two beams, uncontaminated by laser noise,
drift, and artifacts due to etalon fringes or atmospheric absorption. For small absorptions,
the beams can be adjusted to the same strength, and the ratio approximated by their
difference divided by a calibration value of average intensity,
nsig (λ, t) − ncomp (λ, t)
nsig (λ, t)
ncomp (λ, t)
ncomp (λ)
where nsig and ncomp are the photon flux (s−1 ) in the beams. The total electrical power
in the signal part is
Psig = [ηe(nsig − ncomp )]2 RL .
In the absence of other noise sources, shot noise will set the noise floor:
iNshot = e 2η(ncomp + nsig ).
For 1 mW per beam at 800 nm and η = 1, this amounts to a dynamic range (largest
electrical signal power/noise electrical power) of 150 dB in 1 Hz, or a 1σ absorption of
3 parts in 108 . Real spectrometers based on simple subtraction are not this good, due
primarily to laser noise and etalon fringes. Laser noise comes in two flavors, intensity
and frequency; it’s treated in Section 2.13.
Laser noise cancelers use subtraction to eliminate intensity noise and actually reach
this shot noise measurement limit (see Sections 10.8.6 and 18.6.3). When frequency
noise is a problem (e.g., in high resolution spectroscopy) we have to stabilize the laser.
Frequency noise also couples with the variations in the instrument’s T versus λ to
produce differential intensity noise, which in general cannot be canceled well. If the
instrument’s optical transmittance is T and the laser has a (one-sided) FM power spectrum
S(fm ) (which we assume is confined to frequencies small compared to the scale of T ’s
variation), FM-AM conversion will contribute rms current noise iNfa :
iNfa (f ) = nηeS(f )
If this noise source is made small, and the absorption can be externally modulated,
for example, by making the sample a molecular beam and chopping it mechanically,
the shot noise sensitivity limit can be reached fairly routinely. Note that this is not the
same as a measurement accuracy of this order; any variations in T or drifts in calibration
will result in a multiplicative error, which, while it goes to zero at zero signal, usually
dominates when signals are strong.
Example 1.10: Photon Budget for a Dark-Field Light Scattering System. Many systems (e.g., laser Doppler anemometers) detect small amounts of light scattered from
objects near the waist of a beam. Consider a Gaussian beam with P = 0.5 mW and
0.002 NA at 633 nm. From Table 1.1, the 3 dB beam radius at the waist is 70 μm, and the
central intensity is 2P π(NA)2 /λ2 = 3.1 × 104 W/m2 , which is 1.0 × 1023 photons/m2 /s.
A sufficiently small particle behaves as a dipole scatterer, so that the scattered light
intensity is zero along the polarization axis, and goes as the sine squared of the polar
angle θ . It will thus scatter light into 2π steradians, so if the total scattering cross section
of the particle is σtot , the averaged scattered flux through a solid angle (placed near the
maximum) will be F = 1.0 × 1023 σtot /(2π ). (Larger particles exhibit significant angular structure in their scattering behavior, with a very pronounced lobe near the forward
direction.) A 1 cm diameter circular detector at a distance of 5 cm from the beam waist
will subtend a projected solid angle of = π(NA)2 = π(0.5/5)2 ≈ 0.03. A particle
crossing the beam waist will thus result in N = 5 × 1021 σtot photons/s.
If the detector is an AR-coated photodiode (η ≈ 0.9) with a load resistor RL = 10 M,
then in a time t (bandwidth 1/(2t) Hz), the 1 Hz rms Johnson noise current (4kT B/R)1/2
is (2kT /107 )1/2 , which is 0.029 pA or 1.8 × 105 electrons/s (which isn’t too good).
From Section 13.6.15, we know that in order to achieve a false count rate R of 1 in
every 106 measurement times, we must set our threshold at approximately 5.1 times the
rms amplitude of the additive Gaussian noise (assuming that no other noise source is
present). Thus a particle should be detectable in a time t if σtot exceeds
σmin =
1.8 × 10−16
1.8 × 105 (5.1)
5 × 1021 η t
η t
where η is the quantum efficiency. A particle moving at 1 m/s will cross the 140 μm
3-dB diameter of this beam in 140 μs, so to be detectable it will need a scattering cross
section of at least 1.6 × 10−14 m2 , corresponding to a polystyrene latex (PSL) sphere
(n = 1.51) of about 0.2 μm diameter.
Now let’s sanity-check our assumptions. For the circuit to respond this fast, the time
constant Cdet RL must be shorter than 100 μs or so, which limits the detector capacitance
to 10 pF, an unattainable value for such a large detector. Even fast detectors are generally
limited to capacitances of 50 pF/cm2 , so assuming the detector capacitance is actually
100 pF, we cannot measure pulses faster than 1 to 2 ms with our system as assumed. There
are circuit tricks that will help considerably (by as much as 600×, see Section 18.4.4),
but for now we’ll work within this limit. If we can accept this speed limitation, and
the accompanying ∼10× decrease in volumetric sampling rate, we can trade it in for
increased sensitivity; a particle whose transit time is 2 ms can be detected at σmin =
4 × 10−15 m2 . If the particle is much slower than this, its signal will start to get down
into the 1/f noise region, and non-Gaussian noise such as popcorn bursts will start to
be important. Already it is in danger from room lights, hum, and so on.
If the detector is a photon-counting photomultiplier tube (PMT, η ≈ 0.2), the noise is
dominated by the counting statistics, and the technical speed limitation is removed (see
Section 3.6.1). However, PMTs have a certain rate of spurious dark counts. Dark counts
generally obey Poisson statistics, so if we assume a mean rate Ndark = 200 Hz, then in a
140 μs measurement, the probability of at least one dark count is 200 × 140μs ≈ 0.028.
We are clearly in a different operating regime here, where the fluctuations in the dark
count are not well described by additive Gaussian noise as in the previous case. From
Section 13.6.16, the probability of a Poisson process of mean rate λ per second producing
exactly M counts in t seconds is
P (M) =
(λt)M e−λt
If we get 200 photons per second, then the probability that a second photon will
arrive within 140 μs of the first one is (0.028)e−0.028 ≈ 0.027, so we expect it to happen
200 × 0.027 ≈ 5.5 times a second. Two or more additional photons will arrive within
140 μs of the first one about 0.076 times per second, and three or more additional photons
only 0.0007 times per second, so if we require at least four counts for a valid event, the
false count rate will be one every 20 minutes or so, which is usually acceptable. The
limiting value σmin is then
σmin ≈
8 × 10−22
5 × 1021 ηt
which is about 3 × 10−17 m2 , nearly three orders of magnitude better than the photodiode,
and sufficient to detect a PSL sphere of 0.08 μm (alas for particle counters, the signal
goes as a 6 ). We can use the same Poisson statistics to predict the probability of detection,
that is, P (≥ 4 photons) as a function of σtot .
Besides detector capacitance, such measurements are often limited by the shot noise
of light from the background, from imperfect beam dumps, or from molecular Rayleigh
scatter (as in the blue sky), so that our photon budget is not a sufficient theoretical basis
for the planned measurement. More detail is available in Chapter 10. In addition, we have
here assumed a very simple deterministic model for signal detection in noise; any event
whose nominal detected signal is greater than our threshold is assumed to be detected.
This assumption is unreliable for signals near the threshold, and is dealt with in a bit
more detail in Section 13.6.15. Finally, we assumed that our noise was limited by the
use of a boxcar averaging function of width t, equal to the 3 dB width of the pulse. Even
with a priori knowledge of the arrival time of the particle, this is not the optimal case;
if the particles are moving faster or slower than we anticipate, a fixed averaging window
may be far from optimal. This is the topic of Section 13.8.10.
Example 1.11: Photon Budget for an Astronomical CCD Camera. An astronomical
telescope gathers photons from sources whose brightness cannot be controlled. It also
gathers photons from Earth, due to aurora, meteors, cosmic ray bursts, scattered moonlight, and human artifacts such as street lights. Extraneous celestial sources such as
the zodiacal light and the Milky Way are also important. It represents an interesting
signal detection problem: the signals that it tries to pull out of random noise are themselves random noise. The inherent noisiness of the signal is of less consequence in
the optical and infrared regions than in the radio region. The brightness of sky objects
is quoted in logarithmic relative magnitudes, with respect to standard spectral filters.
A look in Allen’s Astrophysical Quantities (affectionately known as “AQ”) reveals that
in the “visible” (V ) filter band, centered at 540 nm in the green, a very bright star
such as Arcturus or α Centauri has a magnitude mV = 0, and that such an object produces a total flux at the top of the atmosphere of about 3.8 nW/m2 , of which about
80% makes it to the Earth’s surface. A first magnitude star (mV = 1.0) is 100 times
brighter than a sixth magnitude star, which is about the limit of naked-eye detection in a
dark location. A one-magnitude interval thus corresponds to a factor of 1000.2 ≈ 2.512
in brightness.†
Even in a dark location, the night sky is not perfectly dark; its surface brightness in
the V band is about 400 nW/m2 /sr, which at 2.3 eV/photon is about 1 × 1010 photons/s/
m2 /sr. An extended object such as a galaxy will be imaged over several resolution elements of the detector, whereas a point-like object such as a star will ideally be imaged
onto a single detector element. Without adaptive optics, the turbulence of the atmosphere during the (long) measurement time limits the resolution to the size of a seeing
disc of diameter 0.25 (arc seconds) on the best nights, at the best locations, with 1
being more typical, and 3 –5 being not uncommon in backyard observatories. Thus
with enough pixels, even a stellar object produces an extended image, and the SNR
will be limited by the fluctuations in the sky brightness in a single pixel. Each pixel
subtends the same solid angle on the sky, so the mean photoelectron generation rate
per pixel is
ntot = ηA(Jsky + Jstar ),
where A is the area of the telescope objective, η is the quantum efficiency, n is in
electrons/s , and J is the photon angular flux density in photons/s/m2 /sr.
There are two classes of noise source in the measurement: counting statistics of the
sky photoelectrons and of the dark current, which go as (nt)1/2 , and the (fixed) readout
noise qRO . The electrical SNR thus is
(Jstar tηA)2
n2RO + ndark t + tηA(Jstar + Jsky )
CCD pixels have a fixed full well capacity B, ranging from about 5 × 104 electrons for
the ones used in a camcorder to 106 for a scientific device. Thus the maximum exposure
time for a perfect device is equal to eB/ isky , which is on the order of 1 week, so that the
† The
magnitude scale goes back to the ancient Greeks—the numerical factor is weird because the log scale
was bolted on afterwards and tweaked to preserve the classical magnitudes while being reasonably memorable.
well capacity is not a limitation for dim objects. The noise-equivalent photon flux density
(NEJ) is the number of photons per second required to achieve an SNR of 1 (0 dB)
in the measurement time. Setting the SNR in (1.83) to 1 yields a quadratic equation for
Jstar , whose solution is
ndark + ηAJsky
A 100 cm telescope with a 40% central obstruction (due to the secondary mirror) has
an area A = 0.66 m2 , and an angular resolution of about 0.14 arc seconds. A commercially available V filter has a transmittance of about 0.72 at the nominal peak.† Assuming
that the telescope itself has an efficiency of 0.8, due primarily to absorption in uncoated
aluminum primary and secondary mirrors, and is used with a cooled, back-surface illuminated CCD with η = 0.85 and B = 1.8 × 105 , the end-to-end quantum efficiency of
the telescope system is η ≈ 0.5. A good cooled CCD has a dark current of around 1
electron per second per pixel, and rms readout noise of about 5 electrons. With this
1 + 3.3 × 1011 52
+ 2.
This is dominated by readout noise at short exposures. At longer times, dark current fluctuations dominate for very small values of but sky background fluctuations dominate
for larger . The largest SNR improvement comes from increasing integration time, as
shown in Figure 1.11a–d which are pictures of the galaxy M100 taken by Dennis di
Cicco using an 11 inch Schmidt–Cassegrain telescope at f /6.2 and an SBIG ST-6 CCD
With an exposure time of an hour, and a 3 × 3 arc second pixel size (favored by many
amateurs, whose CCD budgets don’t run to as many pixels as the professionals’), Jmin
is 2.2 × 109 , which corresponds to a surface brightness of −28 magnitudes per square
arc second, which is how this is usually quoted in astronomical circles.
If we were merely users of this instrument, that would be that. However, we are the
designers, so we have complete flexibility to trade off the operating parameters. What
should be? For detection of faint objects, we set as large as is consistent with
adequate image resolution, so that the photoelectrons dominate the dark current. For
stellar objects, such as star clusters, we want small, since if all the light from each
star is going on one pixel anyway, shrinking reduces the sky light while keeping the
dark current and signal current the same. For a pixel size of 0.22 arc seconds, Jmin is
1.1 × 1011 , which means that with a 1 hour exposure on a really great seeing night,
we could detect a star of magnitude 25.6 with 3σ confidence. (We’ve left out the noise
contributed by the calibration process, which may be significant if there aren’t enough
calibration frames—see Section 3.9.19.)
Corp, Model PFE −1 Technical Manual .
V. Newberry, The signal to noise connection. CCD Astronomy, Summer 1994. (Copyright © 1994
by Sky Publishing). Reprinted with permission.
† Optec
‡ Michael
Figure 1.11. CCD image of M100: (a) 1 s integration, (b) 10 s integration, (c) 100 s integration,
and (d) 1000 s integration.
Once a photocurrent leaves the detector, the work of the optical system is done and the
maximum SNR has been fixed by the photon budget. The signal processing job is to
avoid screwing this up by needlessly rejecting or distorting the signal, on one hand, or
by admitting noise unnecessarily, on the other. Signal processing operates under physical
constraints, having to do with maximum allowable time delays, and practical ones such
as complexity and cost. Most of the time, the inherently highly parallel data coming
in as light is converted to a serial electrical channel, so that electrical wave filtering is
1.8.1 Analog Signal Processing
The first bit of signal processing is the detector front end amplifier, in which should
be included any summing or subtracting of photocurrents, for example, in a differential
position measurement using a split detector. As we saw in Example 1.10, a bad front
end can hobble the system by adding noise and having too narrow a bandwidth with a
given detector. Most designers are uncomfortable designing front ends, and many wind up
crippling themselves by buying a packaged solution that isn’t appropriate to the problem;
there’s detailed help available in Chapters 3 and 18.
In a baseband (near DC) measurement, the next part is usually filtering and then
digitizing (AC signals will usually need frequency conversion and detection too). The
filter should roll off by at least 6N dB (where N is the number of bits) at the Nyquist
limit of the digitizer (see Section 17.4.4) and should reject strong spurious signals signals
(e.g., 120 Hz from room lights) that would otherwise overwhelm subsequent stages or
require more bits in the digitizer. The bandwidth of the filter should be the same as
that of the signal. Although some of the signal is lost, maximum SNR occurs when the
filter’s frequency response is identical with the signal’s power spectrum. Filters wider
than that allow more noise in than necessary, and narrower ones clip off too much signal.
Some filters have much better time responses than others. Have a look at the Bessel and
equiripple group delay filters of Section 15.8.4. To get the best SNR with pulses, use a
matched filter (Section 13.8.10).
We need to be able to convert time-domain to frequency-domain specifications.
Remember that a 1 s averaging window corresponds to a bandwidth of 0.5 Hz at DC.
The reason is that in the analytic signal picture negative frequencies get folded over
into positive ones. However, that same 1 s window and the same folding, applied to an
AC measurement, gives 0.5 Hz both above and below the (positive frequency) carrier, a
total of 1 Hz. The result is that a baseband (near-DC) measurement has half the noise
bandwidth† of an AC measurement with the same temporal response. The resulting
factor of 3 dB may be confusing.
The digitizing step requires care in making sure that the dynamic range is adequate;
an attractively priced 8 bit digitizer may dominate the noise budget of the whole instru‡
ment. For signals at least
√ a few ADUs in size, an ideal digitizer contributes additive
quantization noise of 1/ 12 ADU to the signal, but real A/Ds may have as much as
several ADUs of technical noise and artifacts (up to ∼100 ADUs for ADCs, see
Section 14.8.3), so check the data sheet. Converter performance will degrade by 1–3
bits’ worth between DC and the Nyquist frequency. Bits lost here cannot be recovered
by postprocessing, so be careful to include a realistic model of digitizer performance in
your conceptual design.§
1.8.2 Postprocessing Strategy
With the current abundance of computing power, most measurements will include a fair
bit of digital postprocessing. This may take the form of simple digital filtering to optimize
the SNR and impulse response of the system, or may be much more complicated, such as
digital phase-locked loops or maximum likelihood estimators of signal properties in the
presence of statistically nonstationary noise and interference. In general, the difference
between a simplistic postprocessing strategy and an optimal one is several decibels; this
† Noise bandwidth is the equivalent width of the power spectrum of the filter (see Section 13.2.5). If we put
white noise into our filter, the noise bandwidth is the output power divided by the input noise power spectral
density (in W/Hz), corrected for the passband insertion loss of the filter.
‡ An ADU (analog-to-digital converter unit) is the amount of signal required to cause a change of 1 in the least
significant bit of the converter.
§ High-end oscilloscopes are a partial exception—they overcome timing skew and slew-dependent nonlinearity
by calibrating the daylights out of themselves. They cost $100k and are only good to 6 bits.
may seem trivial, but remember that a 3 dB improvement in your detection strategy
can sometimes save you half your laser power or 2/3 of the weight of your collection
optics. (As an aside, calling it postprocessing doesn’t mean that it can’t be happening in
real time.)
1.8.3 Putting It All Together
Let’s catch our breath for a moment. We began this chapter with the aim of learning how
to do a complete feasibility calculation for an optical instrument, which we should now
be fully equipped to do. We have covered a lot of ground in a short time, so don’t be
discouraged if it takes a while for it all to slot together. It becomes easier with practice,
and standing around with one or two smart colleagues doing this stuff on a white board
is the most fun of any technical activity. To sum up, a conceptual design goes like this:
1. Write down what you know . Get a handle on the basic physics, figure out which
is the largest effect operating, and estimate how much smaller the next biggest
one is.
2. Think up a measurement principle. This may be routine, or may take a lot of
imagination. Use it to take a rough cut at your photon budget. From that, estimate,
for example, the laser power required, the size of the detection area needed, and
so on.
3. Simplify the analysis. Use limiting cases, but estimate where they will break down.
4. Make a very rough optical design. Choose the NAs, wavelength, working distances,
and so on. Check that it makes physical and economic sense.
5. Guess a detection and signal processing strategy. One might choose a large germanium photodiode operating at zero bias, followed by a somewhat noisy front
end amplifier, a Butterworth lowpass filter, and a fixed threshold. Watch out for
sources of systematic error and drift (e.g., etalon fringes or spectral changes in the
6. Make a detailed photon budget. See if your scheme will do the job, without unrealistically difficult steps.† If so, you have a measurement. If not, try step 5 again.
If no amount of background reduction, low noise circuitry, or signal processing
will help, go back to step 2 and think harder.
7. Check it for blunders. Do it all over again a different way, and using scaling
arguments to make sure the power laws are all correct. If you go ahead with this
instrument, a lot will be riding on the correctness of your calculation—don’t scrimp
on time and effort here. If you have one or two colleagues who are difficult to
convince, try it out on them, to see if they can poke any holes in your logic.
Remember that what will get you is not misplacing a factor of Boltzmann’s constant—
that’s large and obvious—but rather the factors of order 1 that you haven’t thought out
carefully. These are things like using the peak value when the rms is what’s√relevant,
or forgetting that when you multiply two similar peaks together the result is 2 times
narrower, or assuming that the bandwidth of a 1 s averaging window is 1 Hz (see
† As
in the light bulb spectrometer of Example 17.9.
Local Osc
Back Wall
Image of
LO beam
Transmit beam
Scattered Light
Noise Canceler &
Signal Processing
Figure 1.12. The ISICL sensor is an alignment-insensitive scanning interferometer for finding
submicron particles in hostile places such as plasma chambers.
Section 13.2.5). A few of these taken together can put you off by factors of 10 or
more—very embarrassing.
Estimating and controlling systematic errors is one of the high arts of the designer,
since they don’t obey any nice theorems as random errors sometimes do. For now we’ll
just try to keep the optical system simple and the processing modest and linear. Getting
too far away from the measurement data is a ticket to perdition.
Let’s do an extended example that illustrates many of the concepts of this chapter.
Example 1.12: Conceptual Design of a Shot-Noise-Limited, Scanning Interferometer.
The ISICL (in situ coherent lidar) system† detects submicron (>0.25 μm) contaminant
particles in plasma chambers and other semiconductor processing equipment. As shown
in Figure 1.12, it works by scanning a weakly focused laser beam around inside the
chamber through a single window, and detecting the light backscattered from particles.
Backscatter operation is very difficult. Between the strong plasma glow and the stray
laser light bouncing off the back wall of the chamber, it must deal with very bright stray
light—thousands of times worse than that seen by an ordinary dark-field detector, and
around a million times brighter than a nominally detectable particle. Coherent detection
is probably the only feasible measurement strategy. With a coherent detector, a particle
crossing near the focus of the beam gives rise to a short (3 μs) tone burst, whose
duration is the transit time of the beam across the particle and whose carrier frequency
is the Doppler shift from its radial motion. These tone bursts are filtered and amplified,
then compared with a threshold to determine whether a particle event has occurred. Since
† P.
C. D. Hobbs, ISICL: in situ coherent lidar for particle detection in semiconductor processing equipment.
Appl. Opt . 34(9), 1579– 1590 (March 1995). Available at
the phase of the tone burst is random, the highest peak may be positive or negative, so
bipolar thresholds are used.
The sensitivity of an instrument is expressed by the minimum number of photons it
requires in order to detect a particle, which is a function of the confidence level required.
A false alarm rate of 10−11 in the measurement time corresponds to about 1 false count
per day with bipolar thresholds in a 1 MHz bandwidth. From Section 13.6.15, the false
count rate depends only on the measurement bandwidth and the ratio α of the threshold
to the rms noise voltage. A false count rate of 10−11 per inverse bandwidth requires
α = 7.1.
In an AC measurement, the shot noise level is equivalent to a single coherently detected
noise photon in the measurement time, so it might seem that 7 scattered photons would be
enough. Coherent detectors detect the amplitude rather than the intensity of the received
light, however, so to get a signal peak 7.1 times the rms noise current, we actually need
7.12 ≈ 50 photons per burst.
We have a first estimate of how many photons we need, so let’s look at how many
we expect to get. For a particle in the center of the sensitive region, the received power
is the product of the transmit beam flux density times the differential scattering cross
section ∂σ/∂ of the particle, times the detector projected solid angle d . Working in
units of photons is convenient, because the relationship between photon counts and shot
noise means that electrical SNR is numerically equal to the received photon count per
measurement time. This works independently of the signal and filter bandwidth. Initially
we ignore the losses imposed by the matched filter.
For a Gaussian transmit beam of power P at wavelength λ, focused at a numerical
aperture NA, Table 1.1 gives the photon flux density at the beam waist as
J (P , λ, NA) =
2π(NA)2 P λ
Assuming the scattered field is constant over the detector aperture, Table 1.1 predicts
that the effective detector solid angle is
d = π(NA)2
and so the expected number of photons available per second is
n0 =
2π 2 (NA)4 P λ ∂σ
Not all of these will be detected, due to imperfect efficiency of the optics and the
detector. The quantum efficiency η of the detector is the average number of detection
events per incident photon on the detector; it’s always between 0 and 1. (Photodetectors generally give one photoelectron per detection event, so ηn0 is the number of
photoelectrons before any amplification.) A good quality, antireflection coated silicon
photodiode can have η ≈ 0.95 in the red and near IR, but there are in addition front surface reflections from windows and other optical losses. A receiver whose end-to-end
efficiency is over 0.5 is respectable, and anything over 0.8 is very good. A value
of 0.9 can be achieved in systems without too many elements, but not cheaply. (We
should also multiply by the square of the Strehl ratio to account for aberrations; see
Example 9.6.)
The SNR can be traded off against measurement speed, by the choice of scanning
parameters and filter bandwidths. Narrowing the bandwidth improves the time-averaged
SNR but smears pulses out. In a pulsed measurement, the optimal filter is the one that
maximizes the received SNR at the peak of the pulse. For pulses in additive white
Gaussian noise, the optimum filter transfer function is the complex conjugate of the
received pulse spectrum (this filter is not necessarily the best in real life, but it’s an
excellent place to start—see Section 13.8.10). Such a matched filter imposes a 3 dB
signal loss on a Gaussian pulse.† However, the measurement detects threshold crossings, and for the weakest detectable signals,
the peaks just cross the threshold. Thus
this 3 dB is made up by the factor of 2 voltage gain from the peak-to-rms ratio, so
that the minimum detectable number of photons (in the deterministic approximation)
is still α 2 .
We can estimate the stray light in the following way. Assume a detector requiring
50 photons for reliable particle identification, operating in backscatter with a 100 mW
laser at 820 nm and NA = 0.008. Assume further (optimistically) that the back wall of
the chamber is a perfectly diffuse (Lambertian) reflector, so that the light is scattered
into π sr. The incident beam has 4 × 1017 photons/s, so that the backscattered stray light
has average brightness 1.3 × 1017 photons/s/sr; the detector solid angle is π(NA)2 ≈
0.0002 sr, so the total detected signal due to stray light is about 2.6 × 1013 photons per
second. Assuming that a particle takes 3 μs to yield 50 photons (1.7 × 107 photons/s),
the stray light is 106 times more intense than the signal from a nominally detectable
particle. What is worse, the signal from the back wall exhibits speckles, which move
rapidly as the beam is scanned, giving rise to large (order-unity) fluctuations about the
average stray light intensity. The size of the speckles (and hence the bandwidth of the
speckle noise) depends on the distance from the focus to the chamber wall and on the
surface finish of the wall. Peak background signals are generally much larger than this
The Doppler frequency shift in the detected signal due to a particle traveling with
velocity v encountering incident light with wave vector ki and scattering it into a wave
with wave vector ks is
fd = v · (ks − ki )/(2π ).
For a system operating in backscatter, ks − ki ≈ −2ki . At 830 nm, a particle moving
axially at 50 cm/s will give rise to a tone burst whose carrier frequency is about 1.22 MHz.
This is the nominal maximum particle axial velocity for ISICL.
The remaining engineering problems center on the choice of detection bandwidths
and the accurate estimation of the shot noise level in the presence of large signals due
to particles. In the present case, where the Doppler shift may be large compared to the
transit time bandwidth, the peak frequency of the received spectrum of a given pulse is
not known in advance. The optimal signal processing system depends on the range of
particle velocities expected. In quiet plasma chambers, where most particles orbit slowly
within well-defined traps, the maximum expected velocity may be as low as 5 cm/s,
† M.
Skolnik, ed., Radar Handbook , 2nd ed. McGraw-Hill, New York, 1990, pp. 3.21– 3.23.
whereas in an environment such as a rapidly stirred fluid tank or the roughing line of
a vacuum pump, the velocity range may be much greater. The scan speed of the beam
focus through the inspected volume is much higher (about 20 m/s).
With carrier frequencies ranging from 0 to 1.22 MHz, the Doppler bandwidth is much
larger than the transit time bandwidth (150 kHz for a 3 μs burst FWHM), so that it is
inefficient to perform the thresholding operation in a single band. In the present system,
four bands are used to cover the Doppler bandwidth. This is frequently best in low NA
systems like ISICL, where the focus is many wavelengths across.
In a thresholding operation, it is essential to set a high enough threshold that the sensor
does not report erroneously high particle counts, possibly resulting in needless down time
for the processing tool being monitored. At the same time, it is economically important
to use the available laser power as efficiently as possible; at the time, the laser used in
this sensor cost over $1200, so that (loosely speaking) too-high thresholds cost $400 per
decibel. The signal processing strategy is to set separate bipolar thresholds for each band,
using an automatic thresholding circuit. This circuit exploits the accurately known noise
amplitude statistics to servo on the false counts themselves and ignore signal pulses,
however large they may be. In radar applications, this is known as a CFAR (constant
false alarm rate) servo; the technologies employed are quite different, however, since a
radar system can look at a given target many times, and its noise is very non-Gaussian.
The ISICL false alarm rate (FAR) tracker can accurately servo the FAR at a level much
below the true count rate in most applications.
Figure 1.13 shows a typical tone burst from a 0.8 μm diameter PSL sphere, together
with cursors showing the predicted peak-to-peak voltage for a particle passing through
the center of the sensing volume. For a particle exactly in focus, the photon flux predicted
Figure 1.13. Measured tone burst caused by a 0.8 μm polystyrene latex (PSL) sphere crossing
the beam near focus. The horizontal cursors are the predicted peak-to-peak value of the output.
The bandwidth of the signal shown is several times wider than the actual measurement bandwidth,
which prevents distortion of the tone burst but increases the noise background.
by (1.88) is converted to a signal current iAC by (1.68), and to a voltage by multiplying by
the known current to voltage gain (transimpedance) of the front end and any subsequent
amplifiers. Because of the known relationship between the signal size and the shot noise
in a coherent detector, we can check the ratio of the rms noise to the signal easily as
well (the aberration contribution is calculated in Example 9.6). The measured system
parameters are NA = 0.0045, P = 90 mW, λ = 820 nm, η = 0.64. Taking into account
the addition of the noise and signal, the error is less than 20% (1.6 dB electrical),
indicating that the theory correctly predicts the signal size and SNR.
Sources and Illuminators
And God saw the light, that it was good: and God divided the light from the darkness.
—Genesis 1:4 (Authorized Version)
In order to make an optical measurement, we need a light source. In some systems, the
only source required is ambient illumination or the luminosity of the object itself. More
often, though, the instrument must supply the light. Instrument designers are less likely
to neglect the light source than the detector, but still it is often chosen without proper
consideration of the pain and suffering its deficiencies may cause one, or without regard
to the special attributes of an alternative.
This chapter deals with light sources and illumination systems, dwelling on their
strengths and weaknesses in applications, rather than on the physics of how they work.
We stick with the mainstream choices: gas, solid state, and diode lasers, tungsten bulbs,
arc lamps, and LEDs. There are other interesting sources, useful in special cases, many
of which are discussed in the OSA Handbook.
Dye laser catalogs usually have a nice chart showing the electromagnetic spectrum from
the infrared (IR), through the surprisingly narrow visible range, to the ultraviolet (UV),
short UV, and vacuum UV (VUV). Electrodynamics doesn’t alter its character between
regions, but the interaction of light and matter does change systematically with wavelength.
Visible Light
The visible part of the spectrum is conventionally taken to be the octave from 400 to
800 nm. Any measurement that can be done with visible light should be.† A visible light
† There
are rare exceptions, such as IR laser based intraocular measurements, where the use of visible light
would be very uncomfortable, or solar blind UV photomultipliers for seeing dim UV sources in normal daylight.
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
Relative Response
632.8 He-Ne
670 diode
589.3 Na ’D’
514.5 Ar+
488 Ar+
Wavelength (nm)
Figure 2.1. Relative response function of human cone cells versus wavelength. Note the almost
complete overlap of the red and green pigments, leading to good wavelength discrimination in the
yellow and less elsewhere. Note: The blue channel has been normalized to the same height as the
others, but it’s much smaller in reality.
system is enormously easier to align, use, and find parts for than any other kind. If lasers
are involved, it’s also somewhat safer for your eyes, because of the blink reflex and
because you can see the stray beams and block them all before they leave your setup.
The wavelength resolution of human color perception is wildly variable, due to the
pronounced overlap of the sensitivities of the retinal pigments, as shown in Figure 2.1.†
The UV begins at about 400 nm and extends to the border of X-rays at a few nanometers.
Near UV (320–400 nm) is generally safe and convenient, because lenses still work. Deep
UV (180–320 nm) is less safe, and lens materials are generally scarce there and may be
darkened by exposure. The most miserable is vacuum UV (below about 180 nm), where
O2 dissociates to produce ozone, and room air and everything else becomes strongly
absorbing. Beware of insidious skin and corneal burns from bright deep-UV sources.
Ultraviolet photons are energetic enough to cause ground state electronic transitions
in gases and solids, so every material known becomes strongly absorbing somewhere in
the UV. Finding optical materials becomes very difficult around 200 nm, another reason
† The
blue channel is about 10 times less sensitive, so the light-adapted eye has its peak sensitivity at 550 nm,
with its 50% points at 510 and 610 nm.
for using vacuum systems. This same property makes solid state UV detectors difficult
to build; photocathodes are the usual choice in the VUV.
The IR starts at about 800 nm, although since the human eye is nearly blind beyond 700,
the point at which light becomes invisible depends on its brightness. Out to about 2.5 μm
is the near infrared, defined roughly as the region in which glass lenses work and decent
photodiodes are available. From there to about 10 μm is the mid-IR, where radiation
from room temperature objects is still not a critical limitation, and beyond 10 μm almost
to 1 mm is the far-IR, where everything is a light bulb, including the detector itself.
The infrared contains the fundamental vibrational frequencies of molecular bonds
(e.g., C—H), so most of the interesting spectroscopy happens there. Infrared spectral
lines are often combinations of rotational and vibrational transitions, whose frequencies
add and subtract; it is more convenient to quote such systems in terms of frequency than
wavelength. The usual frequency unit is not hertz, but wave numbers,† that is, how many
waves fit in one centimeter. Fortunately, the speed of light is a round number to three
decimal places, so the rough conversion that 1 cm−1 ≈ 30.0 GHz is convenient for rapid
calculation. Interesting spectroscopy happens between 100 cm−1 and about 5000 cm−1
(100 μm to 2 μm). Note the near-total failure of interesting spectroscopy and good
detectors to overlap. Lots of work has gone into that problem.
In order to be able to choose between illuminators, we need some way to compare
them. Measurement and comparison of illumination conditions is the aim of radiometry.
Radiometry is an almost completely isolated subfield of optics. This is a bit strange, since
all designers need to know how much power crosses a given surface element into some
solid angle, in some frequency interval, and most of the time they do. It sounds simple.
It would be simple, too, if it weren’t for the names.
Radiometric nomenclature is important in fields such as remote sensing and architecture, but because it’s such a horrible mishmash, it remains a foreign language to most
instrument designers, engineers as well as physicists. For one thing, the names have no
mnemonic value of their own, and preexisting terms that everybody understands (intensity and brightness) have been redefined to mean something completely different; for
another, there’s an impenetrable thicket of redundant units in various subfields. Ideally,
technical language provides a concise method of communicating difficult concepts precisely, and radiometric and photometric nomenclature is an excellent example of how
not to do that.
We’ll eschew the footcandles per fortnight and stick with something the author can
cope with: Lν (ν, θ ), the power emitted per square meter per steradian per hertz.‡ This is
the most natural description of a low coherence source such as a light bulb. Its official
name is something out of Edgar Allan Poe—spectral radiance —and it has a welter of
† The
official name for the inverse centimeter is the kayser, which nobody uses.
subscript Greek nu is easily confused with subscript v, which means something else, so we write it with
an overbar, ν, instead.
‡ The
other names and symbols. Everything else is an integral of the spectral radiance over one
or more of the quantities in the denominator: frequency, solid angle, area. You can find
charts and tables and explanations of this in several sources. One thing to remember is
that radiometrists like to do things in per-wavelength units instead of the per-frequency
units that most physicists and engineers tend to prefer, leading to mysterious factors of
ν 2 turning up on account of the chain rule.
Unless you need to communicate with diehard radiometrists, and don’t care that few
others will understand, stick with SI units written out in full. Use flux for total power
through a surface, flux density for power per unit area, and brightness or radiance for
flux density into unit solid angle. Watts per square meter per steradian per hertz are
widely understood, internationally standardized, and won’t get you into trouble, even if
you can’t tell a metrelambert from an apostilb.†
Photometry is similar, except squishier: the aim there is to compare illumination
conditions not in W/Hz, but by what a standardized human eye defines as brightness.
Since human eyes have different spectral sensitivities under different lighting conditions
(rods vs. cones), this is a bit fraught, so it is solved by fiat: there are two published
curves for the scotopic (dark adapted) and photopic (light adapted) response of the eye,
and you just multiply Lν by that and integrate over ν to get the visual brightness or
luminance, measured in lumens/m2 /sr. The photopic curve peaks at 550 nm, and is down
50% at 510 and 610 nm. The unit of “photometric power” is the lumen, corresponding
roughly with the watt; for historical reasons 1 W at 552 nm (540 THz) is equivalent to
683 lumens, probably the largest prime number ever used for unit conversion.
The reason we can’t completely ignore all this is that making accurate measurements
of optical power, flux, fluence, and so on is hard , which is why there are people who
specialize in it. A decent DVM costing $200 will measure voltage and current to an
accuracy of 0.1% or so over a dynamic range of 120 dB. Nothing remotely comparable
exists for optical power measurements: ±1% is quite respectable even for narrow signal
level ranges. The skill of radiometry is in making decent measurements, not in expressing
them in units.‡
A basic distinction between sources is whether their outputs are predominantly in a spectral continuum or in isolated spectral lines. Lasers and low pressure gas discharge sources
have narrow line spectra, high pressure discharge tubes (such as arc lamps and flashlamps)
produce a mix of broadened lines and continuum, and incandescent objects produce continuum only. Within each class, there are distinctions as to how wide the spectral features
are. The groupings are somewhat rough-and-ready, but nonetheless useful.
In practical applications, the other main distinction is spatial coherence—basically
how much of your source light you can corral into your optical system, and how small
a spot you can focus it into. Piping laser beams around is easy, because the beam is
so well behaved; all the light can be pointed in the same direction. Other sources are
not so tractable. Large diameter thermal sources cannot be focused down as tightly as
† They’re
the same thing.
of the most heroic feats of radiometric calibration to date is the Cosmic Background Explorer (COBE)
satellite—see Kogut et al., Astrophysical Journal 470(2), 653– 673 (1996).
‡ One
small diameter ones. Since we’re always trying either to get more photons or to reduce
the source power (to save cost, weight, and electric power), extremely low coherence
sources such as large arc lamps or tungsten bulbs present problems in illuminator design.
Black Body Radiators
Electromagnetic radiation from an object in thermal equilibrium with its surroundings
is called black body radiation. Since the closest thing to pure black body radiation is
generated experimentally by putting a small hole into a hot cavity, the name may seem
odd; however, it is the same as the radiation emitted by a purely theoretical object whose
coupling to the electromagnetic field is perfect—every photon hitting it is absorbed. More
subtly, by the second law of thermodynamics, it must have the maximum emittance as
well. If this were not so, a black body would spontaneously heat up or cool down in
isothermal surroundings, since it would absorb every photon incident but emit more or
fewer. Thus a black body forming one wall of an isothermal cavity must emit a radiation
spectrum exactly the same as that which fills the cavity. Since resonator effects in a large
cavity are slight, they can modify the black body’s spectrum only slightly (asymptotically,
not at all), so a black body radiates cavity radiation regardless of where it is placed. This
is a remarkable result.
Such an object would indeed appear black when cold, and many objects have emission spectra that are reasonably well approximated by that of a black body. Among these
approximate black bodies are the Sun and tungsten bulbs. The ratio of the total power
emitted by some real surface at a given wavelength to that predicted from an ideal black
body is known as its spectral emissivity, or emissivity for short.† It depends strongly on
surface composition, angle of incidence, finish, and coatings, but tends to vary slowly
with frequency. The ratio of total power absorbed to total power incident is the absorptivity, but this term is almost never used, since it is identical to the emissivity, by the
same thermodynamic argument we used before. Landau and Lifshitz, Statistical Physics
Part I , present an excellent discussion of this topic. We’re going to do everything in
per-frequency units, for conceptual unity and calculating convenience.
The power spectral density curve of black body radiation peaks at a frequency given
by the Wien displacement law,
hνpeak = 2.8214kB T ,
where, as usual, h is Planck’s constant, ν is frequency, kB is Boltzmann’s constant,‡ and
T is absolute temperature. It has a FWHM of about two octaves, in round figures. If the
spectral density is quoted in per-wavelength units, the peak is slightly shifted toward the
blue. (Why?) Although the position of the peak moves, black body curves from different
temperatures never cross; heating up a black body makes it glow more brightly at all
wavelengths (this is also obvious from the second law, if reflective colored filters are
The formulas for black body radiation tend to be a bit confusing, since different
authors quote formulas for different things: exitance from a semi-infinite black body,
† Emissivity
is formally the average value over some spectral range, but since the emissivity changes slowly
with frequency and is anyway a function of temperature, the distinction is not usually emphasized in real life.
‡ This is one of the few places in this book where Boltzmann’s constant k can be confused with the propagation
constant k , so we need the inelegant subscript B .
cavity energy density per unit frequency, or per unit wavelength, and so on. The most
fundamental is the energy density per hertz of the cavity radiation,
e0 (ν, T ) =
2hn3 ν 3
B T )] − 1}
c3 {exp[hν/(k
where n is the refractive index of the (uniform and isotropic) medium filling the cavity.
There is no special direction inside the cavity, so the energy is equally distributed over
all propagation directions. The rate at which a component with wave vector k leaves the
cavity through a patch of area dA is ck·dA/k, where the vector dA is dA times the
outward directed unit surface normal n̂. Thus there is a cosine dependence on the angle
of incidence θ , and the spectral radiance leaving the cavity is
Lν (ν, T ) =
2hn2 ν 3 cos θ
cos θ e0 (ν, T ) = 2
c {exp[hν/(kB T )] − 1}
which is the most useful all around black body formula. Integrated over all ν, this yields
the Stefan–Boltzmann formula for the total power emitted from unit surface area of a
black body into unit solid angle,
Ptot (T ) = σ n2 T 4 cos θ,
where σ is Stefan’s constant, approximately 1.8047 × 10−8 W/m2 /sr/K4 . A real object
cannot (as we saw) radiate any more than this, and most will radiate quite a bit less; not
everything is black, after all.
The cosine dependence of black body radiation arose, remember, because there was
no special direction. A source with this cosine dependence is said to be Lambertian, and
broad area sources with Lambertian or near-Lambertian angular dependence are said to
be diffuse.
Radiance Conservation and the Second Law of Thermodynamics
In imaging with thermal light, it’s important to keep in mind a fundamental limitation:
no passive system can form an output whose surface radiance is higher than that of the
source. We’ve seen this as conservation of phase space volume. This can be shown
rigorously from Maxwell’s equations, and for paraxial optics it follows from the ABCD
matrix for an imaging system: the image-side NA goes as 1/M. For our purposes we’ll
use a thought experiment and the second law of thermodynamics.
Consider two reservoirs, initially at the same temperature, and well insulated except for
a sufficiently small heat engine (SSHE) running off any temperature difference from Rhot
to Rcold . Drill a hole in each reservoir’s insulation, and place a conceptual highly asymmetric optical system (CHAOS) in between (also suitably insulated). Add well-silvered
baffles so that only light that will make it through the CHAOS gets out of the reservoir
Call the A products looking into each end of the CHAOS Ahot and Acold . Then
the net radiative energy transfer from hot to cold will be
Q̇ =
σ 4
(T Ahot − Tcold
Acold ),
π hot
assuming Lambertian radiation (which should be right, since it’s cavity radiation we’re
If Ahot < Acold , the hot reservoir will spontaneously heat up until Q̇ = 0, which
will happen when Thot /Tcold = (Ahot /Acold )0.25 , and the SSHE will run forever as a
perpetual motion machine. If we allow the CHAOS to contain lossless reflective optical
filters and amend (2.5) appropriately, the same argument shows that radiance conservation
applies for each wavelength independently, not just in aggregate. If the two ends are
immersed in different media, then there’s a factor of n2 that has to be added to Stefan’s
law on each end, and the math comes out the same: there’s no temperature drop created.
Tungsten Bulbs
Tungsten bulbs are excellent optical sources for many purposes. They are quiet, stable,
cheap, broadband, and reasonably bright. Although their radiance (in W/m2 /sr) is far
lower than a laser’s, they can put out a lot of photons cheaply. Their electrical to optical
conversion efficiency is excellent—75% or better—provided that the application does
not require throwing away major fractions of their output.
The primary problem with tungsten bulbs is their relatively short operating life, which
is limited by the evaporation of the tungsten filament, with the subsequent formation
of hot spots, leading to filament breakage. The rate of tungsten evaporation goes as
e−10500/T , while the brightness of the bulb is proportional to T 4 , which makes the
trade-off between lifetime and brightness pretty sharp; you really can’t push tungsten
beyond about 3300 K. This isn’t quite as bad as it sounds, because hot tungsten’s
emissivity is around 0.45 in the visible but only 0.15–0.2 in the IR, so that you get
proportionately more visible photons than you expect, and the bulb color looks about
100 K hotter than it is (maximum color temperature ≈ 3400 K).
The trade-off between lifetime and brightness can be exploited in appropriate circumstances: a tungsten bulb run at 10% higher than rated voltage will have three times
shorter life and about a third higher output. Alternately, running it at 10% lower than
rated voltage will give five times longer life at about 30% less output.
Evaporated tungsten collects on the bulb envelope, which produces progressive dimming with time, as well as some spectral shift. The evaporation rate can be reduced by
reducing the bulb temperature or by increasing the pressure of the gas filling the envelope, which retards the diffusion of tungsten vapor away from its source. Thermal losses
and safety considerations put a practical limit on the gas pressure.
Tungsten–halogen bulbs offer a partial solution to the tungsten loss problem, through a
clever regenerative mechanism that keeps the envelope clean. A bit of iodine is introduced
into a small, very strong quartz envelope around the filament. Evaporating tungsten
combines with the iodine to form tungsten iodide, which has a low boiling point. The
envelope is hot enough to prevent tungsten iodide from condensing on it, so that the
halide stays in the gas phase until it encounters the filament and is redeposited as metallic
tungsten. Unfortunately, the redeposition does not take place selectively on the hot spots,
so the filament does not heal itself. The longer life of halogen bulbs is due to the high
gas pressure inside, which by slowing vapor diffusion does cause selective redeposition.
Running a bulb on AC helps its long-term stability, since electromigration of the
filament metal partly cancels out when the current changes direction frequently.
Temperature variations over a cycle give rise to intensity variations. These aren’t
that small—asymptotically, their power spectrum goes as 1/f 2 (1 pole). Square wave
AC drive helps by keeping the power dissipation constant over a cycle. Ripple in the
dissipated power will cause thermal forcing of mechanical oscillations of the filament.
You can’t readily simulate this, so be sure to test it if you’re planning anything fancy.
The other major problem with tungsten bulbs (as with other low coherence sources)
is how to keep the source radiance as high as possible as the light passes through the
bulb envelope and condenser. More on that soon.
Glow Bulbs and Globars
Occasionally one runs across these odd ceramic light bulbs. They provide nice wide,
uniform diffuse thermal emission in the IR, but are not very bright as the ceramic seldom
goes much above 1000 K. They are primarily useful in FTIR spectrometers. Good ones
have low thermal mass so you can do chopping measurements by pulsing the light source
(see Section 10.9.1).
In various places we’ve already encountered the idea of coherence, which is basically a
measure of how well different parts of an electromagnetic field stay in phase with each
other. It’s time to be a bit more systematic about what we mean by that. It’s a big subject,
though, and is explained well in J. W. Goodman’s Statistical Optics. We’re just going
to dip our toes here.†
Coherence theory is easiest to understand in the context of two famous interferometers:
Young’s slits (actually pinholes) for spatial coherence, and the Michelson interferometer
for temporal. Both of these experiments involve sampling the fields at different points
in space-time, and looking at the time-averaged interference term. The result of the
experiments is a fringe pattern; the visibility of the fringes expresses the degree of
coherence between the fields at the two points. This is an appealingly intuitive definition
of coherence, and what’s more, it is powerful enough for all optical purposes, provided
the averaging time is a small fraction of a cycle of the highest signal frequency we care
Thermal sources such as tungsten bulbs are said to be spatially incoherent; the phase
and amplitude of the light emitted from a point on the surface are independent of that
from points more than a wavelength or so away. As a result, the intensity of the light at
any point in the system can be calculated by finding the intensity due to a single arbitrary
source point, and integrating over the source, without regard for the interference of light
from different points on the source.
The light from a tungsten bulb is also temporally incoherent, meaning that the phase
and amplitude of light from the same source point at some time t is independent of what
it was at t − τ , for sufficiently large τ (several femtoseconds for a 3000 K bulb).
These properties may be substantially modified by spatial- and frequency-selective
filters, condensers, and even free-space propagation. For example, starlight is temporally
as incoherent as tungsten light, but spatially highly coherent, due to the small angular
size of the star as seen from Earth. A nearby star would have an angular size of a few
† This
discussion assumes some familiarity with interferometers. Have a look at Section 1.6, or for more basic
stuff, try Hecht and Zajac or Klein and Furtak.
nanoradians. The angular size of an incoherent (thermal) source determines its spatial
coherence; the divergence of the beam is equal to the angular size of the source. (Why?)
The basic quantity in coherence theory is the complex degree of coherence γ , which
is the normalized statistical cross-correlation of the optical field at two points (P1 , t) and
(P2 , t + τ ),
12 (τ )
γ12 (τ ) ≡ √
11 (0)22 (0)
where 12 is the usual ensemble-averaged statistical cross-correlation (see Section 13.5),
12 (τ ) ≡ ψ(P1 , t)ψ(P2 , t + τ ).
A pure temporal coherence discussion sets P1 = P2 and usually drops the subscripts.
Given a screen with unresolved pinholes at P1 and P2 , the fringe pattern at some point
x on the far side of the screen from the source is
|x − P1 | − |x − P2 |
I (x) = I1 (x) + I2 (x) + 2 I1 (x)I2 (x)Re γ12
Temporal coherence can be increased by a narrowband filter, at the price of throwing
most of the light away; you can make fringes using light with a frequency range ν if
the time delay τ between components is less than 1/ν (if τ > 1/ν you get fringes
in the spectrum instead). The envelope of the fringes in a Michelson interferometer will
decay as the path difference is increased, so that we can define coherence time more
rigorously as the equivalent width of the squared modulus of the fringe envelope,
τc ≡
|γ (τ )|2 dτ.
The coherence length sounds like a spatial parameter but is in fact defined as cτc . The
real spatial parameter is the coherence area, Ac , the area over which a given field’s phase
stays highly correlated. It is analogously defined as the two-dimensional (2D) equivalent
width in the ξ η plane, where P2 − P1 = (ξ, η, 0).
Ac ≡
|γ12 (0)|2 dξ dη.
Young’s fringes also fall off in strength as the pinholes are moved apart, but measurements must be made in the plane of zero path length difference in order that temporal
coherence effects not enter. In practice, ν must be small enough that at least a few
fringes are visible for us to be able to measure the fringe visibility as a function of
pinhole separation, but mathematically this can be swept under the rug.
Coherence theory predicts how these properties change and what their effects are. It
also provides a fairly rigorous conceptual framework in which to describe them. It is a
huge topic, which is really beyond our scope. It is treated well in Goodman and in Born
and Wolf.
Light scattered from a rough surface undergoes wholesale modification of its plane
wave spectrum. A single plane wave gets scattered into a range of angles that is characteristic of the surface. At each angle, the light from different locations interferes
together, producing strongly modulated, random-looking fringes called speckle. Speckle
is a three-dimensional (3D) interference pattern, and the characteristic size of the speckles
in radians gives the characteristic length scale of the roughness in wavelengths. Speckle
is a particular problem in laser-based full-field measurements and when using diffusers
(see Section 5.7.11). Although it’s usually a nuisance, speckle does contain information about the position and shape of the surface; this information can be extracted by
electronic speckle pattern interferometry (ESPI, frequently called TV interferometry).
When low coherence light is scattered from a rough surface, its angular spectrum is
modified in a fairly simple way; usually it just spreads out some more, and the scattered
radiance remains reasonably smooth in both position and angle. The lack of speckle is
a consequence of both spatial and temporal incoherence; in each propagation direction,
temporal incoherence (i.e., wide optical bandwidth with no special phase relationships, as
in short pulses) smears the fringes out in the space domain (due to the path differences),
and spatial incoherence makes the phase of the interference pattern at each wavelength
vary at optical frequencies, so that no fringe contrast can be observed in any reasonable
Imaging Calculations with Partially Coherent Light
Besides the two limiting cases of fully coherent and incoherent sources, there’s a broad
range of partially coherent cases, where the source linewidth and angular subtense are big
enough to notice but too small to dominate. Most of the time, partially coherent imaging
is done by starting with an incoherent source such as a light bulb. There are lots of books
and papers telling how to predict the results of a partially coherent imaging system, but
for most of us, getting through that mathematical machinery is a big time investment.
Another way, less general but conceptually much easier, is to replace the incoherent
source by a collection of monochromatic point sources, calculate the (coherent) image
from each one, and then add all the resulting intensities by integrating over the source
distribution in space and frequency. Integrating the intensities expresses the fact that the
expectation value of the interference term between two incoherent sources (or two parts
of a single one) is 0. Even from a single point, light of two different frequencies produces
fringes that move at optical frequency, averaging to 0 (but see below). The interference
term in (1.67) is thus 0, and the total photocurrent from the detector is the sum of those
from each source point.
If your source is inherently partially coherent (e.g., starlight coming through a turbulent
atmosphere), this procedure breaks down and you have to use the big mathematical guns.
Gotcha: Coherence Fluctuations at Finite Bandwidth
One coherence effect that we’re going to run into again and again is intensity fluctuations
due to the interference of light with a delayed copy of itself. This effect is often overlooked, but it limits the SNR of measurements a lot of the time; it’s normally what sets
the ultimate SNR of laser-based fiber sensors, for example, and Section 19.1.1 has a very
sad story about what can happen when it bites. It’s true that the DC photocurrent from
an interferometer whose path difference is 1/ν will be the sum of the intensities of
the two beams, but that doesn’t mean that the instantaneous noise current is just the sum
of their noises.
The autocorrelation is an ensemble- or long-time-averaged quantity, whereas we’re
actually detecting the instantaneous intensity, including the instantaneous interference
term, with only the short-time averaging due to our finite measurement bandwidth.
Whenever we have two beams adding together, they do interfere, because at any instant
the total energy arriving at the detector goes as the integral of E 2 ; all that happens at
τC is that the DC component of the interference becomes small. So where does the
interference signal go?
Temporal coherence limits usually arise from the gradual building up of phase fluctuations as t grows. When this happens, the interference contribution from each point
doesn’t go away, it just gets turned into noise spread out over a wide bandwidth, about
2/τC . A light bulb has a bandwidth of two octaves, centered around 700 nm, so its temporal bandwidth is 600 THz and its coherence time is 2 fs. It takes a lot of noise power
to fill that up.
Furthermore, since these AC fluctuations are uncorrelated across the surface of an
incoherent source, they average to zero pretty well too if the source and detector are
big enough. On the other hand, if you build an interferometer using a spatially coherent
source of limited temporal coherence, it will give you fits, and this is true whether you
meant it to be an interferometer, or just have a stray fringe or two.
In a single-mode fiber measurement, for example, we get no spatial averaging whatever, so we have to cope with the whole interference term, changed into random noise,
thundering into our detector. If there is no delayed wave to interfere with, it’s no problem,
but since the effect is gigantic and the intrinsic SNR of optical measurements is so high,
it doesn’t take much delayed light to reduce the SNR catastrophically. The detection
problem for FM noise in a system with multiple paths of different delays is treated in
Section 15.5.12; here we’ll just wave our arms a bit.
Consider a Michelson interferometer using a well-collimated, 820 nm diode laser.
It is VHF modulated (see Section 2.14.3) so that ν ≈ 1 THz (λ ≈ 2.2 nm), but
has a single transverse mode. Its spatial coherence is then nearly perfect. Recall from
Section√1.5 that the instantaneous intensity reaching the photodetector varies as I1 + I2 ±
2 cos φ (I1 I2 ) (see Section 1.6), where φ is the instantaneous phase difference between
the two beams. Because φ is constant across the detector, spatial averaging of the fringes
does not reduce the fringe contrast in the detected signal. For t 1/ν, the variance
of φ|φ − φ|2 (2π )2 , so φ modulo 2π is a uniformly distributed random variable.
The noise probability density is therefore that of the cosine (peaked at the edges). The
noise current variance is then 2I1 I2 , spread out over a frequency range from 0 to about
ν. The resulting 1 Hz current noise in as a fraction of the peak interference term is on
the order of
∼√ ,
i1 i2
which can easily dwarf the shot noise. Using our 1 THz wide diode laser example, two
1 mW beams at 820 nm will generate fluctuation noise on the order of 2 nW/Hz1/2
(optical), so that the detected noise will be nearly 40 dB (electrical) greater than the shot
noise. This can be a pretty big shock when you’re expecting the interference signal to
go away completely. The same diode laser with the VHF modulation turned off might
have ν ≈ 10 MHz, in which case the measurement will be much quieter for small
path differences, but much noisier for large ones; for t > 100 ns, in will be more like
0.4 μW/Hz1/2 , an absolutely grotesque figure—the photocurrent noise will be 86 dB
over the shot noise.
Even for path differences well within 1/ν, this effect can be very large; at 1% of the
coherence length (3 μm path difference here), the RMS phase wobble is on the order of
0.01 radian, which in the 1 THz instance would bring the fluctuation noise near but
√ not
below the shot noise level. Note that this noise current grows as i, as signal does, not i as
shot noise does; increasing the laser power will make it relatively worse. We see that even
if ν is huge compared with the measurement bandwidth, these seemingly unimportant
fluctuations may be the dominant noise source. How’s that for an insidious gotcha?
One final addendum: it is not unusual for a very small-amplitude side mode in a
diode laser to have very significant mode partition noise, and it gets out of phase with
the carrier really easily in the presence of dispersion.
Measuring Laser Noise in Practice
Okay, so there are lots of noise sources to worry about. How do we measure the actual
noise? In two steps: first, calibrate the detection setup; and second, measure the noise.
For calibration, take whatever setup you have and shine a battery-powered,
incandescent-bulb flashlight on it. Move the flashlight in and out to make the
photocurrent roughly the same as in your laser measurement. This will give you a
beautifully calibrated√white noise source (the shot noise of the photocurrent) of spectral
density iN (1 Hz) = 2eIDC , delivered right to your photodiode. Measuring the noise
level in the dark (circuit noise) and with the flashlight (circuit noise + shot noise) will
give you the gain versus frequency and noise versus frequency of your measurement
system. This will allow you to compare the noise of the laser to the shot noise.
To do the actual measurement, the best bet is a spectrum analyzer. You can do this
with a fancy digitizing scope, by taking the discrete Fourier transform of the output,
squaring it, and averaging many runs together. (In a DFT power spectrum with only
one run, the standard deviation is equal to the mean, so to get good results you have
to average quite a few spectra or take the RMS sum over many adjacent frequency
bins—see Section 17.5).
Alternatively, you can get a filter whose bandwidth coincides with your measurement
bandwidth, and just measure the total power that gets through it, using a sufficiently fast
RMS-DC converter (the author often uses an old HP3400A RMS voltmeter, which has
a 10 MHz bandwidth). The noise power is the time-averaged total power of the signal,
minus the DC power. Since the noise and the DC are uncorrelated, it doesn’t matter
whether you subtract the DC before or after taking the RMS. You can’t do a good job
with just a scope and no filter, because you won’t know the bandwidth accurately enough.
Really work at relating the laser noise to the shot noise, because if you don’t, you
won’t know how well or badly you’re doing.
Light-emitting diodes (LEDs) are fairly narrowband continuum sources, which are nowadays ubiquitous because of their low cost and good properties. Their lifetimes are so long
(hundreds of thousands of hours, without abuse) that you’ll probably never wear one out.
Their other properties are intermediate between those of tungsten bulbs and lasers; their
spectral width is about 5–20% of their center frequency (say, 50–100 nm full width for
a 600 nm LED) and their emitting areas are smallish (100 μm diameter or so). Their
moderately wide bandwidth means that their temporal coherence length is extremely
small, about 3–10 microns, which is often very useful where some spectral information
is needed but etalon fringes must be avoided.
LEDs are available in a wide range of colors, from the near IR (900–1100 nm), where
they are used for remote control, infrared LANs, and multimode fiber connections, to the
ultraviolet (350 nm). Some LEDs have more than one spectral peak, so if this matters,
make sure you measure yours.
Red LEDs are the most efficient, but the other wavelengths are catching up fast.
Already the best LEDs are more efficient than tungsten bulbs, and their efficiencies
continue to climb. White LEDs are really a species of fluorescent bulb—a UV LED
excites a fluor in the package, and it’s the fluor that limits the coherence and lifetime.
There are a few good reasons for the historical inefficiency of LEDs. The efficiency of
generating photons from carriers is reasonably good, approaching 100% in direct bandgap
devices. Unfortunately, these photons are generated deep inside a highly absorbing semiconductor material with a refractive index of about 3.3. This hurts in two ways: the
semiconductor absorbs the light, and the critical angle for light exiting the semiconductor into air is arcsin(1/3.3), or about 18◦ , so that the solid angle through which photons
can exit is only 0.3 steradians, or about 2.3% of a sphere. Fresnel losses on reflection
limit the actual extraction efficiency of uniformly distributed light to below 2%. If the
substrate is absorbing, the totally internally reflected photons will mostly be absorbed
before getting another chance. The absorbing substrate problem is addressed by building heterojunction LED with transparent top layers (and newer ones have transparent
substrates as well, with only the junction region being absorbing). The total internal
reflection problem is helped by putting the LEDs in a plastic package (n ≈ 1.5), which
more than doubles the extraction efficiency, and by roughening the surfaces, which gives
the light lots of chances to escape.
An LED’s intermediate source size means that its spatial coherence is not very good
compared with a single-mode laser, but is still better than you can do with a tungsten
bulb. Again, this is usually a virtue except in interferometry.
Unlike nearly any other source, LEDs can have their intensities varied continuously
over a huge range (from 0 to their maximum rated power) with little or no spectral shift,
by merely changing their drive current. Spectral shifts remain small until maximum
power is approached, at which point most LEDs shift slightly to the red,† due to high
die temperatures (LED wavelengths tend to drift at about +100 ppm/K) and to high
level injection effects, which narrow the effective bandgap. This throttling capability is
of great value in extending the dynamic range of really low cost sensors.
Ordinary LEDs are fairly slow compared with laser diodes, being limited by carrier
lifetime rather than stimulated emission. You can modulate an LED at a few megahertz,
but garden variety ones don’t go faster than that. IR LEDs intended for communications
can go a bit faster, up to about 100 MHz.
† That
is, toward longer wavelengths, not necessarily closer to 630 nm. This is really visible with yellow LEDs,
which turn orange at very high drive currents.
The main problem with LEDs is their packaging. For efficiency reasons, nearly all
LEDs come in molded plastic packages with cylindrical sides and very steep lenses on
the top surface. The optical quality is poor and the die centration erratic. Furthermore, the
wide angle of illumination of the sides leads to lots of internal reflections in the package.
Together, these defects produce far-field patterns that look like a cheap flashlight’s. The
plastic package also limits the LED’s power dissipation, because its thermal resistance
is very high, so that the heat has to travel down the leads. For applications where lower
spatial coherence is desirable, an integrating sphere will produce a nice Lambertian
distribution, and even a ground glass, 3M Magic Invisible Tape, or holographic diffuser
will help considerably. Some LEDs have glass powder mixed with the plastic, producing
a frosted effect with better uniformity but lower efficiency and wider angular spread.
If higher spatial coherence is needed, consider using a surface-mount LED with a
flat surface. Another way is to use a laser diode below threshold as a spatially coherent
LED. The spatial characteristics are lithographically defined and the windows are good,
so their beam qualities are excellent. Their coherence length is short and depends on how
hard you drive them (it gets longer as threshold is approached), but is generally shorter
than in multimode lasing operation, and of course very much shorter than in single-mode
operation. Laser diodes are more expensive and (below threshold) won’t produce as much
total power as a bright LED. Compared to the work of removing the encapsulant from
an LED or using LED chips and wire bonds, a diode laser below threshold is a bargain,
unless you need more than a few hundred microwatts of power.
Superluminescent Diodes
The laser-diode-below-threshold idea can be extended to higher gains by removing the
regeneration (e.g., by AR-coating one or both facets) or by gouging the back facet with
a scriber. The resulting device has stimulated emission but little or no regeneration, and
is called a superluminescent diode (SLD). Its coherence length is short, although not as
short as a LED’s, and it is a bit less sensitive to back-reflections than a diode laser. SLD
linewidths are a few percent of the center wavelength.
SLDs are far from immune to back-reflection, however; residual resonator effects
usually leave ugly peaks and valleys in their spectra, so the fringe visibility doesn’t
usually wash out to 0 as smoothly or as quickly as one would like. If you’re using SLDs
with fiber pigtails, make sure you measure the spectrum after the pigtail is attached, and
generally be suspicious.
Commercial SLDs (sometimes confusingly called SLEDs) are available with output
powers up to 10 mW or so. They’re made with AR coatings and have to be run at
very high current density due to the lack of regeneration. This makes them even more
sensitive to optical feedback than the homemade ones. A perpendicularly cleaved fiber
end is enough to make them lase, which (due to the high level of pumping) will usually
blow them up. Their modulation bandwidths are more like LEDs’ than laser diodes’
(∼10 MHz).
Amplified Spontaneous Emission (ASE) Devices
Optical amplifiers all have spontaneous emission. Leaving out the input signal and cranking up the pump power causes the spontaneous emission to be amplified like any other
signal, producing an ASE source. Commercial ones produce tens of milliwatts from
ytterbium (1000–1100 nm), neodymium (1060 nm), or erbium (1560 ± 40 nm), but are
expensive. Their center wavelengths can vary a lot depending on the fiber material and
the emission bands selected, and their linewidths are a few percent, becoming narrower
at higher gains. A better behaved ASE device is the frequency-shifted feedback laser,
with an acousto-optic cell in the cavity. These permit multipass gain while preventing
normal lasing, because the N th pass gets shifted by 2N times the acoustic frequency.
High Pressure arc Lamps
Thermal sources cannot exceed the brightness of a black body at the same temperature.
The evaporation rate of the most refractory metals limit filament temperatures to below
3500 K, so plasmas are the only way to get brighter thermal sources. There are several
types of plasma sources, divided into high and low pressure plasmas, either continuous
(arc lamps) or pulsed (flashlamps).
High pressure devices are much brighter than low pressure ones. Although they are
thermal sources, they are far from thermodynamic equilibrium, so their spectra are usually dominated by strong emission lines corresponding to electronic transitions in the
component gases. These lines are very broad (tens of nanometers typically) due to collisional broadening from the high pressure of hot gas. There is also a weaker black body
continuum from the plasma, and still weaker, lower temperature thermal radiation from
the hot electrodes.
Arc lamps are available with a variety of fill gases, ranging from hydrogen to xenon,
including especially mercury/argon and sodium/argon. Mercury arcs are the most efficient,
but produce most of their output in the UV (especially near 254 nm), and their visible
output is very green due to a strong line at 546 nm. Sodium vapor is nearly as efficient as
mercury, but is very orange (590 nm). Xenon arcs appear white to the eye, which makes
them popular for viewing applications. Hydrogen and deuterium lamps produce a strong
continuum in the UV, with D2 units going far into the vacuum UV, limited mainly by
the windows (for VUV operation, they are used windowless, with differential pumping).
The emission lines get broader and weaker, and the continuum correspondingly
stronger, as the gas pressure is increased; extremely high pressure arcs (200 atmospheres)
can make even mercury vapor look almost like a 6000 K light bulb.
The plasmas are inherently unstable, so the spectrum and power output fluctuate with
time in a 1/f fashion, with ±10% power variations being common. There are variations
caused by strong temperature gradients in the tube, causing big changes with time and
warmup. The pressure increases with temperature, which changes the broadening. The
position of the arc is unstable with time as well, at the level of 0.1 to a few millimeters,
which will cause the light from your condenser to wander around. In some bulb types,
the spot moves erratically at a speed of 50 m/s or faster, leading to lots of noise in the
kilohertz range. They are a great deal noisier than tungsten or LEDs, and their noise has
a spatially dependent character that makes it hard to correct for accurately.
Commercial arc lamps are divided into short arc and long arc styles. The length of the
arc partly governs the source size, and thus the spatial coherence of the light: you can do a
better job of collimating the light from a short arc lamp. They come in two main package
types: squat ceramic and metal cylinders with quartz windows and integral reflectors, and
long quartz tubes with a bulge in the middle. The ceramic type is much easier to use
but has a few idiosyncrasies. Most such lamps have an arc formed perpendicular to
the window, so that the cathode gets in the way of the beam. The package contains a
parabolic reflector, which roughly collimates the light reflected from it (typical angular
spreads are ±1◦ to ±3◦ , with the spread increasing as the bulb ages). Unreflected light
floods out into a wide angle; the illumination function is thus rather strange looking, with
a strong doughnut-shaped central lobe and a weaker diffuse component. On the other
hand, the arc and reflector come prealigned, which substantially simplifies changing the
EG&G sells ceramic arc and flashlamps that circumvent this problem, by arranging
the electrodes transversely (so that the current flow in the arc is parallel to the window
and the electrodes are not in the way), and using a spherical reflector with the arc near
its center of curvature. All the light therefore emanates from the arc, and both the strange
pupil functions are avoided. You still have to collimate the light yourself, however. One
choice will probably be better than the others in your application.
A flashlamp or flashtube is a pulsed arc lamp, usually filled with xenon or krypton, and
is used where very bright, hot, short-duration illumination is needed. Flashlamps can
reach source temperatures of over 104 K, making them instantaneously about 100 times
brighter than a tungsten source of the same size. When used at high power, their plasmas
are optically thick (i.e., highly absorbing† ), so that their radiation is dominated by the
continuum component. At lower pulse energies, the plasma is cooler (closer to 5000 K
than 10,000 K), and the line spectrum is more pronounced.
Flashlamps are powered by capacitor discharge; you charge up a special capacitor‡
that is connected to the lamp through a very small inductor (which may be just the leads).
When the capacitor is fully charged, you trigger the arc, either with a tickler coil, wound
round the cathode end of the tube and driven with a 10 kV pulse, or by momentarily
increasing the voltage across the tube with a pulse transformer.
Once the arc gets going, it discharges the capacitor in a time controlled by the
impedance of the arc, wiring, and capacitor parasitic inductance and resistance (including
dielectric losses), until the voltage drops below that needed to keep the arc going. This
variable excitation means that the light intensity is strongly peaked with time; unless you
do something special, there’s no flat top to the light pulse, regardless of its length. In fact,
the current can easily oscillate due to LC resonance in the capacitor and leads. Flashlamp
manufacturers publish data allowing you to design a critically damped network, so that
the total pulse width is minimized for a given tube, capacitor, and drive energy.
The total duration is usually between 100 μs and 10 ms, although active controllers
(using big power MOSFETs) can make it as short as 1 μs. Pulse energy is limited by
explosion of the bulb and ablation of the electrodes. Getting long life (105 –107 flashes)
requires operating the bulb a factor of 6–10 below the explosion energy. Tubes with
explosion energies between 5 J and 10 kJ are readily available.
Peak power is limited by how fast you can push that much energy into the bulb, and
average power is limited by cooling. Flashlamps can work at kilohertz repetition rates if
the pulses are weak enough. There is considerable variation in the pulses; a well-designed
flashlamp supply with a new bulb may achieve 0.1% rms pulse-to-pulse variation, but
† It
is perfectly correct from a thermodynamic point of view to call the central plasma core of a 1 kJ flashlamp
“black.” Try it some time.
‡ Capacitors intended for rapid discharge have very low internal inductance and resistance, to improve their
discharge time constants and avoid dumping lots of heat into the capacitor itself.
typical ones are more like 1%. They have the other noise characteristics of regular arc
lamps, for example, strong spatial variations, wandering arcs, and lots of 1/f noise. Their
wall plug efficiency can be over 50% when integrated over all ν, but is more typically
Flash initiation depends on the generation of the first few ions in the gas, which
is inherently jittery compared with laser pulses, for example. Typical jitters are in the
hundreds of nanoseconds and are a strong function of the bulb design. Jitter can be
reduced with a “simmer” circuit, which maintains a low current arc between flashes, or
a pretrigger that initiates a weak arc slightly before the main pulse.
Flashlamps and arc lamps with concentric reflectors are intrinsically more resistant to
source noise from arc wander, because the reflector projects an inverted image back onto
the arc. If the arc moves to the left, the image moves to the right. For an optically thick
plasma, the effect can even help stability a little; if the left side gets hotter, its image
dumps heat into the right side.
Spark and Avalanche Sources
You can get 400 ps rise time from a spark in a mercury switch capsule.† If subnanosecond
speed is enough, and you don’t need narrow linewidth or high brightness, this can be a
convenient and cheap alternative to a pulsed laser. The sharp current pulse will couple
into everything in the neighborhood, so careful electrical shielding will be needed. Spark
sources have all the jitter and pulse-to-pulse variation problems of flashlamps, and since
the rapid breakdown is what we’re after, they can’t easily be fixed with simmer circuits
and so on.
Low Pressure Discharges
We are all familiar with low pressure gas discharge tubes, such as Geissler tubes (beloved
of undergraduate optics labs) and ordinary domestic fluorescent bulbs. Low pressure discharges produce a line spectrum by electrically exciting gas molecules and letting their
excited states decay by radiating photons. The positions of the spectral lines are characteristic of the molecular species doing the radiating, while the line strengths and linewidths
are more strongly affected by experimental conditions such as pumping strategy, temperature, pressure, and other gases present.
The gas fill is often a mixture of gases to improve arc characteristics and aid energy
transfer to the desired excited state of the radiating molecule, for example, argon in
mercury lamps and helium in neon bulbs. Argon is a particularly popular choice; besides
being cheap and inert, it has a low first ionization potential, so that arcs are easily struck.
The striking voltage depends on pressure, with a minimum at about 1 torr, rising at high
vacuum and at high pressure.‡
Their fairly narrow linewidths (moderate temporal coherence) and low spatial coherence make low pressure lamps useful for qualitative imaging interferometers. The most
† Q. A. Kerns, F. A. Kirsten, and G. C. Cox, Generator of nanosecond light pulse for phototube testing. Rev.
Sci. Instrum. 30, 31–36 (January 1959).
‡ M. J. Druyvesteyn and F. M. Penning, Rev. Mod. Phys. 12, 87 (1940).
common example is preliminary testing of optical surfaces via Newton’s rings and Fizeau
fringes (see Section 12.6.2).
The exceptions to the line spectrum rule are low pressure hydrogen and deuterium
lamps, which emit a bright continuum extending into the VUV. Deuterium lamps emit
almost no visible light at all, which is helpful in UV experiments because the filters
and gratings that would otherwise be needed are often highly absorbing. These arcs are
often run at extremely low pressure, especially deuterium, where the interior of the bulb
connects immediately with the vacuum system, and deuterium leakage is controlled only
by differential pumping.
Radiometry of Imaging Systems
We saw in Section 2.4.1 that the product of area and projected solid angle is invariant
under magnification and Fourier transforming, so that in the absence of vignetting or
loss, the source and image radiance are the same, and the average radiance is the same
in the pupil as well. Unfortunately, it is not easy to avoid vignetting when the source
radiates into 4π steradians.
It is a common error to assume that a higher power bulb will get you more photons
for your measurement. Thermodynamics dictates that no point in the optical system can
have a higher radiance than that of the source (i.e., the arc or filament). Since filament
and arc temperatures are lifetime-limited, this means that there is an absolute upper limit
to how bright you can make your illumination with thermal sources. A higher wattage
bulb is merely larger, not brighter. Accordingly, upgrading the bulb will produce a larger
illuminated area or aperture, but the same radiance. If that’s what you need, a higher
wattage bulb can help. If not, you’ll have to improve your condenser and collection
optics instead, or switch to laser light.
The Condenser Problem
The challenge in illumination systems is to achieve the characteristics needed for good
measurements: constant angular distribution, stability in intensity and polarization, and
uniformity of brightness and spectrum across the field. Ideally it should also use the
available source photons reasonably efficiently. Optical systems using tungsten illumination are almost always extremely inefficient with photons, because being efficient is
tough. It isn’t usually worth the trouble, either, because the equipment is mains powered
and tungsten bulb photons are very cheap,† but other illuminators don’t have this luxury.
As shown in Figure 2.2, the simplest kind of condenser is an imaging system, basically
a lens corralling the source photons and pointing them toward the sample. Trying to get
more photons will involve increasing the solid angle of the light collected (i.e., the NA
of the condenser on the bulb side).
The envelopes of some kinds of bulbs (especially tungsten halogen) are serious impediments to illuminator design. If the optical quality of the bulb envelope is good enough,
it may be possible to use the reflector to put the image of the filament right next to the
real filament, which will improve the illuminated area and hence the efficiency.
† About
$10−23 each.
Heat Absorbing
Figure 2.2. A typical condenser is a low quality imaging system with care taken in controlling
thermal loads.
On the other hand, an irregularly shaped bulb envelope, or one with a poorly designed
internal reflector, makes it nearly impossible to approach the thermodynamic limit for
light coupling. Most fall into this category.
Example 2.1: Fiber Illuminators. Getting the highest radiance from a tungsten bulb
may not always involve a condenser. For fiber bundle coupling, just image the filament
on the bundle, using an NA (on the bundle side of the lens) high enough to overfill the
fiber NA. The rest of the housing can just toss light back into the filament to reduce the
electrical power required. If the bundle has a very high NA (some are 0.5), it may not be
easy to get lenses like that. Figure 2.3 shows ways of dealing with this problem, using
an array of lenses, a Fresnel lens, or a fast spherical mirror. With the bulb just off to
one side of the center of curvature and the bundle at the other side, this does a good job
of filling the bundle NA and diameter. The mirror doesn’t have to be too good; there’s
no reason the image quality can’t be reasonable, since both source and bundle are near
the center of curvature. The mirror can have a cold mirror coating (which passes IR and
reflects visible light) to reduce the heat loading of the fibers. (Pretty soon we’ll just stick
a white LED at the business end and get rid of the bundle entirely.)
Figure 2.3. Condensers for fiber bundles: a simple lens or a spherical cold mirror.
Most people are better off buying condensers when they need them, since condenser
design is a somewhat specialized art, and combination lamp housings and condensers are
widely available commercially. If you design your own, take apart a couple of different
commercial ones to see the tricks they use. Be sure to remember the heat absorbing glass,
which is a tempered filter that dissipates the heat that would otherwise toast your optical
system or your sample.
Lasers have an immense literature, and come in a bewildering variety. We’ll concentrate
on the properties of the ones most commonly seen,† especially diode lasers. See Siegman‡
for much more theoretical and pedagogical detail.
Lasers rely on an external energy source, the pump, to provide a population inversion and hence laser gain. Oscillation requires a minimum pump power, known as the
threshold . How many times above threshold the pumping is determines how rapidly the
oscillation builds up, among many other parameters. For instruments, the distinctions
between laser types center on wavelength, tunability, power level, pumping source, pulse
width, and linewidth.
Mode Structure
A laser is a resonant cavity with a gain medium inside, and some way of coupling the
light in the cavity to the outside, usually a partially transparent cavity mirror, the output
coupler. The resonant condition requires that the cavity be carefully aligned, which is a
fiddly process if it’s user adjustable.
Cavities always have more than one mode, so that the laser’s spatial and temporal
behavior will be a superposition of cavity modes. Real lasers have complicated mode
properties, but we can do a good job by decomposing the fields in terms of the modes of
an unperturbed, empty resonator. These break down into the transverse modes, which are
well described by Gaussian beams, and longitudinal modes, which are like the resonances
of an organ pipe. All will have slightly different frequencies in general, constrained by
the limits of the gain curve of the laser medium. For our purposes, transverse modes
higher than TEM00 are unwanted, but fortunately they are usually easily suppressed, so
that most commercial lasers have a single transverse mode.
The longitudinal modes are defined by the requirement that the round-trip phase in
the laser resonator should be a multiple of 2π radians, so that multiple bounces will
give contributions that add in phase and so produce a large total field. Accordingly,
longitudinal modes are spaced by very nearly ν = c/(2n), where 2 is the round-trip
distance (twice the cavity length) and n is as usual the refractive index of the material
filling the cavity. For a large frame gas laser, ν is on the order of 200 MHz, but it ranges
up to ∼150 GHz for a cleaved cavity diode laser and 1 THz for a VCSEL. Diffraction
effects and the pulling of the resonant frequency by the gain slope of the laser medium and
coatings often prevent the longitudinal modes from being exact harmonics of each other,
† The
most important neglected types are dye lasers. These were once very important for lab systems but have
been eclipsed lately, except for those few applications requiring wide tunability in the visible.
‡ Anthony E. Siegman, Lasers. University Science Books, Mill Valley, CA 1986.
which produces some noise and spurious signals in the detected photocurrent. Coupling
between the modes will make them lock to each other if these effects are small enough.
Simplified laser theory suggests that a single longitudinal mode should always dominate, with a linewidth that narrows enormously for pumping rates well above threshold.
In practice, multiple longitudinal modes are more commonly found, and pumping harder
makes it worse. For interferometric applications, multiple modes are undesirable, because
they cause the coherence length to be much shorter than it needs to be and produce noise
and sensitivity variations with a periodicity of twice the cavity length. These applications
often need single longitudinal mode (single-frequency) lasers.
A laser may even oscillate in more than one spectral line at once; this is suppressed
by using a dispersing prism or grating as part of the cavity, so the cavity can only be
aligned for one line at a time.
The equal spacing of these modes means that their configuration should be periodic
with time, like an ideal piano string.
Relaxation Oscillation
Lasers work by running electrons through a loop consisting of three or four quantum
transitions. Each of these transitions has its own time constant, which as any circuits
person knows, can lead to instability due to accumulated phase shifts. Good lasers have
one dominant time constant, namely, the long lifetime of the upper state of the laser
transition. The other time constants lead to excess phase shift, which generally causes
ringing in the laser output and level populations whenever a sharp change in pumping
occurs, just like a feedback amplifier with too small a phase margin. Due to the historical
fact that lasers weren’t developed by circuits people, this ringing is miscalled relaxation
oscillation. It isn’t really oscillation, but it does limit the modulation response of the
laser, and it causes excess noise near the peak—generally about 1 GHz for cleaved
cavity diode lasers and more like 10 GHz for VCSELs. Diodes and solid state lasers
show much stronger relaxation oscillations than gas lasers in general.
Gas lasers cover the entire visible, UV, and infrared range in isolated lines with very
narrow tuning ranges. Typical examples roughly in decreasing order of usability are
helium–neon (HeNe, 632.8 nm), argon ion (488 and 514.5 nm), helium–cadmium (HeCd,
442 and 325 nm), carbon dioxide (CW and pulsed IR at 10.6 μm), nitrogen (pulsed,
337 nm), and excimer (pulsed deep UV). Except for HeNe and CO2 , these are really suitable only for lab use or specialized applications such as laser light shows and medicine.
Gas lasers are big, bulky, and fairly noisy (0.5–2% intensity noise).
Diode lasers are very popular just now because they’re cheap, small, and mechanically
rugged. Don’t despise HeNe’s, however—they are much less sensitive to optical feedback, are very frequency stable, have long coherence lengths (300 m for single-frequency
units), are really well collimated, and work as soon as you plug them in. A HeNe is just
the right medicine for a lot of ills. Besides, they come with their own power supplies and
are totally invulnerable to electrostatic discharge (ESD) damage—if you don’t smash it,
it’ll always just work. Low gain gas lasers such as HeNe’s have no spatial side modes
to worry about, so there aren’t too many really weird sources of noise, apart from the
occasional baseband mode beat (see Section 2.13.7).
HeCd lasers are very noisy, and N2 lasers cannot be run CW due to bottlenecks in
their level transition rates (you get 10 ns pulses no matter what you do—an example of
relaxation oscillations that really oscillate). Excimer lasers are used for laser ablation of
materials and are also widely used for semiconductor lithography. Their spatial coherence
is very low; although their output may be below 200 nm, you can’t focus their output
more tightly than a millimeter or so.
Some gas lasers can be made single-frequency, either by reducing the gain and carefully tuning the cavity length (HeNe, P < 1 mW), or by putting a Fabry–Perot etalon
(see Section 1.6.2) in the cavity (ion lasers). This is effective but painful, requiring
temperature stabilization of one or both resonators to keep the tuning sufficiently stable.
Gas lasers are usually pumped by running an electric arc through the gas fill in a
sealed tube with mirrors or Brewster windows fused to its ends. They are astonishingly
inefficient, due to short upper state lifetimes and lots of deexcitation paths. For example,
an Ar-ion laser is doing well to have a wall plug efficiency (laser emission/electrical
input) of 0.02%, which leads to big AC supplies and water or fan cooling. The main
exception is CO2 lasers, which can hit 25% wall plug efficiency. All those Teslaesque
features cause severe low frequency noise in most gas lasers, with the high power ones
being worse.
Grating tuned gas lasers such as ion and metal vapour units usually have a diffraction
grating or Littrow prism inside their cavities to select a single line, although multi-line
units do exist. Getting a laser aligned is sometimes a painful process; it’s discussed in
Section 12.9.8. Apart from HeNe’s and sealed CO2 s, gas lasers need significant amounts
of tender loving care; if you’re going to use one, learn how to clean and maintain the
mirrors and windows properly, and if there’s someone available who can show you how
to align it, ask.
Solid state lasers are based on electronic transitions in impurity sites in solid crystals or
glasses. Typical examples are Nd:YAG (1.06 μm), ruby (694 nm, pulsed), Alexandrite
(700–820 nm), Ti:sapphire (0.66–1.2 μm, femtosecond pulses possible), and color center
(0.8–4 μm, widely tunable).
Solid state lasers have better characteristics than gas lasers in general, and are much
more flexible. Their efficiencies are limited by the pumping strategy employed; diode
laser pumped Nd:YAGs have very respectable wall plug efficiencies (in the percents). The
host crystals change both the center wavelengths and the strengths of the emission bands,
so that the laser wavelength of a given ion will move by tens of nanometers depending
on the host. Ruby (Cr3+ :Al2 O3 ) was the first laser of all, and ruby units are still sold,
though you’ll probably never use one. Neodymium YAG (Nd3+ ion in yttrium aluminum
garnet) lasers are much more common, because they have really good performance over
a wide range of powers and pulse widths, from CW to the tens of picoseconds. The very
long upper state lifetime of a YAG laser (250 μs) makes it efficient, and also allows
it to store lots of energy. Powerful pulses can be achieved using Q-switching, where
the cavity resonance is spoilt while the upper state population builds up, and is then
rapidly restored (usually with an acousto-optic or electro-optic cell, but sometimes with
a bleachable dye in low energy units). The cavity finds itself way above threshold and
very rapidly emits a huge pulse, typically 1–10 ns wide.
A somewhat less well-behaved pulse can be obtained by letting the cavity build up a
big circulating power, and then using the AO or EO cell to allow this energy to exit the
cavity in one round-trip time, a technique called cavity dumping.
Pulsed YAGs are usually pumped with flashlamps, whose pulse width is a good
match to the upper state lifetime. Efficiency suffers because the pump bands (the spectral
regions in which pump light can be converted to laser light) are not well matched to
the flashlamp spectrum. CW YAGs are nowadays usually pumped with 808 nm diode
lasers. Single-longitudinal-mode diode-pumped YAGs are highly efficient, can be pretty
quiet, too—the best ones are 10–30 dB above the shot noise at moderate modulation
frequencies, and even closer above 20 MHz or so.
YAG lasers are often frequency doubled, yielding green light at 532 nm. These are
familiar as low quality green laser pointers, but with a bit more work, A diode-pumped,
frequency doubled, single-frequency YAG laser is a very attractive source: intense, efficient, quiet, and visible. High peak power units can be tripled or quadrupled into the
UV. Doubling has to be done inside the cavity to get decent efficiency, and there are a
number of sources of potential instability in the process.
Other neodymium-doped crystals are Nd:YV03 (yttrium vanadate) and Nd:YLF
(yttrium lithium fluoride). Both have better thermal behavior than YAG—they’re
birefringent, which reduces losses due to thermal stress-induced birefringence (just as in
PM fiber), and both have low dn/dT , which reduces thermal lensing and so improves
beam quality. Vanadate lasers are also easier to run on the weaker 914 and 1340 nm
lines. Neodymium-glass lasers can produce higher pulse energy than YAG, but the
very low thermal conductivity of glass compared with YAG or vanadate severely limits
the repetition rate, so that the average power is lower for the same size device. The
spectrum is shifted slightly and shows a lot of inhomogeneous broadening. The upper
state lifetime is much shorter and hence efficiency is lower.
Titanium–sapphire lasers are now the predominant choice for femtosecond pulses,
having taken over the honor from colliding-pulse modelocked (CPM) dye lasers.
Diode-pumped YAGs, usually called diode-pumped solid state (DPSS) lasers, are
potentially extremely quiet; their long upper state lifetime has the effect of filtering out
the wideband AM noise of the pump, and the pump laser can be power-controlled over a
wide bandwidth, so in the absence of mode jumps, mode-partition noise, thermal lensing,
and photorefractive instability, a single-frequency DPSS laser is a very well-behaved
Decent solid state lasers are not simple or cheap devices and must be pumped with
another light source, usually a flashlamp or diode laser. They typically run $5000 to
$50,000, and lots more for the really fancy ones. Sometimes there is no substitute, but
you’d better have a decent grant or be selling your system for a mint of money.
2.11.1 Modelocked Lasers, Optical Parametric Oscillators,
and Other Exotica
Like vacuum systems, laser design is an arcane and lore-rich subject on its own—see
Koechner’s Solid State Laser Engineering and Siegman’s Lasers if you need to know
more details. All there’s space for here is that the longitudinal modes of an ideal laser form
a complete basis set for representing any pulse shape you like—including a δ-function.
By forcing all the modes to oscillate in phase at time t0 , t0 + , t0 + 2 . . . , many
cavity modes combine to produce a train of narrow pulses, with the period of cavity
round-trip time. The phase and amplitude shaping is done by a modulated attenuator
or by a saturable absorber, which attenuates bright peaks less than their dimmer wings,
thus narrowing the pulse on every bounce. Until the limit set by loss and dispersion is
Pulsed lasers are much harder to maintain than CW units in general, with the high
pulse power, low rep rate, lamp-pumped modelocked units being generally the worst, on
account of the thermal transients, vibration, coating damage, and general beating up that
the crystals take when they’re clobbered that hard. If you’re exclusively a CW person,
count your blessings—you’ve led a sheltered life.
Aside: Beam Quality. A complicated system such as a picosecond optical parametric
generator pumped by the third harmonic of a lamp-pumped YAG laser is not going
to have the beam quality of a TEM00 HeNe, no matter what. The good ones have
decent spots in the near field that turn into mildly swirly messes in the far field; bad
ones produce beams that resemble speckle patterns regardless of where you look. Small
changes in conditions can produce severe beam degradation, so make sure that whatever
source you have, you can measure its beam profile accurately and easily. Especially in
the IR, it is amazing how many people trying to make complicated measurements (e.g.,
sum-frequency generation spectroscopy) don’t have any idea of their beam quality. Many
kinds of nonlinear sources, especially OPOs, have beam profiles that are pulse-height
dependent, so you can’t normalize them with an energy meter no matter how hard
you try.
The champs in efficiency and cost effectiveness are diode lasers. Diode laser photons are
more expensive than tungsten photons, but cheap for lasers, and their output quality is
good. Linewidths of 10–100 MHz or so are typical for single frequency units. Wall plug
efficiencies can reach 40%, with slope efficiencies (∂Pout /∂Pin ) of 80% or even higher.
Most diode lasers are of the cleaved cavity (Fabry–Perot) type: the die is cleaved to
separate the lasers, and the cleavage planes form the cavity mirrors. They are usually
coated at the rear facet with a moderately high reflector. The front facet can be left
uncoated or may have a passivation layer. The Fresnel reflection on leaving the die
is sufficiently strong (40% or so) to sustain oscillation with the extremely high gains
available in the active region.
Current is confined to a very small active region by giving the surrounding material
a slightly higher bandgap, so that the forward voltage of the diode in the active region
is smaller, and it hogs all the current. The resulting spatial restriction of laser gain helps
guide the beam, and lasers relying on this are said to be gain guided . A better method
is index guiding, where the refractive index profile makes the active region a stable
waveguide as well as confining the current.
Due to this monolithic construction, diode lasers are mechanically extremely rugged,
although their associated collimators generally are not. Their packages are similar to
old-style metal transistor cans, for example, TO-3, 9 mm (TO-8), and 5.6 mm (TO-72)
for the smallest ones. Most have a monitor photodiode in the package, to sense the light
emitted from the rear facet of the diode for intensity control. The monitor is big enough
that it is not completely covered by the laser die, so that it will pick up light scattered
back into the diode package, which makes it somewhat unreliable in practice.
Diode lasers are available in narrow wavelength ranges centered on 380, 405, 635,
650–690, 750–790, 808, 830, 850, 915, 940, 980, 1310, 1480, and 1550 nm (plus a few
at oddball wavelengths like 1.06 and 1.95 μm), which are defined by their intended use.
The shortest-wavelength diodes widely available are 405–410 nm high power multimode
units (from 20 mW up to 100–200 mW, linewidth 1–2 THz) for Blu-Ray discs. At this
writing (late 2008) the cheapest way to get these is to cannibalize them from the drives,
because they cost $1000 apiece otherwise. There’s a big hole between violet (405 nm) and
red (633 nm), where there are currently no good laser diodes, so diode-pumped solid state
lasers are the usual choice. Lasers used in optical drives have lots of power—roughly
150–200 mW for 658 nm DVD burner diodes. Longer wavelengths, 2–3 μm, can be
obtained with quantum cascade lasers based on superlattices.
The market for diode lasers is dominated by optical storage and telecommunications,
so diode types come and go, and they’re expensive in ones and twos. CD/DVD player
lasers are the cheapest: 650 nm, 5–7 mW or less. These cost around $10 unless you’re
a DVD player manufacturer and get them for $0.50 each in 100,000 piece quantities.
Fancier ones, such as the SDL-5400 series of Figure 2.4, can give you >100 mW of
single frequency power. Multiple longitudinal mode diodes reach about 1 W, and multiple
transverse mode units (based usually on many apertures side by side) are approaching
Figure 2.4. Beam parameters of Spectra Diode Labs (now JDSU) 5420 diode laser (From SDL,
Inc., and reprinted with permission © 1994, 1997 SDL, Inc.)
1 kW CW. These bar-type diodes are best used as light bulbs, because their coherence
properties are poor.
The bad things about diode lasers are that they are extremely delicate electrically, that
they are inconvenient to drive and to collimate, that their tuning depends on everything
under the sun, and that their mode hops and instabilities will drive you nuts if you’re
unprepared or need better-than-ordinary performance.
The other major drawback with all these low cost devices is that they are tightly
tied to consumer applications in printers, DVD players, and so on. When the consumer
technology moves on, the lasers go away; if you’re using them in your instrument, and
you don’t have a big stock of them, you’re out of luck.
2.12.1 Visible Laser Diodes
Working in the infrared is more difficult than in the visible, and the available spatial
resolution is less for a given NA. Commercial visible laser diodes (VLDs) work near
670, 650, 630, and 405 nm, which is convenient for human eyes. High power VLDs are
used in CDR and DVDR drives, so they’ve become quite common. Unfortunately, VLDs
behave significantly worse than their IR brethren, with poorer tunability and frequency
stability and more mode hopping. If you need good noise performance from your diode
laser, look elsewhere.
Aside: Wavelength-Division Multiplexing (WDM) and Chirp. The intrinsic bandwidth of optical fiber is extremely wide—much wider than any foreseeable electronic
switching frequency. Thus the most convenient way to get more capacity from a
fiber-optic data link is by sending many independent signals down the same fiber, using
a grating device or a sequence of narrowband couplers to combine the signals at one
end and separate them at the far end. The wavelengths of the optical carriers conform
to a standard spacing, the so-called ITU grid . The grid frequencies are multiples of
100.0 GHz, from 184.8 to 201.1 THz, with 50 GHz offsets also available. Achieving
low crosstalk between channels requires that the laser tuning be very stable, both with
time and temperature and (which is more of a challenge) with fast modulation. Normally
the frequency of a semiconductor laser changes quite a bit—as much as a couple of
nanometers (∼250 GHz at 1.5 μm) † during a fast turn-on, which would scribble all
over the grid if it weren’t controlled. If the chirp is linear in time, it can be used with a
grating to compress the pulse, but most of the time chirp is just a nuisance. VCSELs,
even multimode ones, have much wider spectra but only a few modes; they have much
lower chirp than FP lasers, which have many modes for energy to slosh about in (see
Section 2.12.7).
2.12.2 Distributed Feedback and Distributed Bragg Reflector
Diode lasers need not use the cleaved ends of the die as mirrors. Distributed feedback
(DFB) lasers use an active waveguide with a grating superimposed on it, resulting in
very high selectivity. The similar distributed Bragg reflector (DBR) scheme uses separate
gratings in passive waveguide regions, which can be tuned separately with a second bias
† Paul
Melman and W. John Carlsen, Interferometric measurement of time-varying longitudinal cavity modes
in GaAs diode lasers. Appl. Opt . 20(15), 2694– 2697 (1981).
current (some have more than one tunable grating segment). At one time, DFB lasers had
better tunability since the same mechanisms that tune the frequency also tune the gratings
to match, resulting in wide temperature and current tuning ranges with few or no mode
hops. DFB lasers are expensive and specialized, so they’re only available in the telecom
bands. Chirp is very damaging there, as we saw, so lots of work has gone into reducing
it; although you can tune modern DFB lasers with temperature, they hardly current-tune
at all. For wider current tuning, DBR lasers are superior, if a bit less convenient to drive.
2.12.3 Tuning Properties
Everything in the world tunes diode lasers: temperature, current, pressure, and any sort
of optical feedback (see Section 2.13.6). All of these effects can be controlled, but you
have to be aware of the need. Highly stable diode laser instruments need clever control
systems for temperature, current, and mode hop control, which we discuss in Chapter 20
( and Section 15.9.1.
Among Fabry–Perot lasers, the 800 nm ones are the best behaved. They tune at rates
of around −0.015 to −0.08 cm−1 /mA and −0.1 cm−1 /K in a single mode (for a 5 mW
unit), and around −4 cm−1 /K on average due to mode jumps. They can be tuned through
1–2 cm−1 by current in between mode jumps, and much further via temperature.
Tuning red VLDs is a different story. A small bandgap difference between the active
area and its surroundings hurts the current confinement and makes VLDs very temperature
sensitive and generally unstable. It is not trivial to find a nice single-mode operating
region with a VLD, although it can be done in the lab if you don’t need more than
0.5 cm−1 of current tuning range and are prepared to hunt. Violet and UV lasers are
generally multimode. Stick with the 750–850 nm ones if you need to do anything fancy.
2.12.4 Mode Jumps
The tuning of single-mode diode lasers is discontinuous in both temperature and current,
as shown in Figure 2.5, and the discontinuities unfortunately move around slowly with
laser aging; even with perfect temperature and current control, your continuous tuning
range won’t stay the same. Another odd thing is that the tuning curve is multivalued: in
some regions, the same temperature and current can support oscillation at two different
frequencies. These two modes typically do not oscillate at the same time; it’s one or the
other, depending on history, a phenomenon called hysteresis.
The mode jumps move slowly downwards through the operating current range as you
increase the temperature; they move much more slowly than the tuning, unfortunately,
so you can’t always get to your desired wavelength without external cavity stabilization.
If you’re building instruments relying on tuning F-P diodes, you normally have to
ensure that you’re working at a point where ν is a single valued function of T and I , and
that takes some work. On the other hand, you only get DFB lasers at 1.3 and 1.55 μm,
and they are two orders of magnitude more expensive, so fixing up F-P diodes to tune
reliably is a worthwhile thing to do.
2.12.5 Regions of Stability
Besides the intrinsic Fabry–Perot mode structure, diode lasers in real optical systems
always have some feedback, which is liable to lead to instability, and certainly modifies
Tcase = 24.1°C
0.0089 nm / mA
−0.197 cm−1/ mA
Wave Number (1/cm)
Wavelength (nm)
Diode Laser Current (mA)
0.053 nm /°C
0.057 nm /°C
0.044 nm /°C
Wave Number (1 / cm)
Wavelength (nm)
0.054 nm /°C
Diode Laser Case Temperature (C)
Figure 2.5. Tuning properties of an FP visible diode laser (Toshiba TOLD9211): (a) versus current
and (b) versus temperature.
the stable regions. If you vary the temperature and bias current to a diode laser, and
look at its noise, you find two-dimensional islands of stability surrounded by a sea of
mode hopping. There is some bistability at the edges, since the edges of the islands
sometimes overlap, as we saw (they remain separate if you consider wavelength as a
third dimension).
These islands can be mapped out fairly easily, and if you are using diode lasers in
instruments it is a good idea to do that. Use some simple software for the job, because
otherwise you’ll be at it quite awhile; since the noise goes up so much during mode
hopping, it is straightforward to design self-testing software for the instrument. The
islands change slowly with time. VLDs have much smaller islands than 800 nm diodes.
If you want the islands to be as stable as possible, temperature control the collimating
lens and the lens-to-laser spacing too. A collimator design that makes this easy is shown
in Example 20.7 at
2.12.6 Checking the Mode Structure
Use a fine pitch grating (e.g., 2400 lines/mm at 670 nm), aligned so that the first-order
beam exits near grazing incidence. If the laser is well collimated, the mode structure will
show up clearly. You get a bunch of dim dots at the far-off Fabry–Perot transmission
maxima, which are just filtered spontaneous emission, and a few bright ones that are
actually lasing. You can spot mode jumps right away with this trick, and sticking a
photodiode in just one of the bright spots will give you a visceral feel for just how bad
mode partition noise can be.
2.12.7 Vertical Cavity Surface-Emitting Lasers
Almost all lasers have cavities that are long and skinny. This means that they have a great
many longitudinal modes to jump around in, which causes noise problems. In addition,
they emit out the ends, which is okay for built-up lasers but is inconvenient for diodes.
Diode lasers are hard to test before dicing and can’t easily be made in two-dimensional
arrays. They also have that nasty beam asymmetry and astigmatism to deal with.
A partial solution to some of these problems is the vertical cavity surface-emitting
laser (VCSEL). A VCSEL is rather like a DFB laser built on end. The cavity mirrors
and active region are made by multilayer deposition, and the bias current flows vertically
right through the whole thing. The high reflectance of the mirrors allows the active region
to be very thin, so that the longitudinal mode spacing is very wide (1 THz). That and the
limited mirror bandwidth mean that a typical VCSEL has only one longitudinal mode.
The NA of a typical VCSEL is 0.3–0.4, and due to its approximate rotational symmetry,
its beam is roughly circular. That same symmetry means that its polarization usually
wanders all over the place.
VCSELs also have lots of transverse modes, and they’re nearly all multimode. It is
quite common for a VCSEL to jump between two or three spatial modes as the current
is increased, and wind up in some N = 6 mode that looks like a chrysanthemum (See
Figure 2.6.) They can be really quick (20 GHz modulation bandwidth), so they work
well for things like high speed communication via multimode fiber. There has been
some progress made in improving the polarization purity and mode structure, usually by
breaking the circular symmetry somehow, but VCSELs are far from a panacea.
Aside: VCSEL Pathologies. Even in datacom sorts of jobs, some VCSELs exhibit
weird turn-on artifacts at low repetition rates. The pump current has to flow through all
those layers, so there are lots of opportunities for trap states to form. You have to fill them
all before the laser will turn on, and at low rep rates, they’ll all have recombined before
the next pulse arrives. Repopulating them each time slows down the leading edge badly.
VCSELs also tend to have really horrible 1/f mode partition noise, which is worse
than in ordinary multimode lasers since the modes are randomly polarized. Here’s another
instance where a good quality polarizer at the laser can help a lot in a differential
ULM001 × 01, 20°C
optical power (mW)
Angle (degree)
ULM 001 × 01C, T = 20°C
rel. optical power (dB)
I = 8m
A, P
= 3.0
I = 6mA,
P = 2.12m
I = 4mA, P = 1.13mW
I = 2mA, P = 0.17mW
Wavelength (μm)
Figure 2.6. Angular beam width (a) and spectrum (b) of an 850 nm multimode VCSEL versus
bias current. (Reproduced by courtesy of U-L-M Photonics GmbH.)
2.12.8 Modulation Behavior
You can modulate a laser diode fast enough for almost anything. Generic diodes can
be modulated strongly up to 1 GHz, and special ones such as 850 nm communications
VCSELs can go much faster, up to 20 GHz. The limiting factors are parasitic inductance
and capacitance, and the intrinsic speed of the laser mechanism due to transition rates
and the relaxation oscillation peak.
The modulation sensitivity increases toward the relaxation peak and drops off very
rapidly beyond it, but you don’t want to be working there anyway. The relaxation
frequency (and hence the maximum modulation frequency) generally increases as the
operating current goes up, up to somewhere near maximum output power, and then starts
to slow down slightly.
Like an RF power transistor, the modulating impedance of the diode is normally low,
which leads to vulnerability to parasitic inductance; in fact, wiring an RF power transistor
right across the diode is a good way to modulate a high power device, but watch out for
oscillations at 300 MHz to 3 GHz when you do this—it’s possible to have too much of
a good thing, and an anti-snivet resistor can come in very handy (see Section 19.7.4).
The main problem in modulating diode lasers is that the intensity and frequency
modulate together, so that it isn’t easy to get pure AM or pure FM from current tuning.
If you’re doing a spectroscopic measurement with a diode laser, you need some way of
suppressing the giant intensity slope superimposed on your data. Typical methods are
FM spectroscopy (Section 10.6) and noise canceler spectroscopy (see Section 10.8.6).
2.12.9 ESD Sensitivity
Diode lasers are so extremely sensitive to electrostatic discharge that ESD is the major
cause of reliability problems. ESD in reverse bias causes hot carrier damage to the
junction region, increasing the threshold, but forward bias ESD is much more spectacular:
because the laser radiation grows so fast, a carpet shock with a 1 ns rise time can generate
such a high peak laser power that the output facet of the laser gets blown right off. This
is usually easy to spot; it puts stripes in the laser output pattern and dramatically reduces
the output power.
Always use ground straps, and design protection into your diode laser mounts: use a
DIP reed relay to keep the anode and cathode shorted together when the laser is not in
use. A big bypass capacitor (1 μF) will absorb carpet shocks without letting the laser
blow up, although this severely constrains the AC modulation possibilities. One way
round this is to use a transformer with its secondary wired in series with the the diode
to get fast modulation without ESD risk.
2.12.10 Difficulty in Collimating
Edge-emitting diode lasers emit radiation from a stripe-shaped aperture about 1 μm wide
by a few microns long. The aperture is small, so that light comes out into a large solid
angle, which is very asymmetrical: typical NAs are 0.09–0.15 in one plane and 0.3–0.5
in the other, roughly an elliptical Gaussian beam. This large NA means that the laser–lens
distance is very critical; the depth of focus is only a few microns, so that extreme stability
is required. Very thin layers of UV epoxy are a good way of affixing the lens and laser
together.† Stability against mode hopping requires significantly tighter control than focus
stability does.
The light is strongly polarized with E along the minor axis of the ellipse. The beam
from a gain guided laser is a complete mess. It has such large amounts of astigmatism
that getting decent collimation is extremely difficult (they’re getting rare now anyway).
Index guided lasers have some astigmatism, usually about 0.6 wave. This amount is big
enough that it needs correction, which is a pain, but not so large that correcting it is
unduly difficult if you have a measuring interferometer.
Astigmatism can be reduced with a weak cylindrical lens, but the commercially available ones are normally a poor match to your laser, unless you’re very lucky. Fortunately,
good collimation can be reliably obtained without the use of astigmatism correction optics
† See
Example 20.7 at
(0.95 Strehl ratio is often attainable).† The trick is to compensate for the astigmatism
with a small amount of defocus, so that instead of the wavefront being a cylinder of
0.6λ p-p, it is a saddle shape with a p-p error of 0.3λ. You can also use the collimator
slightly off axis, to get a bit of coma and astigmatism to knock the laser’s astigmatism
flatter still. This approach is unfortunately a bit fiddly; in a production system, it may
be better to use a more powerful laser and just chop off all but the center 20% with a
circular aperture.
The elliptical beam can be circularized with one or two prisms, where the beam
enters near grazing (producing a patch elongated into a circle) and leaves near normal,
so that the circular patch defines the refracted beam profile. Gratings can be used in the
same sort of way, with lower efficiency. If you’re using gratings, make sure you use
the deviation-canceling configuration in Figure 7.8, or your beam will wander around
with tuning.
An anamorphic lens system,‡ in this case a telescope made of cylindrical lenses, can
circularize the beam as well as cancel the astigmatism, but these are limited by the
difficulty of fabricating good quality cylindrical lenses at low cost.
Blue Sky produces lasers with diffraction-limited circular output beams by using a
cylindrical microlens right near the laser die, inside the package, which in principle
makes a lot of things easier. These are not particularly cheap, but they’re good for lab
use. (You still have to collimate them yourself.)
2.12.11 Other Diode Laser Foibles
Diode lasers have a number of other funny properties. One is that dust and crud are
photophoretically attracted to regions of high light intensity, which unfortunately means
the diode laser facet or the center of the laser window. In a high power system, the beam
path will get dirty long before the rest of the system, so make sure to control outgassing
and dust inside your diode laser-based instruments. Interestingly, dry air turns out to be
much better than inert gas—having oxygen around gives the organic crud a chance to
oxidize before it builds up enough to cause damage.§
Lasers exhibit noise in both intensity and frequency. Apart from ultrahigh resolution
spectroscopy, most measurements suffer more from intensity noise than frequency noise.
As we’ll see in Chapter 10, a lot of ingenuity has been expended on getting rid of the
effects of laser noise.
2.13.1 Intensity Noise
Intensity noise pollutes laser-based measurements in two ways. Most measurements have
a nonzero baseline, so that a fluctuation in the laser power causes a fluctuation in the
† The Strehl ratio is the best single measure of beam quality for instrument purposes, where the beams aren’t too
ugly. It cannot be greater than 1, and 0.8 corresponds roughly to Rayleigh’s λ/4 criterion for diffraction-limited
image quality. See Section 9.5.4 for more discussion of the aberration terms used here.
‡ That is, one with different magnifications in x and y.
§ R. Jollay et al., Proc. SPIE 2714, 679– 682 (1996).
background signal, which shows up as additive noise in the measurement (additive means
that this noise doesn’t depend on the signal strength). This additive noise reduces the sensitivity of the measurement and, since laser noise tends to be non-Gaussian in character,
may not average out well.
The other effect of laser intensity noise is to cause the signal itself to fluctuate in
strength. In nearly all laser-based measurements, the detected signal is proportional to
the laser power, so an intensity fluctuation produces a signal strength fluctuation. This is
called multiplicative noise, or noise intermodulation. For example, consider a single-beam
tunable diode laser spectrometer looking at a gas cell. The spectrum consists of absorption
lines on a smooth baseline. Intensity noise causes the baseline to fluctuate (additive noise),
but also causes the absorption peaks to fluctuate in magnitude (noise intermodulation).
(See Section 13.6.11.)
There are a variety of differential and ratiometric detection schemes to help with this
problem, of which laser noise cancellation is by far the most effective; it can provide as
much as 70 dB reduction in both additive laser noise and noise intermodulation, and gets
you down to the shot noise reliably, even with noisy lasers. It needs an extra beam plus
about $10 worth of simple electronics. If your measurement suffers from laser intensity
noise, have a look in Section 10.8.6.
2.13.2 Frequency Noise
An oscillator is composed of an amplifier plus a frequency determining device, usually a
resonator. The resonator attenuates signals outside its narrow passband but, more to the
point, exhibits a phase shift that varies rapidly with frequency. Since oscillation requires
that the round-trip phase be an exact multiple of 2π , changes of the resonator length
force the frequency to move. The frequency noise of the oscillator is determined by the
combination of the amplifier’s phase fluctuations and the phase slope of the resonator.
Lasers are a bit more complicated in that the resonator may exhibit phase fluctuations too,
as when fan vibrations or cooling water turbulence jiggles the mirrors of an argon ion
laser. Resonator noise is confined to low frequencies, so it can be dealt with separately
by mechanical means.
This picture of frequency noise suggests that lasers can be stabilized by using longer
cavities with higher Q, which is true; external cavity diode lasers have linewidths a
factor of 103 –104 narrower than when the diode’s F-P resonator is used. There are also
various active locking techniques such as Pound–Drever stabilization, which are beyond
our scope but are covered well in Ohtsu.
2.13.3 Mode Hopping
Most lasers have a large number of possible oscillation modes within the gain curve
of the medium. Ideally, one of these should dominate all others, but this is often not
the case, due to spatial hole burning.† Even when it is, the difference between adjacent
modes is often small enough that it takes very little perturbation to switch the laser from
one mode to another. This will happen during warmup, for example. Sometimes, though,
† Hole
burning is the picturesque name given to the local reduction of laser gain near peaks of the standing
wave pattern in the gain medium. This reduces the gain of the strongest mode, without reducing that of the
others equally. See Siegman for more details.
there is no one stable mode of oscillation (usually due to spurious feedback of one kind
or another coupling the modes), leading to mode hopping.
Diode lasers are especially sensitive to mode hopping, because their cavities are
strongly coupled to the outside (reflectivities of about 40%, vs. 99.5% for a HeNe).
A spurious reflection on the order of 1 part in 106 can set a diode laser into mode hopping; this leads to a lot of flaky failures that come and go, making testing difficult. The
actual mechanism of mode hopping is a complicated interaction between the thermal,
plasma-optical, and current confinement behaviors of the diode; a change in the laser
tuning changes the power output, which changes the dissipation in the channel, which
changes the temperature, which changes the cavity length and index, which changes the
tuning, and so forth. Since the active region can be cooled very rapidly by the laser output itself, there is a strong coupling between die temperature, tuning, and power output
that leads to instability. Visible diode lasers are the worst for this. VCSELs are designed
with cavities so short that their free spectral range is wider than the natural linewidth, so
they don’t hop between longitudinal modes, but frequently they do between transverse
Mode hopping is most obviously a frequency noise phenomenon, but it results in
strong (0.1–1% ) intensity noise as well, because the gains of the laser system in adjacent
modes are not the same. Mode hopping causes irregular jumps and spikes in the laser
power, at rates of 100 kHz or so. Even in a pure intensity measurement, diode laser mode
hopping due to incidental feedback is very obnoxious and can render your measurement
extremely difficult. Mode hopping makes all your etalon fringes dance around, so that
the frequency noise gets converted to intensity noise as well.
2.13.4 Mode-Partition Noise
The total power output of a laser is limited by the pump power among other things, and
various saturation effects couple the intensities of the modes, so that the instantaneous
power of the laser varies less than that of the individual modes. The switching back and
forth of the laser power is called mode-partition noise. It is insidious, because it doesn’t
show up on a power meter but is silently at work screwing up your measurement and
producing seemingly irrational results; with a gas laser, it is easily possible for a spatial
filter, a knife edge, or even an iris diaphragm to cause the intensity noise to go up by
20 dB; a stray etalon fringe can do the same, since the different modes will see different
phase shifts and hence will be demodulated differently. It’s pretty puzzling if you don’t
know the secret.
The quietest lasers are single longitudinal mode units, followed by highly multimode
ones, and the worst usually have only a few (2–10) modes. The same is true of optical
fiber devices (see Section 8.4.3). It’s worth trying a quick experiment with a few-mode
diode laser and a grating—if you catch one mode with a photodiode, you’ll usually find
it much noisier than the entire beam, even in absolute terms.
2.13.5 Gotcha: Surface Near a Focus
Unless you use really effective—60 dB or more, optical—Faraday isolators to protect
the diode laser from feedback, make sure that there is no surface in your optical system
that coincides or even nearly coincides with a focus. Even if the specular reflection goes
off at a steep angle, and so misses the laser, there will be enough scatter to make the
laser mode hop. If you have a diode laser system that mode hops, and you’re sure you’ve
eliminated all the near-normal surfaces, look for this one. If it isn’t a surface near a focus,
it’s probably feedback right inside the collimator.
An insidious possibility you may need to watch for is that the focus in question may
not be a focus of the main beam, but of a converging stray reflection; a concave coated
surface will give rise to a converging beam 1% as strong as the main beam, and even
after two more bounces to get back into the laser, it can easily get to the 10−6 mode
hopping danger level. The ISICL sensor of Example 1.12 has this difficulty if it isn’t
mounted properly.
2.13.6 Pulling
The oscillation frequency of a laser has to be at a frequency where the round-trip phase
delay is an integral multiple of 2π radians. Normally, as the cavity length changes
slightly, or small amounts of contamination collect on the mirrors, the oscillation frequency changes so as to preserve the round-trip phase. However, if one of the mirrors
suddenly were to develop a time-dependent phase shift of its own, the oscillation frequency would have to respond to it by shifting, even if the cavity length were perfectly
This is more or less what happens with pulling; some external influence, for example
a Fabry–Perot resonator such as an optical spectrum analyzer or merely an incidental
reflection, sends delayed light back to the output coupler of the laser. Interference between
the light inside the laser cavity and this spurious reflection causes the phase of the light
returned from the output coupler to change.
The resulting frequency shift is
ν ≈
∂φ/∂ν + 2π /c
where ∂φ/∂ν is the phase slope of the total reflection. Note that even if the spurious
reflection is weak, so that the total phase excursion is small (see Section 13.6.9), ∂φ/∂ν
can be made very large by using a sufficiently long delay. As a result, the stray reflection
can in principle take over the frequency determining role from the cavity. This happens
especially in diode lasers, where the cavity is short and the mirrors leaky; sometimes
their tuning behavior is determined more by the back-reflection from their collimator
than by the cavity mirror.
2.13.7 Mode Beats
If you shine a HeNe laser onto a photodiode and examine the results on a spectrum
analyzer set to DC–10 MHz, you’ll probably see the occasional strong, narrow spur †
appear, sweep toward 0, and disappear when it gets to 1 MHz or so.‡ It may be as strong
as 0.1% of the total photocurrent. These odd objects are baseband mode beats. Lasers,
like all oscillators without automatic level control, operate in a strongly nonlinear region,
which means that all their (supposedly orthogonal) modes are in fact coupled together
† That
is, spurious signal, see Section 13.5.
it doesn’t do it right away, put your hand on one end of the housing to cool it down a bit, and then look.
A bit of bending caused by thermal gradients will unlock the modes.
‡ If
with each other. Those mode beats are caused by one mode mixing with a third-order
intermodulation product of two others (Section 13.5.3 for more details). Since optics
people always have fancier names for things, this intermodulation is called four-wave
mixing. Small anharmonicities in the laser medium or cavity cause the two products to
differ in frequency by about 1 part in 109 , causing the few-hundred-kilohertz mode beats
we see. The disappearance at low frequency is caused by the modes jumping into lock
with each other.
2.13.8 Power Supply Ripple and Pump Noise
Like other oscillators, lasers are sensitive to power supply ripple. The laser output depends
on how much gain is available, which depends on how hard it’s pumped, which depends
on the supply voltage. This is usually worst in big gas lasers, whose large power requirements make quiet supplies harder to build. The advent of good quality switching supplies,
whose high operating frequencies make filtering much easier, have improved this problem, but it still persists. You’ll definitely see your power supply’s operating frequency
come through loud and clear. If it sits still, you can usually avoid it, but some supplies
change their frequency with load, and that makes it much harder to avoid. Make sure
you know this about your particular laser.
Diode lasers have very high electrical to optical conversion efficiencies, so the current
noise on the supply translates more or less directly into photon noise. Most diode laser
supplies use lousy voltage references, inadequately filtered, to define their output current,
which makes the lasers themselves noisier. For bright-field measurements, it is really
worthwhile to make sure that your diode laser’s bias supply is quieter than the shot
noise. This isn’t too hard to do—see Section 14.6.7.
Lasers that are pumped optically will also be modulated by the noise of the pump
source. Flashlamps are usually the prime contributors, but ion laser pumped dye lasers
also suffer from instability due to mode noise in the pump laser.† As noted in Section 2.11,
single longitudinal mode DPY lasers are very quiet.
2.13.9 Microphonics
The narrow linewidth of a laser comes not from the narrowness of the spectral line
doing the lasing, but rather from its being put in a Fabry–Perot interferometer a meter
long, and then subjected to regeneration, which narrows it further. Since the narrowness
comes from the cavity selectivity, anything that causes the cavity length to fluctuate
will produce frequency noise in the laser. This includes vibrations from ambient sound
(microphonics), cooling water turbulence, fans, and conducted vibrations from the table.
Lasers whose mirrors are firmly positioned (e.g., sealed HeNe units and especially diode
lasers) are much less prone to this problem.
Some types of lasers (e.g., medium power diodes and diode pumped YAGs) come in
styles with and without fans in the laser head. Avoid fans wherever possible.
† Martin
C. Nuss, Ursula H. Keller, George T. Harvey, Michael S. Heutmaker, and Peter R. Smith, Amplitude
noise reduction of 50 dB in colliding-pulse mode-locking dye lasers. Opt. Lett. 15(18), 1026– 1028 (1990).
2.13.10 Frequency Noise
Frequency noise in an oscillator comes from the noise of the active element, restrained by
the selectivity of the resonator. The low frequency noise of the amplifier (and any noise
or instability in the resonator) gets translated up to become low modulation frequency
sidebands on the laser oscillation (see Section 15.9.4), and high frequency noise becomes
a more or less white phase noise background. A resonator run at its 106 th overtone, such
as a laser cavity, makes this a bit more complicated by introducing competition between
modes, and the complexities of the laser gain mechanism contribute intrinsic noise and
instability, but this picture is still basically right.
Laser frequency noise can be reduced by reducing the noise forcing: quieting down
the pumping, using a stable gain medium (e.g., Nd:YAG rather than N2 ) when possible,
or using a highly mechanically stable cavity. It can also be improved by using a more
selective resonator (a longer or lower loss one). An external cavity stabilized diode laser
uses both approaches.
There is a region of optical feedback in which temporal coherence collapses completely, and the laser output becomes chaotic. You’ll recognize it if it happens to you.
2.13.11 Spatial and Polarization Dependence of Noise, Wiggle Noise
Laser noise is not merely a well-behaved wandering around of the intensity or frequency
of the entire beam at once. There are important variations in the noise with position and
polarization; for example, vignetting the beam of a single frequency argon ion laser has
been known to increase its residual intensity noise (RIN) by an order of magnitude; a
weak side mode that was previously orthogonal to the main beam was made to interfere
by the vignetting, producing a huge noise increase. Diode lasers are especially prone to
this pathology for some reason, but all lasers have spatially dependent noise.
Interference with small amounts of laser light and spontaneous emission whose spatial
pattern is different causes the laser beam to wiggle back and forth ever so slightly,
a phenomenon called wiggle noise.† A gas laser with a second spatial mode that is
close to oscillating, or a “single-mode” fiber run at too short a wavelength (so there
are really two or three or five modes) are especially bad for this; you don’t know what
pointing instability is like until you’ve used a system like that. On the other hand, a truly
single-mode fiber with all the extraneous light stripped out is unsurpassed for pointing
stability (see Section 8.2.2). The angular size of the wiggle goes as the ratio of the
amplitudes of the laser mode and the spurious signal, that is, the square root of their
intensity ratio, so this isn’t as small an effect as you might think.
Polarization dependence is easier to understand; for example, lasers produce spontaneous emission, which is more or less unpolarized. Because it has a different dependence
on pump power, the modulation of the spontaneous emission by pump noise will be different. Because the laser radiation is usually highly polarized, the detected noise will
differ in character depending on the rotation of an analyzer.‡
† M.
D. Levenson, W. H. Richardson, and S. H. Perlmutter, Stochastic noise in TEM00 laser beam position.
Opt. Lett. 14(15), 779– 781 (1989).
‡ An analyzer is just the same as a polarizer, but the name specifies that it is being used to select one polarization
for detection, rather than to prepare a single polarization state for subsequent use.
2.14.1 External Cavity Diode Lasers
The tuning and frequency stability of a Fabry–Perot diode laser are limited mainly by
the poor selectivity of its cavity. It is possible to use a diode as the gain medium in a
conventional laser cavity; this is done by antireflection coating the output facet to get
rid of etalon resonances in the diode, and using an external reflector and a grating for
selectivity. In these external cavity diode lasers (ECDLs), the external cavity takes over
the frequency determining function; because of its length and the good optical properties
of air, the linewidth is narrower and the tuning much more stable. Because the diode has
such high gain, the cavity needn’t be all that efficient, and typically fixed-tuned ECDLs
use a grating in Littrow as the cavity mirror, with the specular reflection being the laser
output. This makes the output beam alignment tuning-sensitive, so it is usually restricted
to fixed-tuned applications. Tunable ECDLs use two bounces off a fixed grating, with a
rotatable mirror as the tuning element; that makes the output beam pointing very stable.
Very fast-tuning ECDLs use polygon mirrors but these have amplitude spurs due to
Doppler shifts. ECDLs can also be made from uncoated diodes, but the two competing
cavities make it much harder to find a stable operating region, and the tuning is no longer
continuous as it is with the grating-tuned, AR-coated version. You’d think that the grating
feedback would help to prevent mode hops by sharply distinguishing the allowed modes,
but in practice it doesn’t really. Even fancy commercial ECDLs are highly sensitive to
feedback, so budget for a two-stage free-space Faraday isolator in addition to the laser
2.14.2 Injection Locking and MOPA
Small diode lasers have better mode characteristics than large ones, just as small signal
transistors make better oscillators than power units. Larger lasers can have their facets
AR coated to eliminate regeneration, turning them into amplifiers instead. They can
be used as power amplifiers for the radiation from smaller ones, the so-called master
oscillator–power amplifier (MOPA) approach. The main drawback of MOPA, besides its
cost and complexity, is the large amount of spontaneous emission from the amplifier.
Alternatively, the power stage can be left to oscillate, but seeded by shining the master
oscillator’s output into it. Under the right conditions, the power stage will injection lock
to the seed. (This is an old microwave trick and also helps a lot with OPOs.) Injection
locking requires less pump power than MOPA, and the Fabry–Perot resonance of the
power stage filters out most of the spontaneous emission, but it’s just flakier. Its bad
behavior arises from the coupling between the two resonators, which are sensitive to
temperature, current, and the phases of the moon (microwave versions have amplifiers
with good input–output isolation, a luxury we’ll have more than one occasion to envy
before this book is finished). MOPA seems to be a much more reliable approach.
2.14.3 Strong UHF Modulation
Rather than increase the temporal coherence of a laser, sometimes it’s more helpful
to destroy it. Mode hops can be eliminated by the use of quenching, by analogy with
the superregenerative detectors of early radio. When gain was very expensive, positive
(regenerative) feedback offered an appealing if unstable method of getting more gain
for less money. Superregens work by coupling the input into an oscillator circuit that is
turned on and off at ultrasonic frequency by a second (quench) oscillator. The exponential
buildup of the oscillations produces a waveform proportional to the size of the input,
but many times larger, with the amplification and linearity controlled by the quench
frequency (see Terman, it’s beautiful). Lower quench rates give a logarithmic response.
Laser quenching isn’t as pretty, but it’s still useful. In a mode hopping laser, the situation is a bit more complicated, since the laser oscillation itself builds up very rapidly
(1 ns or faster) and it’s the mode hops we want to quench. Using large-signal UHF modulation to essentially turn the laser on and off at 300–500 MHz suppresses mode hopping
completely, at the cost of enormously increased linewidth. Commercial ICs are available,
or you can use an RF transistor in parallel with your diode laser. (Gallium nitride RF
FETs are especially good for this.) This trick was widely used in magneto-optical storage
applications and DVD players—there were even self-pulsating diode lasers that turned
themselves on and off at UHF rates. (Note that the linewidth is still fairly small compared
with a light bulb, so that interferometers built with these will have phase noise problems
unless their path differences are extremely small.)
Optical Detection
You can’t hit what you can’t see.
—Walter Johnson (American baseball player)
Electro-optical systems normally consist of an optical front end and an electronic and
software back end, with an optical detector occupying the uncomfortable place of honor
in between. The detector subsystem consists not only of the detector itself, but includes
any baffles, coolers, windows, or optical filters associated with the detector, as well as
amplifiers, electrical filters, and analog signal processing taking place in its vicinity. The
optical front end is normally the most expensive part of the system, but is the only
place where its photon throughput (and hence the maximum SNR) can be improved; the
electronic back end is where the filtering, digitizing, and postprocessing occur, and is
what produces the valuable output.
The guiding principle of detector system design is this: a photon, once lost, cannot
be recovered. This includes those lost due to poor coatings, inefficient detectors, or poor
matching of the optical characteristics of the incoming light to those of the detector, as
well as those that are needlessly swamped in technical noise due to a poor choice of
amplifier, load impedance, or circuit topology.
Once the measurement principle has been chosen, the choice of detector and the
design of the detector subsystem are usually the most critical tasks in engineering a high
performance electro-optical instrument; it is easy to get these badly wrong, and even serious errors are not always immediately obvious. Vigilant attention to every decibel there
will be repaid with a sensitive, stable, repeatable measurement system. Decibel-chasing
should not be limited to sensitivity alone, but should include stability as well; if efficiency is purchased at the price of instability, a measurement that was once merely slow
may become impossible. Careful attention must be paid to many second-order sources
of artifacts, such as ground loops, spectral response changes with temperature, etalon
fringes, parametric effects such as memory in photoconductors, and nonlinear effects
such as overload in photomultipliers or Debye shielding and lateral voltage drops in photodiodes. Achieving high stability is a somewhat subtle task and requires the cultivation
of a certain healthy paranoia about unconsidered effects. Nevertheless, it is quite possible
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
for a measurement system to achieve stabilities of 10−5 to 10−6 in 1 hour, even at DC,
if good design practices are followed.
Linearity is often the most important parameter after efficiency and stability. Many
types of photodetector are extremely linear, as well as time invariant; this good performance can only be degraded by the succeeding circuitry. Thus another useful maxim
is: if there’s one operation in your signal processing strategy that has to be really accurate, do it right at the detector. Examples of this sort of operation are subtracting two
signals, where very small phase shifts and amplitude imbalances matter a great deal, or
amplifying very small signals in a noisy environment.
Detectors differ also in their resistance to adverse conditions. Photodiodes tend to be
vulnerable to ultraviolet damage, photomultipliers to shock, APDs to overvoltage. (PDs
are bulletproof by comparison, of course.)
A semiconductor PN junction is the interface between regions of n-doping (excess electrons) and p-doping (excess holes). Electrical neutrality (zero E field) would require the
extra electrons to stay in the N region and the extra holes in the P region. However, the
electrons are in rapid motion, and their thermal diffusion flattens out the density gradient
of the free carriers. The mismatch between the bound charge (ions) and free charge (carriers) causes an E field to form in the junction region, even at zero bias. The magnitude
of E is just enough to cause a drift current equal and opposite to the diffusion currents.
Absorption of light causes the formation of electron–hole pairs, which are pulled apart
by the E field, yielding a current at the device terminals. Away from the junction, the E
field is shielded out by the free charge. Thus an electron–hole pair generated there will
usually recombine before it can be collected, which reduces the quantum efficiency of
the device. Applying a reverse bias pulls the free carriers back from the junction, forming
a depletion region with a large E field throughout. If the doping level in the depletion
region is low, applying a reverse bias can cause a very large change in the depletion
layer width (Figure 3.1a,b). This reduces device capacitance typically 7× (by effectively
separating the capacitor plates), and the extra depletion layer thickness improves quantum efficiency at long wavelengths, where many photons would otherwise be absorbed
outside the depletion region. In an avalanche photodiode (Figure 3.1c), the doping profile
is changed to make a separate high field region deep inside the device, in which electron
multiplication takes place. (See Sze for more.)
There is a significant amount of confusion in the electro-optics field about how to calculate
and quote signal-to-noise ratios, and power ratios in general. This arises from the fact
that detectors are square-law devices; the electrical power emerging from the detector
subsystem is proportional to the square of the optical power incident.
Square-Law Detectors
One way to look at this is that a photon of a given frequency ν carries an energy equal
to hν, so that the optical power is proportional to the number of photons incident on the
Figure 3.1. Photodetection in semiconductor diode: (a) PIN diode, zero bias; (b) PIN diode, high
bias (fully depleted); (c) avalanche photodiode; ν and π are very low-doped n and p regions,
detector. In a quantum detector, each photon gives rise to an electron–hole pair, so that
the electrical current is proportional to the photon flux. Electrical power is proportional
to i 2 , however, so the electrical power goes as the square of the optical power. Since
signal-to-noise ratio (SNR) is defined in terms of power, rather than amplitude, the
signal-to-noise ratio on the electronic side is the square of that on the optical side. The
same is true of thermal detectors, since T ∝ Popt and V ∝ T .
An optical signal of 106 photons per second has an RMS statistical noise of 103
photons in a 1 second DC measurement (0.5 Hz bandwidth); since the optical power is
proportional to the photon flux, the signal-to-noise ratio on the optical side is 103 in 1
second. On the electrical side, however, the signal power delivered to a load resistor R is
(106 e)2 R, while the noise power is (103 e)2 R, so that the signal-to-noise ratio on this side
is 106 in the same 1 second measurement. The two ratios behave differently with bandwidth, as
well. In a 0.01 second measurement (B = 50 Hz), the optical signal-to-noise
ratio is (0.01 × 106 ) = 100; it goes as B −1/2 . The electrical SNR is 1002 = 104 in 0.01
second; it goes as B −1 .
The author uses the electrical signal-to-noise ratio exclusively, because he finds he
makes many fewer blunders that way. It is much easier to convert the signal and shot
noise into electrical terms than to convert load resistor Johnson noise, multiplication
noise, recombination noise, and so forth into optical terms, because the one conversion
is physical and the other is not. Test equipment such as spectrum analyzers, lock-in amplifiers, and A/D boards all work in volts and amps. By using the electrical signal-to-noise
ratio, it is unnecessary to mentally extract square roots all the time in order to have good
comparisons between different measurements. One drawback to this approach is that if
your colleagues are used to quoting SNR in optical terms, they may feel that you’re
grandstanding by using the electrical SNR. The only defense against this charge is to
state clearly and often which ratio is being quoted.
Aside: Where Does the Power Go? Square-law detection is a surprisingly deep subject, leading as it does to coherent detection and large dynamic ranges, but there is a
simpler aspect that people puzzle over too. It is this: If I send an optical signal of power
PO into a photodiode, the electrical power PE = RL (R · PO )2 , which is less than the
optical power when R · PO < hν and greater than PO when R · PO > hν. What’s going
on? Am I really wasting almost all my optical power when I’m stuck with small signals?
We need to distinguish between energy and information. A solar cell, whose job is to
turn optical into electrical energy, wastes a lot of the incident energy: even if its quantum
efficiency is 1, its power efficiency is eVF /(hν). However, if shot noise dominates,
the counting statistics and hence the information content are identical before and after
detection, so no information is lost or gained, even though the SNR is squared. The root
of the confusion here is that we get so accustomed to identifying electrical SNR with
information carrying capacity (correctly, as we’ll see in Section 17.11.1) that we tend to
forget that it’s really the detection statistics that are fundamental for measurements. This
is another reason to stick with electrical SNR throughout.
3.3.2 Photons
The photon concept is central to the theory of quantum electrodynamics, the relativistic
quantum theory of electromagnetic fields. The interaction of light with matter is quantized; a very weak beam of light, which spreads out over a large area, nevertheless gets
absorbed one photon at a time, each one in a very small volume. This is easy to see
with an image intensifier. The mathematics of quantum field theory are the most accurate
way of predicting the interactions of light with matter that are known at present, and
that accuracy is very impressive in many cases. It is clear that this represents one of the
towering achievements of 20th century physics.
Now let’s come down to Earth and talk about instrument design. Photons are a very
important bookkeeping mechanism in calculating photocurrent shot noise, and they are
a useful aid in keeping straight the operations of acousto-optic cells. Beyond that, the
photon is the most misleading concept in electro-optics. The moment people start thinking
of photons flying around from place to place, they start finding mares’ nests and hens’
teeth. For our purposes, it’s only the interaction of light and detectors that’s quantized—in
every other way, light behaves as a wave, and obeys Maxwell’s equations to absurdly
high accuracy.†
Comparing detectors based on widely differing technologies, manufactured in different
sizes, and requiring different circuitry is often difficult. Various figures of merit have
been developed to make it easier; unfortunately, these frequently have only oblique
connections with the issues facing designers. A healthy scepticism should be maintained
about the value of a small difference in a figure of merit, and practical considerations
kept firmly in mind.
† Nobody
knows what a photon is, for one thing: see Willis E. Lamb, Jr., “Anti-photon,” Appl. Phys. B 60,
77– 84 (1995), and the supplementary issue of Optics & Photonics News from October 2003, which was devoted
to the subject. Each of the eminent contributors had a strong opinion about what a photon was, and no two of
them agreed.
3.4.1 Quantum Efficiency
A quantum detector is one in which absorbed photons directly create free carriers: a
photodiode, photoconductor, or photocathode. The quantum efficiency (QE) η of such
a detector is the ratio of the number of photodetection events to the number of photons incident, before any amplification takes place. (We usually get one carrier pair per
detection event.) It is the most basic measure of how good a particular quantum detector
is—the detector analogue of the transmittance of the optical system. It determines our
signal to shot noise ratio, which sets an upper bound on the SNR of the measurement.†
Detectors with gain can reduce the effects of circuit noise but are powerless to improve
the shot noise (and indeed contribute significant excess noise of their own). Close attention to quantum efficiency is an excellent defense against being led astray in the design
of a detector subsystem.
3.4.2 Responsivity
The responsivity of a detector is the ratio of the output current to the optical power input,
and is quoted in amps per watt or cute substitutes such as milliamps per milliwatt. It is
the most appropriate figure of merit to use when considering the biasing of the detector
and the gains and dynamic ranges of subsequent amplifier stages, and also in determining
how much optical power will be needed to overcome Johnson noise in the electronics.
It is easily seen to be given by
where η is the quantum efficiency and M is the multiplication gain (unity for photodiodes). For a detector of unit quantum efficiency and unit gain, the responsivity is the
reciprocal of the photon energy in electron volts; it ranges from 1A/W at 1.24 μm to
0.32 A/W at 400 nm in the violet. The responsivity of real detectors is typically a strong
function of wavelength; not only are there fewer photons per joule at short wavelengths,
but the detectors themselves are selective.
Responsivity is a reasonable basis for comparison of detectors without gain, such as
different types of photodiodes, but should be used with caution otherwise; for example,
a PMT with lower photocathode quantum efficiency but higher dynode gain may exhibit
a higher responsivity than a high QE device—perhaps yielding a better measurement in
very low light, where Johnson noise in the load resistor may dominate, but certainly a
worse one in brighter light, in which the amplified shot noise from the photocathode is
the major noise source.
The term responsivity is reused for a rather different parameter in photoconductors:
the change of terminal voltage produced by an incident optical signal, in V/W, with some
combination of bias current and bias resistor.
3.4.3 Noise-Equivalent Power (NEP)
The most important single parameter of a detected signal is its signal-to-noise ratio.
Because most commonly used optical detectors are very linear, their intrinsic noise is
† There
are very special situations (“squeezed states”) in which this has to be qualified, but they aren’t of much
practical use.
additive in nature: it remains constant regardless of the incident optical power (PMTs
and APDs are exceptions).
Detectors differ in their intrinsic gains and readout mechanisms, so that it is inappropriate to compare the noise performance of different detectors solely on the basis of their
output noise currents. It is common to convert the noise into a noise-equivalent (optical)
power, NEP. The NEP is defined as the optical signal power required to achieve a SNR
of 1 (0 dB) in the given bandwidth. The output noise is broadband, but not in general
flat, so that the SNR of your measurement depends on how fast your light is modulated.
Accordingly, the NEP is quoted in (optical) watts, at a certain wavelength λ, modulation
frequency f (see Section 13.3), and bandwidth B (usually 1 Hz)—NEP(λ, f ,B). NEP is
somewhat awkward to use, because a good detector has a low NEP; thus its reciprocal,
the detectivity, is often quoted instead.
In order to be able to compare the performance of detectors of different sizes, the
NEP is sometimes normalized to a detector area of 1 cm2 , and then is quoted in units of
W·cm−1 · Hz−1/2 (the somewhat peculiar unit is another example of optical vs. electrical
power units—this one is optical). (Thermal detectors do not in general exhibit this area
dependence, so their noise should not be normalized this way—see Section 3.10.7.)
Example 3.1: Silicon Photodiode NEP. Designing with visible and NIR photodiodes is
easy, at least near DC, because there are really only three noise sources to worry about:
Johnson noise from the load resistor, shot noise from leakage, and the shot noise of
signal and background. As a concrete example of a noise-equivalent power calculation,
consider a 3 mm diameter silicon PIN photodiode, Hamamatsu type S-1722, used with
820 nm diode laser illumination. This device has a quantum efficiency of about 60% at
820 nm, a room-temperature dark current of about 100 pA at 10 V of reverse bias, and
a shunt impedance of 1010 ohms. Its responsivity is
which is 0.39 A/W at 800 nm. The two contributors to the dark NEP (of the diode alone)
are the shot noise of the dark current and the Johnson noise of the shunt resistance RP ,
so that the total current noise is
iN2 =
4kB T
+ 2eiDC .
With these parameters, the second term dominates the first by a factor of nearly 20,
and the result is iN = 5.7 × 10−15 A/Hz1/2 . The shunt resistance is not in general linear,
so it is not safe to assume that RP = Vbias / ileak , although it’ll be the right order of
magnitude. To get from here to the NEP, we just multiply by the energy per photon of
hc/λ = 1.51 eV and divide by the quantum efficiency, since a notional signal photon
has only 0.6 probability of being detected, so that the optical power required to equal
the noise goes up by 1/0.6. The final result for this case is that the NEP is
≈ 1.6 × 10−14 W·Hz−1/2 .
With a load resistance of 500 k, this device will be shot noise limited with a current
of 2kT /(eRL ), or 100 nA, corresponding to an optical power of 260 nW. The shot noise
of the dark current is so small that it would take a 500 M load resistor to make it
dominate. This is very typical.
3.4.4 D∗
The most used figure of merit for infrared detectors is D ∗ , the specific detectivity, that
is, detectivity normalized to unit area. More specifically,
D =
NEP(f, BW)
This choice of normalization takes account of the area dependence of the noise, the
wavelength dependence of the responsivity, and the frequency dependence of the noise
and the sensitivity. (Consider a photoconductive detector with significant 1/f noise below
1 kHz, and a 3 dB cutoff at 1 MHz due to carrier lifetime.) This allows meaningful
comparisons of detector types that may not have exactly the same area, and accounts to
some degree for the behavior of detectors with gain, such as APDs and PMTs, whose
huge responsivities may mask the fact that their noise is multiplied at least as much as
their signals.
To choose a detector technology, we first prepare a photon budget, which estimates
how much light is available with how much noise, and see if we can meet our electrical
SNR target. Lots of common sense and sanity checking are required here, to make sure
all relevant noise sources are considered, including signal-dependent terms such as shot
and multiplication noise. The maximum NEP in the measurement bandwidth is equal
to the optical power divided by the square root of the target electrical SNR (optical vs.
electrical again). Once the detector area A has been chosen (normally the minimum size
required to collect all the signal photons), the minimum D ∗ required is given by
It must be emphasized that normalizing the response this way is intended to aid
comparisons of different technologies and does not necessarily help the designer choose
the right detector unit. In a background or Johnson noise limited system, if the signal beam
can be focused down onto a 100 μm detector, then choosing a 5 mm one will result in a
NEP needlessly multiplied by 50 for a given bandwidth, but D ∗ won’t change. Thermal
detectors’ NEP is controlled by their thermal mass and thermal conductivity, so they
often don’t scale with area this way, making D ∗ useless.
D ∗ is not very useful in the visible, because it is so high that the detector’s intrinsic
noise is seldom a limitation. Visible light measurements are typically limited by the
background and its noise, the shot noise of the signal, or the Johnson noise of the
detector load resistor. In a poorly designed front end, amplifier noise may dominate all
other sources, but this need not be. Chapter 18 discusses in detail how to optimize the
signal-to-noise ratio of the detector subsystem. In brief, because visible-light photodiodes
are such good current sources, the effects of Johnson noise can be reduced by increasing
the value of the load resistance RL . The signal power is proportional to RL , whereas the
Johnson noise power is constant, so the (electrical) SNR goes as RL (there are circuit
tricks to keep the bandwidth from vanishing in the process—see all of Chapter 18).
In the infrared, D ∗ appropriately includes the shot noise of the leakage current and
of the 300 K background, shunt resistance Johnson noise, and lattice G-R noise. D ∗
is sometimes quoted in such a way as to include the shot noise of the nonthermal
background (e.g., room lights), and even nonlinear effects such as multiplication noise in
APDs and PMTs. While this is a legitimate way of quoting the noise performance of a
measurement, it is unhelpful. Since it lumps all noise contributions into a single number,
it totally obscures the questions most important during design: Which is largest, by how
much, and what can be done about it? Furthermore, as discussed in the previous section,
nonoptical noise sources are important, and translating all noise into optical terms is
confusing and unphysical. Accordingly, all the D ∗ numbers quoted in this book are for
detectors operating in the dark.
Another problem with D ∗ is that an inefficient detector with very low dark current
may have a huge D ∗ but still be a very poor choice for actual use; for example, a 1
cm2 detector with a quantum efficiency of 10−6 , but with a dark current of one electron
per week, would have a D ∗ of about 1.6 × 1015 cm·Hz1/2 /W in the red, an apparently
stunning performance, but actually it’s useless. This illustration is somewhat whimsical,
but nonetheless illustrates an important point: NEP and D ∗ are not the whole story.
Example 3.2: Indium Arsenide. Consider the case of a 2 mm diameter InAs photodiode
(EG&G Judson type J12-5AP-R02M), operating over the wavelength range of 1.3–2.7
μm, at a temperature of 240 K and zero bias. This temperature was chosen to reduce
the otherwise large response nonuniformity and to improve Rsh . The device has a long
wavelength cutoff of about 3.5 μm. Its shunt resistance Rsh is 100 , and the Johnson
noise of the shunt resistance dominates the noise. Its quantum efficiency η is about 0.6
across this wavelength range, set mainly by the front surface reflection from the detector,
which lacks an AR coating. The mean square noise current in a bandwidth B is given by
iN2 =
4kT B
where the thermal background has been neglected. The Johnson noise current is 11.5
pA/Hz1/2 , equal to the shot noise of a photocurrent of 400 μA (about 0.5 mW of optical
power, far more than this detector would ever see). D ∗ is given by
D =
hν iN2 which at λ = 2 μm is about 1.5 × 1010 cm·Hz1/2 /W.
Infrared detection is where the concept of D ∗ is really crucial, because here the
intrinsic noise of the detector is far from negligible. To verify that the detector is Johnson
noise limited, we will estimate the shot noise of the background.
We assume that the planar detector is surrounded by a hemispherical black body at a
temperature T , and that background photons whose wavelength is shorter than 3.5 μm are
detected with η = 0.6, while all others are lost. Since hν kT , the thermal photons are
uncorrelated, and the thermal photocurrent exhibits full shot noise (see Section 3.10.2).
The photon flux per unit area, between ν and ν + dν, from this hemisphere is Mqν dν,
2π ν 2
Mqν (T ) = 2 hν/kT
c (e
− 1)
Since the exponential in the denominator exceeds 106 over the wavelength region
of interest, the −1 can be neglected, and the result integrated analytically, giving the
approximate photocurrent due to the thermal background,
2π ηe kT 3
dx x 2 e−x
2π ηe kT 3 2
−x ,
where x = hν0 /kT and ν0 = hc/3.5 μm. After converting to current noise spectral density (see Section 3.10.2), this corresponds to a noise current iN in 1 Hz of 0.6 pA,
well below the Johnson noise. The limiting D ∗ if only this noise contributed would be
2.7 × 1011 cm·Hz1/2 /W.
Neglecting the 1 in the denominator causes less than a 1% error in the integrand
whenever hν/kT > ln(100), which applies for λ < 10.4 μm at 300 K, or λ < 40.6 μm
at 77 K.
3.4.5 Capacitance
The product of the load resistance and the detector capacitance usually dominates the high
frequency performance of the detector system (transit time and carrier lifetime effects
may also be important in some instances, especially in photoconductors). Bandwidth is
often the limiting factor in how large the load resistor can be, so capacitance partly
determines the sensitivity of the detector subsystem. The capacitance per unit area of the
detector is thus another important figure of merit.
Reverse Bias. If the detector type allows use of reverse bias, this can reduce the
capacitance by as much as 7–10 times in some devices. High speed applications, which
require small load resistors to achieve small RC products, may benefit from detectors
with intrinsic gain; their gain allows them to overcome the high Johnson noise current
of the load. See Sections 18.4.4 and 18.5 for other ways to sidestep the problem.
In CCD and CID detectors, which are inherently integrating, capacitance per pixel
should be large, to increase the full well capacity. The maximum signal level depends
on how many electrons can be stored, so increasing the capacitance makes the statistical
fluctuations and the readout noise a smaller fraction of the maximum signal level.
3.4.6 Spectral Response
A detector with a high peak quantum efficiency isn’t much use if it is insensitive to some
wavelength you care about. On the other hand, low quantum efficiency can be very helpful
sometimes, as in solar blind UV photomultipliers and in GaP or other direct bandgap
detectors, whose quantum efficiency drops extremely rapidly at wavelengths longer than
cutoff. Spectral flatness means different things in different situations; a bolometer tends to
be flat in terms of resistance change per watt, irrespective of the photon energy, whereas
a photodiode is flat in terms of electrons per photon. Thermal band broadening gives
detectors large values of ∂η/∂T near their long-λ cutoff.
3.4.7 Spatial Uniformity
Detectors are not perfectly uniform in sensitivity. Silicon photodiodes typically have
1–10% variation in η across their active areas, due to coating nonuniformity and the
sheet resistance of the very thin top layer. Large-area diodes are usually worse, as one
might expect. Fringes due to windows and nonuniform passivation layers can make this
even worse. Beyond about 1 μm, you can even get etalon fringes from the back surface
of the silicon (especially in CCDs, which have no spatial averaging). Nonuniformity can
be a problem in situations such as interferometric detectors, where it can degrade the
angular selectivity of the measurement by preventing spatial interference fringes from
averaging to zero as they should, in position sensing applications, and in any measurement
calling for absolute radiometric accuracy. Some detectors are much better than others;
specifications may be available from the manufacturer, but there’s no substitute for your
own measurements. For the highest accuracy applications, homogenizing devices such
as integrating spheres provide a good solution; they are of course better for white light
sources than for lasers, due to speckle (see Section 5.7.11). With lasers, it’s usually best
to put the photodiode at the pupil, because small angular shifts generally cause much
less sensitivity variation.
3.5.1 Photodiodes and Their Relatives
Photodiodes are the most popular detectors for optical instruments. This is no accident;
they come in a huge variety of types, sizes, and characteristics, their performance is
excellent, and their cost is usually low.
A shallow PN junction is formed in a semiconductor. Light entering the chip is
absorbed, creating electron–hole pairs. Provided the absorption occurs in or near the
depletion region of the junction, the large electric field there (produced by the junction
itself and by any externally applied bias) separates the carriers rapidly. Thus they do not
recombine appreciably until the charge accumulation is sufficient to overcome the field
in the junction, either through forward conduction (as in an open-circuited solar cell) or
through Debye shielding in areas of high light intensity.
Ordinary PN junction devices are useful in low speed applications but tend to have
very large capacitances, especially at zero bias. PIN diodes have a thick layer of very
low-doped (intrinsic) semiconductor between the electrodes, which increases the depletion layer thickness, and so reduces the capacitance enormously, to the point where carrier
transit time can become the speed limitation. Moderate reverse bias can cause the whole
thickness (500 μm or so) of the device to be depleted.
Photodiodes tend to have constant, high quantum efficiency over broad bands;
AR-coated silicon photodiodes easily reach 90% η over the visible, and even with
no coating, often reach 60%. They roll off at long wavelengths as the semiconductor
becomes transparent, so that many of the photons pass completely through the junction
region, and at short wavelengths as light is absorbed too shallowly for the carriers to
even reach the junction before recombining. So-called blue- and UV-enhanced diodes
use very thin top layers to minimize this, at some cost in linearity. Another approach
is to use a Schottky barrier instead of a PN junction; a nearly transparent top electrode
of thin metal or transparent conductor such as indium tin oxide (ITO) can substitute for
the top semiconductor layer.
The quantized nature of the light-to-electricity conversion ensures that photodiodes
are extremely linear, which is one of their most important characteristics. Furthermore,
photodiodes made of silicon are extremely good current sources when operated at zero
or reverse bias; as we saw in Example 3.1, their intrinsic noise arises almost entirely
from the shot noise of their leakage current, which is itself low. The low intrinsic noise
and leakage are largely traceable to the fact that silicon has a relatively wide bandgap,
nearly 50 times the thermal voltage kT/e at room temperature, so that thermally generated
carriers and thermal radiation are not significant contributors to the noise in most cases.
Leakage can be reduced even further by cooling, although that is rarely necessary in
Nonlinearity in reverse-biased photodiodes comes mainly from excessive current
density. High photocurrent densities (>1 mA for a uniformly illuminated 3 mm diameter
unit, for example) cause lateral voltage drops and intensity-dependent local increases
in the carrier density. Both effects reduce the electric field in the junction, leading
to enhanced recombination and slow drift. In other words, if you hit it too hard, it’ll
go nonlinear and slow down on you. A very small illuminated spot will exhibit this
effect at surprisingly low photocurrents—microamps, even CW. See Section 3.5.4
for how much worse this gets with pulsed lasers. Don’t focus the beam down on the
Photodiodes used at zero bias are considerably less linear than reverse-biased ones,
because lateral voltage drops in the thin epitaxial layer cause the diode to be locally
forward biased, so that small forward currents flow and partly short out the photocurrent.
Radiometrists resort to 20 mm diameter cells for photocurrents of 100 or 200 μA, whereas
in reverse bias, the same detector is probably good to 10 mA. It takes a pretty big lateral
voltage drop to overcome 5–20 V of reverse bias.
Silicon is the most common material used for photodiodes, both because it is a very
good material and because silicon processing is a mature field. In general, silicon is best
from 0.2 to 1 μm, and InGaAs from 1 to 1.7 μm (new InGaAs devices reach 2.6 μm).
Germanium is also widely used for detectors out to 1.8 μm, but it is being eclipsed by
InGaAs, which has been greatly developed for fiber optic communications at 1.3 and
1.55 μm. Beyond there, infrared devices become more exotic and difficult to use. High
bandgap semiconductors such as GaP and GaN photodiodes ought to be the best kind
in the ultraviolet, but don’t seem to be as good as Si in practice. The one exception is
silicon carbide, which has the advantage of being nearly totally solar blind; SiC devices
have negligible leakage, work from 200 to 400 nm, and can reach η = 0.72 at 270 nm,
which is remarkably good.
3.5.2 Shunt Resistance
Low light applications, where you want to use big diodes and huge feedback resistors
(100 M to 10 G) are one place where cooling silicon photodiodes helps, for an
interesting reason. To reduce leakage, you run the photodiode at exactly zero bias. Here’s
the subtlety: the leakage goes to 0, all right, but the shunt resistance deteriorates very
badly at zero bias, because of the diode equation. In Section 14.6.1, we’ll see that a
diode’s zero-bias shunt resistance r0 decreases dramatically at high temperatures and low
bandgaps, which is why IR detectors are worse than visible ones, and why cooling them
helps. That resistance is a real resistance with real Johnson noise current, and furthermore
if it gets too small, it forces the transimpedance amplifier to run at a huge noise gain
(see Section 13.1), which multiplies the amplifier’s noise voltage, offset, and drift—bad
news for measurement accuracy. Fortunately, we’re very rarely in this situation.
Aside: The Zero-Bias Heresy. It is a sad fact that almost all the photodetector circuits
published in the history of the world have been designed by circuits people who weren’t
also optics people. The author has no statistics to back this up, but he feels that it must
be so, because almost all show the photodiode being operated at zero bias, often with
great care being exerted to make the bias exactly zero. This will reduce the dark current
through the photodiode, all right, but that isn’t the problem we need to solve (see Section
18.2.1). Photodiode dark current is almost never the limiting factor in a visible or near-IR
measurement. Fixing this nonproblem costs you a factor of 5–7× in bandwidth (or the
same factor in high frequency SNR), as well as destroying the large-signal linearity,
which makes it an expensive blunder. Don’t do it.
3.5.3 Speed
For the highest speed applications, such as 40 Gb/s optical Ethernet, the speed of photodiodes becomes a serious issue. There are two effects that limit speed: transit time and
RC delays. The transit time is how long it takes a carrier to reach the terminals. It can
be reduced by making the diode thin, and maximizing the field in the junction by using
as high a bias as possible and making the diffusion-dominated p and n layers extremely
thin. This of course tends to increase the capacitance, so fast photodiodes tend to be very
small. Another problem is that the QE suffers, because at the 850 nm wavelength used in
fiber LANs, the absorption depth in the silicon is several microns; thus narrow-gap III–V
materials such as InP are commonly used in fast NIR applications, which costs extra.
One approach to fixing this is to use transverse devices, where a very small detector is
coupled to a guided wave—the light travels perpendicular to the current flow, so the
light path can be long and the current path short.
Aside: Plastic Packages. It appears that a few types of plastic-packaged photodiodes
are susceptible to QE changes due to triboelectric charging of their plastic packages. The
effect usually goes away with a bit of surface leakage (e.g., by breathing on the package)
or slight heating. If your accuracy requirements are high, you may want to test for this,
or stick with metal-can devices.
3.5.4 Photodiodes and Pulses
Now that femtosecond lasers are so popular, it’s worth giving a bit of thought to the
plight of the poor photodiode used to detect the pulses. Of course, the photodiode doesn’t
have the glamour job—or the easy one either. Consider a 100 fs laser with a 100 kHz
pulse repetition rate, and an average detected power of 2.5 mW. A 100 fs pulse of 250
nJ (assuming it’s a Ti:sapphire around 800 nm) produces a peak photocurrent of 160,000
A, and even though the diode can’t possibly respond that rapidly, one wouldn’t expect
it to be terribly linear under such treatment, as indeed it isn’t. One very good solution
is to use a small (25 mm) integrating sphere on the diode (see Section 5.7.7). If it has
a reflectivity of 97%, then incoming photons will typically bounce around over about a
meter’s distance before being absorbed, which will broaden the pulse to 2.5 nanoseconds
or so. This is a better match to the capabilities of photodiodes. If your detector’s diameter
is larger than the input aperture’s, and larger than about 1/8 of the sphere’s diameter,
you’ll collect most of the photons. Make sure the first bounce of the beam in the sphere
isn’t in the field of view of the detector, or you may still have nonlinearity problems.
(Don’t ablate the paint.)
3.5.5 Phototransistors
A phototransistor looks like a silicon bipolar transistor with a photodiode connected
between base and collector. They have gain and are widely available at low cost, which
about exhausts their virtues. Although their gain makes smallish photocurrents conveniently large, they are slow, leaky, nonlinear, and very noisy. They come only in very
small sizes. Photodarlingtons have an additional gain stage and are even slower. These
devices are classical examples of why amplification is not always useful; except for the
very lowest performance applications, avoid them at all costs.
3.5.6 Prepackaged Combinations of Photodiodes
with Amplifiers and Digitizers
Several detector and op amp manufacturers build packaged combinations of photodiodes
with transimpedance amplifiers, current to frequency converters, “current amplifiers”
(Hamamatsu), or even – type A/D converters. These devices are intended for people
who are not comfortable designing detector circuits, and you pay in cost and performance for the convenience of using them despite frequent claims to the contrary by their
manufacturers. (Their data sheets very seldom mention noise performance, which is a
very bad sign.) The performance problems mostly arise from their use of fixed value
internal feedback resistors (often 1 M), and failure to put in a cascode transistor to
reduce junction capacitance effects or a tee network to decrease the second-stage noise
contribution (see Section 18.4.12 for more details). They can be useful where there is
lots of light available and speed is not a limitation, or where skilled engineering time
is very limited; people commit much worse blunders than these all the time, and using
these devices at least limits the damage. These devices may improve in quality and
cost effectiveness in the future; at present, however, designers who are willing to do
a bit more work can usually achieve much better performance at a considerably lower
cost. The exception to this rule is some packaged APD/GaAs FET amplifier combinations, which provide a level of speed and noise performance that is not trivial to
3.5.7 Split Detectors
It is often handy to be able to measure the position of a beam on a detector. If image
sensing is not absolutely required, the best way to do this is by using split detectors (bi-cells and quadrant cells) or lateral effect devices. A split detector is just that:
two or more detectors spaced very closely, often arranged as sectors of a circular
disc. Each detector has at least one lead of its own (often all the anodes or all the
cathodes are common, to save leads and ease fabrication). By combining the separate
photocurrents in various ways, such as subtracting them or computing their ratio, small
changes in the beam position can be measured very accurately (often to 0.1 nm or better
near null, at least at AC). Because their geometry is defined lithographically, they are
10X Expanded
Figure 3.2. Split photodiode, showing the dependence of output signal on beam diameter and
very stable, and because each segment is operated as an independent photodiode, split
detectors have the same virtues of linearity, low noise, and high efficiency we expect
from single photodiodes. Split detectors are currently available in Si, Ge, InSb, and
The main problems with split detectors are that some of the light is lost by falling
into the kerf between the cells, and that the position sensitivity depends reciprocally on
the beam diameter (See Figure 3.2). This is physically obvious, since a full-scale change
results as the beam moves from being entirely on one side to entirely on the other. If the
position information is obtained by subtraction, the sensitivity will depend on the optical
power as well. Using analog division instead of subtraction helps.
Another approach is to use the cells in the open-circuit photovoltaic mode, where
the photocurrent is allowed to forward bias the cell (as in a solar cell), and subtract
the open-circuit voltages. These voltages depend logarithmically on the photocurrent, so
when they are subtracted, a ratiometric measurement results. The circuit responds slowly,
but this trick is good for alignment sensors (see Section 12.9.11).
3.5.8 Lateral Effect Cells
Another position sensing device, which avoids the problems of split detectors, is the
lateral effect cell. Both 1 D and 2 D devices are available, in Si, Ge, InSb, and InAs.
They are single large photodiodes that use a thin, highly resistive layer for the top
electrode of the cell, each end of which has its own lead. The output leads are connected
to low impedance points (such as op amp summing junctions). The light beam appears
as a current source located somewhere on the surface, so that the photocurrent divides
itself between the output pins in proportion to the conductance of each path. Because the
conductance depends on the distance from the light beam to the output pin, the ratio of
the currents in each pin gives the location of the light source. Because the cell surface
is uniform, and the current division linear (i.e., two or more light sources shining on the
same cell produce the sum of the outputs each would produce by itself), the position
signal is much less sensitive to beam diameter, until the edges of the beam approach
the edges of the cell or the current density or lateral voltage drops get so high that
response nonlinearity sets in. Two-dimensional lateral effect cells come in two varieties:
pincushion, where the readouts are in the corners, and linear, where the anode is split
in one axis and the cathode in the other, so the anode sheet gives x and the cathode y.
The linear ones give position information that is accurate to 1% or so, with the best ones
achieving 0.1% over 80% of their aperture. Lateral effect cells tend to be much slower
than split detectors, since they have a fairly large series resistance (1–200 k), and also
more difficult to use. The difficulty is that, because of the series resistance, even moderate
photocurrents can cause lateral voltage drops large enough to locally forward bias the
junction, leading to serious nonlinearity. The simplest way to avoid this problem is to
reverse bias the junction. There are also some inferior devices where the top electrode
has four long contacts, along the sides of a square, and the bottom has only one. These
superficially resemble linear lateral effect devices but differ in one crucial respect: the two
axes compete for current. If the illuminated spot is near the −x electrode, that electrode
will suck up almost all the photocurrent, leaving little to drive the y axis, which therefore
is rendered nearly indeterminate by noise and drifts.
The low-light performance of lateral effect cells is relatively poor, because the resistance of the silicon layer appears in parallel with the outputs. It thus contributes a large
amount of Johnson noise current, with a noise source connected between the two outputs of each axis. This is an obnoxious problem, because although these noise currents of
course sum to zero, they must be subtracted or divided instead in order to make a position
measurement; this causes them to augment instead of canceling. Unless the photocurrent
is sufficient to drop 2kT /e (50 mV at room temperature) across the cell resistance, the
measurement will be Johnson noise limited. It does not help much to increase the load
resistance, since the differential signal will always be loaded by the cell resistance, so
that the effective load for differential measurements will be the parallel combination
of the cell resistance and the external load. In general the lateral effect cell is a fairly
noisy, kilohertz-range device for sensing beam positions, but it is relatively immune to
beam-size effects that can plague split-cell measurements.
3.5.9 Position Sensing Detector Pathologies
Split detectors are more sensitive to etalon fringes than single element ones. A smooth
beam profile on a single element detector does a much better job of preserving the
orthogonality of misaligned beams, whose fringe patterns average to zero over the surface.
In a split cell, there’s a great big cliff in the middle of the fringe pattern, so a small phase
shift with temperature can cause an overwhelming amount of drift—all the more so since
the desired signal is a small difference between two large currents. Position-sensitive
detectors are also vulnerable to nonorthogonalities of the measurement directions caused
by sensitivity variations across the detector.
3.5.10 Other Position Sensing Detectors
For position sensing applications, lower cost solutions such as a few discrete diodes plus
a shadow mask should not be overlooked (Figure 3.3). For many purposes, they can be
very useful and are dramatically cheaper than the combination of a lens plus a position
On-Axis View
W/2 + d tan φ
W/2 − d tan φ
Oblique View
Figure 3.3. Three-dimensional shadow mask detector: (a) schematic and (b) drawing of actual
sensing diode. When using mid- and far-infrared detectors, where split detectors and
lateral effect cells are harder to get, this may be the only choice feasible.
Example 3.3: Two Solar Cells and a Mask Versus a Bi-cell. As an example of a cheap
position sensing detector, consider two rectangular solar cells of width W and height h,
plus a thin mask at a height d, covering the right half of the left cell and the left half of
the right cell, as shown in Figure 3.3. Uniform illumination of intensity I comes in at
an incidence angle φ, illuminating W/2 + d tan φ of cell 1, and W/2 − d tan φ of cell
2. Obviously, if the angle is too large, one diode will be completely covered, and the
other one completely illuminated. For angles smaller than this, if the two photocurrents
are subtracted, the result is
i− = 2I R dh sin φ,
where I is the power per unit area, measured in a plane normal to the incoming beam.
If the difference is normalized by dividing by the sum of the two currents, the result is
inorm =
2d tan φ
which is plotted in Figure 3.4, together with the angular uncertainty due to shot noise
alone. You can also divide instead of subtracting, which gives
W + 2d tan φ
W − 2d tan φ
It is apparent that the sensitivity of the shadow mask/multiple element detector can
be adjusted over a wide range by changing d, a desirable property. If the light comes
Normalized Output
RMS Angular Uncertainty (nanoradians)
Angle of incidence (rad)
Figure 3.4. Output and RMS angular error of a shadow mask sensor, with 5 mm square detectors
and a 3 mm mask spacing. Popt = 10 mW/cm2 .
from a wide spectrum of angles, it is sensible to integrate Eq. (3.11) over angle. If the
differential characteristic is not required, a single photodiode with a shadow mask can
be used instead, at an even greater cost saving.
If two axes are required, two such single-ended detectors can be used, with the edges
of their shadow masks mutually perpendicular. If the differential character is important,
then three or four detectors can be combined. If four are used, the 1D expressions are
approximately correct, except for the additional effect of obliquity in the other axis. For
the three-detector case, as shown in Figure 3.4, the X and Y directional signals are
given by
i3 − i1
i1 + i2 + i3
Y =
i3 + i1 − 2i2
i1 + i2 + i3
respectively. This saves one photodiode, while preserving the differential property: light
coming in exactly on axis gives X = Y = 0; the third dimension comes from intensity or
(for a nearby point source) parallax, in which case the simple expressions given have to
be modified.The noise performance of shadow mask detectors, like that of quadrant cells,
is excellent. In a 1 Hz bandwidth, the shot noise limited RMS angular uncertainty is
where i0 is the response of one unobscured detector to the incident optical intensity. A
HeNe laser beam of 1 mW per detector will give i0 ≈ 300 μA. Assuming a mask spacing
of half the detector width, which results in a full scale range of ±π /4 radians, the 1 Hz
RMS angular uncertainty is around 50 nanoradians, or 0.01 arc second.
The accuracy attainable by this technique is limited mainly by nonideal beam characteristics, such as amplitude nonuniformity, multiple scattering between mask and detectors, and diffraction of the beam at the edge of the mask. †
3.5.11 Infrared Photodiodes
Photoelectric detectors for the mid- and far-infrared region also exist, based on compound
semiconductors such as indium antimonide (InSb), indium arsenide (InAs), platinum
silicide (PtSi), and mercury cadmium telluride (HgCdTe, sometimes written MCT, and
pronounced “mercadtell”). Their characteristics are not as good as those of silicon and
InGaAs photodiodes, and they are much more expensive. Compound semiconductor
materials are more difficult to process, because small errors of stoichiometry appear
as huge dopant densities, and because their small markets dictate that their processing is
much less developed than that for silicon (GaAs and InGaAs are the exceptions). In the
mid-IR, the standard detector at present is HgCdTe.
Of the near-IR devices, Ge was king for a long time. It has the advantage of being an
element, so the stoichiometry is not a problem, but its poor leakage performance means
that it requires cooling in situations where compound semiconductor devices such as
InGaAs do not. In the past, InGaAs detectors were too expensive for widespread use,
but now that demand for detectors in fiber optics applications has funded their further
development, costs have fallen to a few dollars for a usable InGaAs detector at 1.5 μm.
Mid- and long-wavelength detectors, such as HgCdTe, PbS, PbSe, PtSi, and InSb,
frequently require cryogenic cooling, which is inconvenient and expensive ($2500 for a
single cooled detector element). InAs, with response out to 3.5 μm, is often useful with
only thermoelectric (−20 to −60 ◦ C) cooling.
Unlike silicon detectors, the shunt resistance of an infrared diode can be very low,
as in Example 3.2; the Johnson noise of this low resistance is the dominant additive
noise source in IR devices that are not cryogenically cooled. Because of the low shunt
resistance, carriers generated far from the electrodes are sometimes lost, leading to very
large spatial nonuniformities of response (Figure 3.5 shows a 4× changes in η with
3.5.12 Quantum Well Infrared Photodiodes
A promising new mid- and far-IR detector technology is based on quantum wells. These
are basically the famous square box potential well, whose energy levels can be tailored by
adjusting the depth and width of the well. Accordingly, the bandgap of the detector can
be as narrow as desired, without having to deal with the poor properties of narrow-gap
semiconductors. Thermionic emission is still a problem with these devices, so they must
be cooled, but their D ∗ can be as high as 1012 at 9 μm, a remarkable performance.
They can also be made in arrays. We’ll see more of these in the future; the technology
is driven by space-borne and military sensors.
† Do
the diffraction ripples at the leading and trailing edges of the shadow cause ripples in the XYZ outputs
versus angular position? Why or why not?
Light Spot
−40 C
of Spot
22 C
−1.0 −0.5 0.0 0.5 1.0mm
Active Area Position
−1.0 −0.5
Active Area Position (mm)
Figure 3.5. Response nonuniformity and shunt resistance of a commercial InAs photodiode
(EG&G Judson J12-5AP-R02M) versus temperature. High lateral and low shunt resistance leads
to poor uniformity at 300 K. (Courtesy of EG&G Judson Inc.)
3.6.1 Photomultipliers
At wavelengths shorter than 2 μm, thermally generated photons are very rare, so in principle a detector should be limited only by signal photon statistics (shot noise). However,
Johnson noise in load resistors, amplifier input noise, and other circuit noise typically
dominates shot noise at low photocurrents. That’s where photomultiplier tubes (PMTs)
come in. PMTs use electron multiplication, which we will see more of later, to amplify
the generated photoelectrons before they become mixed with the noise currents of the
electronics. Electron multiplication occurs when an electron hitting a surface causes it
to emit more than one secondary electron. The way this works in a PMT is as follows:
a photocathode is exposed to incoming light. By the photoelectric effect, some fraction
of the incoming photons cause the photocathode to emit photoelectrons from its surface.
These photoelectrons are electrostatically accelerated and focused onto another electrode,
the first dynode, which is coated with a material selected for high secondary electron
yield. The secondaries from the first dynode are accelerated and focused onto the second
dynode, and so on for 5 to 14 stages, before finally being collected by the anode. By
the end, each photoelectron has become a pulse 300 ps to several nanoseconds long,
containing perhaps 105 to 107 electrons, which is easily detectable above the Johnson
Aside: Electron Affinity. When a photoelectron is generated inside a photocathode or
dynode, it must escape into the vacuum before it’s any use to us, and to do so it has
to overcome a potential barrier. The height of this barrier is a material property called
the work function W . The photoelectron is continually losing energy through collisions,
and the requirement that it arrive at the surface with at least W limits the photoelectron
yield. The work function is made up of two parts, the classical image potential, which
is the work required to separate the electron from its image charge in the metal surface,
and the electron affinity.† The image potential is always positive, but negative electron
affinity (NEA) materials exist, and their lower work function leads to improved electron
yield, at the expense of slower response.
3.6.2 PMT Circuit Considerations
This elegant scheme requires a certain amount of circuit support: a power supply of from
−500 to −2000 volts, with taps for the photocathode and all the dynodes. This is usually
provided by a powerful (1–2 W) high voltage supply and a multitap voltage divider
string made of high value (100–300 k) resistors. The high supply power is almost all
dissipated in the resistors, which sets a practical lower limit on their values. The exact
bias voltages are not terribly critical (unlike APDs, for example).
Because of the high electron gain, the last few dynodes need a fair amount of bias
current, which is supplied only poorly by the resistor string. Bypass capacitors or Zener
diodes on the last couple of stages are a help, but nevertheless the nonlinearity of most
photomultiplier systems at high light intensity is dominated by voltage drops in the
dynode bias string. It is often convenient to change the supply voltage to vary the gain,
and having zeners on in the string makes this hard, because the voltage distribution on
the dynodes will change as the supply voltage is altered. Since the linearity depends
significantly on this distribution, zeners reduce the flexibility of the system.
The Cockroft–Walton (C-W) generator is a many-section voltage multiplier based
on a diode–capacitor ladder structure. An N -stage C-W generator produces N nearly
equally spaced voltage taps, making it a natural fit for biasing PMT dynodes. Some
photomultiplier modules now include C-W multipliers rather than voltage dividers.‡ C-Ws
have the useful property that the lower taps have much lower impedance than the high
voltage end, a good match for the needs of PMTs. Besides compactness and low power,
this leads to enormously improved linearity in Cockroft–Walton devices, so that the
resistor scheme is obsolescent for most purposes. Watch out for the modules with preamps
built in—they all cut off at 20 kHz to avoid the power supply ripple.
The most appropriate uses for resistor biasing nowadays are in applications where
linearity is not vital but power supply ripple is extremely objectionable, or in devices
where the photocathode cannot be run at a large negative bias, and DC coupling is not
needed. Running the photocathode near ground means that the anode and last dynodes
must be run at a high positive voltage, which significantly reduces the advantages of the
Cockroft–Walton technique, since the high current electrodes are at the high impedance
end of the supply.
Applications requiring a grounded photocathode include scintillation detectors, where
the scintillator is an ionic crystal such as NaI, which must be operated at ground for
safety reasons. If the photocathode is not grounded, the large potential gradient across
the glass envelope of the PMT in an end-on tube can lead to electrophoretic motion
of ions toward the photocathode, which destroys the tube by photocathode corrosion.
Side-looking PMTs with opaque photocathodes (electrons are emitted from the same
side the light hits) are immune to this since the photocathode doesn’t touch the envelope.
† For
insulators the electron affinity may be less than this, because an added electron goes into the conduction
band, whereas a photoelectron comes from the valence band.
‡ This idea took awhile to catch on—it was first published in 1960 (R. P. Rufer, Battery powered converter
runs multiplier phototube. Electronics 33(28), 51 (1960)).
The noise of PMTs is dominated by thermionic emission from the photocathode,
leading to dark current spikes, and by variations in the dynode gain (especially at the first
dynode), which leads to multiplicative noise. The average number of secondary electrons
is only 5 or so, although the NEA material GaP(Cs) can achieve 20–50. As we’d expect
from a low yield random process,
the size of the pulse from a single photon event
varies within a 1σ range of ± 5/5 ≈ ±45%, although with larger photocurrents these
error bounds are greatly reduced through averaging. PMTs require very high insulation
resistances between their internal elements, but use volatile metals such as cesium and
antimony, which are prone to migrate at high temperatures; thus PMTs cannot be baked
out as thoroughly as most vacuum systems. There is always some residual gas (perhaps
10−6 torr, mainly water), which leads to artifacts known as ion events. An ion event
takes place when a positive ion is generated in the residual gas inside the PMT near
the photocathode. Because of its positive charge, it accelerates and hits the photocathode
hard enough to knock loose many electrons. This large pulse is amplified through the
dynode chain, producing a very large current pulse at the anode. Ion events are even
more common in old or poor quality tubes, or those operated near radioactive sources.
Related to ion events are afterpulses, which are secondary pulses that sometimes occur,
usually 20–100 ns after a photon is detected. These arise from photoemission inside the
PMT due to electron impact on a surface, or a generated ion that hits a dynode instead
of the photocathode. Afterpulses are a problem with fast photon counting setups; putting
in 100 ns of dead time after each photocount will more or less cure the problem. This
dead time reduces the integration period at higher light levels, so it has to be corrected
for (see Section 7.3.1).
PMTs get old and wear out, at a rate largely controlled by the anode current. They
exhibit strong (several percent) effects due to warmup, intensity and voltage hysteresis,
and other historical causes. Under good conditions, over its life a PMT can produce an
integrated anode charge of a few hundred coulombs per square centimeter of photocathode. In long-term dark storage, or in low current applications, the lifetime approaches a
constant value of a few years, limited by helium diffusion through the tube envelope and
by surface changes inside. Tubes with soda lime glass envelopes are the most vulnerable;
high partial pressures of helium can kill one of those in an hour.
PMTs can be quite fast; ordinary ones have rise and fall times of a few tens of
nanoseconds and the fastest are around 250 ps, with timing uncertainties of 40 ps or
thereabouts. Fall times are rather slower than rise times. In a high gain PMT, a single
photon can produce 107 output electrons, which in an 8 ns wide pulse amounts to 200
μA. Such a large current can produce 10 mV signals even across a 50 load, making
it feasible to count individual photons at high speed. The pulses are repeatable enough
in height that simple thresholding can easily distinguish a pulse corresponding to a
single detected photon from noise, and from multiphoton or ion events. Such photon
counting is an important application of PMTs, and photon rates of up to 30 MHz can
be accommodated with good accuracy with commercial gear (200 MHz with special
hardware). The combination of high gain and low dark count rate makes PMTs uniquely
suited to photon counting.
In photon counting, the photoelectrons arrive more or less one at a time. PMTs can
also be used in analog mode, where the average anode current is measured in much the
same way as with a photodiode. Of the two, photon counting is better behaved. This is
mainly because the gain of a PMT depends on everything you can think of. In the analog
mode, this directly affects the signal level, whereas in photon counting it merely changes
the pulse height, without making a pulse significantly more or less likely to be detected.
A photon counting PMT has a sensitivity similar to that of a cooled CCD. It has
no spatial resolution, but on the other hand you don’t have to wait for the end of the
integration time to see the data.
The amplification mechanism of PMTs is very clever and effective, and their optical
performance has recently improved by a factor of nearly 2. Conventional bialkali PMTs
tend to be narrowband, peaking strongly in the violet. PMTs are available in large sizes, up
to about 300 mm in stock devices, and up to 600 mm in custom devices. The combination
of huge area and low dark count rates is unique to PMTs.
Choosing a Photocathode Material. The choice of photocathode material depends
on the application; they are typically made of a mixture of antimony with alkali metals
or of a direct bandgap compound semiconductor with traces of cesium. Infrared units are
available too. The classic Ag-O-Cu S-1 material reaches 1.1 μm, but has extremely low
quantum efficiency (0.01–1%) and has a lot of dark current, often requiring cryogenic
NEA photocathodes work further into the IR than alkali metal ones, but have a much
slower response (1 ns rather than < 50 ps), making them less suitable for exotic applications such as streak cameras. The best InGaAs NEA photocathodes reach 1.7 μm, with
quantum efficiencies of around 20%, and the best enhanced bialkali and GaAsP ones
achieve about 45–50% peak quantum efficiency in the blue and near-UV.
How to Kill a PMT. Photomultipliers can be destroyed by exposure to daylight when
powered, and their dark current can be dramatically increased by such exposure even
when unpowered; several days of powered operation may be required to bring it back to
Making Accurate Measurements. The photon counting mode is pretty trouble-free,
but it is not trivial to make accurate analog measurements of absolute light intensity.
For instance, the total gain typically varies ±10% with position on the photocathode.
Accuracies of around 1% result from ordinary care, but by taking sufficient pains to
control or calibrate small effects, accuracies and repeatabilities of 0.1% can be achieved.
The gain of a PMT is affected by many internal and external effects, including shock,
age, dynode fatigue due to high current operation, stray fields, and even changes in
geometry caused by gravity or acceleration. The efficiency and speed with which the first
dynode collects the primary photoelectrons depends on position, leading to variations
of ∼10–20% in sensitivity and a few percent in delay. Due to Fresnel reflection at
the photocathode surface, the quantum efficiency also depends on incidence angle and
(off-normal incidence) on polarization. Besides these parabolic-looking variations, the
sensitivity of some PC types shows ripples with position, which get much worse at
longer wavelengths, so it’s often helpful to put diffusers in front of PMTs.
The gain of some tubes can be reduced 40% by a 1 gauss magnetic field (the Earth’s
field is about 0.5 gauss), although others can work in 1 kilogauss. Mu-metal shields
improve this greatly, but mu-metal’s shielding properties are easily destroyed by shock
or bending. The lower energy electrons between the photocathode and first dynode are
most susceptible to magnetic steering, so mount the tube so the shield sticks out about
one diameter in front of the photocathode, to allow the fringing fields space to die off.
Static charges on the envelope or nearby grounded objects can cause electrons to
strike the tube envelope, generating spurious light. Photoemission from the gas and from
dynode surfaces, as well as Cěrenkov light from cosmic rays and radioactive decay,
cause spurious counts. Light from these sources can be guided by the glass tube envelope
directly to the photocathode, where it will cause spurious counts. Graphite paint (DAG)
applied to the envelope and kept at cathode potential (via a 10 M high voltage resistor
for safety) eliminates the light guiding and provides an electrostatic shield. High electric
fields inside the envelope can cause scintillation as well, so use DAG and really good
insulation (e.g., 4 mm of high quality silicone rubber, not 10 layers of PVC tape). Taking
care here can reduce the dark counts by a factor of 10. High humidity is a disaster for
PMT performance due to external leakage currents.
Being vacuum tubes, PMTs are easily destroyed by shock and may be microphonic
in high vibration environments; also, of course, they are expensive. PMT manufacturers
provide 200-odd page manuals on how to apply PMTs in measurements, and these are
full of lore. If you need PMTs, get a few of these application manuals.
3.6.3 Avalanche Photodiodes (APDs)
When a high electric field is applied to a semiconductor, free carriers can acquire enough
energy to excite other carriers through impact ionization. These newly generated carriers
can themselves create others, so that a chain reaction or avalanche results. At higher
fields still, carriers can be generated spontaneously by the field, and breakdown occurs.
When this mechanism is used to multiply the photocurrent in a photodiode, the result is
an APD. Reasonably stable multiplication gains (M) of up to 100 or so are possible by
controlling the bias voltage carefully at a value of about 90% of the breakdown voltage,
and compensating its strong temperature dependence.
Holes and electrons have different values of the ionization coefficient α (a normalized
cross section). One might think that the performance would be best when both contribute
equally, but in fact that’s the worst case. All the carriers have the same speed, so in
a pure electron avalanche, all the secondary electrons arrive at the same time as the
primary photoelectron, and the holes are spread out by the transit time τ . In a bipolar
avalanche, the holes cause ionizations, so that the avalanche spreads out in both directions
and bounces back and forth until it dies away due to statistical fluctuations in the rates.
This makes it become very slow and very noisy as the gain increases. The figure of
merit for this is k = αh /αe , the ratio of the ionization cross sections of the less ionizing
species (holes) to the more ionizing (electrons). The situation to avoid is k ≈ 1. In
silicon, k is very small, because holes cause almost no impact ionization, but in low
bandgap materials like InGaAs, k is around 0.3 at low voltage, rising to nearly 1 at high
voltage. Heterostructure APDs exist, in which the detection and multiplication are done
in different semiconductors; this gives the best of both worlds at the price of complexity
and expense. The noise is always worse at high gain; the noise power tends to increase as
M 2+m , where the noise exponent m is 0.3–1.0, so that the SNR goes down as M −0.3 to
M −1 . The exact exponent is device dependent, and manufacturer’s specifications should
be consulted—if you’re lucky enough to find a data sheet that has that level of detail.
(Optical detector manufacturers should be ashamed of themselves for the uniformly poor
quality of their data sheets.) The excess noise performance of APDs has been improving,
due to innovations such as separating the photodetector region from the multiplication
region, so this may become less of a problem in future.
The other major excess noise contributions come from uncertainties in the position of
the first ionization event of the avalanche. Simple designs where the multiplication occurs
in the absorption region have a 6 dB noise penalty, because some photons are absorbed
deep into the multiplication region, so that the available gain is much less (they also
have horrible variations of gain with wavelength, for the same reason). Newer designs
that separate the absorption and multiplication regions via control of the doping density
are much quieter and flatter with λ.
APDs exhibit a gain–bandwidth trade-off almost like an op amp’s. Due to the finite
value of k, the avalanche takes more time to build up and die away as M increases, so
the bandwidth tends to go as 1/M. Even at low gain, the time it takes carriers to transit
the (thick) multiplication zone limits the ultimate bandwidth. That’s why the quickest
photoreceivers (40 Gb/s or ∼30 GHz) use Schottky photodiodes and accept the reduced
The inherent SNR of the photocurrent generated by an APD is monotonically decreasing with gain; the signal quality gets worse and worse—so why on earth use them? The
great virtue of APDs shows itself in wide bandwidth systems. For speed, these systems
have to use low load impedances, and so are limited by the large additive Johnson noise.
When the additive noise dominates all other noise sources, the (electrical) signal-to-noise
ratio improves by M 2 until the excess noise from the APD becomes comparable to the
Johnson noise, at which point the SNR peaks and begins to deteriorate. Thus the operating M should be chosen so that the excess noise roughly equals the Johnson noise of the
amplifier (actually a bit higher since the falloff in SNR is very slow in the direction of
increasing M, so the peak is not exactly where the two noises are equal). Alternatively,
compared to a PIN diode, you can reduce the load resistance by a factor M 2 , which can
help the bandwidth a lot. The price you pay is slightly lower SNR due to multiplication
noise, narrower optical bandwidth, extra cost, and uncertainty in the exact operating gain.
The gain of an APD is a very strongly increasing function of bias voltage near breakdown, as shown in Figure 3.6 for a Hamamatsu S5343; a ±20 ◦ C change will make a
nominal gain of 70 vary between 30 and 200. While this can be calibrated, and the bias
(Typ. λ = 650 nm)
−20 °C
0 °C
20 °C
40 °C
60 °C
Reverse Voltage (V)
Figure 3.6. Gain of a Hamamatsu S5343 Si avalanche photodiode versus bias voltage, for various
voltage or temperature controlled to keep it within closer bounds, we’re never going to
get even 1% accuracy over temperature with wild swings like that. This makes APDs
poorly suited to accurately calibrated jobs in analog mode.
If you have matched APDs, either monolithically or by sorting, it is possible to stabilize the multiplied dark current (and hence the gain) of an illuminated APD by putting a
constant current into the dark diode and applying the resulting voltage to the illuminated
device. (You have to use a lowpass filter and a buffer, plus some thoughtfully chosen
safety limits to avoid blowing up expensive APDs.) This is twice as expensive, somewhat noisier, and vulnerable to gradients, but saves the trouble of running temperature
calibrations, which tend to be slow and hence expensive.
3.6.4 Photon Counting with APDs
APDs operated in the breakdown region (Geiger mode) are often used in mediumperformance photon counting applications. The bias must be reduced after each event to
stop the avalanche, typically by putting a huge resistor (≈100 k) between the device
and the bias supply, with the output taken across the diode. The resulting RC time constant makes the recovery slow (1 μs), but this does not affect their rise time, which can
easily be below 1 ns—20 ps has been reported.
Since all light pulses produce an output of the same size, pulse height discrimination
is impossible. Compared with compact photomultipliers, these devices are more rugged,
have higher quantum efficiency, and are available in larger diameters, but because of their
slowness they have no compelling advantage in signal detection performance. Circuit
improvements† can get them down to around 50 ns, which is somewhat better.
APDs emit a small amount of light when they break down, so if you have a multi-APD
setup, you can get optical crosstalk. It’s a good idea to reset all the APDs whenever any
of them fires. You can also get segmented APDs intended for counting bursts of multiple
photons in Geiger mode. In these devices, the active region is pixellated, but all the
segments (as many as 14,000) are wired in parallel, via integrated quench resistors.
These have higher dynamic range (since many segments can break down at once) but
no better timing characteristics. Pulses from different segments are reasonably uniform
in size, so that bursts of a few dozen photons can be counted with good pulse height
discrimination. Besides the ordinary dead time correction (see Section 7.3.1), there is a
small nonlinearity due to segments with multiple events. However, the main drawback
of APDs for photon counting is their very high dark count rate—something like 50
MHz/cm2 for an APD (Hamamatsu S10362-33-100C 3 × 3 mm segmented APD) versus
30 Hz/cm2 (Hamamatsu H9319-01/11 25 mm PMT module), more than six orders of
magnitude worse. This restricts photon counting APDs to very small areas.
APDs should be avoided if possible, since a PIN diode operating anywhere near the
shot noise limit will always be superior, as well as cheaper and much easier to use.
Unfortunately, we can’t always have all the light we’d like, and when we can’t use
coherent detection, APDs can be a big help in the nasty 10 pA to 1μA region.
† A.
Spinelli, L. M. Davis, and H. Dautet, Actively quenched single photon avalanche diode for high repetition
rate time gated single photon counting. Rev. Sci. Instrum. 67, 1 (January 1996); A. Lacaita et al., Performance
optimization of active quenching circuits for picosecond timing with single photon avalanche diodes. Rev. Sci.
Instrum. 66, 4289– 4295 (1995).
Incident Light
Photocathode (−HV)
Anode (0V)
Cathode (+HV)
Vacuum Can
Figure 3.7. The vacuum APD (or hybrid PMT), an imaginative cross between APD and PMT;
the high energy of the photoelectron hitting the APD produces a very well-defined pulse.
3.6.5 Vacuum APDs
An imaginative hybrid between the PMT and APD has been developed, the vacuum APD
or hybrid photomultiplier (Figure 3.7). It consists of a photocathode and a silicon APD
sealed into opposite ends of a vacuum tube, with a high potential (5–10 kV) applied
between them. A photoelectron emitted by the photocathode slams into the APD surface
with enough energy to generate as many as 2000 carrier pairs. Combined with avalanche
multiplication, the overall gain is 105 –106 , easily enough for photon counting. Because
of the huge electron gain due to impact (equivalent to the first dynode gain in a PMT),
the pulse height is much better controlled than in either a PMT or a regular APD, and is
much larger than APD dark pulses, so the two are easily distinguished by thresholding.
You can also get these with regular PIN diodes as the anode, and those have the usual
PIN advantages of stability and low noise, at the expense of gain. Either type gives a tight
enough pulse height distribution that you can accurately compute the number of photons
in a burst, at least up to about 10 photons or so. This is important, because it extends
the usefulness of photon counting measurements to higher flux levels. The disadvantages
of VAPDs include high cost, photocathode inefficiency, and the requirement of two very
high voltage (+2 kV and −10 kV) power supplies. Still, these devices appear to be the
state of the art for nonimaging detectors with gain.
3.6.6 Photoconductors
True photoconductive detectors must be distinguished from photodiodes operated at
reverse bias, in the misnamed “photoconductive mode.” The two bear a superficial
resemblance, in that the same circuit is used for both, but their operating principles and
performance are quite distinct. A true photoconductor is a resistive chip or film made of
a substance whose conductivity changes when it is illuminated, due to the generation of
electron–hole pairs when photons are absorbed. If a bias current is applied, the change
in conductivity gives rise to a voltage change between the terminals.
Except in special circumstances, such as far-IR detection, where no good photodiodes
exist, photoconductors are a poor choice of detector.
An ideal photoconductor is 3 dB noisier than an ideal photodiode; in a photoconductor,
recombination becomes a Poisson process with the same variance as generation, so that
the noise power doubles. More than that, though, photoconductors exhibit a fundamental
trade-off between sensitivity and speed that photodiodes are free of. Once a carrier pair
is generated in a photoconductor, it contributes to the conductivity until it recombines.
Recombination can occur at the surface or in the bulk; surface recombination is usually
faster, so that at short wavelengths, where the carriers are generated near the surface, the
responsivity may drop.
A photoconductor exhibits gain equal to the ratio of the carrier lifetime τ to the transit
time τtr , since it basically gets to reuse the carriers M times, where
This can be increased by increasing the lifetime or shortening the transit time. This
relation appears very favorable, since it allows the possibility of useful gain with noise
lower than that of PMTs or APDs (since the multiplication contributes no additional
noise except for recombination). Unfortunately, the carrier lifetime limits the speed of
the device, and so it must be made short for practical devices.
The only common photoconductors used in the visible are cadmium sulfide (CdS),
cadmium selenide (CdSe), and their offspring, CdSSe. Besides being very slow, these
devices have a sharply peaked spectral response that depends strongly on processing,
a memory effect that can lead to factor-of-5 changes in cell resistance due to previous
history, and a response time that depends on the illumination level; on the other hand,
they exhibit a truly gigantic photoconductive response, which is helpful in applications
such as night lights, where the amount of circuitry must be minimized. They are not
terribly useful in high performance applications.
Infrared photoconductors are a bit better, but their main attraction is that they exist
in a field with limited alternatives. They include HgCdTe, whose cutoff wavelength can
be tuned from 5 to 22 μm by adjusting the Hg/Cd ratio; InSb, useful to 5.5 μm; and
lead salts PbS and PbSe, which are very slow and should be avoided if possible. Most
of these require cryogenic cooling for good performance, which tends to cost thousands
of dollars per unit. Room temperature photoconductors are available out to 15 μm, but
are so insensitive that they are good only for detecting laser beams.
Far-infrared (>20 μm) photoconductors are also available; these are made from doped
silicon and germanium. Due to dopant solubility limitations, they are nearly transparent to
the radiation they are designed to detect, which seriously limits their quantum efficiencies;
they also require cooling to liquid helium temperatures, and cost $10,000 or more.
Designers who find themselves thinking that there must be an easier way should
consider pyroelectric and thermal detectors. Photoconductor performance varies widely;
consult the manufacturers’ data sheets.
Not all photodetectors are based on quantum effects; the other major class is thermal
devices and includes bolometers, thermocouples and thermopiles, pyroelectric detectors,
and mixed technology devices such as Golay cells. These devices are characterized by
room temperature operation, low sensitivity, broad wavelength response, and low speed
(except for some pyroelectrics).
A bolometer is a device whose resistance changes with temperature, such as a carbon thermistor or platinum film, and which has a very black surface. It is limited by
the Johnson noise of its resistance and (if care is not taken) by the noise of its bias
current and by thermal gradients. Increasing the bias current improves the sensitivity,
but self-heating effects set a limit. Bolometers are used in a bridge configuration with
a second, similar element shielded from incident radiation, to provide compensation for
ambient temperature variations.
Thermocouples and thermopiles were the first IR detectors, dating from the 1830s.
These devices do not require bias currents, as they generate their own output voltage from
the temperature difference between two junctions of dissimilar metals or semiconductors.
A thermopile is a series array of thermocouples, which gives a higher output voltage.
Bolometers and thermocouples are frequently used in spectrometers, where their room
temperature operation and wide wavelength range are important advantages. Superconducting bolometers exploit the very large temperature coefficient of a superconductor
near its transition temperature; they are extremely sensitive but obviously cumbersome.
Infrared bolometer arrays are becoming popular, in spite of poor sensitivity, because
room temperature operation makes them much easier to use than cooled quantum detectors; they are somewhat cheaper as well.
Pyroelectric detectors are basically capacitors made of lithium tantalate (LiTaO3 ),
molecular crystals such as triglycine sulfate (TGS) and its derivatives,† or ferroelectric
plastics such as polyvinylidene difluoride (PVDF), which is basically fluorinated Saran
Wrap. By poling the material (causing a net dielectric polarization to be “frozen in”),
the dependence of the polarization on temperature can be converted to a surface voltage
change. High impedance AC amplifiers are used to detect these changes. Pyroelectric
(PE) detectors do not work at low frequency, so PE detectors require their inputs to be
modulated (e.g., with a chopper). Pyroelectrics work at room temperature but are not
as sensitive as cooled semiconductor detectors. Their response increases as their Curie
temperature (where they depole spontaneously) is approached. These devices have a very
wide range of speeds and sensitivities, from submicrosecond devices used with pulsed
lasers to pyroelectric vidicons that can see small (0.1–0.5 ◦ C) variations around room
temperature. Circuit hacks can get very competitive sensitivity (∼0.13 K NET) in a
low resolution image sensor very cheaply (see Section 3.11.16 and Example 17.1).
3.8.1 Image Tubes
These are two kinds of spatially resolved image amplifiers. Image intensifier tubes use a
photocathode and a scintillator in a vacuum tube with an electrostatic lens in between, to
make a single photoelectron emitted from the photocathode produce many photons from
the scintillator (up to 2 × 103 in one stage). These can be detected directly or can be run
into another image intensifier tube. Stacks of three tubes have been popular, requiring
† Philips
used to use deuterated triglycine fluoroberylate in their PE vidicons. To the author’s knowledge this
stuff has the most jaw cracking name of any material in practical use.
power supplies of 15–50 kV at low current, and yielding photon/photon gains of 5 × 104
or so.
Image tubes are somewhat inconvenient to use, since electrostatic lenses have curved
focal surfaces, so that the tube faces must be convex. Controlling field curvature requires
negative lenses. Laplace’s equation forbids this, because the field would have to be a
maximum away from the boundary. This is a particularly inconvenient shape to match
to the focal surface of a lens, which also tends to be convex (in the other direction of
course). Even when a fiber optic face plate is used to flatten the focal surface, image
tubes have relatively poor resolution. The image intensifier tube is not used as widely
as it was 20 years ago, because of the advantages of microchannel plate (MCP) image
3.8.2 Microchannel Plates
A microchannel plate is a slab of glass containing a close-packed hexagonal array of
small holes or channels, 5–15 μm in diameter, running from one face right through
to the other. It is typically made by fusing together a bundle of specially made optical
fibers, whose cores are of an easily etched glass. After fusing and slicing, the cores are
etched away, leaving channels. The channels are lined with a thin layer of an electron
multiplication material like that used in photomultiplier dynodes. A small current flows
along this layer, allowing the layer to replace both the dynodes and the bias string of a
PMT. A few kilovolts’ bias is applied across the thickness of the plate, so that due to
the voltage drop across a strong potential gradient exists along the length of the channel.
An electron hitting the wall of the channel gives rise to a few secondary electrons,
which cascade down the length of the channel, hitting the walls and resulting in a large
multiplication factor from face to face. The length of the channels is typically 50 times
their diameter.
A MCP image intensifier has a photocathode near one face and a scintillator near
the other; due to the electron multiplication, a much higher photon–photon gain (>104 )
can be realized in a single stage with MCP, so multiple stages are usually unnecessary.
The spatial resolution of a MCP is much better than that of an image tube, since it is
defined by the geometry of the channels, rather than that of a rather fuzzy electrostatic
lens. Microchannel plates are vulnerable to ion events; a straight MCP channel is an
effective ion accelerator. Modern MCPs have their channels at an angle to the slab faces
or arranged in a chevron or curved (“J”) arrangement. This increases the gain and reduces
its spread, since it tends to force all electrons to hit the wall near the top of the channel. In
addition, tortuous channels help force any ions to hit the sides instead of being launched
into the photocathode.
MCPs are extremely fast; typical rise times for MCP electron multipliers are around
100 ps, and fall times 100–500 ps. The transit time spread is usually a few times less
than the rise time. They are also very linear for channel currents up to perhaps 10% of
the bias current per channel, at which point signal-dependent voltage drops along the
channel begin to modulate the gain. Their multiplication noise is rather worse than a
conventional dynode-chain multiplier’s, because the secondary electron yield depends
more on the trajectory of the electrons.
You can get MCPs with up to 64-segment anodes for time- and space-resolved photon
counting applications. Their time resolution is excellent (∼40 ps RMS) but they take a
lot of circuitry.
Due to their huge internal surface area and constricted channels, MCPs cannot be
baked out very well. Their interiors are thus much dirtier than those of PMTs, which
limits their lifetime to about 0.2–0.5 coulomb of anode charge per square centimeter,
about 1000 times shorter than that of a regular PMT .†
3.8.3 Streak Tubes
The high gain and fast shuttering of MCPs has one major disadvantage; all the photons
from all times are superimposed on one another, so that there is no way of discovering
the time evolution of the signal afterwards. One can use several MCPs, or make the signal
periodic and use stroboscopic sampling, but these are not always possible; fortunately,
there is a better way: the streak tube.
A streak tube is nothing more than an image tube with deflector plates in it, so that the
image can be steered about on the output phosphor. A one-dimensional (1D) line image
can be scanned across the output phosphor to produce a two-dimensional (2D) grey scale
picture of the time evolution of every point in the line image—much like a 2D optical
oscilloscope, but with a time resolution down to 1 ps. The moderate photon-to-photon
gain of a streak tube, 10–500, is useful, because a signal has to be pretty intense to
produce many photons in a picosecond. Fast-responding photocathodes such as S-20 are
de rigueur —an NEA photocathode will smear the time response out to 1 ns or even
slower. Electron storage materials can be used to make IR streak cameras. Streak cameras
tend to cost $100k or so, but they’re 30 times faster than the fastest digitizing scopes.
3.9.1 Charge-Coupled Devices
Silicon photodiodes are very good detectors in the visible and are made on the same
substrates as ICs; thus they are natural candidates for making imaging array detectors. In
operation, each array element operates as an isolated photosensor, accumulating charge in
a potential well for some integration time, after which it is read out. The classical silicon
array detector is the charge-coupled device (CCD). In CCDs, the readout mechanism is
destructive but quiet; the charge from each element is shifted out by a clever analog shift
register technique analogous to a worm screw feed, until it reaches a device lead or an
on-chip sense amplifier. CCDs usually have on-chip amplifiers, which have enough gain
that the subsequent stages are easy to design (one to a few microvolts per electron is
The limits on CCD performance are set by photon efficiency, dark current, readout
noise, and charge transfer losses. Provided the CCD is not shifted too rapidly, transfer
efficiency will be 99.99% per stage at least. (This sounds great, but remember that this is
raised to the 512th or even 4096th power—in a good device, you can get 99.9999% by
slowing down a bit, which is dramatically better.) Nearly all the charge lost in transfer
winds up in the following pixel, so that a bit of postprocessing can help.
All but the most expensive CCDs have some bad pixels, which have anomalously
low sensitivity (dead pixels) or high dark current (hot pixels). Really cheap CCDs have
whole dead columns. Your processing strategy has to take these into account.
† Philips
Photonics, Photomultiplier Tubes: Principles & Applications, 1994, p.1– 21.
3.9.2 Types of CCD
The two major classes of area-array CCDs are interline transfer (ILT) and frame transfer (FT). In an ILT CCD, the columns of sensor pixels are interdigitated with readout
columns. When the integration time is finished, the entire image is transferred into the
readout columns and is transferred out to be read. The areal efficiency of ILT devices is
poor, 25–30%, since the readout columns take up space in the focal plane. On the other
hand, the very fast lateral transfer makes these devices good shutters. The sensitive area
in ILT CCD pixels is often weirdly shaped, which make its Fourier-domain properties less
desirable, and the small size and odd shapes lead to serious moiré effects when imaging
strong geometric patterns. Newer ILT CCDs having microlenses deposited on the top can
have fill factors nearly the same as FT devices, with the attendant FOV reduction.
A frame transfer CCD has no interdigitated columns. When it is read out, the columns
are shifted down into a shielded area array and read out from there. This keeps the fill
factor high but requires hundreds of times longer to get the last of the pixels shifted
into the dark array, so the shuttering performance is correspondingly worse. On the other
hand, the fill factor can be 100%, and the square, full-pitch pixels reduce the moiré
3.9.3 Efficiency and Spectral Response
Some CCDs are cheap, being produced in huge volumes for video cameras and other
applications (usually with many dead pixels). These have poor long wavelength response,
some cutting off at 600 nm or even shorter, although Sony makes some that work at
some level out to 1000 nm. Front-illuminated CCDs are also insensitive in the blue and
UV due to absorption of the films on the top of the silicon. Antireflection coating helps
Illuminating the CCD from the back helps a lot, because of the greater absorption
depth available and the absence of top surface obstacles. This requires thinning the die
drastically, which is extremely expensive and hurts yield; nevertheless if you have the
money, you can get CCDs whose fill factor is 100% and whose QE exceeds 85% after
AR coating.
Typical commodity black-and-white CCDs have η ≈ 60% and fill factors of 25%,
so that their overall efficiency is only 15% or thereabouts. These specifications must
be taken into account when designing a CCD detector subsystem. Some devices are
becoming available with microlenses on the chip, to gather more of the incoming light
into the active area, which helps a lot but causes spatial pattern problems, especially
etalon fringes with lasers (see Section 3.9.6).
3.9.4 Noise and Dark Current
The readout noise of a CCD is additive and can be very low: as low as 2 electrons per
pixel in a cooled device with a full well capacity of 106 electrons, read out at 20 kHz
or so. Floating-gate readout devices have demonstrated sub-electron noise levels but are
not commonly available. Over the past 20 years or so, commercial CCD readout noise
has been stuck at the few-electron level.
There is a sharp trade-off between speed and readout noise; going faster requires
more bandwidth, and the noise power scales linearly with speed. It’s possible to achieve
sub-electron noise by sampling slowly enough, but then it takes forever to read out the
camera. More commonly, the readout noise of a decent camera is around 30 electrons
RMS. Achieving this noise level requires eliminating the kTC noise (see Example 13.3)
contributed by the reset operation, which is done by sampling the output level before and
after each charge readout, and subtracting the two voltages, a procedure called correlated
double sampling. (Interestingly, since no charge is dissipated in the shifting operations,
no kTC noise is introduced by shifting the charge packet into the readout cell.)
Increasing the integration time increases the signal linearly, so the electrical SNR of
low light images goes as t 2 , until the dark current noise (idark t)1/2 equals the readout
noise, after which the SNR is shot noise limited and so increases only linearly with time.
The attainable (electrical) SNR is limited by photon statistics to a value equal to the full
well capacity in electrons, around 47 dB for the camcorder CCD to perhaps 60 dB for a
scientific device.
A garden-variety camcorder CCD operated at room temperature has a dark current of
about 100 electrons per pixel in a 30
second integration time, and a well capacity on
the order of 5 × 10 electrons. A cooled CCD, as used in astronomy, can achieve dark
currents far below 1 electron/s per pixel. These very low dark currents are achieved by
multiphase pinning (MPP), which eliminates the effects of surface states (MPP is also
called inverted mode). Since the dark current is thermally activated, it drops by a factor
of 2 every 10 ◦ C. Cooling cannot be taken too far, however, due to the “freezing out” of
trap sites, which leads to serious charge transfer losses at low temperatures (−30 ◦ C for
some devices, to −100 ◦ C for others). Freeze-out is not subtle; you get severe streaks
along the transfer direction.
3.9.5 Bloom, Bleed, and Fringing
Prominent CCD pathologies are bloom, bleed, and fringing. Bloom is the picturesque
name given to the artifacts that result when some elements become full and spill over
into the readout circuitry or the substrate, causing spurious lines and a general leakage of
charge into adjoining elements. A badly bloomed image is full of bright lines and blobs.
Bloom is controlled by additional electrodes that extract the charge before it can migrate
far; antiblooming compromises linearity, fill factor, and full well capacity, so use it only
if you really need it. Note that bloom is not necessarily restricted to illuminated areas; by
dumping charge into the substrate, saturated pixels can send charge into storage areas as
well, a particular problem in scientific applications, when the readout time is sometimes
long compared to the integration time.
Fringing and bleed cause loss of resolution in CCDs operated at wavelengths near
the 1.1 μm absorption edge of silicon. The silicon becomes increasingly transparent, and
light can bounce back and forth between the front and back surfaces of the wafer, causing
nasty irregular fringes and scatter. This is not a trivial effect: at short wavelengths the
fringes tend to be 1% or below, but in the IR they can get up to 5% or even more, which
is very objectionable.
If we try to make the silicon thicker, so that more absorption occurs, lateral diffusion
of carriers in the field-free region (away from the depletion zone of the junction) causes
bleed, where carriers from one pixel diffuse into neighboring ones. The best solution to
this is to use a thick piece of high resistivity silicon, which can be depleted throughout
the volume (like a back-biased PIN diode). High resistivity CCDs have high quantum
efficiency even at 1000 nm. If you haven’t got one of these special devices, there’s not
a lot you can do about it.
3.9.6 Spatial Pattern
CCDs also have stable, accurate spatial patterns; subpixel position measurements of
images a few pixels in size can be done by interpolating. Carefully designed, backilluminated CCDs have high fill factors, which makes their Fourier domain properties
simple, but front-surface devices are much iffier.
Fancier Fourier processing techniques should not assume that the spatial sensitivity
pattern of a PMT pixel looks like a nice rectangular box, even if the pixel is rectangular:
most of the time, it has sloping sides and a bit of dishing in the middle, so that it resembles
a fedora more closely than it does a top hat. Some types have serious asymmetry between
two halves of the pixel (e.g., when the sensitive area is L-shaped). These effects make
the optical transfer function of your CCD do things you might not expect; for example,
the zero in the optical transfer function of the CCD is not at 2π /pixel width but is
somewhat further out; irregular pixel shapes make it even worse because of their high
spatial harmonic content. The only defense against this in serious Fourier processing is to
oversample by a big factor (say, 4× to 8× the Nyquist limit), so that the big irregularities
in the OTF happen out where you don’t care about them, and you can correct for the
small ones that remain. Devices that are not efficiently AR coated will do unintuitive
things, because of reflectance changes with incidence angle. For an air–silicon interface,
the normal incidence reflectance is 0.3, whereas at 30◦ it’s 0.36 for s and 0.26 for p.
Measuring pixel sensitivities directly with a high NA flying spot will thus produce some
funny results due to this asymmetric pupil apodization.
At long wavelengths, bleed spreads incident light into adjacent pixels and causes the
OTF to be seriously wavelength dependent; how much this happens depends on the
absorption of the silicon.
CMOS image sensors and interline transfer CCDs with microlenses exhibit horrible
etalon fringes when used with temporally coherent sources.
3.9.7 Linearity
Unlike imaging tubes such as vidicons, CCDs are normally extremely linear, although
the antiblooming provisions of some devices can cause serious nonlinearity above about
half-scale. Infrared focal plane arrays are much less linear and usually require multiple
calibrations at different light intensities.
At low temperatures and long wavelengths, front-illuminated CCDs exhibit QE hysteresis due to trapping of charge at the interface between the bulk and epitaxial layers (this
is polished away in back-illuminated CCDs). The CCDs used on the Hubble, Galileo,
SXT, and Cassini space missions had QE variations as much as 10% depending on signal
level and operating temperature.†
Aside: CCD Data Sheets. From an instrument designer’s viewpoint, the worst thing
about CCDs is their data sheets. Op amps, microprocessors, diode lasers—all these have
reasonably standard spec sheets, but not CCDs. Designing a CCD system that will be
replicated more than a few times depends on the designer having a deep knowledge of
the details of each kind of CCD considered. The data sheets are also nearly all hopelessly
out of date. Consult the manufacturer of your devices for the latest specs—and don’t be
† J.
Janesick, posted to the CCD-world mailing list∼tmca/CCD−world/, March 18,
too surprised if they pay more attention to their camcorder-building customers than they
do to you.
3.9.8 Driving CCDs
CCDs take a lot of circuit support. This is somewhat specialized, so if you’re rolling
your own CCD drivers, you probably want a copy of Janesick. In general, CCDs are
forgiving of mildly ugly clock signals as long as they’re highly repeatable. People often
use transmission gates connected to well-filtered analog voltages to produce the funny
clock levels required.
3.9.9 Time Delay Integration (TDI) CCDs
Linear CCD sensors are used in line-scan cameras and in pushbroom-style remote sensing
satellites, where the spacecraft motion supplies the frame scan. The SNR can be improved
by time-delay integration (TDI), in which the linear array is replaced by a narrow area
array (perhaps 4096 × 64 pixels) and clocked in the narrow direction at the same rate that
the image crosses the detector, so that the same object point illuminates the same bucket
of electrons throughout. This requires accurate alignment and places severe constraints
on the geometric distortion of the imaging system, but a 64-deep TDI sensor can get you
a 36 dB signal increase.
Another TDI advantage is uniformity. Because each object point is imaged by (say)
64 pixels in turn, the fixed pattern noise tends to average out. There’s a bit more in
Section 10.5.3.
3.9.10 Charge-Multiplying CCDs
Because the charge transfer efficiency of a CCD is so high, and its dark current can
be made very low, its major noise source is its output amplifier. The noise can be
reduced to subelectron levels by slowing the readout clock and filtering or averaging
the output. The noise/speed trade-off prevents the use of CCDs for real-time imaging at
low light levels, as we’ve seen. Hynecek† and more recently Mackay et al.‡ have made
a clever electron-multiplying CCD that overcomes this trade-off almost completely, the
low-light-level CCD or LLLCCD (or L3 CCD). This device can be used either as a
normal CCD of quantum efficiency η, or as the equivalent of a noiselessly intensified
CCD with a QE of η/2 and a readout speed of 20 MHz. The way it works is to take
an ordinary single-output CCD and add a few hundred extra transfer stages before the
readout amplifier. In the extension, one of the three readout phases is run at a much
higher voltage, enough that there is a small amount (≈1–2%) of electron multiplication
in each stage. Because there are many stages, reasonably well-controlled gains from 1
to several thousand are easily realized by changing the voltage in the one phase, so the
same device can go from sunlight to photon counting by changing one voltage. Since
the multiplication occurs inside the silicon, there is no dirty vacuum to limit its lifetime.
The complexity increase is much smaller than that of an MCP. Furthermore, blooming
† Jaroslav
Hynecek, CCM—a new low-noise charge carrier multiplier suitable for detection of charge in small
pixel CCD image sensors. IEEE Trans. Electron Devices 30, 694– 699 (1992).
‡ Craig D. Mackay, Robert N. Tubbs, Ray Bell, David Burt, Paul Jerram, and Ian Moody, Sub-electron read
noise at MHz pixel rates. SPIE Proc. January 2001.
of the multiplication stages automatically protects against damage from bright lights that
would reduce an MCP to lava.
The LLLCCD needs cooling to get its dark current low enough to count photons. In
addition, its dynamic range is limited by the full well capacity of the multiplication stages,
and multiplication causes the shot noise to go up by 3 dB, which is actually amazingly
good. In a PMT, secondary emission is a Poisson process, so 512 dynode stages, each
with a secondary electron yield of 1.01, would produce a pulse height histogram that
looked like a pancake (its variance would be about 100 times the mean; see the chapter
problems in the Supplementary Material). Electron multiplication in the CCD involves
a Poisson process too, with one key difference: with 512 stages of 1.01 gain, the 1 is
deterministic and hence noiseless—only the 0.01 is Poissonian, so the variance is only
twice the mean, not 100 times. (Why?)
There is a slight linearity trade-off involved at very high gains—bright pixels see
a somewhat higher gain than dim ones, the opposite of the usual situation. The error
can be a factor of 2 in really bad situations, but as it’s monotonic, with care it can be
calibrated out. A significant advantage of L3 CCDs is the ability to achieve high frame
rates, because the usual trade-off of readout noise versus frame rate has been greatly
improved by the amplification ahead of the readout amplifier. Because of their sensitivity,
flexibility, potentially low cost, and long life, electron-multiplying CCDs will probably
replace image intensifiers in applications where cooling is feasible and fast shuttering is
not required.
For applications such as astronomical spectroscopy, where the 3 dB SNR loss is too
great, L3 CCDs can be operated in photon counting mode, provided the frame rates are
high enough that you don’t lose too many counts due to two photons hitting a given
pixel in the frame time. Because the device operational parameters don’t need to change
between analog multiplication and photon counting modes, you could in principle change
the voltage on the multiplying phase during readout, which would give different pixels
different gains, increase the dynamic range.
3.9.11 Charge Injection Devices (CIDs)
CIDs are like CCDs only backwards: the well starts out full, and light removes charge
instead of adding it. This adds noise at low light levels but makes CIDs intrinsically
resistant to bloom, and so suitable for high contrast applications. Their quantum efficiencies are typically fairly low, around 30%. CIDs use multiplexers instead of shift
registers and can be read nondestructively, since the charge need not be removed from
the element during readout. CMOS imagers also use big multiplexers (see below). These
multiplexer-based technologies offer random access, so we can use different integration times for different pixels on the array. This means that an image with extremely
high contrast can be read adaptively, yielding the best of all worlds: rapid acquisition of
bright objects and long integration times to enhance detection of faint ones. Since CMOS
imagers don’t exhibit bloom either, CIDs are no longer so unique in that way.
3.9.12 Photodiode Arrays
Photodiode arrays (commonly known as “Reticons” for historical reasons) look like
CCDs but are actually read out via a big multiplexer instead of a bucket brigade, which
makes them much easier to control. They are competitive only in 1D arrays, as in OMA
spectrometers, but they work very well for that. Compared with CCDs, they are not
limited by charge transfer inefficiency, which suits them well to high contrast applications,
where the data will come under close scrutiny, as in spectroscopy. They are a bit less
sensitive than CCDs of the same pixel size, however, because of multiplexer noise and
charge injection. More insidiously, they are dramatically less linear than CCDs.
Photodiode arrays, unlike CCDs, generally have no bias applied to the diodes during
integration; thus they are somewhat nonlinear at large signals, because of the forward
conduction of the photodiodes. The forward voltage drops and the dark current increases
with temperature, so the linearity, full well capacity, and available integration time all
degrade more quickly with temperature than in a CCD. A 10 ◦ C increase doubles the
dark current and reduces the charge capacity by about 25%.
3.9.13 CMOS Imagers
The CMOS imager is similar in performance to an interline transfer CCD but is manufactured on an ordinary CMOS process, as used in logic and memory ICs. It consists
of an array of pixels, each with its own buffer amplifier. The amplifiers are connected
to a gigantic crossbar multiplexer, as in a dynamic memory IC. Often there is a single
stage of CCD-style charge transfer between the pixel and the amplifier, to provide the
fast shuttering capability of an ILT CCD. The attraction of a CMOS imager is that lots of
the ancillary circuitry required for a CCD, such as clock generation, amplification, A/D
conversion, panning, zooming, and even image processing (e.g., halftoning), can be done
on the imager chip. The optical characteristics are not generally as good as CCDs; the
dark current is generally higher, and the linearity and accuracy less. The large number of
amplifiers all have slightly different offset voltages, so that there is a lot of fixed-pattern
noise in CMOS imagers that is not present in CCDs, which have many fewer amplifiers
(often only one). This makes their dim-light performance poor, but work is underway
to bring these devices to the capabilities of scientific CCDs. Being able to use imaging
sensors without dragging along six tons of ancillary hardware is a powerful and useful capability, but on the other hand the flexibility of custom controllers is sometimes
CMOS imagers often exhibit severe fringing due to reflections between the fairly
dense metal wiring layers and the silicon surface, and microlenses make it worse. Effective optical path lengths are in the 10 μm range. This isn’t too terrible in wideband
applications, but it can make narrowband measurements and spectroscopy exciting. On
the other hand, since there are real FETs between each CMOS pixel and its neighbors,
CMOS imagers generally don’t bloom.
3.9.14 Video Cameras
Television is a vast wasteland.
—Newton N. Minow (then Chairman of the US Federal Communications Commission)
Video cameras use CCDs or CMOS imagers, but most of them are poorly suited to
precise measurements. They have black level and gain adjustments (usually automatic)
that often foul up measurements by changing when we want them to keep still. In order
to mimic the behavior of photographic film and vidicons, their response is deliberately
made nonlinear, sometimes with an adjustment for γ , the contrast exponent. There are
cameras made for instrumentation use, and they work well enough but tend to cost a lot
compared to the entertainment-grade ones. Since gain and black level adjustments are
done on the full frame, putting an intensity reference in the field of view will sometimes
allow you to fix it afterwards, but the γ problem is still there.
Cameras are useful in full field interferometric techniques (phase shifting, moiré, and
holographic) and in structured light measurements (see Section 10.5.9). On the other
hand, cameras are often overused by people whose primary strength is in software, and
who want to get the data into digital form as early as possible. The importance of getting
the detector subsystem right cannot be overemphasized, so this by itself is an inadequate
reason for using video.
Sometimes video is necessary, for example, in machine vision systems or where the
requirement for using commercially available hardware is more compelling than that for
optimizing performance. In the long run, however, it’s such a headache that it is vitally
important to make sure you really understand why you’re using video rather than one or
a few silicon photodiodes. Its cost and complexity rise very quickly once you get past
webcam quality, and even so the measurements are normally poor, due to the low SNR
of image sensors, the 8 bit limit of the A/D converters, and the generally poor fidelity
of the electronics used.
Color video is even worse than black and white. Color sensors are made by putting
arrays of different-colored filters on top of a monochrome sensor.† These filters are
generally arranged in groups of four, one pixel each of red and blue, and two of green.
In order to prevent bad moiré effects, color sensors have “defuzzing filters,” which are
thin walkoff plates (see Section 6.3.5) mounted over the CCD to smear out the image over
a four-pixel area. Of course, this reduces the resolution by a factor of 2 in each direction,
and equally of course (marketing departments being what they are), the quoted resolution
is that of the underlying sensor. When comparing color to monochrome, remember that
it takes four Marketing Megapixels™ to make one real monochrome megapixel. Avoid
color sensors for instrument use whenever possible.
3.9.15 Extending the Wavelength Range: CCDs + Fluors
The UV performance of a front-illuminated CCD detector can be dramatically improved
by applying a thin coating of a fluorescent material to convert incident UV photons into
visible light before they are absorbed. The quantum efficiency of this approach is around
10% for many fluors, which is not as high as we’d like but a lot better than nothing,
which is what we’d get otherwise.
3.9.16 Electron Storage Materials
You can get near-IR sensor material for converting 0.8–1.6 μm light to visible. The
best is Q-42 phosphor from Lumitek Corp (available as IR sensor cards from Lumitek
and Edmund Optics, since Kodak went out of the business). It uses rare-earth oxides
or sulfides in a calcium sulfide matrix and works by electron trapping: visible or UV
excites the Ce2+ , Er2+ , or Sm3+ ions to a long-lived metastable state; when an IR photon
† There
are honorable exceptions, in which dichroic prisms are used to form RGB images on three separate
CCD chips, but you’ll probably never see one.
comes along, it is absorbed, and a visible one emitted. The fluorescence dies away very
rapidly (picoseconds to nanoseconds) when the IR stops, and the quantum efficiency
is near unity. This material could probably be used to extend silicon CCD detectors
out to 1.6 μm, in much the way as the UV fluors. It would obviously have to be
pumped with a wavelength to which the CCD is insensitive, or refreshed during readout
like dynamic memory. For near-IR applications at high power density, you can also
get second-harmonic generating material, such as nitrofurazone (5-nitro 2-furaldehyde
semicarbazone)† (5-nitro 2-furaldehyde semicarbazone, Aldrich Catalog #73340), which
can be mixed with paint. Water and polar solvents deactivate it, but it survives well in
anisole and NMP (n-methyl pyrridone).
3.9.17 Infrared Array Detectors
Platinum silicide arrays are made on silicon substrates, but other infrared arrays are
odd hybrid devices, generally consisting of detector chips made of InGaAs (to 1.7 or
2.2 μm), InSb (to 5 μm), or HgCdTe (2.5–14 μm) bump-bonded to Si readout chip.
Since the chip metal goes on the top, the chips are bonded face-to-face. Thus the light
comes in through the substrate of the photodetector chip, which is the origin of the
short-wavelength cutoff in most IR imaging arrays (e.g., the absorption edge at 850 nm
due to the CdZnTe substrates used for many HgCdTe detector arrays). Nowadays more
of these devices are being back-thinned, so it is possible to get InGaAs detectors with
response as far down as the ultraviolet (<400 nm). The long-wavelength edge of HgCdTe
is tunable from 2.5 μm to 14 μm by changing the alloy ratios, with the leakage becoming
worse for longer wavelength cutoff, as the bandgap decreases. Near-IR sensors such as
InGaAs as well as lower performance devices such as pyroelectrics and microbolometer
arrays can work at room temperature, but all the others require cryogenic cooling.
IR arrays are much less linear than silicon CCDs, and at the current state of the
art, their dark current and response nonuniformities are much worse. Thus calibrating
an IR array isn’t the simple matter it is with silicon CCDs (see Section 3.9.19). Even
with pixel-by-pixel correction for gain and offset, the detection performance of IR arrays
is usually limited by their residual fixed-pattern noise. Using calibrations taken at several radiance levels, and fitted with low-order polynomials or sums-of-exponentials, can
greatly (20–30 dB) improve matters, at least within the range measured. Platinum silicide
arrays have enormously better uniformity than InSb and HgCdTe arrays, so they often
offer effectively lower noise in the mid-IR even though their QE is very poor. Pixels
with bad 1/f noise are usually the limiting factor in the stability of the calibration; PtSi
calibrations are good for days, but InSb and especially HgCdTe arrays require recalibration on timescales of minutes to hours, if the spatial noise is going to be below the
temporal noise.‡
3.9.18 Intensified Cameras
Electron multiplication is often helpful in low light situations, to overcome circuit noise. It
is natural to try applying it to imaging detectors such as vidicons and CCDs, to overcome
† Used
as a topical antibiotic (Furacin), but also good for making 0.5– 0.7 μm from 1.0– 1.4 μm.
Gross, Thomas Hierl, and Max Schulz, Correctability and long-term stability of infrared focal plane
arrays. Opt. Eng. 38(5), 862–869 (May 1999).
‡ Werner
their dark current and readout noise in the same way. Just now, the dominant type of
intensified camera is the combination of an MCP image intensifier with a CCD sensor.
Older types, such as the image orthicon, the silicon intensifier target (SIT) vidicon, and
cameras based on image converter tubes and electron optics are now seldom used (see
Section 3.9.10).
An MCP camera consists of a microchannel plate image intensifier whose output is
coupled to a CCD imaging detector. These are often proximity focused; the phosphor
screen is close to the CCD, making an extremely compact, simple, and robust system.
MCPs have spatial resolutions of 10 μm or so, a good match to a CCD.
Such intensified cameras normally produce noisy output, because generally they can’t
improve the photon statistics of the incident light. You can’t just use long exposures,
because each primary may produce 2000 electrons in the CCD well, so you get only
a few dozen counts before saturating. Such noisy signals often need frame averaging
before they become useful, which limits the utility of intensified cameras. For instrument
use, they’re usually at a disadvantage compared with cooled CCDs using long integration
times. On the other hand, image intensifier cameras are very suitable when the images are
primarily intended to be viewed by eye; real-time imaging becomes more important then,
and spatial averaging in the human visual system can take the place of frame averaging.
Another reason for using MCP intensified cameras is their very fast time-gating capability; an MCP has a rise time of 100 ps or so, and its gain can be turned on and off
in a few nanoseconds, making it an excellent shutter as well as an amplifier. The two
methods for doing this are to use an avalanche transistor circuit to gate the whole MCP
bias voltage, or to use an MCP with a grid between photocathode and MCP array, which
requires only a few tens of volts. Cameras exist that will take a few frames at 107 fps.
Aside: Night Vision Goggles. Direct-view image intensifiers, (e.g., night vision goggles) are helpful for a few other reasons. Photocathodes have about four times the QE of
rod cells, plus a much wider wavelength band, especially toward the IR where the sky
glow is brighter; they bring the cone cells into play, which have much higher resolution
and higher response speed; and the collection system can have a much larger étendue
than the eye, both in area and in solid angle.
3.9.19 Calibrating Image Sensors
Achieving imaging performance limited by counting statistics is a bit more of a challenge than this, because image sensors have characteristics that vary from pixel to pixel.
Each output amplifier will have its own characteristic bias voltage; each pixel will have
slightly different sensitivity and dark current. Separating out and correcting these effects
is nontrivial.
What we initially measure is a raw data frame R, but what we want is the true image
intensity I , corrected for gain and offset. Providing that the sensor is very linear (e.g., a
properly operated CCD), the best way is to take long- and short-exposure dark frames,
Dl (tl ) and Ds (ts ), and a flat field frame, F (tf ), using the real optical system aimed at a
featureless surface such as the dawn sky or an integrating sphere. A bias frame (just the
bias voltage) B can be constructed as
ts · Dl − tl · Ds
ts − tl
a normalized thermal frame (just the dark current in 1 s) as†
T =
Dl − B
and a sensitivity frame as
S = F − B − tF · T ,
where S is normalized to unity at the radiance of the featureless surface. Note that all
these frames should really be averaged across at least four separate exposures so that
the noise of the calibration frames does not dominate that of the data you’re going to
take with the system. Make sure that the long-exposure dark frames are at least as long
as your longest measurement, and that the flat fields are just below half-scale to guard
against nonlinearity. The long-exposure frames will be slightly corrupted by cosmic ray
events, so in the averaging operation, code in a check that throws out the highest value
in each pixel if it’s way out of line (e.g., four times the RMS noise).
When all this work is done, a properly normalized and calibrated true image I can be
computed from a raw image R(tR ) of exposure time tR :
(R(tR ) − B − tR · T )
This is a flexible and convenient calibration scheme, because you can make your
exposures any length you need to. You’ll need to experiment to determine how often it
needs to be repeated. Watch out especially for sensor temperature changes, which will
make these calibrations go all over the place. (See the problems for how many of each
kind of frame you need.) Above all, make sure that your sensor is always operating in
a highly linear regime: if your hot pixels saturate, you can’t do good dark frames; if
your flat fields are above half-scale and the antiblooming or MPP is on, you’ll have the
The flat field is wavelength sensitive, so make sure you take the flat field frames with
a source of the same spectral characteristics as your typical data. Fringing in thinned,
back-illuminated CCDs makes the wavelength variation of the nonuniformity sharper
than you might expect; the normalized photoresponse nonuniformity (PRNU) of a good
back-illuminated CCD is 1% or so in the red, rising to 5–10% in the UV and as much
as 20% near the IR cutoff.
As we discussed earlier, IR arrays are much more difficult to calibrate, because they
are not as linear, are more temperature sensitive, and are less uniform to begin with.
3.9.20 Linearity Calibration
CCDs are usually pretty linear up to about half the full well capacity, and many are
very linear nearly to full well. On the other hand, it is much more comfortable knowing
than hoping. One good way to get a linearity calibration is to use a diffused LED
source (a few frosted LEDs all around the CCD, at some distance) that provides a nice
uniform illumination across the whole chip (a few percent is OK, you can remove that
† If
you’re using integer arithmetic, you’ll want to normalize to some longer time interval to avoid significance
mathematically). Start with the CCD reset and shift the image out normally. Flash the
LEDs once after each row is shifted out, and you get a nice intensity staircase signal that
will tell you a great deal about your linearity curve. This is much faster than doing many,
many full frame calibrations, and so can be used as an online linearity calibration. Note
that LEDs generally have a temperature coefficient of output power near −1%/◦ C, so for
decent accuracy you have to temperature compensate them or use them in a closed-loop
system with a photodiode.
The most basic limit to the sensitivity of an optical measurement is set by the shot noise of
the signal photons. Once other noise sources have been reduced below this level, further
SNR improvements can come only from increasing the signal strength or narrowing
the bandwidth. There is a certain cachet to “shot noise limited” or “quantum limited”
measurements, which should not be allowed to obscure the fact that such measurements
can still be too noisy, and that most of them can still be improved.
The contributions of Johnson noise, signal and background shot noise, and thermal
fluctuations are easily calculated from parameters given in typical data sheets; the one
major imponderable is lattice (thermal) generation–recombination (G-R) noise in IR
photodetectors, which depends on the carrier lifetime, a number that is not always easily
available. In most cases, one must use a seat-of-the-pants method of estimating lattice G-R
noise, such as taking the published noise specification and subtracting all the other noise
sources, or rely on the detector manufacturer’s assertion that a detector is background
limited at given detector and background temperatures and field of view.
Formulas for noise contributions are often given in a form that explicitly includes the
modulation frequency response of the detector. This seems unnecessary. Apart from 1/f
noise, the detector noise has the same frequency response as the signal, and combining
different authors’ complicated formulas is cumbersome. The frequency response is a
matter of deep concern to the designer, who is unlikely to be misled by, for example,
someone describing shot noise as having a flat power spectrum.
The total noise contributed by a detector will depend on just how the detector is
coupled to the external circuitry. Calculating this requires some sort of circuit and noise
models of the detector. Determining the overall SNR of the signal emerging from the
detector subsystem is a major topic of Chapter 18. Table 3.1 is a good starting point for
figuring that out.
3.10.1 Source Noise
In many measurements based on externally applied illumination, source noise is the
dominant contributor. Great ingenuity is expended on reducing and avoiding it. In
laser-based measurements, especially bright-field ones such as interferometry, absorption
spectroscopy, and transient extinction, laser residual intensity noise (RIN) is frequently
the dominant contributor. It may easily be 60 dB above the shot noise level. Fortunately,
it is usually tractable; see Sections 10.6.2 and 10.8.6 for how to handle it. Laser frequency
noise must usually be treated by stabilizing the laser.
Noise from incoherent sources, especially arc lamps, is more difficult to deal with
since it is strongly dependent on position and angle, so that compensating for it by
TABLE 3.1. Which Noise Source Dominates?
Detector Type
Si, Ge, InGaAs
IR photodiodes
Noise Source
Photocurrent shot
Background shot
Photocurrent shot
Photon (background
Lattice generation/recombination
Rsh Johnson
IR photoconductors
Lattice G-R
Thermal detectors
Thermal fluctuations
a Here
Dominates Whena
is RL > 2kT /e (50 mV
@ 300 K)
ib RL > 2kT /e
(is + ib )RL < 2kT /e
is (RL Rsh ) > 2kT /e
(50 mV @ 300 K)
ib (RLRsh ) > 2kT /e
Only when reverse
Always unless
is G(RL Rsh ) > kT /e
(25 mV @ 300 K)
VDC τ μ > 2 kT /e
BLIP when
cooled (believe
Always unless
Nearly always
Noise Spectral Density
iN = (2eis )1/2
iN = (2eib )1/2
iN = (4kT RL )1/2
iN = (2eis )1/2
Eq. (3.10)
Eq. (3.23)
iN = (4kT /Rsh )1/2
iN = (4Geis )1/2
doubles variance)
Eq. (3.23)
Eq. (3.10)
iN = (4kT /Rsh )1/2
Almost never
vN = (4kT R)1/2
Eq. (3.24)
iN = (2Mei)1/2
Almost always
Only if M is too low
iN = (2M 1+χ ei)1/2
iN = (4kT /RL )1/2
i is the actual current from the device (after multiplication, if any).
comparing the measured signal in some way to a sample of the light from the source
may not be sufficiently accurate to control the effect. It is possible to stabilize arcs using
strong magnetic fields, but as this is awkward and expensive, it is seldom done; the
problem remains a difficult one. Incandescent bulbs are much quieter than arcs and are
often a good choice.
Source noise is multiplicative in character, since both signal and coherent background
are proportional to the source intensity; the multiplication of the source noise times the
coherent background level gives rise to the additive part of the source noise, while source
noise multiplying the signal puts noise sidebands on the desired signal, a particularly
obnoxious thing to do. See Chapter 2 for more on source noise and what to do about it.
3.10.2 Shot Noise
Shot noise is the easiest limit to calculate, but the hardest to improve upon. There is a
fair amount of confusion about shot noise, where it comes from, and where it applies.
The formula, which states that if the arrival of electrons at a given circuit point is a
Poisson process, the current i will have a noise current spectral density of
iN = 2ei A/Hz1/2 ,
is perfectly correct. However, much of the time the current is not Poissonian, so that
the formula is inapplicable. The simplest case is in ordinary photodiodes, where all
photocurrents and leakage currents exhibit exactly full shot noise, regardless of quantum
It is easy to make currents with full shot noise, much less than full shot noise, or much
more than full shot noise. A battery connected to a metal film resistor will exhibit much
less than full shot noise, because any charge density fluctuations are smoothed out by
electron scattering, which tends to reestablish the correlations and reduce the temperature
of the electrons; the shot noise power is reduced by a factor of L/ l, where L is the length
of the resistor and l is the mean free path for electron–electron scattering.† Since l ∼ 100
Å for disordered metals, and L ∼ 1 mm, the suppression is pretty strong.
An avalanche photodiode’s dark current will exhibit much more than full shot noise,
since each electron generated in the junction gives rise to a large pulse; even if all N
per second were exactly A volts tall, the RMS fluctuation in 1 s will be
√ arriving √
A N. This is M times larger than the shot noise corresponding to the average output
Shot noise in an optical beam can be viewed as the interference term between a noiseless optical signal and the zero-point vacuum fluctuations of the electromagnetic field.
It affects both the amplitude and phase of a detected signal; by heroic preparation, in
very special circumstances, the noise can be redistributed slightly between two conjugate components (sine and cosine, or amplitude and phase), but the product of the two
components cannot be reduced. It is thus a fundamental feature of the interaction of light
with matter.
Thermal light, at least at frequencies where hν kT , is Poissonian to good accuracy;
at lower frequencies, where the occupation number of the quantum states is higher, there
is an additional term due to the Bose–Einstein statistics of thermal photons; this makes
the variance of the photon count N go up by a factor of 1 + 1/[exp(hν/kT ) − 1].‡ We
are nearly always in the high frequency limit in practice.§
It is useful to concentrate on the statistics of the initially generated photocarriers,
before any gain is applied, because in all quantum detectors this current exhibits full
shot noise, and amplification does nothing whatever to improve the signal to shot noise
ratio. Like quantum efficiency, this ratio must be kept in mind; it is a vitally important
sanity check.
Carrier recombination is also Poissonian, so the shot noise variance is doubled. This double shot noise in photoconductors is sometimes called photocarrier
† There’s been a lively literature on the details of this for the last 25 years, and there’s a lot of interesting
physics there: for example, see R. Landauer, Ann. New York Acad. Sci . 755(1), 417–428 (1995).
For example, see E. L. Dereniak and G. D. Boreman, Infrared Detectors and Systems. Wiley, Hoboken, NJ,
1996, Section 5.2.2.
§ Hanbury Brown and Twiss demonstrated that these classical fluctuations could be used to measure the angular
diameter of hot thermal sources such as blue stars by cross-correlating the measured noise from two detectors
whose spacing was varied—a so-called intensity interferometer. See Robert Hanbury Brown, The Intensity
Interferometer. Halsted Press (Wiley), Hoboken, NJ, 1974. Hanbury Brown is one of the present author’s
technical heroes.
generation–recombination noise, but it passes the duck test† for shot noise. This
renaming leads to the highly misleading statement that photoconductors do not exhibit
shot noise.
A rule of thumb is that if the photocurrent from a photodiode is sufficient to drop
2kT /e (50 mV at room temperature) across the load resistor, the shot noise dominates
the Johnson noise; in a photoconductor, the required current is reduced by a factor of 2
because of the increased shot noise from recombination.
Comparison of shot noise due to signal and background is easier; because both photocarriers generated by signal and background photons exhibit shot noise, the signal shot
noise will dominate the background shot noise whenever the signal photocurrent is larger.
3.10.3 Background Fluctuations
In the absence of other modulation, the photon statistics of otherwise noiseless background light are the same as signal light. Background shot noise will dominate the
Johnson noise any time that the background photocurrent drops more than 2kT /e across
the load resistance for photodiodes, or kT /e for photoconductors.
In many instances, the background light is strongly modulated; examples include 120
Hz modulation in room light, and 15.75 kHz from television screens. Furthermore, in
many measurements, the coherent background is very important; for example, the ISICL
system (a short range coherent lidar) of Example 1.12 encounters unwanted reflections
from the plasma chamber walls that are 106 time stronger than the desired signal, and
which further exhibit strong modulation during scanning due to speckle. A combination of baffles, homodyne interferometry, and laser noise cancellation produce a stable
measurement even so.
3.10.4 Thermal Emission
In the mid- and far-IR, say, from 5 to 20 μm, room temperature or thermoelectrically
cooled quantum detectors are limited by the Johnson noise of their own shunt resistance,
while a cryogenically cooled unit is generally limited by the fluctuations in the background thermal radiation, the so-called BLIP condition.‡ For sufficiently strong signals,
the shot noise limit may be reached, but this gets increasingly difficult as the wavelength
gets longer, because the decreasing energy per photon makes the shot noise limited SNR
higher, and because the thermal background gets stronger. The key design goal while
using a BLIP detector is to reduce the detector area and field of view as much as possible,
while keeping all the signal photons, and not doing anything silly in the electronics to
add significant additional noise. Example 3.2 describes how to calculate the expected
noise from an IR photodiode.
3.10.5 Lattice Generation– Recombination Noise
Photoconductors exhibit noise due to random fluctuations in their number of carriers,
so-called generation–recombination noise. The last heading dealt with noise due to coupling to the fluctuations in the radiation field; noise also arises from coupling to the
† “If
it looks like a duck and it quacks like a duck, it’s a duck.”
originally stood for “background-limited infrared photoconductor” but has come to be applied to any
detector whose performance is limited by the thermal background.
fluctuations of the lattice vibrations. Since this is basically noise in the conductivity, it
causes noise mainly in the bias (dark) current. Thus it does not strongly affect photodiodes, which are normally run without a bias current.
For a photoconductor with resistance R, made of a material with majority carrier
lifetime τ and mobility μ, with a DC current I flowing in it, the lattice G-R noise
voltage vl is
4τ B
vl = I R
4τ μBR
= IR
Unfortunately, this calculation depends on parameters that are not readily extracted
from most data sheets, such as the lifetime of each carrier species, which is the majority,
and so on. In spectral regions (IR) where this is a significant noise source, it is usually
necessary to rely on the manufacturer’s assertion that a certain detector, at a given
temperature and field of view, is BLIP, and go from there. This is unsatisfactory, since it
makes it difficult to make trade-offs between, say, a cooled filter with significant passband
losses or no filter at all. The cooled filter will be better if the lattice G-R noise and Rsh
Johnson noise are low enough, whereas no filter is preferable if the signal loss is enough
to bring the detector into the G-R or Johnson limits. It is usually best to set a lower limit
on the noise using the Johnson noise of the published shunt resistance of the device, and
then quiz the manufacturer as to how low the field of view can go before the detector
ceases to be BLIP.
3.10.6 Multiplication Noise
APDs and photomultipliers exhibit multiplication noise, which appears as gain fluctuations. In a PMT, the variance of the electron gain at the first dynode is the principal
contributor to this noise, while in an APD, the effect is distributed. This noise source is
normally specified in the manufacturers’ data sheets and must not be overlooked when
designing detector subsystems, as it is often the dominant noise source toward the bright
end of the system’s dynamic range.
3.10.7 Temperature Fluctuations
A small detector weakly coupled to a thermal reservoir exhibits local fluctuations in
thermal energy density, which are sometimes described as temperature fluctuations. This
is a poor name, since the idea of temperature is well defined only in the limit of large
systems, but it is unfortunately entrenched. Thermal fluctuations depend on the thermal
mass of the detector, and on the thermal resistance between it and the reservoir, but
not directly on the area; this makes it one of the few intrinsic additive noise sources
for which D ∗ is an inappropriate measure. For a detector connected to a reservoir at
temperature T through a thermal conductance G, the RMS thermal noise power spectral
density is given by
P 2 = 4kT 2 G.
If the two surfaces are connected by thermal radiation, then for a small temperature
difference, the thermal conductance is given by the derivative of the Stefan–Boltzmann
formula (2.4), with an emissivity correction:
Grad = 4σ T 3
η1 η2
η1 + η2 − η1 η2
so the the fluctuation due to radiation thermal conductance is
= 4kσ T 5
η1 η2
η1 + η2 − η1 η2
For η = 1 and T = 300 K, the thermal conductance due to radiation is about 2 W/m2 /K,
which is equivalent to that of 1.3 cm of still air (0.025 W/m/K). Thus for well-insulated
thermal detectors, radiation can be an important source of both thermal forcing and
fluctuation noise. Low emissivity surfaces and cooling can help a lot.†
3.10.8 Electronic Noise
A good detector can easily be swamped in electronic noise from a poorly designed or
poorly matched preamp. It is vitally important to match the detector’s characteristics to
those of the amplifier if the best noise performance is to be obtained.
One common way in which amplifiers are misapplied is in connecting a simple transimpedance amp to a detector whose shunt capacitance is significant. This leads to a
large noise peak near the 3 dB cutoff of the measurement system, which (although data
sheets and application notes often describe it as inevitable) is easily avoided with a few
circuit tricks. See Section 18.4.4 for details.
3.10.9 Noise Statistics
It is not enough to know the RMS signal-to-noise ratio of a measurement; without
knowledge of the noise statistics, it is impossible to know what the effects of noise on
a given measurement will be. The noise sources listed in this section are Gaussian, with
the exception of most kinds of source noise and some kinds of electronic noise. Section
13.6.12 has a detailed discussion of this and other noise and signal detection issues.
3.11 HACKS
This section contains only optical hacks, but there are a number of circuit tricks listed in
Chapters 10, 15, and 18 as well, which should be considered when choosing a detection
strategy. It is important to keep the complete subsystem in mind during selection of a
detector element and detection strategy.
† See,
for example, Lynn E. Garn, Fundamental noise limits of thermal detectors. J. Appl. Phys. 55(5),
1243– 1250 (March 1, 1984).
3.11 HACKS
3.11.1 Use an Optical Filter
If your measurement is limited by the noise of the background light, it can often be
improved by a filter. In mid- and far-infrared systems using cooled detectors, you usually
have to cool the filter too, because it emits radiation of its own in its stopbands. Interference filters may present less of a problem; they mostly reflect the stopband light, so
the detector may see a reflection of itself and its cold baffles in the out-of-band region.
Make sure you mount a room temperature interference filter with its interference coating
facing the detector, or else the colored glass backing will radiate IR into your detector.
This helps with narrow filters, which drift a long way when cooled.
3.11.2 Reduce the Field of View
In background-limited situations, the background level can often be reduced by limiting
the field of view (FOV) of the detector. Typical ways of doing this are descanning the
detector in a flying-spot measurement, or by using baffles and spatial filters to reject
photons not coming from the volume of interest. In the mid- to far-IR, the situation
is complicated by the thermal radiation of the baffles themselves, which must often
be cooled in order to afford a signal-to-noise improvement. For BLIP detectors, if the
background radiation is isotropic, the (electrical) noise power will scale approximately
as the solid angle of the field of view, which is a very worthwhile improvement. To
control stray background light and reduce the thermal load on these cold shields, the
inside should be black and the outside shiny.
3.11.3 Reduce the Detector Size
As the FOV is reduced, there will come a point at which the background ceases to
dominate other noise sources, so that further reductions are no help. If the shot noise of
the signal is the next-largest effect, then only the collection of more photons will improve
the measurement; most of the time, however, the next-largest effect will be Johnson or
lattice G-R noise, which scale as the detector area.
The spatial coherence of the incident light will set a minimum étendue n2 A for the
detector, but if this has not yet been reached, it is possible to focus the light more tightly
(larger FOV) on a smaller detector. This strategy has the advantage of reducing all the
noise sources, while keeping the signal strength constant; as a bonus, smaller detectors
tend to be faster and cheaper. The limits on this approach are set by the available detector
sizes, by working distance restrictions at higher NA, and by approaching the shot noise
level, which does not depend on detector area and FOV.
3.11.4 Tile with Detectors
Gain isn’t everything. To pull really weak optical signals out of significant amounts of
background light (too much for photon counting), consider using detectors as wallpaper. If your measurement is limited by background noise statistics, and the signal and
background have the same spatial, angular, and spectral distribution, then the tricks of
reducing detector size or FOV won’t help any more. There’s only one way forward:
collect all of the light.
As you increase the detection solid angle, the background noise grows as but
the signal goes as . Sometimes you can just line a box with photodetectors, such as
CCDs or solar cells, and improve your measurement statistics. In this sort of case, consider especially whether your detector really needs imaging optics. Would a nonimaging
concentrator or just putting the detector up close be better?
3.11.5 Cool the Detector
Cooling a silicon photodiode below room temperature doesn’t accomplish much, though
cooling does reduce the dark current of CCDs quite a bit. In the IR, where we’re stuck
with narrow bandgap materials, cooling helps detector noise in two ways. The main one
is that it reduces leakage by sharply cutting the rate of thermal carrier generation (in
photodiodes) or thermionic emission (in photocathodes); this effect is exponential in the
temperature. In a photodiode, this leads to an enormous increase in the shunt impedance
of the device, which reduces the Johnson noise current as well as the G-R noise.
The other way cooling helps is that the Johnson noise power of a resistor is proportional to its temperature, so that even with a fixed impedance, the noise current goes
down; this effect is only linear, and so contributes less to the overall noise reduction.
Transistor amplifiers running at room temperature can have noise temperatures as low as
30 K (see Section 18.5.3), so that it is usually unnecessary to cool the amplifier if the
detector is run at 77 K (liquid nitrogen) or above.
Cooling schemes divide into thermoelectric (TE) and cryogenic. Neither is free, but
TE coolers (TECs) are much cheaper than cryogenic ones. Multistage TECs can achieve
trouble-free T s of 130 ◦ C, and single stage ones 60 ◦ C, provided care is taken not to
“short-circuit” them with heavy wires or mounts. This is adequate for work at 3.5 μm
and shorter, or with strong signals at longer wavelengths.
Getting BLIP performance at λ 5 μm requires cryogenic cooling, which is much
more involved. For lab use, LN2 cooling is usually best, because simplest; in a field
instrument, where LN2 is hard to come by, some sort of mechanical cooler, such as a
Joule–Thompson or Stirling cycle device, will be needed. Both alternatives are expensive.
In the extreme IR (beyond 20 μm), choices are more limited: often the choice is
between a helium cooled extrinsic photoconductor such as Ge:Zn, or a room temperature
bolometer or pyroelectric detector.
3.11.6 Reduce the Duty Cycle
The signal-to-noise ratio of a detection scheme can also be improved by concentrating
the signal into a shorter time, as in a pulsed measurement with time-gated detection.
Assuming the average optical power remains constant, as the the duty cycle† d decreases
the electrical SNR improves as 1/d, because the average electrical signal power goes as
1/d, and the average noise power is constant, because the noise bandwidth goes as d −1
but the detection time goes as d. The limit to this is when the shot noise of the signal is
reached—see Sections 10.8.2, 13.8.10, and 15.5.6.
3.11.7 Use Coherent Detection
By far the quietest and best-performing signal intensification scheme is coherent detection. It exploits the square-law properties of optical detectors to form the product of the
† Duty
cycle is the fraction of the time the signal is active: a square wave has a 50% duty cycle.
3.11 HACKS
signal beam with a brighter beam (often called the local oscillator (LO) beam, by analogy with a superheterodyne receiver). If the signal and LO beams have time-dependent
vector electric fields Es and ELO , respectively, the photocurrent is given by
i(t) = R
⎨ ⎩
|ELO (t) + Es (t)| dA
= iLO + is + 2 Re
⎨ ⎩
ELO (t)E∗s (t)dA
= (DC) + 2R iLO is
W (t) cos(θ (t))dA,
where R is the responsivity, is and iLO are the photocurrents generated by the signal and
LO beams alone, W is the ratio of the local value of |ELO Es | to its average, and θ is
the optical phase difference between the two beams as a function of position. If the two
beams are in phase, perfectly aligned, and in the same state of focus and polarization, the
integral evaluates to 1, so that the signal photocurrent sees a power gain of (iLO / is ). The
shot noise is dominated by the additive noise of the LO beam, but since the amplification
ratio is just equal to the ratio of the LO shot noise to the signal shot noise, the resulting
total signal to shot noise ratio is equal to that of the signal beam alone, even with an Es
equivalent to one photon in the measurement time—a remarkable and counterintuitive
result. This effect can overcome the Johnson noise of a small load resistor with only
a milliwatt or two of LO power. This remains so for arbitrarily weak signal beams, so
coherent detection offers an excellent way to escape the Johnson noise limit.
If θ is not 0 everywhere on the detector, the value of the integrals in Eq. (3.27) will
be reduced. Even slight differences in angle or focus between the two beams will give
rise to fringes, which will dramatically reduce the available amplification, and hence the
signal-to-noise ratio. This seriously restricts the field of view of a heterodyne system,
which may be undesirable. In some instances this restriction is very useful, as it allows
rejection of signals from undesired locations in the sample space.
When the light beams are at exactly the same frequency, this is called homodyne
detection, and when their frequencies differ, heterodyne. The SNR for heterodyne detection goes down by a factor of 2 because the signal power is averaged over all φ, and
the average value of cos2 φ is 0.5. Another way of looking at this is that a heterodyne
detector receives noise from twice the bandwidth, since an upshifted optical beam gives
the same beat frequency as a downshifted one (see Section 13.7.2). Temporal incoherence
between the beams will spread the interference term out over a wide bandwidth, reducing
the gain available as well (see Section 2.5.3). Examples of the use of this technique are
heterodyne confocal microscopes, measuring interferometers, and coherent cw lidars.
3.11.8 Catch the Front Surface Reflection
You can get a signal-to-noise boost, in really tight spots, by arranging the first photodiode
at 45◦ to the incoming beam and putting another one normal to the reflected light, wiring
the two in parallel so that their photocurrents add (Figure 3.8). That way, a sufficiently
1st Ref l
3rd Refl
2nd Refl
Figure 3.8. Catching the front-surface reflection from photodiodes. This trick improves detection
efficiency and can enormously reduce the effects of etalon fringes in the photodiode windows.
low NA beam has to make three bounces off a photodiode before it can escape. (Using
a smaller angle will increase the number of bounces but require bigger photodiodes.)
The front-surface reflection from an uncoated silicon photodiode is about 40%, so this
trick can result in a gain of almost 4 dB in electrical signal power (2–4 dB in SNR), a
highly worthwhile return from a small investment. With coated photodiodes, the gain will
be smaller, but even there, this is a simple and inexpensive way to pick up another several
tenths of a decibel in signal strength. Another benefit is that by collecting most of the
reflected light, you can greatly reduce the signal strength drifts caused by temperature
and wavelength sensitivity of etalon fringes in the photodiode windows. (This effect,
which often limits the attainable accuracy of CW optical measurements, is discussed in
Section 4.7.2.) Adding a third photodiode to bend the light path out of the page by 90◦
eliminates polarization dependence completely, and the two extra bounces improve the
light trapping even more. This approach is used in high accuracy radiometry.
3.11.9 Watch Background Temperature
In mid- and far-infrared systems, the detector is normally the coldest thing, and so
(although its emissivity is very high) its thermal radiation is weak. A convex lens
surface facing the detector will reflect a demagnified image of the detector and its
surroundings. In an imaging system, this will result in a dark region surrounded by a
lighter annulus, the narcissus effect, from Ovid’s story of the boy who fell fatally in
love with his own reflection. Narcissus is actually a good effect—it becomes a problem
when the image of the detector moves, or when its magnification < 1 as in our example,
so that it has hot edges and a large spatial variation. Since the detector’s emission is
wideband, in general narcissus is not an etalon issue. Good baffles and careful attention
to silvering are needed to ensure that radiation from the rest of the optical system
doesn’t dominate the detected signal.
Besides narcissus and other instrumental emission, nonuniformity in background temperature can mask weak infrared sources, as skylight does stars. The chopping secondary
mirror of Example 10.5 is one way of fixing this.
3.11 HACKS
3.11.10 Form Linear Combinations
Many measurements require addition and subtraction of photocurrents. This is best done
by wiring the detectors themselves in series (for subtraction) or parallel (addition). Doing
this ensures that both photocurrents see exactly the same circuit strays and amplifier gain
and phase shifts; this makes the addition or subtraction extremely accurate and stable,
without tweaks (see the subtraction trick of Section 18.6.1) and the differential laser noise
canceler of Section 18.6.5.† It is a bit confusing at first, but since the far ends of the
photodiodes are connected to very low impedance bias points (which are basically AC
ground), the series and parallel connections are equivalent for AC purposes; the noise
sources and capacitances appear in parallel in both cases.
3.11.11 Use Solar Cells at AC
One problem with good quality silicon photodiodes is their cost per unit area. A 5 mm
diameter photodiode can easily run $100, although some are available more cheaply
(down to $5). There are lots of applications in which more area is better, but cost is
a problem. If you have such an application, consider using solar cells. A 25 × 75 mm
amorphous silicon solar cell costs $5 in unit quantity, has a quantum efficiency of 0.5, and
responds well throughout the visible. It is very linear at high currents, and surprisingly
enough, if you use the cascode transistor trick (Section 18.4.4), you can get 3 dB cutoffs
up to 20 kHz or so. Some smaller cells work at 100 kHz. Because of leakage, you can’t
usually run much reverse bias, so if you’re using an NPN cascode transistor with its base
and collector at ground potential, bias the solar cell’s anode at −0.6 to −1 V. Besides
large capacitance and leakage, solar cells have serious nonuniformity—they often have
metal stripes across their faces, to reduce lateral voltage drops. On the other hand, for
photoelectrons per dollar, you can’t beat them.
3.11.12 Make Windowed Photodiodes into Windowless Ones
One good way of avoiding etalon fringes in photodiode windows is to use windowless
photodiodes. Many types of metal-can photodiodes can be used without windows, but
procuring such devices can be very difficult and expensive in small quantities. For laboratory and evaluation use, it is frequently convenient to remove the windows from ordinary
devices. The methods used most are filing or cutting using a lathe. These methods often
lead to metal chips or cutting oil being left behind on the die, possibly causing short
circuits, scratches, or 1/f noise and drift.
A much more convenient and safe method is to use a big ball-peen hammer, although
this may seem odd initially. Hold the diode in a vice by the leads, with the base resting
on top of the jaws, and tap the glass gently with the peen (the rounded side). It will turn
to powder, which can be removed by turning the diode over and tapping it against the
side of the vice. The protruding face of the ball makes the blow fall on the glass, but the
gentleness of its curvature ensures that it will be stopped by the metal rim of the case
before any glass dust is ground into the die.
Because the glass is clean and nonconductive, it does not lead to any long-term
degradation of the optical performance of the detector, and because any glass falling
† There’s
lots more on noise cancelers in Philip C. D. Hobbs, Ultrasensitive laser measurements without tears.
Appl. Opt . 36(4), 903– 920 (February 1, 1997).
on the die does so reasonably gently, no scratches result. The only problem with this
approach is that not all diodes are adequately passivated for windowless operation. The
passivation layer used in ordinary IC chips is usually a thick layer of silica glass produced
by a sol-gel process or by sputtering. Because the glass and the chip have very different
refractive indices (1.5 vs. 3.4 to 4), it is not so easy to AR coat a diode processed this
way, especially since the thickness of the passivation layer may be poorly controlled;
for best performance, it may be necessary to AR coat the die, passivate it, and then AR
coat the top of the passivation layer. Understandably, this is not often done, so that a
diode with excellent performance in a hermetically sealed package may degrade very
rapidly when the window is removed. The usual symptoms are a gradually increasing
dark current, together with rapidly growing 1/f noise and occasional popcorn bursts.
The otherwise good Hamamatsu S-1722 used in the example is in this class.
Aside: Hermetic Seals. Most ICs and other active devices are packaged in Novolac
epoxy. Lots of optoelectronic parts such as LEDs and photodiodes are encapsulated in
clear polycarbonate. CCD windows are often glued on with epoxy. All these plastics are
great, but there’s one thing they aren’t: hermetic. Water vapor diffuses readily through
plastic. The air inside a CCD package with an epoxied window will respond to humidity
changes outside with a time constant of a week or two; this can lead to condensation
inside the package in service, especially in cooled setups. If you’re building a system
with a cooled detector, insist on a glass-to-metal or frit-bonded seal, or else work without
a window and fight the dust instead.
3.11.13 Use an LED as a Photodetector
Direct bandgap devices such as GaAs diodes have very steep long-wavelength cutoffs,
which can reduce the need for short-pass filters. This can be used to good account in
highly cost-sensitive applications, at least when these detectors can be had at low cost.
Unfortunately, most such detectors are relatively expensive. One exception is ordinary
AlGaAs LEDs. These devices are inefficient as detectors; their quantum efficiency is
low, they have small areas, and the optical quality of their packages is extremely poor.
Nevertheless, their long-wavelength cutoff is very steep, and it can be selected to some
degree by choosing a device of the appropriate emission color and package tint. Where
spectral selectivity is needed and every nickel counts, they are sometimes just the thing.
3.11.14 Use an Immersion Lens
Although the minimum étendue cannot be reduced, remember that it contains a factor of
n2 . If you contact a hemisphere of index n to the photodiode, you can reduce its area by
n2 , thereby reducing the capacitance by the same factor and increasing the effective D ∗
too. This of course works only for n up to the refractive index of the photodiode material,
but this is 3.5 for Si and 4 for Ge. Plastic package photodiodes are a good candidate for
this, because their indices are similar to glass, so that UV epoxy or index oil can be used
easily. Thermoelectrically cooled HgCdTe devices really need this treatment.
3.11.15 Use a Nonimaging Concentrator
The immersion idea can be extended by using a nonimaging concentrator, as shown in
Figure 3.9. Repeated bounces off the sides of the cone cause the angle of incidence to
3.11 HACKS
Figure 3.9. Nonimaging concentrator for improving photon collection with a small diode. The
simplest kind is the cone concentrator, shown along with its unfolded light path; unfolding is an
easy way to see which rays will make it and which won’t. Due to near-normal incidence, the
bottom of the cone will need silvering.
increase. As shown in the figure, TIR cannot be relied on near the bottom of the cone, so
it will probably have to be silvered. Don’t silver the whole cone unless you have to, since
TIR is more efficient. There are better ways to make a concentrator than this, for example,
the compound-parabolic concentrator, which can achieve the thermodynamic limit.
3.11.16 Think Outside the Box
There are fashions in the detector business as elsewhere, and the received wisdom about
how to do things is always a mixture of actual engineering experience and “professional
judgment.” Sometimes it’s right and sometimes it’s wrong. For example, most solid state
thermal cameras are built in lithographically defined arrays much like CCDs or CMOS
imagers. They may be cryogenically cooled (InSb and HgCdTe) or run at room temperature (PZT, lithium tantalate, or microbolometers), but their basic physical outline is an
array of small pixels on a solid surface. This has significant effects on the performance
and economics of the devices—they tend to cost between $2000 and $40,000 and have
maximum NET of a bit below 0.1 K, in array sizes from 256 to 500,000. The small
pixel sizes require well-corrected lenses, which are very expensive in the infrared. Not all
applications absolutely need that many pixels, and for those uses, there’s an alternative
method as shown in Figure 3.10.
This sensor uses large pixels (3 × 5 mm) made of carbon ink applied to a 9 μm
film of PVDF by screen printing (T-shirt lithography). The film is freestanding in air,
leading to very low thermal conductance. Interestingly, a photon budget shows that the
best SNR is achieved by insulating the pixels (which makes them slow but sensitive) and
recovering the bandwidth by digital filtering afterwards, as we’ll do in Example 17.1.
The reason is that the insulation slows the sensor down by increasing the low frequency
sensitivity, rather than decreasing the high frequency sensitivity. The big pixels and low
resolution (96 pixels) mean that a simple molded polyethylene Fresnel lens works well.
The multiplexer is a little more of a problem, but it turns out that an array of ordinary
RF Communications
2.5 × 2.5 inch hole pattern
3 inch ID
Fresnel Lens
Figure 3.10. Footprints thermal infrared imager: (a) a cross-section view of the development
version shows its 50 mm diameter, 0.7 mm thick HDPE Fresnel lens, and 9 μm PVDF sensor film
freestanding in air; (b) a front view without the lens shows the 8 × 12 array of 3 × 5 mm carbon
ink pixels screen-printed on the PVDF.
display LEDs used as switches (see Section 14.6.1), one per pixel and driven by 5 V
logic lines, does an excellent job, leading to a sensor costing about $10, whose NET
is about 0.13 K. There are some other signal processing aspects we’ll look at in Section
18.7.2 and Example 17.1.†
† For
more details, see Philip C. D. Hobbs, A $10 thermal infrared sensor. Proc. SPIE 4563, 42– 51 (2001)
( for the gory technical stuff, and Philip C. D. Hobbs,
Footprints: a war story. Opt. Photonics News, pp. 32–37 (September 2003) (
footprints/fpwaropn.pdf) for the war story.
Lenses, Prisms, and Mirrors
In theory, theory and practice are the same. In practice, they’re different.
Although lasers and nonlinear optics get all the attention, most of the real work in an
optical system consists of making the beam you’ve got into the one you want and routing
it where you want it. This prosaic but necessary work is done by lenses, prisms, and
mirrors (with an occasional cameo appearance by a grating). In this chapter, we discuss
the workings of these simple devices, beginning with what they’re made of and how they
work. The bulk of the chapter is a heuristic treatment of what they actually do, and the
surprisingly subtle business of how best to combine them so that they do what you want
them to.
Designing a complex lens is a highly specialized art using special tools, and is beyond
our scope. Using and combining lenses that others have designed, on the other hand, is
a very practical skill that everyone in optics should have.
Glass is remarkable stuff. It can be far more transparent than air and stronger than steel.
It is useful for almost every optical purpose from lenses to torsion springs.
By and large, glass is a trouble-free material as well. It comes in a very wide variety
of types. The main properties we care about in optical glass are its refractive index, its
dispersion (i.e., how much the index changes with wavelength), and its transmittance. For
historical reasons, these are specified with respect to certain spectral lines of common
elements, which were discovered and labeled by Joseph von Fraunhofer in the solar
spectrum. The index usually quoted is nd , at the d line of helium at 587.6 nm. Older
references use the slightly redder D lines of sodium, 589.3 ± 0.3 nm, but it doesn’t
matter much. The dispersion is quoted as NF C ,
NF C =
n(486.1 nm) − 1
n(656.3 nm) − 1
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
"Normal Glass"
Crowns Flints
SF 11
BK 7
Immersion Oil
Fused Silica
Low Dispersion
High Dispersion
Figure 4.1. Refractive index nd versus reciprocal dispersive power Vd for Schott optical glasses
and common plastics.
The deep red C line at 656.3 nm is the Balmer α line of hydrogen, and the blue-green
F line at 486.1 nm is Balmer β. The quantity n − 1 governs the power of a glass–air
surface, so NF C is the ratio of the powers of a given surface at the F and C lines. The
classical way of quoting dispersion is the Abbe number V ,
V =
nd − 1
nF − nC
also called the reciprocal dispersive power (a big V means low dispersion). Figure 4.1
is a plot of the index versus dispersion for the optical glasses manufactured by Schott.
By and large, glass has a large coefficient of dispersion in the visible, which is highly
variable among different glass types, but it has low temperature coefficients of index and
of expansion.
Optical glasses are traditionally divided into two types: crowns, which have low
indices (1.5–1.6) and low dispersion, and flints, which have higher indices and dispersion.
The classical distinction was that anything whose V was over 50 was a crown, but the
two categories have become blurred over the years as new glass formulations have been
developed. The most common optical glass is BK7, a borosilicate crown glass with
n = 1.517. It is popular because it is inexpensive and works well. Glass prices span a
huge range—more than 100×.
Glass always has residual optical defects, such as bubbles and striae (long, thin regions
of slightly different optical properties). For a critical application such as laser windows,
choose a grade of glass whose striae and bubbles are guaranteed to be low enough.
For high quality applications, optical elements are often made of synthetic fused silica,
a very pure quartz glass made by a chemical vapor deposition process. Fused quartz, an
inferior material, is made by melting natural quartz sand. Fused silica comes in several
grades, differing in the density and type of their bubbles and striae, and in their OH
content. The O—H bond absorbs at 1.34 μm, 2.2 μm, and especially 2.7 μm. High-OH
fused silica is essentially opaque at 2.7 μm. Low-OH fused silica, such as the Infrasil
grade from Heraeus Amersil, can be pretty transparent there (depending on its thickness
of course).
Fused silica and many types of glass are chemically fairly inert, but different glasses
differ significantly. Hostile environments, such as continuous exposure to salt spray,
will weather the surface of the glass (and any coatings present), degrading its optical
performance. Severely weathered glass may appear to have a whitish crust on top. Fused
silica and crown glasses resist weathering quite well, but high index glasses n ≈ 1.8–2
are often environmentally delicate, as they have less quartz and more lead oxide and other
less inert materials. Some of these will darken and weather over time even in ordinary
lab conditions. The trade term for chemical inertness is stain resistance, and glasses are
specified for it. Corrosion is not always a disaster: Fraunhofer discovered that weathering
glass slightly increased its transparency—he correctly guessed that weathering the surface
produced a layer of lower refractive index, which reduced the surface reflection. Pickling
telescope lenses to improve their transmission was popular throughout the 19th century.
Temperature Coefficients of Optical Materials
The subject of temperature effects on optical elements is somewhat subtle. The temperature coefficients of expansion (CTE) and of refractive index (TCN) are both positive,
but CTE is specified in normalized (dimensionless) form, whereas TCN is just ∂n/∂T .
The time (phase) delay through a piece of dielectric is (n)/c, where is the length of
the optical path. The normalized temperature coefficient, TCOPL , is
1 ∂(n)
+ CTE.
n ∂T
For most glass types, TCOPL is approximately 10−5 /◦ C or a bit below. Fused silica
has a very low CTE—in the 5 × 10−7 range—but a big TCN, about 9 × 10−6 , so with
n = 1.46, its TCOPL is 7 × 10−6 . BK7, on the other hand, has a larger CTE, 8 × 10−6 ,
but a low TCN, only 1.6 × 10−6 ; its TCOPL is 9 × 10−6 . There are a few glasses with
negative TCN, such as that used for molded optics by Corning.† The only common solid
with a negative TCN is magnesium fluoride (MgF2 ). Air’s TCN at constant pressure
is about −1 × 10−6 (see below). Glass used in solid state laser rods is often specially
formulated to achieve a zero TCOPL . Etalons sometimes use two materials with opposite
signs of TCOPL , their thicknesses chosen so as to athermalize the path length.
The definition of TCOPL here is for the phase delay inside the dielectric, which is
relevant for discussion of fringes. In discussions of the temperature coefficients of lenses
or of a mixture of dielectric and free space, another temperature coefficient is also useful,
that of the differential optical path length through the element,
G = TCN + (n − 1)CTE.
† Mark
A. Fitch, Molded optics: mating precision and mass production. Photonics Spectra, October 1991.
Air and Other Gases
The expression (4.4) is actually only an approximation, since it leaves out the effects of
the refractive index of air. Air’s index is nearly 1, but it has surprisingly large temperature and pressure coefficients. An ideal gas behaves strictly as a collection of isolated
molecules, whose molecular susceptibilities are constant. Thus the dielectric susceptibility of an ideal gas is strictly proportional to its density, which in turn can be predicted
from the ideal gas law. What this means is that χ (and hence r − 1) is proportional to
pressure and inversely proportional to temperature,
(r − 1) ∝
Since r ≈ 1, a binomial expansion shows that
TCN = −
n − 1 ∂n
For dry air at T = 288K (15 ◦ C) and P = 101.325 kPa (1 atm), n = 1.00028, so that
TCN ≈ −1.0 × 10−6 /K and ∂n/∂P ≈ 2.8 × 10−6 /kPa. Thus air’s TCN is comparable
in magnitude to that of BK7. These are small numbers that aren’t usually much of
a concern, but they become very important in the design of interferometers, especially
Fabry–Perot etalons, and in the presence of temperature and pressure gradients. Humidity
is a second-order effect, because the water content of moist air is only a couple of
percent and the molecular susceptibilities of H2 O, N2 , and O2 are similar. Helium has a
susceptibility about 8 times less than air, with corresponding decreases in the temperature
coefficients. Note also that these are partial derivatives—TCN is quoted at constant P ,
and ∂n/∂P at constant T .
Optical Plastics
Plastic lenses have become very popular lately. They are lightweight, cheap (at least in
high volumes), and can be molded with a mounting surface integral with the lens element,
which helps a lot with assembly and alignment. Mass-produced aspheric lenses (perhaps
with a diffractive element on one side) make it possible to get good performance with
fewer elements.
On the other hand, plastic tends to be less uniform than glass, cannot readily be
cemented, and is harder to coat. Plastic also has much larger temperature coefficients of
expansion (≈150 ppm/◦ C) and of refractive index (≈100 ppm/◦ C) than glass. The most
popular plastic used is polymethyl methacrylate (PMMA), sold under the trade names
Lucite, Perspex, and Plexiglas. Others are polycarbonate (Lexan), cyclic olefin copolymer
(COC, sold as Zeonex and Topas), and CR39, used for eyeglasses.
Plastics don’t have the variety of optical properties found in glasses. Their indices
range from about 1.44 to 1.6, with some of the newest (and very expensive) ones reaching
1.7. They have a narrow but reasonable range of V , 30 to about 58, so that plastic
lenses can be achromatized. They have higher internal Rayleigh scatter due to the high
molecular weight of the polymer chains. They are less transparent in both the UV and IR
than most glasses and are more vulnerable to solvents and other environmental hazards.
UV exposure is especially damaging to some kinds of plastics, causing them to yellow
and craze. This is especially true of the polyethylene used in far-IR Fresnel lenses, as
used in automatic porch lights. Thermosets such as CR39 (nd = 1.50, V = 58) are about
the most durable optical plastics.
Alongside refractive index and dispersion, the transmission properties of a material govern its range of applicability. As a rule of thumb, a wide selection of good materials
is available between 300 nm and 3 μm; there is some choice between 200 nm and 15
μm; below 200 nm and beyond 15μm, most materials are strongly absorbing, so we take
what we can get.
UV Materials
Optical materials don’t go very far into the UV. The absolute champion is lithium fluoride,
LiF, which dies at about 120 nm, right around the Lyman α line of hydrogen at 121.6 nm.
The fluorides of barium (BaF2 ), magnesium (MgF2 ), and strontium (SrF2 ) are nearly as
good and are more practical materials—harder, easier to polish, and less vulnerable to
water. Water absorption can destroy the UV transmittance of LiF completely.
UV grade fused silica is useful down to 170 nm, but glass craps out at about
300–350 nm. Many types of materials are damaged by exposure to short wave UV
(below 320 nm or so); glass will darken, and plastics yellow and craze. Flashlamps
and arc lamps have strong UV emission, so this can easily happen even in visible-light
IR Materials
Optical glass and low-OH fused silica are useful out to 3 μm or so. Beyond there, the
choices diminish considerably; the best window materials are semiconductors like silicon
and germanium, both of which can be made transparent out to 15 μm or further. These
materials have high indices, 3.5 for Si and 4 for Ge. This is great for lenses, because
with a high index, large aperture lenses can be made with shallowly sloped surfaces,
so that aberrations are minimized. It is less helpful for windows, because Fresnel losses
are large, and the huge index mismatch makes AR coatings rather narrowband. These
materials are opaque in the visible, which is a pain because all your alignment has to
be done blind. (Don’t underestimate the difficulty of this if you haven’t tried it—you’ll
have a good deal less hair afterwards.)
There exist lower index materials with excellent IR transmission, but most of them
are toxic or water soluble. The best ones are diamond (if you can afford it, it goes from
230 nm to 40 μm with a weak interband absorption at 2.6–6.6 μm), zinc selenide (ZnSe),
arsenic trisulfide or triselenide glass (As2 S3 and As2 Se3 ), and sodium chloride (NaCl).
Good quality synthetic sapphire and calcium fluoride are also good if you can live with
their limitations (mild birefringence for sapphire and sensitivity to thermal shock for
CaF2 ). Others, such as thallium bromoiodide (KRS-5), are sufficiently toxic that only the
stout of heart and fastidious of touch should grind and polish them. These materials have
the enormous advantage of being transparent in at least part of the visible, which makes
using them a great deal easier.
In the far IR, some plastics such as high density polyethylene (HDPE) are reasonable
window materials. Their high molecular weight and polycrystalline morphology, which
make them highly scattering in the visible, are not a problem in the IR (Rayleigh scattering
goes as λ−4 ). Ordinary polyethylene or PVC (Saran) food wrap makes a good moisture
barrier to protect hygroscopic window materials from humidity. These films can be
wrapped around the delicate element without affecting its optical properties in the far IR
very much (although each type should be tested for absorption before use).
Unlike visible-light optical materials, most IR materials have low dispersion but have
huge temperature coefficients of index and expansion compared with glass (silicon’s
dn/dT is ∼170 ppm/K, and while As2 S3 ’s dn/dT is only about 10 ppm/K, its CTE is
25 ppm/K. Near- and mid-IR absorption depends on molecular vibrational modes.
It isn’t just the material that matters, but the surface quality too. Ray bending happens
at the surfaces, so they have to be accurate to a fraction of a wavelength to maintain
image quality. Total scattered light tends to go as [4π (rms roughness)/λ]2 , so optical
surfaces have to be smooth to ∼ λ/1000. The figure error is the deviation of the surface
from the specified figure, without regard for small localized errors, which are divided
into scratches and digs (small craters or pits). Scratches are usually just that, but digs are
often the result of a bubble in the glass having intersected the surface of the element.
A commodity colored glass filter might have a scratch/dig specification of 80/60,
which is pretty poor. An indifferent surface polish is 60/40, a good one is 20/10, and
a laser quality one (good enough for use inside a laser cavity) is 10/5. The scratch/dig
specification largely determines the level of scatter that can occur at the surface and also
affects the laser damage threshold and the weathering properties of the glass. It tells
something about how numerous and how large the defects can be but is a subjective
visual check, not anything that can be easily converted to hard numbers.
Figure error and scattering from scratches and digs are not the only ways that manufacturing variations can mess up your wavefronts. Striae and bubbles in the glass can result
in significant wavefront errors, even in a perfectly ground and polished lens, and will
produce a certain amount of internal scattering. Rayleigh scattering from the dielectric
sets the lower limit.
Lenses are more forgiving of surface errors than mirrors are, because the rays are bent
through a smaller angle at the surface. Tipping a mirror by 1◦ deflects the reflected light
through 2◦ , regardless of incidence angle, whereas tipping a window or a weak lens has
almost no effect at all.
An alternative way of saying this is that a figure error of wavelengths produces
a phase error on the order of (n − 1) waves in a lens and 2 waves in a mirror at
normal incidence. If the error is a tilt or a bend, even this reduced error tends to cancel
upon exiting the other side of the lens. If the mirror is operated at grazing incidence, to
reproduce the ray-bending effect of the lens, the surface roughness sensitivity is reduced
equivalently, because kZ is smaller.
Aside: Surface Error Sensitivity. The effects of roughness or surface error in a high
index material such as germanium (n = 4.0) can be even worse than in a mirror, when
quoted in terms of wavelengths. Since these materials transmit only in the IR, however,
the actual sensitivity in waves per micron of surface error is not much different from
glass in the visible.
Small-scale roughness produces a phase front with random phase errors. Since
exp(iφ) ≈ 1 + iφ, this appears as additive noise on the wavefront and produces a
scattering pattern that is the Fourier transform of the surface error profile. In Section
13.6.9, we’ll see an exactly analogous case in which additive noise produces random
phase shifts in signals. For the optical case, compute the deviation from perfect specular
reflection. Anything that doesn’t wind up in the main beam is scatter, so in the thin
object approximation (Section 1.3.9) the total integrated scatter (TIS) is
TIS ≡ Pscat /Prefl = 1 − exp(−(2kz z)2 ).
Quantities like z are implicitly filtered to spatial frequencies below k, since evanescent
modes don’t reduce the specular reflection.
Lenses are often poorly specified for wavefront error, since most of the time simple
lenses would be limited by their aberrations even if their figures were perfect. Mirrors
are usually specified with a certain degree of flatness (λ/10 at 633 nm is typical).
Although windows are very simple optical elements, they are unusual in that their purpose
is not usually optical at all, but environmental: a window separates two volumes that are
incompatible in some way. Ordinary windows in a house or an airplane are examples, as
are windows in a vacuum chamber, laser tube, or spectrometer cell. Microscope slides
and cover slips are also windows. A window is the simplest lens and the simplest prism.
The trick with windows is to make sure that their environmental function does not
detract from the working of the optical system. A beam of light is affected by any surface
it encounters, so windows should be specified with the same sort of care as lenses and
mirrors. There is a large range of choice among windows: material, thickness, coatings,
wedge angle, and surface quality. For work in the visible, in a benign environment,
choose windows made of an optical crown glass such as BK7 or its relatives. In a
corrosive environment, quartz, fused silica, or sapphire are better choices. Filters sold
for cameras, for example, UV or Video filters (not skylight or color correcting ones), are
pretty flat, have good multilayer coatings for the visible, and are very cheap. Be sure
you buy the multilayer coated ones.
4.5.1 Leading Order Optical Effects
The leading order optical effects of windows are shown in Figure 4.2. Rays entering a
window are bent toward the surface normal, so images seen through a window appear
closer than they are; a window of thickness d and index n shifts the image a distance
z = d(1 − 1/n) but does not change the magnification. The window thus looks from
an imaging point of view like a negative free-space propagation. This effect on images is
opposite to its effect on the actual optical phase; because light is slower in the window,
the phase is delayed by the presence of the window, as though the propagation distance
had increased . These effects are very useful in imaging laser interferometers, where they
allow a nonzero path difference with identical magnification in the two arms—tuning
the laser slightly shifts the phase of the interference pattern.
Figure 4.2. The leading order effect of a window is a shift in image position and a phase delay.
4.5.2 Optical Flats
A window that has been polished very flat (λ/20 or so) is called an optical flat. Flats
are used as phase references (e.g, in Newton’s rings and Fizeau wedges). Most of them
have a small built-in wedge angle of 30 arc minutes or so to reduce etalon effects, but
you can get them with two flat parallel surfaces.
It is normally possible to get lenses, windows, and mirrors of adequate surface quality
in a suitable material. The troubles we get into with optical elements don’t come so
much from random wavefront wiggles due to fabrication errors, but from aberrations,
front-surface reflections, and birefringence. Aberrations we’ll talk about beginning in
Section 9.2.2, but briefly they are higher-order corrections to the paraxial picture of what
optical elements do. As for the other two, let’s do birefringence first—it’s simpler.
4.6.1 Birefringence
(See Section 6.3.2 for more detail.) Birefringence in good quality lenses and windows
comes from material properties, as in sapphire or crystalline quartz, and from stress. The
stress can be externally applied as in pressure-vessel windows or lenses with poor mounts
(e.g., tight metal set screws). It can also be internal to the glass due to nonuniform cooling
from the melt or surface modification such as metalliding. The birefringence produced
by stress is n⊥ − n with respect to the direction of the stress:
n⊥ − n = K · S
where S is the stress in N/m2 (tensile is positive) and K is the piezo-optic coefficient
(or stress-optic coefficient). Most optical glasses have piezo-optic coefficients of around
+2 × 10−12 m2 /N, so that a compressive stress of 200 N on a 1 mm square area will
produce a n of around 0.0004. A few glasses have negative or near-zero K values, for
example, Schott SF57HHT, whose K is two orders of magnitude smaller (+2 × 10−14
m2 /N). Note that the K value is wavelength dependent, which matters at these low levels.
Stress birefringence comes up especially when we use tempered glass, in which the
residual stress is deliberately kept high in order to force surface cracks to remain closed.
Colored glass filters are usually tempered, so that it is dangerous to assume that the
polarization of your beam will remain constant going through a glass filter.
Good optical glass is carefully annealed to remove most of the residual stress.
Fine-annealed glass, the best commercial grade, has less than 12 nm of residual
birefringence in a 100 mm thickness. In a normal window of perhaps 3–5 mm thickness,
this is usually negligible, but in a large prism it may not be.
Intrinsic birefringence is encountered in windows made of sapphire and some other
crystalline materials. It is often vexing because these windows are usually chosen for a
good physical reason—sapphire has unusually high mechanical strength and unsurpassed
chemical and thermal inertness. Examples are flow cell windows in liquid particle counters, which may encounter liquids such as hydrofluoric acid (HF) solutions, which rapidly
destroy quartz and glass, or windows in the plasma chambers used for reactive ion etching. In such cases, we must be grateful that such windows even exist, and make the best
of their optical deficiencies.
Like most common birefringent optical materials, sapphire is uniaxial ( two of its
indices are the same). Its birefringence is fairly small and negative (n⊥ − n = 0.008)
and its index is around 1.8. The phase difference due to randomly oriented sapphire
amounts to a few dozen waves for a typical 3 mm thick window in the visible, which is
large enough to be very awkward. If we just need to get a beam in or out near normal
incidence, we can use so-called c-axis normal sapphire windows, where the optic axis is
normal to the surfaces, and light entering near normal incidence sees no birefringence.
If this is not possible, we must usually choose optical designs that are insensitive to
polarization, or compensate for the polarization error with appropriate wave plates or
other means. Should this be unreasonably difficult, it is often possible (though usually
painful) to use polarization diversity—doing the same measurement at several different
polarizations and choosing the ones that work best. Fortunately, it is now possible to
obtain quartz windows with a thin coating of transparent amorphous sapphire,† which
has the inertness of crystalline sapphire without its pronounced birefringence. There’s
more on these effects beginning with Section 6.3.1.
The good news about fringes is that they are very sensitive to many different physical
effects—displacement, frequency, temperature, air speed, and so on. Fringes are the basis
of a great many highly sensitive measurements, as we saw beginning in Section 1.6. The
bad news is that they are very sensitive to many different physical effects. The power of
interference fringes to turn small changes into big signals is not limited to the ones we
make on purpose, but extends to all the incidental fringes we create by putting things in
the way of light as it propagates.
4.7.1 Surface Reflections
All surfaces reflect light. An uncoated glass-to-air surface at normal incidence reflects
about 4% of the light reaching it, the proportion generally increasing for higher incidence
angles (depending on the polarization). This leads to problems with stray light, etalon
fringes, and multiple reflections.
† Research
Electro-Optics, Inc.
4.7.2 Etalon Fringes
There are lots of different kinds of fringes, associated with the names of Fizeau, Newton,
Haidinger, Fabry and Perot, and so on. All are useful in their place, all will occur accidentally, and all have baleful effects on measurement accuracy, independent of nomenclature.
The usual informal term for these unloved stripes is etalon fringes. The name should
not be allowed to conjure up visions of beautiful uniform beams blinking on and off as
the length of a carefully aligned Fabry–Perot cavity changes by λ/2—the fringes we’re
talking about are not pretty, not uniform, and usually not visually obvious.
Fringes arise from the linear superposition of two fields. In Chapter 1, Eq. (1.68) shows
that for two beams having the same shape, and whose phase relationship is constant across
them, the combined intensity of two beams is
iAC |peak = 2 iLO isig ,
where iLO and isig are the detected photocurrents corresponding to the two beams individually, and iAC isthe amplitude of the interference photocurrent (half the peak-to–valley
value). In the case of a single window, etalon fringes arise from the interference of fields
reflected from its front and back surfaces. Their interference causes large modulations
in the total reflectance of the window, which vary extremely rapidly with wavelength
and temperature. If the two reflections are comparable in strength, the net reflectance
varies from twice the sum of the two (16% in the case of uncoated glass) to zero. (Since
T + R = 1 for lossless elements, the transmittance changes from 84% to 100%.) The
magnitude of this problem is not commonly realized, which is part of what gives optical
measurements their reputation as a black art.
Since the size of the fringes depends on the amplitude of the stray reflections, it does
not decrease as rapidly as you might imagine with multiple reflections. A two-bounce
reflection, whose intensity is only 0.16% of the main beam, can cause a 3.2% p-v change
in the reflectance, and even a completely invisible five-bounce beam (i5 / i0 ≈ 10−7 ) can
manage a p-v change of 6 parts in 104 (with a long path length to boot, which multiplies
its sensitivity). In a more complicated system, the possibilities go up very rapidly; the
number of possible five-bounce beams goes as the fifth power of the number of elements.
Example 4.1: Polarizing Cube. Consider an ordinary 25 mm polarizing beamsplitter
cube, made of BK7 glass (nd = 1.517), with a broadband antireflection (BBAR) coating
of 1% reflectance. We’ll use it with a HeNe laser at 632.8 nm. As shown in Figure 4.3, if
the beam is aligned for maximum interference, the peak-to-valley transmission change is
about 4% due to etalon fringes. The cube is 60,000 wavelengths long (120,000 round trip),
so it goes from a peak to a valley over a wavelength change of 4 parts in 106 —0.0026 nm,
or 2 GHz in frequency. If the HeNe has a mode spacing of 1 GHz, then a mode jump
can produce a transmission change of as much as 2.8% from this effect alone. The temperature effects are large as well. BK7 has a temperature coefficient of expansion of
about +7 × 10−6 /◦ C, and its TC of index is +1.6 × 10−6 /◦ C. Thus the TC of optical
path length is 8 ppm/◦ C. At this level, a 1◦ temperature change will cause a full cycle
fringe shift, and for small changes on the slope of the curve, the TC of transmittance is
π(8 × 10−6 )(120, 000)(4%) ≈ 12%/◦ C for one element alone. You see the problem—a
1 millidegree temperature shift will make your measurement drift by 120 ppm. Fortunately, we’re rarely that well aligned, but being anywhere close is enough to cause big
Slope = 12%/K
Temperature (K)
Figure 4.3. Normal-incidence transmission of a polarizing cube for 633 nm HeNe laser light, as a
function of temperature.
4.7.3 Getting Rid of Fringes
Since we obviously can’t make decent narrowband measurements in the presence of these
fringes, we’ll just have to get rid of them. Two basic strategies are commonly used: get
rid of the fringes altogether, or cause them to smear out and average to some constant
value. There is a long list of approaches people use to do these things, because everybody
has to do them. None of them works that well, so a highly precise instrument usually
relies on a combination of them—wear a belt and suspenders.
Add Strategic Beam Dumps. This sounds like an arms reduction program, but really
it’s like emergency roof maintenance: put buckets in the living room to catch the leaks.
By calculating or finding experimentally where the front-surface reflections go, it is
possible to put an efficient beam dump to catch them. See Chapter 5 for more on beam
dumps—black paint is not usually good enough by itself. The main problem with this
is that if you don’t catch the stray beams before they hit other optical elements, they
multiply in numbers to the point that it is nearly impossible to catch all of them.
Cant the Window. Tipping the element so that the front- and back-surface reflections
are offset laterally from one another can be helpful. If the beam is narrow enough that
the two will miss each other completely, this is a tolerably complete solution, assuming
that there are no higher-order reflections that land on top of each other.
Apply Coatings. Front-surface reflections can be reduced by coating the glass. This is
less effective than we would wish, as we saw in the egregious polarizing beamsplitter
example above. A really good multilayer coating such as a V-coating can reduce Fresnel
reflections to the level of 0.25%, but rarely any further unless the optical materials
are specially chosen for the purpose. Such coatings are normally narrowband, but it’s
narrowband applications that really need them. V-coating the cube of Example 4.1 would
reduce the slope to a mere 3%/K.
Come in at Brewster’s Angle. Windows and prisms can be used at Brewster’s angle,
so that the p-polarized reflections go to zero. With highly polarized light and great care
in alignment, the reflections can be reduced to very low levels. To reduce the amplitude
reflection coefficient of an air–dielectric interface to less than requires an angular
accuracy of
2n1 n2
θ1 < n2 − n1
so if we require = 10−3 (reflectivity = 10−6 ) for a glass–air interface, we find that
the incidence angle must be within 9 milliradians, or about 0.5 degree. Since the
s-polarization reflection coefficient is
rs |θB =
n22 − n21
n22 + n21
which is larger than the normal incidence value, the polarization must be really pure.
With fused silica, rs = 13.8%, rising to 27.9% for a flint glass with n = 1.8.
The other advantage of Brewster incidence is that the residual surface reflection goes
off at a large angle, where it can be controlled easily. For a prism, it is more important
to control the internal reflection than the external one, because you usually can’t put
a strategically placed absorber inside an optical element. Thus if you have to choose,
AR coat the entrance face and put the exit face at Brewster’s angle. You will have to
accept the residual stress in the prism causing slight polarization funnies. Brewster angle
incidence is never quite what it promises to be.
Cement Elements Together. Elements whose refractive indices are similar can
be cemented together, which reduces the surface reflections. If the match is good, for
example a plano convex lens and a prism made of the same glass, or a colored glass
filter (n ≈ 1.51–1.55) and a BK7 window, the reflections can be reduced by factors of
100 or more. Another convenient feature is that when the two indices are closely similar,
Brewster’s angle is 45◦ , which is often a convenient incidence angle to work at.
Remember that the cement has to be index-matched as well! If the cement has an index
midway between the two glass elements, the reflectivities are reduced by an additional
factor of 2 on average, but anywhere in between is usually fine. Index oil can be used
in lab applications; it comes in standard index increments of 0.002 and can be mixed
or temperature controlled to better accuracy than that. Note its high dispersion (see
Figure 4.1) and high TCN.
Use Noncollimated Beams. By using a converging or diverging beam, all interfering
beams will be in different states of focus. This leads to lots of fringes across the field,
so that the average reflectance is stabilized. The average interference term drops only
polynomially with defocus, even with Gaussian beams, so more may be needed. With
careful design, a reflection can be eliminated by bringing it to a focus at the surface
of another element, and placing a dot of India ink or opaque there to catch it—baffle
design in its minimalist essence. Remember the astigmatism, spherical aberration, and
chromatic aberration caused by putting windows in nonparallel light. (Note: Parallel
includes collimated but is more general; it refers to any place where the image point is
at infinity.)
4.7.4 Smearing Fringes Out
Use Canted Windows and Low Spatial Coherence. Sometimes the beam is wide,
the element thin, or the allowable tilt small, so the reflections won’t miss each other
completely. Nonetheless, if the spatial coherence of the beam is low enough, canted
windows can still be useful. If the reflections are laterally separated by several times
λ/NA, where NA is the sine of the minimum divergence angle of the beam at the
given diameter, the fringe contrast will be substantially reduced. This works well for
narrowband thermal light such as a mercury line.
Use Time Delays and Low Temporal Coherence. The same trick can be played
in time. Fringes can be smeared out by taking advantage of the limits of your beam’s
temporal coherence, with a window several times c/(nν) thick, so that the different
optical frequencies present give rise to fringes of different phase, which average to near
zero (for narrowband sources, this will not be your thin delicate window). Do remember
the coherence fluctuation problem too.
Modulate the Phase. If the source is too highly coherent, etalon fringes can be
reduced by wobbling the window or the beam rapidly, as with rotating wedge prisms or
rotating diffusers, so that fringe motion is large and rapid compared to the measurement
time. Unless this is done really well, it is less effective than it looks. The strength of
the fringes depends on the time autocorrelation of the field at a delay corresponding to
the round-trip time. If the phases of the two are being varied continuously but slowly
compared to optical frequencies, what we get is fringes moving back and forth.
The kicker is that these fringes won’t in general average to zero. For instance, take a
triangle wave phase modulation of ±10.5 cycles, where the unmodulated fields exhibit
a bright fringe. Over a modulation cycle, the pattern will traverse 20 bright fringes and
21 dark ones, so that the average fringe amplitude is down by only a factor of 20. If
the amplitude changes by 5%, to 10.0 or 11.0 cycles, the average fringe amplitude is
0—assuming that your measurement time is an exact multiple of the modulation period.
Modulate the Frequency. Another way of applying phase modulation is to tune the
source wavelength rapidly (e.g., current tuned diode lasers). For modulation slow enough
that the entire apparatus sees nearly the same optical frequency, this is nearly the same
as the previous case, except that the range of phase shifts attainable is usually lower due
to the limited tuning range of inexpensive diode lasers.
If the tuning is fast with respect to the delay between the reflections, the two reflections
will be at a different frequency most of the time. Since the average frequencies of the
two are the same, if the laser is turned on continuously the two instantaneous frequencies
have to be the same twice per cycle of modulation (a stopped clock is right twice a day).
The autocorrelation thus falls off more slowly than you might hope as the modulation
amplitude increases, but nevertheless, this is a very useful trick, especially since by
adjusting the modulation frequency and phase, you can put an autocorrelation null at the
position of your worst reflection. It is especially good in reducing another etalon effect:
mode hopping from diode lasers used in situations where backscatter is unavoidable.
(Gating the laser can also improve matters sometimes.)
Put in a Wedge. Fringes are tamer if the two beams are not exactly collinear. Replacing
the parallel surfaces with wedged ones makes sure this will be the case, leading to fringes
of higher spatial frequency. These fringes will average out spatially if a large detector or
a spatial filter is used. If there is enough space available, or the angle can be made large, a
simple baffle will separate the two reflections completely. The key to this trick is to make
sure that there is no low-spatial frequency component in the fringe pattern. Interference
between two unvignetted Gaussian beams is an excellent example; the integral over the
detector of the interference term goes to zero faster than exponentially with angular
Allow the beams to be vignetted, or use less smooth pupil functions (e.g., uniform), and
all bets are off. Since the pupil function and the vignetting both enter multiplicatively, they
work like a frequency mixer in a superhet radio, moving the energy of your harmless high
frequency fringe down toward 0, defeating the spatial averaging. Detector nonuniformity
is also a problem here. Nonetheless, if this strategy is carefully applied, rejection on the
order of 104 (electrical) is easily possible.
4.7.5 Advice
You will note that all of these strategies require care, and that all will become less
effective very rapidly as the number of optical surfaces becomes larger, or their spacing
smaller. Keep your system as simple as possible and, in an instrument where high stability
is needed, be prepared to sacrifice aberration correction, which multiplies the number of
surfaces. It is usually preferable to make a measurement at half the spatial resolution
with 103 times better stability.
In a monochromatic system, eliminate all closely spaced, parallel planar surfaces,
cement things together a lot, and use mirrors rather than prisms for beam steering.
4.8.1 Plate Beamsplitters
These useful devices are just windows with an enhanced reflection coating on one side
and often an antireflection coating on the other. They were once made with a very thin
(10 nm or less) coating of metal on a glass plate, but such coatings are very lossy and
so are no longer widely used. Good beamsplitters use a half-wave stack coating instead.
Their problems are the same as those of windows, and the treatment is similar as well.
They are available in a wide range of splitting ratios, from 90:10 to 10:90, with a very
rough 50:50 being most common.
Beamsplitters are often used to allow illumination and viewing along the same optical
path, as in microscope vertical illuminators. In that case, the light encounters the beamsplitter twice, once in reflection and once in transmission. The optimal efficiency is only
25% and occurs with a 50:50 beamsplitter. This maximum is very flat; 60:40 and 40:60
both give you 24%, and even 80:20 gives you 16%. Thus the poor accuracy of the 50:50
beamsplitter is not much of a worry.
Beamsplitters are always polarization sensitive. Polarizing plate beamsplitters are
available at wavelengths of common high power pulsed lasers: 694 nm ruby and 1064 nm
Nd:YAG. The cement used in polarizing cubes is easily damaged by high peak powers.
These rely on multilayer λ/2 stacks oriented near Brewster’s angle, so that one polarization is passed nearly completely and the other nearly completely reflected. In these
devices, the transmitted beam is much more thoroughly polarized than the reflected one.
A dielectric plate inserted in a converging wave produces significant spherical aberration and, for off-axis points, astigmatism and coma as well (see Section 9.4.1). Either
use the first-surface reflection for imaging, and illuminate in transmission, or put the
beamsplitter before the objective lens, where the NA is low.
4.8.2 Pellicles
A pellicle is a plate beamsplitter on a diet. It consists of a 2–5 μm thick membrane
(typically made of nitrocellulose) stretched across a very flat metal ring, sometimes
coated. A pellicle is sufficiently thin that (at least in transmission) the aberrations it
introduces are small enough to ignore. It is surprisingly robust mechanically, providing
nothing actually hits the membrane.
Pellicles reduce the drift due to etalon fringes by making the etalon very thin, so that
the fringe period is large, and its variation with temperature relatively slow. This works
well enough for moderately narrowband sources, such as mercury tubes; with lasers, it
may or may not, depending on your accuracy requirements. In fact, with broader band
sources, pellicles tend to be a nuisance, as their broad fringes make the transmission
nonuniform on the scale of thousands of wave numbers (hundreds of nanometers wavelength). Their main benefit in white-light systems is that they essentially eliminate ghost
images due to the two reflecting surfaces.
Pellicles are not very flat—1 wave/cm typically, far worse than a decent plate beamsplitter. What’s more, pellicles are very sensitive to vibration and air currents, which
make them deform. A deformed or vibrating pellicle will not reflect a faithful replica
of the incident wavefront; the transmitted beam is almost unaffected by the vibration
but still suffers from the nonuniformity. The reflection from a pellicle is strongly angle
dependent, varying from about 16% to 0% with angle and wavelength. Cleaning pellicles is difficult—you obviously can’t use compressed air or lens paper, but in addition,
nitrocellulose softens in ethanol. You can get away with detergent and deionized water
or with isopropanol. As with gratings, it’s best not to get pellicles dirty in the first place.
4.8.3 Flat Mirrors
Flat mirrors are conceptually the simplest optical elements available and often are the
simplest to use, as well. Excellent quality mirrors are available at low cost, for a wide
range of wavelengths, and from many suppliers. There are three main dangers in using
them: neglect, leading to underspecifying or careless mounting; worst-case design, which
although commendable in most fields, is something of a vice in optical systems, since
it leads to very expensive overspecification; and blunders such as thumb prints. Mirrors
are often more sensitive to contamination than lenses and more difficult to clean. Mirror
coatings are discussed in detail in Section 5.2.
Some situations require high surface quality in mirrors: interferometers, flying-spot
systems, and the best quality imaging systems are examples. Even there, however, there
are lots of places in the system where the best mirrors are not needed. Before the
two beams of an interferometer have been split, and especially after they have been
recombined, the precision required of the mirrors is less than that of the end mirrors
of the interferometer arms. Mirrors made of ordinary float glass (as used in domestic
windows) are flat to better than 1 wave per cm of diameter. These very inexpensive
mirrors are good enough to pipe light into detectors, for example. If there are more than
a few mirrors in use, the total photon efficiency starts to drop dramatically if ordinary
protected aluminum is used. Depending on how valuable your photons are, you may
be much better off buying more expensive mirrors (or at least more expensively coated
ones) if you can’t simplify your optical system.
Glass prisms are used for dispersing light into its color spectrum, but most often for
beam bending and image erection, both of which usually involve internal reflections off
one or more faces of the prism. These reflections can be of two sorts: total internal
reflection (TIR), in which the light hits the face at an incidence angle greater than the
critical angle; or an ordinary reflection from a mirror-coated surface. Which of the two is
superior depends on the application. TIR prisms are often used because their efficiency
is superior to that of any mirror coating (provided the entrance and exit faces of the
prism are AR coated sufficiently well, and the TIR face is really clean). Mirror coatings
such as silver or aluminum are used in applications where the TIR condition is violated,
where the reflecting surface cannot conveniently be kept clean, or where the phase and
polarization shifts on total internal reflection are unacceptable. Some of the more common
types of glass prism are shown in Figure 4.4.
4.9.1 Right-Angle and Porro Prisms
Right angle prisms are used for bending a beam through roughly 90◦ as in Figure 4.4a.
Their performance is similar to a simple mirror oriented parallel to the hypotenuse of
Figure 4.4. Types of glass prisms: (a) and (b) right angle, (c) Dove, (d) penta, and (e) Littrow.
the prism. Light enters normal to one face, bounces off the hypotenuse (either through
total internal reflection or by means of a mirror coating), and exits through the other
face. This arrangement is commonly used in microscopes, where prisms are more easily
aligned and cleaned than mirrors, and where the high efficiency and spectral flatness of
TIR reflectors or metallic silver coatings is important. Another advantage of prisms for
microscopes is that the bulky optic is mostly inside the optical path, whereas the thick
substrate of a mirror is entirely outside it. This makes a prism system mechanically more
compact, an important attribute of parts that must slide in and out of the optical path.
The other way of using a right angle prism is shown in Figure 4.4b, which is the
typical operating mode of the Porro prism (which is just a gussied-up right angle prism).
Here the beam is reflected through 180◦ in one axis and undeviated in the other. The 180◦
angle is constant irrespective of rotations of the prism about an axis coming out of the
page. This property is purely geometrical—the 180◦ is made up of two 90◦ reflections
that add; rotating the prism through an angle φ will decrease the effect of the first
reflection by φ while increasing the second one by exactly the same amount. Even the
surface refraction cancels out, since the light emerges through the hypotenuse at exactly
the same angle it entered at.
Porro prisms have a big enough incidence angle for TIR, so they are usually uncoated.
There is a polarization shift on TIR, since the s and p polarizations have different phase
shifts. Porro prisms are usually used in L-shaped pairs, one for up–down reversal and
one for left–right, so as to produce an erect image. Provided that the prisms are at right
angles to one another, s polarization for one bounce is p for the other, so the polarization
shift cancels out.
Right angle prisms have one major pathology, which is the one that runs through this
chapter (and much of the first part of this book in fact): severe etalon fringes due to the
coincidence of the incident light and the front-surface reflections. We all have a natural
desire for the beam path to be as simple as possible, without complicated alignment at
odd angles. Unfortunately, this desire leads to lots of collimated beams and perpendicular
surfaces, which makes for lots of etalon fringes. It is thus in direct conflict with our other
natural desire, namely, to have our gizmos work when they’re done.
4.9.2 Dove Prisms
The Dove prism is an image rotating prism, which also inverts the image left–right. It
is a cylinder of square cross section and has faces cut at Brewster’s angle to its central
axis. Light propagating along the axis is refracted at the faces, bounces off one side, and
exits through the other face, being refracted back to axial propagation in the process.
If the prism is rotated about its axis, the image rotates at twice the speed of the prism.
Interestingly, the polarization of the light does not rotate—it stays more or less the same
(see Section 6.2.4 for why). Because of the two symmetrical refractions, there is no
angular chromatic aberration, as in a simple prism, but there is lateral —different colors
emerge moving in parallel directions but offset laterally from each other.
4.9.3 Equilateral, Brewster, and Littrow Prisms
An equilateral prism is commonly used for spectral dispersion. For use with polarized
light, employment of a Brewster prism, in which the light enters and leaves near Brewster’s angle, is generally superior. A Littrow prism Figure 4.4(e) is one in which the light
enters at Brewster’s angle and is reflected at normal incidence from the second face of
the prism. Light of a certain wavelength thus returns along the path of the incident light,
with other wavelengths dispersed on one side or the other. Such a prism is nice because it
avoids having the beam direction pass through inconveniently arbitrary angles, and leads
to a compact light path with few bends. Such prisms are commonly used as cavity mirrors
in argon ion lasers. The laser can oscillate only at the wavelength at which the light path
retraces itself. Littrow prisms are not particularly vulnerable to etalon fringes, because
the front-surface reflections go off at large angles. The residual external reflection can
be got rid of easily with a beam dump, and the internal one controlled with a patch of
black wax or other index-matched absorber placed where it hits the prism surface.
There are several types of compound dispersing prisms, of which the Amici prism
is representative. It has alternating triangles of high and low dispersion glass cemented
together, oriented like the teeth of a bear trap with the low dispersion prisms forming
one row of teeth and the high dispersion ones the other. This allows multiplication of the
dispersing power without the beam having to go in circles. The cemented construction
allows the internal surfaces to work near grazing, for high dispersion, without the large
surface reflections and sensitivity to figure errors. Such high dispersion prisms have been
superseded almost entirely by diffraction gratings, except in oddball applications where
polarization sensitivity or overlap of grating orders is a problem and the linearity of the
dispersion is not critical.
4.9.4 Pentaprisms
A pentaprism (Figure 4.4(d)) is an image erecting prism that maintains a constant 90◦
deviation between incoming and outgoing rays, independent of their incidence angle. The
physical basis of this is two reflections in one plane, as in the porro prism. The beam
undergoes two reflections from mirrors that are at an accurate 45◦ to one another (as
shown in Figure 4.4d). Unless the prism is made of a very high index material (n > 2.5 or
so), the steep incidence angle makes TIR operation impossible, so the reflecting faces are
normally silvered. The entrance and exit facets are both normal to the beam direction,
so pentaprisms produce etalon fringes but don’t exhibit much chromatic distortion in
parallel light. (See Section 4.10.)
4.9.5 Other Constant-Angle Prisms
There’s nothing special about 90◦ or 180◦ as far as constant deviation prisms are
concerned—the only requirement is to have two reflections from surfaces rigidly held
together. To make a 60◦ constant deviation, for example, you can use a 30-60-90 degree
prism. Send the beam in near normal to the short face. It will bounce off the hypotenuse
(by TIR) and the long side, then exit through the hypotenuse at normal incidence, deviated by 60◦ exactly. The incidence angle on the long side is only 30◦ , so it must be
silvered unless n > 2.0.
4.9.6 Wedges
A wedge prism is used for performing small deviations (up to perhaps 20◦ ) in the pointing
of a beam. Two such identical wedges mounted coaxially and independently rotatable
can be used to point a beam anywhere in a cone of 40◦ half-angle (except for a small
zone around 0◦ , caused by the inevitable slight mismatch between the angles of the two
prisms). This adjustment is considerably more compact and robust than a reflection from
two mirrors, but is somewhat less convenient to adjust and (like all refracting devices)
more prone to etalon fringes.
The tendency to produce fringes is better controlled in wedge prisms than in most
other refracting devices, since the surfaces are not parallel and the surface reflections can
often be isolated by applying black wax or other index-matched absorbing material in
strategic locations, or merely by making sure that none even of the high-order surface
reflections can reenter the main optical path.
4.9.7 Roof Prisms
Most of the standard prism types are occasionally modified by replacing one face with
a roof —a pair of surfaces at 90◦ to one another. The effect is identical to cementing a
right angle prism, hypotenuse-first, on top of the standard prism: an additional left–right
inversion takes place. The most common type is the Amici roof prism, which is a right
angle prism with a roof. It has the useful property of reflecting a beam through 90◦
without inverting it left-to-right. In imaging applications, the roof prism must be made
very carefully, because the ridge of the roof appears right in the middle of the field of
view; any imperfections will be quite obvious.
4.9.8 Corner Reflectors and Cats’ Eyes
A corner reflector (aka corner cube or retroreflector) is a constant 180◦ deviation prism.
These useful devices come in two varieties: hollow ones, which are built up from three
flat mirrors accurately perpendicular to one another, and solid ones, which are ground
from a block of glass. They have the threefold symmetry of a cube about its body
diagonal, but since the beam comes in and out at opposite sides, the optical symmetry is
sixfold—looking into the corner cube, you see your own eye cut by six radial segments
like an equatorial slice of an orange.
Solid retroreflectors may use TIR or may be coated with metal. The hollow ones tend
to work better (stock items are available with 2 arc seconds tolerance, vs. 20 to 50 for
solid). Solid ones have poorer transmitted beam quality and suffer from etalon fringes
and multiple reflections; those that rely on TIR also cause large polarization shifts. On
the other hand, solid retroreflectors are considerably easier to clean, very rugged, and can
be used as a vernier adjustment of polarization by rotating them slightly. The polarization
changes where the beam crosses a segment boundary, and the shift is big enough to cause
weird apodizations in polarizing applications. The phase is also not continuous across
the boundary, which will mess up focused or interferometric measurements. The net of
all this is that corner cubes work great, but if you want to do anything fancy with the
returned beam, it has to fit completely within one of the six 60◦ orange segments. It
follows that the displacement of the returning beam axis has to be at least two beam
diameters or so.
A retroreflector successively inverts kx , ky , and kz of the incoming beam on each reflection. The amplitude profile is reflected through the axis of the cube, and kout = −kin . For
collimated beams, this is identical to the action of a cat’s eye —a lens with a mirror surface at its focus. It is different for converging or diverging beams, of course, since the back
focus of the lens arrangement is reimaged at the back focal plane, whereas the retroreflector looks like a free-space propagation, so that the light would converge or diverge
considerably before returning. This approximate equivalence is useful in building focused
beam interferometers such as the ISICL sensor of Example 1.12. With a lens in one arm
and a corner reflector in the other, no fine alignment is necessary, apart from collimation.
4.9.9 Beamsplitter Cubes
A beamsplitter cube works the same way as a plate beamsplitter, except that the reflective
coating is deposited on the hypotenuse of a right angle prism, and another one is cemented
on top of it, forming a cube with a reflective surface at its diagonal. They are most
commonly polarizing, so that one linear polarization is reflected and the other transmitted,
similarly to the polarizing type of plate beamsplitter.
Cube beamsplitters are very widely used, much more widely than their merits deserve.
The advantages of the cube, namely, no beam deviation and easy mounting, are nowhere
near sufficient to make up for the deficiency we saw in Example 4.1: severe etalon fringes
in both the transmitted and reflected beams. If you do laser-based measurements, these
infernal devices will make your life miserable.
On the other hand, with broadband sources and low spectral resolution, etalon fringes
are not normally a problem, so cubes are a good choice. Even with lasers a bit of tweaking
can help; if your beams are narrow, canting the cube slightly will help by making the
reflections miss each other laterally, but you can’t go too far or the polarization quality
degrades badly. For experiments where you want good polarization and don’t mind
tweakiness, a cube mounted on a three-axis tilt table (such as a small prism table)
can often be adjusted to give polarization purity of 1 part in 105 or even better in the
transmitted beam.
Relaxing the requirement for no beam deviation can improve matters quite a lot more.
If the faces are polished at an angle of a few degrees to one another, much better control of
the reflections can be achieved. Such near-cubes are not catalog products, unfortunately,
but can be synthesized for lab purposes by removing the coating from one or two faces
and attaching wedge prisms with UV epoxy or index oil. There is only a very small
index discontinuity at the surface, so the repolished surface doesn’t have to be perfect,
making hand work feasible if you don’t have an optical shop.
Like all reflection type polarizers, polarizing cubes have problems with the polarization
purity of the reflected light; more on that appears in Chapter 6.
4.9.10 Fresnel Rhombs
In Section 1.2.6 we saw that beyond the critical angle, the reflection coefficients of pand s-polarized light both have magnitude 1 but have different phases, and that the phases
depend only on n and θi , as given by (1.15),
⎢ cos θi sin θi − n2 /n1 ⎥
δ = δs − δp = −2 arctan ⎣
sin2 θi
To get a 90◦ retardation requires that n2 /n1 > 1/( 2 − 1) = 2.41, which while not
impossible is inconveniently high. A Fresnel rhomb, shown in Figure 4.5, does the
trick by using two TIR reflections from a glass–air interface to produce an achromatic
quarter-wave retarder.
Figure 4.5. The Fresnel rhomb functions as an achromatic quarter-wave retarder.
Retarders in general are discussed in Section 6.9, but briefly, a quarter-wave retarder
can be used to change linear into circular polarization and back again. Most retarders
produce phase shifts by time delaying one polarization with respect to the other, so that the
retardation depends strongly on the wavelength. Fresnel rhombs do not, so that apart from
material dispersion, their phase retardation is constant with wavelength. This property
makes them unique. The two reflections make this another constant-deviation prism—0◦
this time. Two rhombs cemented together like a V make an achromatic half-wave retarder
with no beam deviation.
The retardation of a rhomb depends mildly on field angle, though less than that of
a birefringent retarder (since when one angle goes down the other goes up). The main
problem is the the long path length in glass. A 10 mm aperture rhomb made of BK7 has a
path length in glass of (10 mm)(2 sec(54◦ ) ) = 34 mm, so that material nonuniformities
produce retardation variations and phase wiggles across the aperture. Big rhombs are
thus less accurate than other retarders for narrowband applications, besides being heavy.
Glass prisms are pretty trouble-free devices. They share the normal problems of any
thick piece of dielectric, namely, residual birefringence, material nonuniformity, and
etalon fringes. In polychromatic measurements, chromatic effects are also important.
A thick piece of glass is not as uniform as an air space of the same size, so that the
waveform quality is poorer. Apart from polarization funnies, a prism has the same effect
on an image as a window whose thickness is the length of the optical path inside the
prism, unfolded (see Section 3.11.15). If the entrance and exit angles from the prism are
not equal, the equivalent window has a wedge angle as well. Reflecting prisms such as
pentaprisms and Fresnel rhombs can unfold to a very long path in glass. This means,
for example, that a pentaprism placed in a converging or diverging beam will introduce
large amounts of spherical aberration if care is not taken.
A big chunk of dispersive dielectric will cause lots of chromatic aberration if either
the entrance and exit angles are different or the incident light is not parallel (focused at
The sign conventions used in lens design are simple, but tend to be hard to remember,
because they are completely arbitrary. Here are the four rules.
Sign Conventions in Lens Design
1. The object is at the extreme left of the drawing, and the image at the right (not so
for mirrors of course).
2. The radius of a curved surface is positive if it is convex toward the left.
3. Distances along the direction of ray propagation are positive. If the ray would have
to back up to get from the object to the lens or from the lens to the image, the
distance is negative. Both do and di are positive when the image is real (true also
for mirrors).
4. Phase increases with extra propagation distance; a ray that has to travel further
than it should to get to a reference surface has a more positive phase, and so a
positive aberration coefficient.
Glass lenses have been used for over a thousand years, since transparent glass became
available. The fact that useful lenses could be made in the early days of glassmaking
is an excellent indication of their forgiving qualities; for such precise artifacts, lenses
are remarkably hard to get wrong. The main danger to the beginner is getting the signs
A lens images one space (the object space) into another, the image space. Since light
propagation is time-reversal symmetric, lenses work fine backwards too; thus the choice
of which is the object and which the image is somewhat arbitrary, so the two are often
lumped together as conjugate points, or just conjugates. We covered the paraxial thin
lens case in Section 1.3; here we go into the more general case.
4.11.1 Thin Lenses
The simple lens, usually made of glass and having spherical surfaces, is the most useful basic optical component. Although they are not in themselves adequate as imaging
devices, except well into the infrared or at very low numerical aperture, they can be
built up into lens systems that perform remarkably well. The simplest approximation to
what a lens does is the thin-lens approximation, where the total thickness of a lens is
vanishingly small. How thin is thin in real life? The thickness of the lens has to be small
compared to the depth of focus of the beam you’re using,
so that not knowing just where the ray bending is really going on doesn’t affect the
A thin lens is characterized by its focal length f or equivalently by its power P ,
which is 1/f . If its radii of curvature are r1 and r2 , it has a focal length f (in air) given
by the so-called lensmaker’s equation,
= (n − 1)
From the rules in the text box, both radii are measured from the right side of the
lens, so that for a biconvex lens r1 is positive and r2 negative; thus they both contribute
positive power in (4.14). The powers of thin lenses placed in contact add.
Sag 1
Sag 2
Figure 4.6. A thick lens acts like a thin lens, but operating between the principal planes. In the
thin-lens limit, the principal planes collapse at the center of the lens. (Adapted from Kingslake.)
This approximation is convenient for initial layout of an optical system, but the effects
of thickness must be put in before the real optomechanical design is done; the condition
(4.13) is extremely stringent, requiring that a 1 mm thick lens of 10 mm diameter have
a focal length f 800 mm when used with a collimated red HeNe beam.
4.11.2 Thick Lenses
Fortunately this ferocious restriction can be got round easily; Gauss himself recognized
that a lens of finite thickness has imaging properties very similar to those of a thin lens,
except for the location error, and that this error can be eliminated by splicing in a small
section of hyperspace, as shown in Figure 4.6† .
The planes P1 and P2 are the principal planes of the lens and intersect the lens axis at
the principal points. Notionally, a paraxial ray coming from the left is undeviated until it
hits the first principal plane P1 . It is then magically translated to P2 at exactly the same
lateral position (height) and then bent as if a thin lens of the same focal length were at P2 .
The focal points F1 and F2 are the front and back foci, respectively. The axial distance
from the left vertex of the lens to F1 is the front focal distance or working distance, and
that from the right vertex to F2 is the back focal distance (also confusingly called the
back focus). These are what you’d measure with a caliper and are tabulated by the lens
manufacturer. The back focus is nearly always less than the focal length (BF < FL).
If the refractive indices of the media on both sides of the lens are the same, then
f1 = f2 . If not, the Lagrange invariant (see Section 9.2.9) can be used to show that
n1 /f1 = n2 /f2 .
The lensmaker’s equation can be generalized to the case of a single thick lens;
P =
t n−1
= (n − 1)
n R1 R2
where t is the thickness of the lens, measured from vertex to vertex. The front and back
focal distances are
t (n − 1)
WD = f1 1 −
† Rudolf
Kingslake, Lens Design Fundamentals. Academic Press, Orlando, FL, 1978, p. 49.
BF = f2
t (n − 1)
and the separation 2δ between the principal planes is
2δ = t + BF − WD =
t (n − 1)
It is reasonable to take the center of the lens to be halfway between the front and
back foci.
The axial distance from the vertex of a surface to its rim is called the sagitta or sag
for short. The same term is used for an off-axis point, but there it is not the sag of the
surface but the sag of that point, so confusion seldom arises. For a spherical surface of
radius R and element diameter d,
sag = R 1 −
2R 2
Example 4.2: Biconvex Lens. Consider a 100 mm f /4 symmetrical biconvex lens
(25 mm diameter), made of BK7. From the lensmaker’s equation, R1 = 200(1.517 − 1)
= 103.4 mm. Over a 25 mm diameter, each surface will have a sag of about 252 /(412)
or 1.5 mm. If we leave 1 mm edge width, then t ≈ 4 mm. If we take that thickness,
then the thick-lens equation gives us (in units of meters and diopters)
10 diopters = 0.517
1.517R 2
where the second term is expected to be a small perturbation. We can either use the
quadratic formula or just use the approximate value of 0.1034 m we got before to plug
into the correction term; either way, we get R = 104.7 mm. Since the refractive index is
uncertain at the level of ± 0.002, and most of the time the tolerance on focal length is
a couple of percent, the iterative method works fine. Note also that we have discovered
a useful rule of thumb: for glass of n = 1.5, the surface radii are equal to f for an
equiconvex lens (f /2 and ∞ for a plano-convex).
Note that the temperature coefficients of index and of expansion (both positive) fight
each other in (4.15); as T increases, the radii, thickness, and index all normally go up.
This is in contrast to what we saw earlier for the temperature coefficient of optical path
length. Since the net effect can be made positive or negative, it is possible to athermalize
even a single element lens, so its focal length is nearly constant with temperature. Of
course, the mounting and other mechanical parts must be considered in an athermalized
A thick lens can easily be put into the ABCD matrix formulation. Consider a thick
lens of focal length f whose principle planes are at ±δ. The ABCD matrix for this is
composed of a thin lens L(f ) with a (negative) free-space propagation operator Z(−δ)
on each side:
⎢ 1 + f − 2δ − f
LT (f ; δ) = Z(−δ)L(f )Z(−δ) = ⎢
One subtle but important point: you might think that the symmetry of the operator
form in (4.21) would mean that the lens can be put in backwards without any problem,
but that isn’t so. For an asymmetric lens, the front and back focal distances are different,
so putting the lens in a mount backwards will move the center of the lens, and so cause
a focus shift. It also changes the aberrations. This somewhat subtle effect leads to a
huge gotcha if the lens is nearly, but not quite, symmetric; the asymmetry may not be
immediately obvious, leading to blunders in assembly.
4.11.3 Fast Lenses
A lens with a short focal ratio (diameter/focal length) produces a bright image, due to
concentrating light from a large angular range. In photographic applications, this allows
a short exposure time, so that the lens is said to be fast. Fast lenses bend rays sharply,
for which they need highly curved (steep) surfaces. Unfortunately, aberrations increase
rapidly with increasing incidence angles of the rays; making a fast lens with good image
quality is challenging. A rule of thumb to minimize spherical aberration is to minimize
the maximum incidence angle of any ray on any surface. Thus when using a single
element lens to focus a beam, use a plano-convex one with its flat side toward the focus.
Use a double-convex lens for 1:1 imaging, and a meniscus lens in a converging beam.
Aside: Short and Fast. If you’re wondering why we bother with fast and short instead
of, say, large, it’s another one of those historical quirks. A rather slow lens whose focal
length is 8 times its diameter is said to be an f /8 lens, pronounced “eff eight.” The
aperture rings of camera lenses just say “8.” People who think this is 1/8 would say
that the ratio is small, while those who think it’s 8 would say it was large. Everybody
knows what fast and short mean—fast exposures and a short focal length for the lens
diameter. Increasing and reducing the aperture with a diaphragm are always known as
opening up and stopping down, also from camera√lore; the detentes on an aperture ring
are known as stops and go in integral powers of 2: 1.4, 2, 2.8, 4, 5.6. . .. Doubling the
aperture would be “opening up 2 stops.” Don’t confuse this with the other use of stop,
as in field stop and aperture stop —this could get muddled, since when stopping down
the aperture, you’re adjusting the aperture stop.
4.11.4 Lens Bending
The lensmaker’s equation shows that the power of a lens can be distributed between the
two surfaces by increasing one radius while decreasing the other. This procedure is called
lens bending and leads to lenses of the sorts shown in Figure 4.7. Lenses of different
types have identical paraxial properties, but their aberrations differ when the focal ratio
is shorter. Lens bending is the most important degree of freedom in lens design.
Figure 4.7. Lens types: (a) double convex, (b) plano-convex, (c) positive meniscus, (d) doubleconcave, (e) plano-concave, and (f) negative meniscus.
4.11.5 Dependence of Aberrations on Wavelength and Refractive Index
A single element 1-inch f /2 glass lens bent for minimum spherical aberration has about
10 waves RMS error at 588 nm. If it had an index of 4, that would be 1 wave; at 10.6
μm, it’s 0.05 waves—diffraction limited.† The time-delay differences between different
components do not change with wavelength, apart from dispersion, but as the wavelength gets longer, these differences become small compared to a wavelength, which is
what diffraction limited means. Another way of looking at it is that as λ increases, the
diffraction spot grows until it eventually dwarfs the geometric errors.
4.11.6 Aspheric Lenses
Because of the limitations of simple spherical lenses, it is natural to consider two possible ways of improving their performance: using them in combination, and relaxing the
seemingly astrological constraint of spherical surfaces. Aspheric lenses can indeed perform better than simple spheres, and in low precision applications such as condensers, or
large volume applications such as disposable cameras (where the lenses are made by a
sophisticated plastic-over-glass molding process), they can be an excellent solution. The
plastic-over-glass approach minimizes problems with the temperature coefficient and poor
transparency of the plastic. Aspheric lenses are also commonly made by hot-pressing a
few waves of asphericity into a glass preform (Corning) and by a sol-gel process based on
tetramethyl orthosilicate (TMOS), which can be turned into a gel consisting of pure silica
and then cured and baked to make it fully dense (Geltech). One-off custom aspheres are
difficult to make and so are too expensive for most purposes. Molded glass aspheres can
have good optical performance (e.g., a single element 4 mm 0.55 NA laser collimating
lens (Corning 350160) with λ/20 RMS wavefront error—a Strehl ratio of around 0.95).
It is not really that a particular asphere is so very awkward to fabricate, at least not
more so than another one; rather, what is at work is the strong tendency of any surface
being polished to become spherical. This tendency is largely responsible for the fact that
a small optical polishing shop producing components with surface accuracies measured
in nanometers usually looks rather lower tech than an auto garage. The precision comes
from the lens grinder’s skill, the ease of testing the particular property sought (i.e.,
focusing), and from the surface’s own seeming desire for sphericity.‡
Making aspheric lenses or mirrors requires resisting this desire, either by generating
the surface with computer numerically controlled (CNC) diamond machining, or by
† McGraw-Hill
‡ Large
Encyclopedia of Lasers and Optical Technology, p. 530.
optical shops nowadays have big surface generating machines that can polish many lenses at once.
nonuniform grinding and polishing, combined with iteration after iteration of polishing
and careful interferometric measurement using a precisely made null corrector. Both
procedures are expensive, and diamond machining has the additional disadvantage that
the tool marks left behind tend to scatter large amounts of light when the element is used
with visible light (it is much less of a problem in the IR).
4.11.7 Cylinder Lenses
The most common type of asphere is the cylindrical lens. These are widely available and
relatively inexpensive, but their optical quality is notoriously poor. Grinding a lens with
one accurately circular cross section and one accurately rectangular one is nontrivial.
Few applications of cylindrical lenses really require high accuracy, fortunately. Cylinders are often used as light buckets, directing light to a slit, as in a spectrograph, or to a
linear photodiode array. Their one common, moderate accuracy application is in anamorphic pairs for correcting gross astigmatism or distortion, as in diode laser collimators; a
better choice for this application is the combination of anamorphic prisms and controlled
misalignment of the collimator. Despite the nice catalog pictures, cylinder lenses are
lousy, so don’t design systems requiring accurate ones.
4.12.1 Achromats and Apochromats
For nearly all dielectric materials at nearly all wavelengths, the dispersion coefficients
are positive; that is, n increases as λ decreases. The only exceptions are deep within
absorption bands, where you won’t want to use the stuff anyway. Thus it is not possible
to color correct merely by sandwiching two plates of opposite dispersion.
On the other hand, you can color correct lenses by putting a positive lens of a high
dispersion material next to a negative lens of low dispersion material, or vice versa.
Let’s take the positive lens case. A powerful positive lens made from crown glass next
to a weaker negative lens made from flint glass produces a weakened but still positive
two-element lens. If the powers of the two elements are adjusted correctly, then as we
go to shorter λ, the increasing positive power is balanced by the increasing negative
power, so that the net power of the combination is nearly constant. With two elements,
the lens can be color corrected at two wavelengths and is called an achromat. Exploiting
the different shapes of the dispersion curves of different glasses, color correction can
be extended to three or more wavelengths, giving much lower average error over the
wavelength band of interest; such a lens is called an apochromat.
A side benefit of needing two lenses to get color correction is that the extra degrees
of freedom can be used to improve the monochromatic aberrations as well; a commercial
achromat has so much less spherical aberration that its wavefront error will generally be
10 or more times better than a single-element lens of the same diameter and focal length.
Example 4.3: Achromatic Doublet. Suppose we want to make an achromatic 200 mm
f /8 lens, corrected so that the F and C wavelengths come to a common focus. We’ll use
BK7 (nd = 1.51673, nF = 1.52224, nC = 1.51432, so NF C = 1.01539 and V = 65.24)
for the front element, and SF11 (nd = 1.78446, nF = 1.80645, nC = 1.77599, so NF C =
1.03925 and V = 25.75) for the rear. The crown glass is usually more durable than the
flint, so it is put on the exposed side unless there is a compelling reason not to. Real
lens designers do this by exact ray tracing. We’ll do it paraxially with the lensmaker’s
Ptot = P1 + P2 = (n1 − 1)
= P1d
+ (n2 − 1)
(n1 − 1)
(n2 − 1)
+ P2d
(n1d − 1)
(n2d − 1)
where R1 and R2 are the surface radii of the first element, and R3 and R4 are those of
the second element.
The achromatic condition requires PF = PC , which leads to
= −2.534
and hence P1d = 1.652P , P2d = −0.652P . It is slightly simpler to express this in terms
of the ratio of P1C /P2C and NF C . For this lens, we require a positive element 1.65 times
stronger than the combination, and a negative element 0.65 times as strong. We haven’t
specified anything about the bending of this lens, so we can put most of the power in
the buried surfaces; nevertheless, it is difficult to make a cemented achromat faster than
f /1.4 this way. It is possible to distribute the chromatic correction across the groups of
a more complicated design to achieve good color correction at short focal ratios.
4.12.2 Camera Lenses
Camera lenses are a wonderful and inexpensive resource for optical instrument builders.
For a hundred bucks or so, you get a very well corrected lens, preadjusted, mounted, and
tested, on its own focusing ring and with an aperture diaphragm. Mating bayonet mounts
are available to attach them to your system.
Bargains like that are not common in optics, so take advantage of it while it lasts.
Camera lenses tend to have lots of elements, so their stray light and etalon fringe performance is not as good as simpler designs. Use the slower, fixed focal length designs rather
than the superduper fast ones, the extreme wide angles, or the zooms; they have better
image quality and a lot fewer elements to scatter and make fringes. Ordinary camera
lenses are best for distant objects; the macrolenses are better for magnifications of 1:10
to 1:1. For higher magnifications, turn the lens around—for 1:1 to 10:1, use a macrolens
backwards, and for magnifications greater than 10×, use an ordinary lens backwards
(watch out for the soft coatings on the rear elements—they’re not as durable as the hard
front-surface coatings). For large magnifications, you can also use enlarger lenses, which
are somewhat cheaper.
Camera lenses are often described as having, say, 13 elements in 6 groups (about
typical for a zoom lens in 35 mm format). That means that there are 13 bits of glass, but
that some of them are cemented together, leaving only 6 × 2 = 12 air–glass surfaces.
Since the air–glass surfaces scatter more light, this is worth knowing; a 13-element lens
with 13 groups would have 26 surfaces, so its surface reflections would likely be much
worse. The etalon fringes in a lens like this would daunt the bravest, but even in a
white-light application with good coatings, around a quarter of the total light would be
bouncing around inside the lens barrel, much of it eventually arriving at the image plane
to reduce the contrast and introduce artifacts. The 13/6 lens would be about half that bad.
4.12.3 Microscope Objectives
A lot of optical instruments can be cobbled together from microscope parts. Microscopes
are high image quality systems built in modular fashion. Their optical trains must be
designed to support this use, and those are just the qualities we want in instrument
prototypes. Usually we just use the objectives, but sometimes an eyepiece or a trinocular
head is useful too—you can’t align what you can’t see (see Section 11.8).
Microscope objectives are specified by magnification and numerical aperture. To find
out what the actual focal length is, divide the magnification into the tube length, which is
nearly always 200 mm, or 160 mm for old designs. Thus a modern 20×, 0.5 NA objective
normally has a focal length of 10 mm. The working distance will be significantly less
than this, which is sometimes very inconvenient in instruments; we often have to get
other things in between the sample and the lens. Since this gets worse with shorter focal
lengths, and since the NA and not f controls the resolution, 20×, 0.5 NA is the most
all-round useful microscope objective for instruments.
Long working distance objectives are available; the longest ones come from Mitutoyo
and are big enough to use as antiaircraft shells. They get rapidly more expensive as the
aperture and working distance increase.
Some high NA microscope lenses come with an adjustable correction for cover glass
thickness, which dials in a controllable amount of spherical aberration to correct for that
introduced by the cover slip. This can be useful in other situations as well.
Microscope lenses exhibit severe longitudinal chromatic aberration; different colors
come to focus at different depths. This is a trade-off based on the characteristics of the
human visual system, which has poorer spatial resolution in the blue, but is obnoxious
in some instrument applications, such as white light scanning microscopy. For such
applications, and for use in the UV or IR, where microscope lenses are tough to get, you
can use an all-mirror microscope objective, the Schwarzschild objective.
In choosing a microscope objective, know what it is you need. For applications not
requiring the highest quality, such as the condenser side of a spatial filter, use a commodity objective such as the cheap ones from American Optical, Newport, Swift, or several
others. For imaging, or on the collection side of a spatial filter, a good objective such as
a Nikon, Olympus, Reichert, Leitz, or Zeiss will work much better. Japanese objectives
tend to be nicely corrected in the objective itself, which makes them useful for other
purposes where you don’t want the microscope attached.
4.12.4 Infinity Correction
A lens designed to operate with its image at infinity is said to be infinity corrected . Most
modern microscope objectives are infinity corrected, because the resulting parallel light
exhibits no chromatic errors when being piped through bending prisms, and the system
aberrations do not depend strongly on where subsequent optical components are located.
These properties make infinity corrected lenses extremely useful in building instruments.
Camera lenses run backwards are another example of infinity correction—most of
them are designed to have the object at infinity, but by time reversal symmetry, this is
exactly equivalent to having the object at the film plane and the image at infinity. A pair
of camera lenses operated nose to nose makes a good transfer lens, for example, to image
the center of one scan mirror onto the center of another one, or to image an acousto-optic
cell at the pupil of a microscope objective to make a scanning microscope. Note that the
antireflection coating on the back of the lens is often much softer than the one on the front,
and so much more easily damaged in cleaning. Similarly, the glass itself is often more
delicate, and of course all the mechanical works are exposed to damage or corrosion.
4.12.5 Focusing Mirrors
Curved mirrors can do nearly anything lenses can but have a completely different set
of design trade-offs. A focusing mirror causes much larger ray bending than a lens of
the same focal length; this makes it much more sensitive to surface inaccuracies and
misalignment than a lens, but also can lead to a more compact optical system, due
to the opportunity for folding the light path. Folding leads to two problems, however;
obscuration, as mirrors partially shadow one another, and light leakage, as far off-axis
light can often get into the detector without having traversed the entire optical system.
Baffles can eliminate leakage, but eliminating obscuration requires the use of off-axis
aspheric mirrors, which are very difficult to align among other faults.
Mirrors exhibit no dispersion, so chromatic aberration is eliminated in all-mirror systems; the expected improvement in optical quality is not always realized, since it is much
easier to make multielement lens systems than multielement mirror systems. On the other
hand, for UV and IR use, it is delightfully easy to be able to focus the system with a
HeNe laser and be sure that it will be perfectly focused at 335 nm or 1.06 μm. Most
lenses have poorly corrected longitudinal chromatic aberration, so that different wavelengths come to focus at slightly different depths. Where spatial filtering is used with
broadband illumination (e.g., real-time confocal scanning optical microscopy, where a
disc full of pinholes is spun to make an array of moving spots), longitudinal chromatic
is extremely objectionable, so mirror systems make a lot of sense.
Because there are no transparent surfaces in an all-mirror system, there is no opportunity for etalon fringes to form, which can be a very important advantage.
Where there are fewer surfaces, there are fewer degrees of freedom to optimize, and
mirrors are much more seriously affected by surface errors than lenses are. For these
reasons, aspheric mirrors are much more common than aspheric lenses. Overall, mirrors
are wonderful in special situations such as working with invisible light, but there is
nothing like an optical system based on focusing mirrors to make you appreciate the
forgiving qualities of lenses.
Aside: Off-Axis Performance of Focusing Mirrors. Fast mirrors have amazingly bad
oblique aberrations, and off-axis ones are the worst. For example, a 25 mm diameter,
f /1, 90◦ off-axis paraboloid exhibits a spot size of approximately 30% of the off-axis
distance—if you go 100 microns from the axis, the spot grows from the diffraction limit
to 30 μm diameter. You really can’t run any sort of field angle at all with those things.
4.12.6 Anamorphic Systems
An anamorphic system is one whose magnifications in x and y differ. The main uses of
these are to correct for perspective distortion caused by oblique imaging, as in a picture
of the ground taken from the side of an aeroplane, and to circularize elliptical beams
from diode lasers. There are two main types: prism or grating systems, and telescopes
made from cylindrical lenses. When a beam encounters a surface, its edges define an
illuminated patch. If the beam comes in near grazing incidence, its illuminated patch will
be greatly elongated. The law of reflection guarantees that a reflected beam will have the
same shape as the incident one, but if the beam is refracted or diffracted at the surface,
this is no longer the case. On refraction, a beam entering near grazing incidence will
leave near the critical angle, elongated. Two prisms are often used together, with the
second one bending the beam back to its original propagation direction. On diffraction, a
beam entering near grazing can be made to leave near normal, so that a properly chosen
grating can substitute for a 90◦ folding mirror. This idea is used in commercial beam
circularizers based on anamorphic prisms.
Cylindrical telescopes can easily be made in any desired magnification, can be slightly
defocused in order to correct astigmatism, and do not offset the beam axis, but this is
as far as their advantages go. It is a great deal easier to make good prisms than good
cylindrical lenses, and the astigmatism correction can be done by mildly misaligning the
collimating lens.
4.12.7 Fringe Diagrams
Fringe diagrams are very nearly useless for good quality (< λ/4) optics. A good quality
optical component will have a surface accuracy of a small fraction of a wavelength. Such
a small deviation produces only very small irregularities in the spacing and direction of
the fuzzy fringes in the diagram. Localized errors are reasonably easily spotted, but
global ones, such as astigmatism, are very hard to see by eye, especially since the strong
visual asymmetry caused by the fringes running in one direction masks asymmetric
errors—you can see a λ/4 kink by sighting along the fringes, but good luck seeing a
λ/4 spacing error spread out over 20 periods. For evaluating the quality of an optical
component, use a phase shifting measuring interferometer if at all possible. Failing that,
a Foucault knife-edge test using a razor blade is good for qualitative work, for example,
examining a batch of lenses for bad units. If you really need to get quantitative data from
a hard copy fringe diagram, either scan it into a computer and digest it in software, or
use a parallel rule and a pencil to construct the axes of the fringes, and measure their
straightness and separation. It really would be much more useful if optics vendors would
ship their fringe diagrams on CD or make them available for downloading, but the author
is not hanging about waiting for the day.
There are other devices besides lenses and curved mirrors that can make beams converge
or diverge. These are based on refractive index gradients or on diffraction.
4.13.1 GRIN Lenses
Section 9.2.6 shows how a refractive index gradient causes light to bend. It turns out
that a circular cylinder whose index decreases parabolically moving out from the core
works like a lens. The ray bending happens continuously throughout the bulk, rather
2.2 periods
Figure 4.8. A GRIN lens exhibits periodic foci along its length.
than happening abruptly at the surface, as shown in Figure 4.8. Such a device is called
a graded-index (GRIN) lens, or GRIN rod . Because the bending happens continuously,
a length of GRIN rod exhibits periodically spaced images down its axis, alternating
between erect and inverted. If it is cut to an even number of half-periods, a point on one
surface is imaged onto the other surface; if it is a quarter period shorter, the image is
at infinity. Thus making a GRIN lens of any desired focal length (up to a quarter of a
period) is just a matter of cutting the rod to the right length.
At one time, GRIN lenses were quite poor—good enough for coupling light in and
out of fibers but not for imaging. Recently, they have been developed to the point where
their imaging properties are quite respectable. Fiber coupling is probably still the most
common application, but borescopes and endoscopes are now often made from long GRIN
rods instead of many sets of relay lenses. Besides simplicity and robustness, GRIN rods
avoid the accumulation of field curvature that plagues designs with cascaded relays (e.g.,
Another approach to using index gradients is to fuse together slabs of optical glass of
slightly different index.† When a spherical surface is ground into the high index side, the
power of the margins of the lens is automatically weakened by the gradual decrease of
n. This weakening can be chosen so as to cancel the spherical aberrations of the surface,
and so aspheric-quality images can be obtained with a single spherical element.
Aside: Birefringence of GRIN Rods. GRIN rods are usually made by diffusing
dopants in from the outside of a plain glass rod. This results in an axially symmetric pattern of residual mechanical stress, so that GRIN rods are actually birefringent,
with a pattern not found in nature. This is sometimes important.
4.13.2 Fresnel Zone Plates, Diffractive Lenses, and Holographic Optical
Recently, as a result of improvements in optical modeling software and in the molding
of plastic and sol-gel glass, it has become possible to fabricate not only aspheric lenses
but lenses with diffractive properties: for example, an aspheric lens with a phase grating
on the other surface (zone plates and general holographic elements are discussed in
Chapter 7). These diffractive lenses can have unique properties. Although the power
† Gradium
glass, made by LightPath Technologies, Albuquerque, NM.
of the refractive surface decreases as the wavelength increases, that of the diffractive
element increases; thus it can be used to cancel the chromatic aberration of the refractive
surface, resulting in a novel element, an achromatic singlet. The possibilities inherent
in such a capability have only begun to be assimilated, and such elements should be
considered any time more than 10,000 units are needed. Before becoming too breathless
with excitement, however, remember the drawbacks: plastics have huge temperature
coefficients of index and of thermal expansion; getting good aberration correction over
a large field is very difficult with only two surfaces of a low index material; and you
absolutely must have very high diffraction efficiency, which is very difficult to maintain
over a wide bandwidth (e.g., the visible spectrum). Also see Section 7.9.9 for reasons to
keep the diffractive power small. The effective V number of a DOE can be found from
the grating equation: V = λd /(λf − λc ) = −3.452.
4.13.3 Fresnel Lenses
A mirror does all its work at the surface—all that stuff underneath is only there to
keep the surface accurately in place. You can make a lightweight mirror by removing
unnecessary material from the blank. This isn’t quite as easy to do with a lens, since
the material in the middle is there to preserve phase coherence by making all the rays
arrive with the same delay. For crude light bucket applications, however, an idea of
the same sort leads to the Fresnel lens. A Fresnel lens is a whole bunch of concentric
annular lenses, with the same focal length, as shown in Figure 4.9. Because of all the
sharp edges, there’s lots of scatter, and because of the loss of phase coherence, the image
quality is very poor.† Fresnel lenses can’t normally be coated, either. Thus their efficiency
is poor—as low as 50% in some cases.
On the other hand, they are light and compact, and nonimaging applications such as
condensers aren’t sensitive to their poor image quality. Projectors are the largest use of
Fresnel lenses, but you can make a solar furnace this way, too—up to 2500 times solar
concentration has been demonstrated. Off-axis Fresnel lenses are available, or can be
made easily from a big one with snips or a razor knife.
When using a fast Fresnel lens, make sure to put the side with the ridges toward
the more distant conjugate. Otherwise, the outer rings will exhibit TIR, and no light
will get through them. Really steep conventional lenses exhibit this behavior too. The
image quality of a Fresnel lens, nothing much to start with, gets dramatically worse with
increasing field angle. (This can often be helped by shifting the aperture stop well out
ahead of the Fresnel lens.)
4.13.4 Microlens Arrays
Lenses are sometimes used in arrays, to produce an array of very small, poor images.
Microlenses as small as 10 μm in diameter are often deposited on top of interline transfer
CCDs, to corral light into the small active area of each pixel. Somewhat larger lenses
can be used to measure wavefront tilt as a function of position (the Shack–Hartmann
technique), from which the wavefront can be reconstructed, more or less. (Even in thermal
light, where the etalon fringes aren’t so bad, there are never enough pixels per microlens
† Since
there’s no point knocking oneself out in a lost cause, Fresnel lenses are also manufactured to very loose
Cheesy Lens
Large Field Angle
Figure 4.9. Fresnel lens.
to do a decent job of the reconstruction, unfortunately; the Shack–Hartmann measures
the local slope of the wavefront, so to reconstruct it you have to extrapolate, which is
always fraught with problems.)
Another interesting class of microlens array applications relies on the moiré pattern
between two microlens arrays of slightly different pitch, which can synthesize the equivalent of a single short-focus lens. Standard microlens products are available (e.g., from
WaveFront Sciences). They are typically 0.2 mm to a few millimeters in diameter, with
focal lengths of 1–100 mm. They are generally plano-convex singlets, of course, and
so their numerical apertures are limited even given their small Fresnel numbers. The
existence of standard products makes microlens arrays good candidates for building into
real systems.
4.13.5 Axicons
In addition to flat surfaces and spheres, there exists a class of conical prisms called axicons, as shown in Figure 4.10. They are generally made by single-point diamond turning,
because otherwise it’s difficult to get an accurate surface. Typical uses of an axicon are
Uniform Beam
Beam Region
Unstable Laser
QuasiUniform Beam
Figure 4.10. An axicon converts between filled and annular beams, or between a collimated beam
and a J0 Bessel beam.
sending a laser beam through a Schwarzschild (Cassegrain) microscope objective without
hitting the central obstruction, turning annular beams from unstable laser resonators into
uniform beams, and, less respectably, making J0 Bessel beams (misnamed “nondiffracting”) from uniform ones. Aligning axicons is very fiddly, especially in angle. The cone
beam dump of Section 5.6.10 is also an axicon.
Coatings, Filters, and Surface Finishes
Bash to fit, file to hide, and paint to cover.
An optical element is just a chunk of stuff put in the way of a light beam. Nearly all
the action happens right at the surface, which means that controlling the strength and
path of surface reflected and transmitted waves is most of optics. In this schematic view,
Chapter 4 is about controlling the path, and this one is about controlling the strength.
The jumping-off point for the discussion of coatings is a more detailed consideration
of the Fresnel formulas of Section 1.2.4. From there, we can develop a simple way of
calculating the behavior of an arbitrary plane-parallel coating stack.
Besides lenses and mirrors, optical systems use white surfaces, for diffusion, and black
ones, for stray light control.
Refraction and Reflection at an Interface
We saw in Section 1.2.4 that the Fresnel formulas (1.8)–(1.11) predict the amplitude
and phase of reflected and transmitted plane waves at planar interfaces. Here we’ll go
into a bit more detail about their behavior. Figures 5.1 and 5.2 show the magnitude
and phase of the reflection coefficients at planar interfaces between several pairs of
lossless dielectrics, as a function of incidence angle in the higher index medium. It isn’t
usually plotted this way, because it makes a complicated picture, but there’s some useful
physics here.
Below the critical angle θC , reflections at interfaces between lossless isotropic
dielectrics always have phase angles of 0 or π , so that linear polarization stays linear
after encountering such a surface. Above there, the phase goes all over the place, as you
can see; the polarization change on TIR depends on the phase difference δ between s
and p, so keep careful track of polarization when using TIR prisms. (This isn’t all bad;
in Section 4.9.10 we use this effect for polarization control with Fresnel rhombs.) Note
especially that rp goes negative between the Brewster angle θB and θC .
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
Modulus of Reflection Coefficient
1300 nm
MgF - ZnS
(1.38 - 2.3)
air - BK-7
(1 - 1.517)
BK-7 - flint
(1.517 - 1.80)
Incidence Angle (degrees)
Figure 5.1. Modulus of the reflection coefficients |rp | and |rs | at a dielectric boundary, for several
choices of material. The light is assumed to propagate from higher n to lower.
Reflection Phase (degrees)
Air - BK7 Air - Si (1300 nm) MgF2 - ZnS BK7 - flint
(1 - 1.517)
(1 - 3.5)
(1.38 - 2.3) (1.517 - 1.8)
Incidence Angle (degrees)
Figure 5.2. Phase of the reflection coefficients rp and rs at a dielectric boundary, for several
choices of material. The light is assumed to propagate from higher n to lower.
Lossy Media
The most familiar mirror coatings are metals such as silver, aluminum, and gold. From
an optical point of view, the distinguishing feature of a metal is the very large imaginary
part of its refractive index, that is, its extremely high loss. It is slightly paradoxical at
first blush, but the highest reflectance coatings are made from the highest loss materials.
(See Figure 5.3)
R (%)
Int Au
Int Ag
Int Al
R (%)
600 700 800 900 1000 1100 1200
Wavelength (nm)
Int Au
Int Ag
Int Al
600 700 800 900 1000 1100 1200
Wavelength (nm)
Figure 5.3. Theoretical normal incidence reflectance of gold, silver, and aluminum mirrors as a
function of wavelength, for external and internal (nglass = 1.52) reflections.
The Fresnel formulas in terms of θi are also valid for absorbing media, even metals.
Aluminum has an index of about 0.76 + i5.5† for green mercury light (546 nm), so a
clean air–aluminum interface at normal incidence has a reflection coefficient of
n2 − n1
−0.24 + i5.5
= 0.953∠160 ,
n2 + n1
1.76 + i5.5
and so the reflectivity R = |r|2 = 0.91. An internal reflection (e.g., an aluminized BK7
glass prism) has an even lower R of 0.87.
From a designer’s point of view, just bouncing a light beam once from an aluminum
mirror costs 0.8 dB in detected (electrical) signal power for a first surface mirror and
1.2 dB for Al–glass. You can’t do that too many times and still have a signal; this
is one of the reasons for the popularity of folding prisms based on TIR, especially in
complicated optical systems like microscopes. Since n1 is real, the reflection coefficient
is maximized for a given |n2 | when n2 is purely imaginary. Metals work a whole lot
better in the IR.
How Thick Does the Metal Have to Be?
We saw that metals make reasonable although not fabulous mirrors. Since we usually
don’t want to make mirrors from solid metal, we need to know how thick to make the
coating. The transmitted light typically has an initial amplitude |E | ≈ 0.2|E|, so we
can’t just ignore it—we have to make the coating thick enough to be opaque.
Because of the hugeness of Im{n}, the wave equation in metals behaves like the
diffusion (heat) equation, so that the electromagnetic fields are diffusive in character‡ ;
the amplitude of a plane wave in a metal dies off by exp(−2π ) per cycle. Also, |n| is so
large that k is directed essentially normal to the interface, regardless of incidence angle.
In order for the light transmitted through the film not to significantly reduce the
reflected power, we choose a film thickness d > λ/Im{n}, so that the light making a
round trip through the film is attenuated by at least e−4π , or 3 × 10−6 . A thousand
angstroms of aluminum makes a good mirror in the visible, but 200 Å is getting a bit
see-through. The optical constants of metals change so rapidly with wavelength that the
thickness required can easily vary 2:1 across the visible, so light transmitted through
a thin metal layer is often strongly colored. Metallic neutral density filters are made
of chromium, rhodium, Inconel, or other metals whose properties are less wavelength
Aside: Free-Electron Metals. In the infrared, metals such as copper, silver, and gold
exhibit free-electron behavior. That is, their dielectric constants behave as though the
electrons were undamped harmonic oscillators. Since the harmonic oscillator equation
is of second order, the response to a sinusoidal E at a frequency above the resonance
is multiplied by two copies of j ω —in other words, these metals have negative real
dielectric constants. This leads to all sorts of useful effects such as surface plasmons,
which are true electromagnetic surface waves that may turn out to have lots of useful
technological applications.
† With
our Fourier transform sign convention, absorbing media always have positive Im{n}. Why?
is used in its mathematical rather than optical sense here.
‡ Diffusive
5.2.3 Designing Metal Films
Silver is the best metal coating in the visible but tarnishes so fast that it is nearly useless
for first-surface reflections. It works much better than aluminum (R = 0.96) for internal
reflectors (e.g., pentaprisms), where the back of the silver can be protected with a plated
layer of less active metal (e.g., copper, Inconel, or nickel) and a coat of paint. On the
other hand, if silver is protected from sulfides in the air, it lasts for months, and it can
be applied chemically in the lab, which is sometimes a very important property.
Gold is the best mirror coating in the NIR and the red (λ > 633 nm). It is often
convenient for lab use, as many laboratories have small gold sputtering chambers intended
for electron microscope samples, so the turnaround time is unbeatable. It doesn’t stick
all that well, though, so be careful with it. A very thin adhesion layer of chromium or
titanium makes gold stick to glass very well. Rhodium is a noble metal whose reflectivity
holds up well far into the UV (that’s the good news—the bad news is that R ≈ 0.8 in
the visible and 0.4 in the UV). Good metal films (e.g., Cu, Ag, Au, Al) in the IR are
essentially perfect conductors, so their efficiency is excellent.
Most optical surfaces are not bare glass or plastic, but are coated with one or more thin
layers of another material to modify their transmission and reflection properties. The
most common are antireflection (AR) and mirror coatings, but sometimes we want a
beamsplitter, a polarizer, or a filter, all of which can be made by appropriate coatings.†
One can treat coatings in different ways; because we’re after physical insight rather
than, say, efficient simulations, we’ll keep it simple and use the Fresnel formulas, assuming a plane-parallel geometry and homogeneous, isotropic films and substrates.
5.3.1 Dielectric Coating Materials
The theoretical performance of coatings is limited by the available materials and by the
difficulty of getting good coating morphology and adhesion. In the early days of coated
optics (the 1940s), the best available method for putting down dielectric coatings was
vapor phase reactions in air. This produced surprisingly good coatings, so good that Carl
Zeiss gave up on vacuum coaters completely for a while (see Anders).
At present, most coatings are deposited in a vacuum, by evaporation or sputtering.
These techniques work well, but almost all coatings are a bit less dense than the bulk
material, owing to the formation of small voids during deposition. These voids reduce the
refractive index slightly and, by adsorbing air and water, cause drift in n with temperature
and humidity. They also reduce the corrosion protection the coating affords. This is not
especially serious with substrates that form thin oxides with good properties (e.g., Al),
but is more of a problem with silver and copper, which do not.
Films are highly variable, depending on deposition conditions such as stoichiometry,
humidity, choice of substrate, temperature, pressure, and other things, and there is significant variation from run to run. The microstructure of the film may be quite different
from the bulk material’s; a good quality coating is amorphous, which will influence its
† The
discussion of optical coatings is indebted to the admirable small monograph by Hugo Anders of Carl
Zeiss, Oberkochen, Thin Films in Optics, Focal Press, London, 1967 (J. N. Davidson, tr.).
refractive indices and transmission bands significantly (especially for anisotropic materials). Besides porosity and microstructure, coatings often have stoichiometry errors that
can make n go up or down. For example, one maker quotes a range of 1.34–1.38 for
its MgF2 coatings at 550 nm. The idea of a table of optical constants of thin films is
thus something of an oxymoron, so don’t take the values in Table 5.1 too seriously.
Above all, don’t expect to get exactly the same refractive index and transparency range
as the bulk material. The high variability of the indices of the films, and the moderate
TABLE 5.1.
Common Coating Materials
λ (nm)
Lowest n among dense coatings; water
soluble; soft
Lowest index hard coating; popular
Cryolite (Na3 AlF6 )
Magnesium fluoride (MgF2 )
Quartz (SiO2 )
Silicon monoxide (SiO)
Silicon nitride
Sapphire (Al2 O3 )
Titanium dioxide (TiO2 )
Zinc sulfide (ZnS)
Lead fluoride
Indium–tin oxide
Silicon (Si)
Nonstoichiometric; high index SiO
absorbs in the blue
Absorbs strongly below 300 nm
Varies between 2.2 and 2.7
Electrically conductive; good ITO
transmits >87% in the visible
Gold (Au)
1.66 + i1.956
1.24 + i1.80
0.61 + i2.12
0.31 + i2.88
0.16 + i3.80
0.19 + i5.39
0.27 + i7.07
7.4 + i53.4
Needs Cr or Ni adhesion layer; best
metal for λ > 900 nm
Silver (Ag)
1.32 + i0.65
0.17 + i1.95
0.13 + i2.72
0.12 + i3.45
0.15 + i4.74
0.23 + i6.9
0.65 + i12.2
10.7 + i69
Corrodes rapidly; best metal in the
visible; much poorer below 400 nm;
can be applied chemically in the lab
Aluminum (Al)
0.13 + i2.39
0.49 + i4.86
0.76 + i5.5
1.83 + i8.31
2.80 + i8.45
2.06 + i8.30
1.13 + i11.2
25.4 + i67.3
Reasonably stable in dry air; best
all-round metal; reflectance dips
badly in the deep red and near IR
difficulty of obtaining highly uniform films of the correct thickness, must influence the
way we design with thin films—a design needing three- figure accuracy in refractive
index and thickness will be totally unmanufacturable. Theoretical elegance must be sacrificed to the exigencies of coating manufacturing. (This is not a blanket dismissal of
fancy coatings—some highly multilayer coatings have been designed precisely to be
very tolerant of certain classes of coating errors.)
Beyond the electromagnetic properties of coatings, their mechanical ones, such as
adhesion, residual stress, and environmental sensitivity, must be considered. Not every
coating material sticks to every other, or to every substrate (a couple of nanometers of Cr,
Ti, or Ti3 N4 can help a lot with metals). Materials with a coefficient of thermal expansion
(CTE) very different from that of our glass will experience severe stress upon cooling,
which may make the film craze or delaminate. Surface preparation is vitally important
too—coating really is a bit of a black art. Detailed design of thin film coatings is beyond
our scope, because most instrument builders buy parts already coated; nevertheless, you
may easily require a custom coating occasionally and so need some understanding of the
difficulties involved.
Grinding through the algebra for multiple-layer coatings becomes less fun very rapidly.
What’s more, the formulas so obtained provide no insight and are useless in practice due
to their specificity. You can formulate the problem as a band-diagonal matrix equation
based on the matching of tangential E and perpendicular D at each interface, with a phase
delay of exp(±ikz z) for waves going in the positive and negative z direction, but there’s
an easier way: for a plane-parallel geometry, the Fresnel formulas can be cobbled together
to give us the full electromagnetic solution, assuming that all the materials are isotropic.
We’ll add up all the reflections and find the result, taking account of the propagation
phase delays. As usual, we take the z direction to be the surface normal directed from
layer j to layer j + 1.
A wave exp(ik · x − ωt) delayed by propagating through a layer of index nj and
thickness dj acquires a phase delay of kZj dj . The value of kZj depends on nj , so we
have to find it using the phase matching condition.† Phase matching states that k⊥ is
preserved across a boundary, as is necessary to preserve translational invariance. Thus
in the j th layer, kZj obeys
kZ2 = n2j k02 − |k⊥ |2 .
All the forward waves in the j th layer have this kZ , and the reverse waves (i.e., those
reflected an odd number of times) have kZj
= −kZj .
A point of terminology: film thicknesses are universally specified in waves and not
nanometers: a half-wave film is one whose optical thickness is 12 wave, that is, dj =
λ/(2nj ). In coatings such as beamsplitters, intended to be used off-normal, a “half-wave”
coating is usually specified as dj = π/kZj , that is, thicker by a factor of sec θj , where
θj is the angle of incidence in medium j .
† We’re
doing wave propagation here, so we’ll stick with the physicists’ sign convention, where a plane wave
is exp[i(kx − ωt)], and a time delay τ contributes a factor of exp(+iωτ ).
5.4.1 Multilayer Coating Theory
Multilayer coatings appear much more complicated, and in fact the explicit formulas
for their reflection and transmission coefficients are ugly enough to give small children
nightmares.† We therefore proceed recursively, calculating the effect on an existing coating stack of adding another layer, using the total reflection coefficient r̂ of all lower
layers in place of the simple r from the Fresnel formula on the bottom side (see
Section 1.2.4).
First we need to restate the Fresnel formulas ((1.8)–(1.10)) in terms of k instead of
θi ; we especially must keep the index 12 for propagating from n1 into n2 , since as we
saw the Fresnel coefficients are not the same going in and coming out, and in dealing
with multiple bounces we’ll need to keep them straight:
n22 kZ1 − n21 kZ2
n22 kZ1 + n21 kZ2
ts12 =
kZ1 + kZ2
rp12 =
rs12 =
kZ2 − kZ1
kZ2 + kZ1
2n1 n2 kZ1
tp12 =
n2 kZ1 + n1 kZ2
Referring to Figure 5.4, we observe that the multiple bounces form a geometric series
with common ratio r̂23 r21 exp(+i2kZ2 d2 ), so that the total reflection coefficient r̂12 can
be expressed as the total reflection coefficient r̂23 of the lower system with the additional
effect of the layer n2 , yielding
r̂12 = r12 +
t12 t21 r̂23
exp(−i2kZ2 d2 ) − r̂23 r21
Z = d2
n4 ....
t12 t21 ^r23 W2
t12 t21 r21 ^r23 W4
2 3
t12 t21 r21 ^r23 W6
W = exp (ikz2d2)
Figure 5.4. Geometry of the plane-parallel coating problem, showing the geometric progression
of multiple bounces from the stack below.
† Besides,
the amount of CPU time spent calculating them is minuscule compared with the time spent getting
the coating recipe debugged.
For the bottom layer, we take r̂12 = r12 from the Fresnel formula, and for all subsequent ones we use the total reflection of all lower layers, obtained from repeated
application of (5.4), which gives us a recursive algorithm for calculating the total r̂ for
the stack. We can compute t the same way, but now we need to go from the top of the
coating stack toward the bottom, instead.
5.4.2 Lossless Coating Examples
Here are some typical examples of the uses of coatings. The goal is physical insight, not
detailed coating recipes, so we neglect dispersion, material absorption, adhesion problems,
and interface effects, for example, the 10–50 nm of Al2 O3 that grows immediately on
top of deposited aluminum, even in vacuum. Don’t take this to mean that these effects
are negligible in practice.
Example 5.1: Single-Layer AR Coating. For a single interface, the strength of the Fresnel reflection depends on n2 /n1 and θi . At normal incidence, we can get reflections of
the same strength and phase from both sides of a coating if n3 /n2 = n2 /n1 , that is, when
n2 is (n1 n3 )1/2 . By choosing the layer to be λ/(4n2 ) thick, the two reflected waves are
put π out of phase (there and back) and thus cancel perfectly. The problem is that low
index glass requires that n2 be around 1.2 or 1.25, and there aren’t any solid materials in
that range (people have used silica aerogels with some success in the red and IR). The
lowest index material that is hard and insoluble is MgF2 , at 1.38. This is an excellent
AR coating for high index glass, but it’s no great shakes with garden-variety borosilicate
such as BK7 (1.517), as you can see from Figure 5.5 (We’ll take 1.37 as its index, a
reasonable value closer to the center of the distribution than the bulk value of 1.38.)
Note the angle tuning; at normal incidence the coating is centered around 514.5 nm,
but at larger angles it shifts significantly to the blue. Higher index materials have more
constant kZ (since k⊥ ≤ k0 ). The coating is also polarization sensitive, which means that
R (%)
Wavelength (nm)
Figure 5.5. Single layer MgF2 coating on BK7. Note the angle tuning and polarization dependence.
the polarization of your incoming beam will be changed somewhat by passing through
the coated surface. A number of such surfaces together (e.g., in a camera lens) can easily
cause serious polarization shifts with position and angle.
Example 5.2: Protected Aluminum Mirrors. First surface metal mirrors are too soft
to clean vigorously and (except for gold) are also vulnerable to corrosion. The usual
solution is to put a dielectric overcoat on top for protection. We saw that a glass–metal
interface was not as good a mirror as air–metal, so protected metal mirrors start out
with poorer performance even than Al–air. We can adjust the relative phases of the two
reflected waves by changing the thickness of the coating; if we make the layer λ/2 thick,
the reflections will add in phase. This effect is used to make hybrid mirrors, where the
coating partly offsets the poorer reflectance of an internal reflection from aluminum. The
most common coating choice is λ/2 of SiO over Al, the classical protected aluminum
coating of Figure 5.6. Over a relatively narrow bandwidth, the result can be as good
as a bare aluminum coating (the “internal reflection” curve is calculated for Al/SiO).
These mirrors are OK for simple systems, or ones in which you have photons to burn;
be careful how many bounces you use, though, or you may burn more than you can
spare. The coating is cheap, which is important, but that and physical robustness about
exhaust its virtues. For polarization-sensitive systems such as spectrometers, the fact that
a significant proportion of the reflectance comes from the thin film coating means that
protected aluminum mirrors are partially polarizing when used off normal. This is not so
in the IR, where metals are more or less perfectly conducting; there, a dielectric coating
of any thickness does not change R, which is always 1; protected gold coatings make
great IR mirrors.
5.4.3 Angle Tuning
Because of the variation in kZ with incidence angle, the tuning of coatings shifts to
shorter λ as θi increases, a phenomenon called angle tuning. It’s an easily calculated
effect that can be reduced by using high index materials and reducing the field angle.
R (%)
Int Al
Ext Al
Wavelength (nm)
Figure 5.6. The protected aluminum mirror: 0.5 wave at 520 nm of SiO (n = 1.7) over Al.
Tuning with angle is generally a minor nuisance, as in Example 5.1, because kZ is a
weak function of k⊥ at small angles. The difference between rp and rs and the increase
in |r| at high angles usually cause us worse problems. There are exceptions to this rule,
such as polarizing beamsplitters, in which selectivity degrades with angle, and sharp
interference filters, which can angle-tune your signal right out of the passband.
5.4.4 Examples of Multilayer Coatings
A good coater can put down layers of different materials without breaking vacuum, so
it is convenient as well as worthwhile to put down stacks of many layers. Excellent
antireflection coatings require several layers, and more complicated multilayer coatings
can have excellent performance as mirrors, filters, and beamsplitters. A good AR coating
can achieve 0.2% reflectance over a 50% bandwidth, with 0.4% being a typical guaranteed
spec. Coatings for use across the whole visible usually achieve <1%.
Most optics manufacturers have a variety of standard coatings available, which may
or may not be stock items. The price list will specify how much the coated elements cost,
but be aware that lead times are frequently rather long (several weeks is not unusual).
If you want different coatings on different surfaces, or a custom coating on parts you
supply, be prepared to pay for it: even apart from design time, you’ll be charged a setup
fee of perhaps $1200 for a coating run, plus a per-piece charge of around $100 per
surface, and will get no guarantee that your parts will survive.
Example 5.3: V-Coating. Single-layer AR coatings on plastic and low index glass don’t
work too well, because there are no materials available with n near 1.25. If we care about
a narrow range of wavelengths (e.g., in a laser system), a quarter-wave of MgF2 over
a quarter-wave of SiO (with n = 1.71) can fix this problem, as Figure 5.7 shows. Here
the coating’s reflectance is calculated at normal incidence, and at 30◦ in both s and p.
The coating angle-tunes to shorter λ with increasing θi , as we expect. Note how the p
R (%)
Wavelength (nm)
Figure 5.7. Two-layer V-coating: quarter-wave MgF2 over quarter-wave SiO (1.70) on BK7.
R (%)
400.0 450.0 500.0 550.0 600.0 650.0 700.0 750.0 800.0
Wavelength (nm)
Figure 5.8. BBAR coating.
polarization is more tolerant of angular shifts; the decrease in the rpi values partially
compensates for the error in kZ .
Example 5.4: Simple Broadband AR Coating. For crown glass (1.46–1.65), a fourlayer stack consisting of 0.50λ of SiO (1.60), 0.54λ MgF2 , 0.25λ SiO (1.60), and a
final 0.27λ of MgF2 on top makes quite a reasonable broadband AR (BBAR) coating;
Figure 5.8 shows the calculated results on BK7 (1.517) at normal incidence, and 30◦ s
and p.
Example 5.5: Enhanced Aluminum. The idea of metal with a dielectric overcoat can
be improved by using an LH pair over a metal surface: Figure 5.9 shows the result
of using a quarter-wave each of ZnS over MgF2 on top of the aluminum (as before,
the “internal reflection” curve is calculated for the index of the film adjacent to the
aluminum). Another LH pair on top produces an even better mirror, with R > 95% even
in the 800 nm dip. Enhanced aluminum mirrors are nearly as cheap as protected aluminum
and are the minimum quality you should consider for high sensitivity instruments. (See
Figure 5.9.)
Example 5.6: Quarter-Wave (HL)m H Stack. A sufficiently tall stack of λ/4 layers of
alternating high and low index materials makes an excellent mirror. It works better with
high index layers on both ends of the stack, so that the optical prescription is (HL)m H .
Figure 5.10 shows an 11-layer (m = 5) stack tuned to 514.5 nm. Note that it’s now
the p polarization that droops at high angles, since the rp values are dropping as θi
Example 5.7: Stagger-Tuned HL Stack. Although the HL stack√high reflector becomes
broader as the number of layers increases, this happens only as N, which is wasteful,
and even this eventually stops due to absorption in the coating. By putting two HL stacks,
R (%)
Int Al
Ext Al
30º p
30º s
Wavelength (nm)
Figure 5.9. Replacing the half-wave of SiO with quarter-waves each of ZnS over MgF2 yields a
substantially improved metal mirror for the visible, the enhanced aluminum coating.
R (%)
Wavelength (nm)
Figure 5.10. Eleven-layer (HL)5 H stack, ZnS/MgF2 , centred at 514.5 nm.
tuned slightly differently, we can get a broader bandwidth with very high efficiency, as
we see in Figure 5.11. You usually use a spacer layer between them. This idea is called
stagger tuning, and it is broadly useful as a bandwidth-increasing device, in circuits as
well as optics.
R (%)
Wavelength (nm)
Figure 5.11. Two (HL)5 H stacks (ZnS/MgF2 ), tuned to 463 and 600 nm, with an 0.21 wave L
spacer layer.
5.4.5 Polarizing Beamsplitters
Beamsplitters used to be made with thin metal films, and cheap ones still are. Inconel
is the most popular choice due to its spectral neutrality, but all such coatings are very
lossy—typically 40–50% gets absorbed, which is a lot, considering that even a lossless
beamsplitter wipes out 75% of our light if we go through and back. Modern beamsplitters are made from dielectric stacks for narrowband or polarizing applications (see
Figure 5.12), and from even thinner metal films with dielectric overcoats for wideband
applications. You typically lose 15–25% of your light in an enhanced metal beamsplitter,
and 5% or less in a good polarizing one.
Because rs and rp are so different at high incidence angles, the performance of coatings
changes drastically with polarization at high angles. This is how broadband polarizing
beamsplitter cubes are made: you come in at 45◦ to an (H L)m H stack, with enough
layers that Ts is very low† (11 layers (m = 5) of ZnS and cryolite will get you to 0.0001,
not counting absorption), for example, the one in Figure 5.11.
You choose nglass so that light coming in at 45◦ to the hypotenuse (i.e., normal to the
cube face) is refracted into the coating stack at Brewster’s angle for the HL interface; this
guarantees Brewster incidence at lower layers, because there are only two different indices
involved. Cementing another 45◦ prism to the top of the coating stack makes a cube,
where the transmitted light is undeviated and the reflected light comes off at 90◦ . This
makes a good broadband beamsplitter, whose main problem is the first-surface reflections
at the glass–coating and coating–cement interfaces. These pollute the reflected light
with as much as 5% of the p polarization (it varies with λ because the two p reflections
interfere). Adding extra AR coatings top and bottom can make a very good beamsplitter,‡
marred only by its narrow angular acceptance (and, of course, the horrendous etalon
† Remember
that you have to rejigger the coating thicknesses a bit to make dj kZj equal to π /4 at each layer.
even close to a Wollaston prism for p-polarization quality, of course, but lower in absorption and
cheaper—pretty good for a chunk of glass with a few films on it.
‡ Not
Rs (%)
Rp (%)
Rs +1.5
Rp +1.5
Int Ag
Int Al
Wavelength (nm)
R (%)
Rs +1.5
Wavelength (nm)
Figure 5.12. Polarizing beamsplitter: A(H L)5 H A, where A is SiN (2.0), and H and L are ZnS
and cryolite. The AR layer A suppresses the reflection from the glass (1.66). Note the selectivity
reduction (to 20:1 from 100:1) due to coming in at only 1.5◦ off normal.
fringes due to the manufacturer’s insisting on choosing 45◦ , which we’ve alluded to in
Section 4.7.2).
You can also make wider angle, narrower band beamsplitters by working off Brewster’s angle, at a wavelength where rp has fallen way off but rs is still large. By careful
control of the sidelobes of the (H L)m H stack’s reflectance, you can make good beamsplitters for laser applications this way.
The polarization purity is still worse in the reflected light, only 25 or 50:1, whereas
in the transmitted light it can be 1000:1 in a narrowband device or 100:1 in a wideband one.
Aside: Unintentional Polarizing Beamsplitters. Some coatings are pretty good polarizers, including yours if you’re not careful. Polarization effects in AR coatings cause
major problems with lasers and in high accuracy applications.
5.4.6 Interference Filters
Two HL stacks separated by a spacer layer make a Fabry–Perot etalon, which has a
sharply peaked passband near where the spacer is an integral number of half-wavelengths
thick (phase shifts from the HL stack move the passbands around a bit). The bandwidth
and free spectral range can be traded off by changing the spacer thickness, from λ/2 up.
A complex structure composed of two of these etalons deposited on top of each other,
so that their passbands coincide at only one peak, makes an excellent filter with adjustable
parameters. As usual, stagger tuning can flatten out the peak and suppress the sidelobes,
yielding a flat-topped bandpass filter with steep edges and few artifacts.
Interference filters are fiddly to develop and need long coating runs that must be precisely controlled. Thus they tend to be available only for commonly desired wavelengths
(e.g., laser lines) and spectral bands of wide interest (e.g., Balmer α at 656 nm). They are
normally backed by colored glass, to suppress unwanted sidelobes, so that out-of-band
light hitting the top of the filter is mainly reflected, while that hitting the bottom is largely
absorbed. Sometimes it matters which way round you put the filter, for example, in the
infrared, where the absorbing glass radiates but the mirror coating doesn’t, and with high
powered light sources, which may overheat the filter.
The stopband rejection of interference filters isn’t always that great. Besides the occasional spurious peak, they sometimes have only 30 dB (optical) typical rejection. That
might not be too bad in a color compensating filter for a camera, where the passband is
wide and the rejection requirements modest. On the other hand, if you’re looking at solar
H α absorption at 656 nm with a 0.5 nm passband, you’re in trouble—30 dB rejection
means that each nanometer in the stopband will be attenuated by 30 dB, but there are
a lot more nanometers in the stopband than the passband, so the background light will
dominate. Make sure you calculate and measure the total out-of-band leakage in your
filters. A quick test is to cant the filter enough to angle-tune your desired signal into the
stopband, and see how much the signal level changes. This isn’t really precise, because
the L layers angle-tune more than the H , so the shape of the curve will change with
angle too.
If you use interference filters in your instrument, be aware that they drift with temperature and time. They normally drift toward longer λ with increasing T , generally with
λ/λ ≈ 10–30 ppm/K; time variations are usually associated with hydration (or even
corrosion) of the coatings (see Section 12.13.2). Get detailed specs from the manufacturer.
5.4.7 Coating Problems
Coatings usually have a columnar morphology, which makes them porous, chemically
somewhat unstable, nonstoichiometric, and often having properties significantly different from the bulk material. Lots of work has gone into fixing these problems, but the
solutions are different for different coatings. Sometimes depositing at an angle of 30◦
or so, or using ion bombardment during deposition, can reduce porosity and produce a
fully dense coating. The columnar morphology can be exploited (e.g., by rotating the
substrate eccentrically at an angle to the deposition source), so as to make the columns
helical—that makes an optically active coating (see Section 6.3.6).
Optical filters are used to get rid of light we don’t want. (They always get rid of some
of the desired light as well, but not too much with a bit of luck.) Here we’ll talk about
absorbing materials and scattering from small particles.
5.5.1 Filter Glass
Glass makers can exploit the absorption characteristics of different materials to make
intentionally colored glass filters. Filter glass comes in a wide range of colors and characteristics, but the two most used are long pass and short pass, with the long pass glasses
having generally better performance. The coloring can come from one of two sources:
colloids, which are formed by a carefully controlled heat treatment process (struck or
colloidally colored glass), or by the formation of color centers due to ionic doping (ionically colored ). Ionically colored glass is a great deal more stable with time and thermal
history. Color centers are not easily bleached by optical dose either, so ionically colored glass is pretty stable all round. In addition, it can be annealed to eliminate stress
The data sheet for the filter glass usually tells how the color has been achieved. The
transmission spectrum of the glass does shift somewhat with time and exposure to the
air, so that your design should have a safety factor. It is usually a mistake to expose
glass filters to severe weathering; in a hostile environment, make the objective (outermost
element) out of something more robust (e.g., quartz or borosilicate crown glass).
Glass filters are often called upon to absorb large amounts of optical power, for
example, in color-correcting a tungsten bulb with a blue filter to approximate daylight.
Glass has poor thermal conductivity, so the temperature does not equilibrate rapidly; this
leads to large temperature gradients and consequently large tensile stress in the cooler
areas, to the point of shattering the filter. Filter glass therefore is usually tempered to
increase the amount of heat that can be dumped into it before it breaks. This is useful
but causes severe stress birefringence, which only gets worse with nonuniform heating
(see the Schott filter glass catalog). For large heat loads, consider sending the heat away
from the filter using a hot or cold mirror, or spreading the load by using a gentle filter
to absorb half the heat, then a dense one to absorb the rest.
In long pass filters, the band edge shifts toward the red as the temperature increases,
at a rate between 0.02 nm/K for deep UV filters to 0.3 nm/K for NIR filters; it tends to
go as
≈ (5 × 10−7 nm−1 )λ2c .
This shift is linear for reasonable temperature excursions, and large enough (hundreds of
ppm/◦ C) to be very obnoxious. There is also a significant shift in passband absorption,
which tends to be very large proportionately, since the wings of the exponential are very
sensitive to slight changes in kT/e. These shifts are of course sensitive to field angle and
NA and so are generally difficult to compensate for in software. If you’re trying to do
accurate photometry with filters, control their temperature carefully.
Filter glass is usually fluorescent, with a peak about 200 nm to the red of the absorption
edge. The usual way of fixing this is to use a series of filters of different cutoff wavelength
in series. Unfortunately, the order matters—putting them in the wrong order can cost
you a factor of 1000 in leakage. This can be a big effect if you’re looking for dim light
in a bright background (e.g., Raman spectroscopy), where it looks just like a light leak,
so it’ll have you chasing your tail—see Section 10.7.4.
Colored glass filters can have extremely high absorption in their stopbands and are
inexpensive; these virtues make up for the gradualness of their absorption versus wavelength compared to interference filters. Unfortunately, due to low sales Schott has reduced
the number of filter glasses in their catalog, so that finding a glass just right for your
application is significantly more difficult than it once was.
5.5.2 Internal and External Transmittance
Some of the light incident on a dielectric surface is reflected, so that even if the material
itself is completely lossless, not all the light hitting a dielectric plate makes it through. We
distinguish the two sources of loss by speaking of internal and external transmittance.
Internal transmittance excludes the Fresnel reflections at the surfaces, whereas the external
transmittance includes them. For purposes of definition, the filter is assumed thick enough
that interference effects and multiple reflections can be ignored.
One benefit of making this distinction is that the dependence of the internal transmittance on the thickness of the element is very simple; it follows Beer’s law ,
Tint (λ; d) = exp[−κ(λ)d],
which allows us to predict Tint of an arbitrary thickness from a single data point:
Tint (λ; d2 ) = [Tint (λ; d1 )]d2 /d1 .
Due to this very strong thickness dependence, the shape of the curve of Tint versus
d changes with thickness; the width of absorption features increases and the valley
transmittance decreases as the element becomes thicker. Filter glass transmittance is
usually plotted in terms of the diabatie
(λ) = 1 − log10 log10 [1/Tint (λ; d0 )],
where Tint (λ; d0 ) is the internal transmittance of a filter with standard thickness d0 . A plot
of diabatie does not change shape with thickness, but merely moves up and down; a
common way of presenting the spectral characteristics of filter glass is to plot the diabatie
in black on red coordinate axes, then give you a red transparent sheet with the grid and
graduations on it. You line up the thickness scale so that the design thickness lines up
with the fiducial on the plot, and presto, a plot of internal transmittance versus wavelength
for your particular thickness (albeit on a weird vertical scale). Because numerical values
of diabatie don’t convey much, the scales are labeled with internal transmittance. Neutral
density filters are usually thought of in terms of their optical density D,
D(λ; d) = log10 [Text (λ; d)].
5.5.3 Holographic Filters
Another class of filters is based on holograms. Unlike interference filters, these tend
to be available in bandstop types, perhaps 10–20 nm wide, with 40–80 dB (optical)
rejection at the stopband center. These devices angle-tune as coatings do, but because
of the depth of the null we’re talking about here, it leads to a stiffer restriction: you
have to use these filters with normally incident collimated beams. A strong beam at
the stopband center produces weak surface scatter and stray junk that get through the
filter, leading to a doughnut of speckle around the original beam direction. Since they
are offset in angle, this is not a critical problem, but you have to put in a baffle after the
5.5.4 Color Correcting Filters
In a tunable system, such as a monochromator driven by a tungsten source, it is often
tempting to use color compensation filters, which are fairly gentle colored-glass filters
intended to flatten the source spectrum by attenuating the red more than the blue. This
should be avoided if possible, for a number of reasons. An optical filter cannot improve
the flux of blue light, so that even in the blue, it will decrease the signal-to-noise ratio. The
filter response will never be accurately inverse to the instrument function, if for no other
reason than that the two change differently with time and temperature, so that a calibration
will be necessary anyway. Sometimes there are good technical reasons for using such
a filter, for example, a sample or detector that may be damaged by higher intensity in
the red, a CCD that blooms badly when saturated, or a digitizer whose dynamic range
is insufficient, but these are not as common as might be thought. A slightly more subtle
problem is that these filters are mostly designed for color correction with photographic
film and do not necessarily make the spectrum even approximately flat. The general
idea of whitening a signal to improve the SNR is more useful in signal processing—see
Sections 13.3.8 and 13.8.10.
Designing good beam dumps and baffles is a subtle business, which absolutely must be
part of the early stages of your instrument design. A common error is to think about
baffles last, when even trivial design changes are expensive, and then run for a pricey
high performance coating such as Martin Black to fix it.
It must be clearly understood from the outset that the stray light performance of
your system is controlled primarily by the geometry rather than by the quality of the
black coatings themselves. You can lose a factor of 105 by allowing a large expanse of
black painted lens barrel, illuminated at grazing incidence, to appear in your detector field of view, and you’ll only gain back a factor of 10 or so by replacing the
black paint with a fancy and expensive black dendritic finish. The rules are pretty
1. Everything is specular at grazing incidence, so don’t allow any grazing bounces to
hit your detector.
2. The more illuminated area your detector sees, the more stray light it will receive,
so keep the baffles and lens barrel out of the field of view as far as possible.
3. Multiple reflections from dark surfaces will rapidly eliminate stray light, so trap
it and then absorb it. Don’t allow any one-bounce paths to hit your detector (see
rule 2).
4. Sharp edges of baffles will diffract light, so adjust the relative apertures of the
baffles to catch it (i.e., later baffles should have slightly smaller inner diameters).
Instruments that search for extrasolar planets need about the best baffles going, which
has led to the development of band-limited baffles. The idea here is just that of data
windowing (see Section 17.4.9), in which a carefully chosen, gradual cutoff of the light
leads to greatly reduced diffraction rings and consequently to improved sensitivity at
small separations.†
5.6.1 What Is a Black Surface?
We call a surface black when it doesn’t reflect light. The Fresnel formulas predict
significant reflection from any discontinuity in ñ, in either its real or imaginary part.
Accordingly, a black surface has an absorption depth of many wavelengths, but much
less than its thickness, and is a good index match to the incoming wave. Black surfaces
in air are doomed from the outset by the size of the index mismatch at the surface, but
if the wave is coming in via a medium such as plastic or glass, the situation is much
less dire.
5.6.2 Black Paint
Because of the aforementioned index mismatch, flat black paint in air has a diffuse
reflectance of a few percent, which is blacker than TiO2 but nothing to write home
about. (Volume II, Chapter 3 of the OSA Handbook has a useful list of black finishes.)
Flat black is useful as a last ditch solution to a scattered light problem, where the
stray light is not highly directional. The biggest problem with it is that a major fraction
of the light will escape after only one bounce off the black surface, so that the ultimate
performance of a flat black paint baffle that is in the detector’s field of view is not that
much better than that of a single flat black surface. The next biggest is that near grazing
incidence, even flat black paint is a quite reasonable reflector. On the other hand, you
have to coat the inside of your optical system with something, and flat black is at least
less bad than the alternatives.
Internal reflection is a different matter; the improved index match and enforced
smoothness of the glass–paint interface improve the qualities of paint enormously (flat
black looks shiny from underneath). For example, garden-variety ultraflat black spray
paint (Krylon #1602) is a spectacularly good index match to fused quartz, very useful
for getting rid of internal reflections from unused areas of quartz prisms. Over the visible, the reflectance of such a quartz–Krylon interface is on the order of 0.01%, which is
very impressive for hardware-store spray paint. Remember that paint has environmental
limitations and tends to outgas and shed particles.
† See,
for example, K. Balasubramanian, Appl. Opt . 47(2), 116 (2008).
5.6.3 India Ink
India ink is an aqueous suspension of very fine carbon particles. It is pretty black when
dry, but really black when liquid, especially if you can get rid of the first-surface
reflection—in the visible, the absorption length in India ink is less than 1 cm even
at a dilution of 1:104 .
5.6.4 Black Anodizing
Anodizing is a surface treatment for aluminum, which consists of electrochemically
oxidizing a clean aluminum surface to produce a porous Al2 O3 (alumina or sapphire)
layer. The porosity is inherent—otherwise no current would flow after a very short
while, as sapphire is an excellent insulator. The resulting porous matrix is then filled
with something else, for example, aluminum hydroxide in the classical anodizing process,
or fluoropolymer in some proprietary systems such as Tufram and Polylube. The color
comes from dye that is part of the bath. Anodizing is less black than paint because of
the high index of sapphire, and in the IR it may not be black at all, since organic dyes
do not have the broad absorption spectrum of, say, carbon. Check before relying on its
IR performance.
5.6.5 Dendritic Finishes
Coatings made of closely spaced, shiny black peaks or ridges are better absorbers than
paint or anodizing. Lockheed Martin makes a dendritic black finish, Martin Black, based
on this principle, which is one of a whole range of “designer blacks” (the OSA Handbook
has a chapter on them). They’re useful but far from a complete solution and are very,
very expensive. Dendritic finishes tend to reflect a bit more near grazing incidence.
Recently, some dendrite-type blacks using oriented carbon nanotubes have got down to
about 0.05% reflectance in the visible, but that still won’t save a system with lousy
5.6.6 Black Appliques
There are also a variety of stick-on black finishes, of which the flocked sticky paper
sold by Edmund Optics deserves special mention. In the visible, it is comparable in
performance to an advanced black coating such as Martin, but costs about 100 times
less, and can be cut with scissors. It has a low damage threshold and probably outgasses
somewhat due to the adhesive. Because its blackness comes from organic dye, it is much
less impressive in the mid-IR, whereas Martin holds up quite well.
5.6.7 Black Plastic
Black plastic is optically similar to glossy black paint, although most of the time its surfaces are not as smooth on small scales. Like black anodizing, some types of black plastic
are near-IR transmitting—in the 1970s, some mysterious offset drifts in plastic-packaged
op amps were traced to photocurrents induced in the die by light getting through the
phenolic plastic. (Modern packages are made of Novolac epoxy, which is very opaque.)
If you’re in doubt, hold a big sheet of it up to an incandescent light and look through
it with an IR viewer (don’t use the sun unless you’ve made sure the plastic is at least
opaque enough to be safe for your eyes and your IR viewer).
5.6.8 Black Wax
Carbon black in grease or wax is very black indeed—Wood made his horn with lampblack
(candle soot), which was extremely effective as well as convenient. There are a number
of hard waxes that can be filled with carbon particles to make a very black material
of appropriate refractive index for mounting prisms, especially calcite ones, where the
mechanical weakness of wax avoids overstressing the soft crystals. (Apiezon W is a
common choice that doesn’t outgas.) It isn’t a complete solution, though, because black
wax doesn’t actually stick that well. Prisms are mounted in metal cells that fit the prisms
loosely, so the wax is thin and is never loaded in tension. Still, delamination due to
mechanical or thermal stress is the major cause of death in calcite prisms.
5.6.9 Black Glass
Various types of very dark colored glass are available, including beer bottles† and Schott
glasses. These can be UV-epoxied to the faces of prisms or windows in strategic positions,
to collect and dissipate stray reflections before they go anywhere. This approach can take
a lot more power than spray paint; a thick BK7 flat with a piece of black glass UV-epoxied
to the back makes a nice laser reflection attenuator.
Black glass used straight is less satisfactory, as its surface finish is often poor, causing
scatter. More subtly, if the laser beam power is high, the temperature gradients in the
glass will cause it to become locally convex in the region of high intensity, which will
defocus the reflected light. You don’t need a kilowatt class laser to see this; 50 mW CW
is easily enough. In transmission, this sort of effect is called thermal lensing.
5.6.10 Designing Beam Dumps and Light Traps
Assuming that you’ve designed the system sensibly, stray light will have to make at least
one bounce to get to the detector. Thus controlling stray light involves two steps: reducing
the amount of illuminated area in the field of view of the detector, and reducing the
illumination intensity there. All that stray light has to go somewhere, and that somewhere
is usually a baffle of some sort. We send unwanted beams into a beam dump and corral
ambient light and scatter into a light trap. The two look more or less the same.
The best way to design beam dumps is to use shiny black surfaces, where the surface
reflection is specular. A specular reflection can be directed onto another black surface,
and another and another. . . . With care, you can make the light take many bounces before
it can escape. The job is foremost to trap the light, and then to dispose of it.
5.6.11 Wood’s Horn
The original beam dump is Wood’s horn,‡ a gently curved and tapered tube of glass
coated outside with lampblack or black paint, shown in Figure 5.13a. It is a really good
design, which works well over a reasonable range of incidence angles. The gentle taper
traps the specular reflections, and due to the curved shape, most of the diffusely scattered
light also has to make multiple bounces before escaping. Because the beam rattles around
† Beer’s
law is named after Dr. Beer.
after Robert Williams Wood, debunker of N-rays (Nature 70, 530–531 (1904)), pioneer of grating
spectroscopy, and author of How to Tell the Birds from the Flowers, among other stellar contributions.
‡ Named
Figure 5.13. Assorted beam dump and baffle designs: (a) Wood’s horn, (b) cone dump, (c) black
glass at Brewster’s angle, (d) knife-edge baffles, and (e) barbed baffles. Designs (a)–(c) use shiny
black surfaces, and (d) and (e) shiny or flat black.
between surfaces making some angle with each other, it tends to be sent back out after
some number of reflections, so the length of the horn has to be at least a few times its
5.6.12 Cone Dumps
A more convenient beam dump is the conical type, which fits optical breadboards and
erector sets such as Microbench. As shown in Figure 5.13b, the cone absorbs most of the
light and directs the rest into a simple trap arrangement that tends to confine the light.
Light has to make at least three bounces from shiny black surfaces to escape, and most
makes many more. These are easy to build in the lab if you have a lathe.
5.6.13 Black Glass at Brewster’s Angle
You can combine black glass with Brewster angle incidence to get rid of an unwanted
collimated beam (e.g., the residual pump laser beam in a photoacoustic measurement). As
shown in Figure 5.13c, the first piece of black glass at Brewster’s angle gets rid of one
polarization, and the second one, at Brewster’s angle for the other polarization (partly
reflected from the first one), completes the job. This approach can take more peak power
than Wood’s horn, but is restricted to well-collimated beams coming from one particular
direction, and requires attention to the control of surface scatter from the black glass.
5.6.14 Shiny Baffles
A barbed pattern, with the barbs pointing toward the light slightly, is probably the best
sort of baffle for long lens barrels and other narrow cylinders. Make the channels narrow
enough that a beam needs several reflections to escape. Coat the inside with shiny black
paint or make it from a shiny black material. The disadvantage of barbed patterns is that
the light will eventually be reflected back out, and that the number of reflections required
for this is a strong function of the angle of incidence.
5.6.15 Flat Black Baffles
In some situations, strong stray light can come from far off axis and may enter at any
angle, as with sunlight in outdoor applications, so this angular dependence is inconvenient. In cases like this, we may resort to flat black surfaces and just reduce the
illuminated area in the field of view. Optical design packages have stray light analysis
that relies on the bidirectional reflectance distribution function (BRDF), which predicts
the amount of light scattered into k2 from k1 . Do yourself a favor and design decent
An example of a flat black surface approach that works well is the knife-edge baffle,
consisting of a series of black apertures lying normal to the axis of the optical system.
Knife edges are easy to fabricate, being planar structures. The inside diameters decrease
slightly coming toward the detector, so that ideally the earlier baffles in the series are
out of the detector’s field of view entirely, and thus light escaping from the flat black
surfaces must either go back out the objective or hit another baffle. You do need to make
the edges sharp, though, because if they’re too blunt you’ll have nice grazing incidence
reflectors sending light into your detector. Knife edge baffles are lightweight and highly
effective when properly designed.† Figure 5.13 shows some popular baffles and beam
5.6.16 Combinations
If the optical system is sufficiently long and narrow that the first few knife-edge baffles
are really out of the detector FOV, you can use shiny black surfaces there and apply
simple graphical ray tracing (starting from the detector) to estimate what the stray light
intensity will be. A ray hitting baffle n will be near enough to normal incidence that
it will rattle around between baffles n and n − 1 several times before exiting, which
will get rid of it very effectively. If baffle n is smaller in aperture, and none of baffle
n − 1 is visible to the detector, then a lot of the escaping light will exit the way it
came in, which helps. Subsequent baffles, which may be in the FOV, can be flat black if
† See,
for example, A. Buffington, B. V. Jackson, and C. M. Korendyke, Wide-angle stray-light reduction for
a spaceborne optical hemispherical imager. Appl. Opt . 35(34), 6669– 6673 (1996).
A white surface is one that scatters incident light of all colors efficiently. We usually
want diffuse white surfaces, whose scatter pattern is nearly Lambertian. White surfaces
are made by suspending small particles of high index, nearly lossless dielectric in a low
index medium, to scatter light as strongly and as often as possible.
5.7.1 Why Is It White?
When light enters a white surface, it is scattered in all directions; thus much of it does
a random walk in the material. When it encounters the surface, most of it will escape
(all of it, apart from Fresnel losses and TIR). Mathematicians call this phenomenon
gambler’s ruin —due to that boundary, eventually all your money diffuses out of your
pocket. Considering how many times it will be scattered, and the geometric dependence
of intensity on the number of scatterings, any absorption will be enormously enhanced;
the same metals we use for mirrors appear black in powder form. Some white surfaces
are better at depolarizing light than others, so measure yours carefully if it matters
to you.
5.7.2 Packed Powder Coatings
The best diffuse, high reflection surface is a packed powder of pure TiO2 , Ba(SO4 ),
or MgO in air. Their reflectance is over 99% through most of the visible, and it is
highly Lambertian (TiO2 ’s drops off badly near 410 nm, and in commercial grades it
also tends to fluoresce). What’s more, unlike paint it is easily renewed if it gets dirty. It
is primarily useful as a reflectance standard, because the surface is easily disturbed and
really has to lie nearly horizontal. Barium sulfate’s claim to fame is that its reflectivity
is very constant with time and very flat with wavelength. Packed polytetrafluoroethylene
(PTFE) powder will stick to surfaces well enough to use it in integrating spheres and has
similar reflectance properties. Because of its lower refractive index, it needs a thicker
section (at least 6 mm or so) to reach its peak reflectance, but can achieve values above
0.996 in the visible, and maintains its properties into the UV and NIR.†
5.7.3 Barium Sulfate Paint
The most popular diffuse white coating is barium sulfate paint (available from Edmund
Optics as “Munsell white reflectance coating”). Paint is a collection of various particles
in a dielectric binder. There is a front-surface reflection from the binder, which makes
painted surfaces non-Lambertian, though they’re closer than most other things. Compared
with BaSO4 powder in air, the dielectric is lossier, and the index mismatch at the surfaces
is smaller, so the total reflectivity is also lower—about 98% in most of the visible and
NIR. Regular white paint is loaded with TiO2 and is closer to 90% in the visible. Barium
sulfate paint is especially good in places like the interior of integrating spheres, where
you need a nearly Lambertian reflector that’s very opaque in a fairly thin layer (1–2 mm),
and there’s a lot of nonhorizontal area to cover.
† Victor
R. Weidner and Jack J. Hsia, Reflection properties of pressed polytetrafluoroethylene powder. J. Opt.
Sci. Am. 71, 7 (July 1981).
5.7.4 Spectralon
Spectralon is a sintered PTFE sold by Labsphere, with properties similar to packed
PTFE powder. It can be machined into odd shapes and is stable and cleanable. (Avian
Technology sells similar stuff as Fluorilon-99W.) It is highly reflective and, although
not so Lambertian as fine powders, it is very convenient for applications needing high
efficiency diffuse reflectors that can absorb some punishment. The stuff is very expensive,
though, so don’t go milling an integrating sphere from a solid block of it.
5.7.5 Opal Glass
Opal glass is very inefficient at transmission (1%) but very Lambertian. It is used only for
those applications for which diffuse illumination is vital. It produces very small speckles
when used with lasers.
5.7.6 Magic Invisible Tape
Matte finish translucent tape is intended for mending torn papers, but it works pretty well
as a diffusing material for light duty use, for example, putting a piece on the entrance slit
of a spectrometer to fix spectral artifacts due to a weird pupil function, or to homogenize
the ugly beam patterns of LEDs and liquid light guides. It won’t take much power, and
it leaves a slight residue behind, but it lasts for years, so it’s just the right medicine
5.7.7 Integrating Spheres
Light reflected from a white coating loses most of its directional information in a single
bounce. A closed cavity with sufficiently high reflectance can bounce the light dozens
of times before absorbing it, so that the illumination of a point on the wall becomes
Lambertian to high accuracy; this is the idea of an integrating sphere. There are two
main applications: measurement of optical power and building Lambertian light sources.
The photon efficiency of an integrating sphere is a great deal higher than that of opal
glass, so you can make a good light source by putting a small bulb inside the sphere,
screened by a small white shield so that no unscattered light can reach the exit hole.
The hole has to be fairly small, no more than 1/6 of the sphere diameter, to get the best
performance. The same homogenizing property makes integrating spheres the best optical
power sensors available; a photodiode in place of the bulb (still behind the shield) will see
almost the same total flux regardless of the incident angle of the optical beam, assuming
it isn’t vignetted by the aperture, and furthermore the spatial and angular variations of the
responsivity is homogenized out. Residual angular variation is at the 0.1% level unless
the ports are too large.
The average number of bounces required for light to escape is equal to the total area
of the apertures (projected on the sphere) divided by the area of the sphere. Assuming
this is small, we can treat it like reflection loss, so by the geometric series formula,
4π r (1 − R) + (Aout + Ain )R
where R is the reflectance of the coating, r is the inside radius of the sphere, and Aout and
Ain are the areas of the output and input ports (actually the areas of their projections on
the sphere). This doesn’t take account of the few bounces it takes for the light to become
Lambertian inside the sphere, the effect of baffles inside the sphere, or the deviation
of the steady state illumination from a Lambertian condition due to the losses through
the ports, all of which are generally small effects. For a perfectly reflecting sphere with
equal sized ports, η = 0.5, and in real spheres, it is generally no more than 0.3 and is
commonly much lower.
Similarly, a δ-function light impulse will be spread out into a roughly exponential
pulse of time constant
3c (1 − R) + (Aout + Ain )R/(4π r 2 )
which is on the order of 10–50 ns for most spheres.
From (5.10), we see that the efficiency of a sphere is a very strong function of the
coating reflectance, particularly if the coating is very good. Spheres made with the very
highest reflecting coatings are therefore somewhat less stable with time and conditions
than those made with ordinary white paint; on the other hand, they do a better job of
diffusing the light and waste fewer photons. Keep your integrating spheres clean, plug any
unused apertures with the white caps provided, and use them in their intended wavelength
interval. This sensitivity can also be used to advantage in multipass measurements: see
Section 10.6.5 for an example. The many bounces taken by a typical photon before it
is absorbed or lost unfold to quite a long optical path, as we saw, and this can be very
helpful in reconciling the demands of fast pulses to the capabilities of photodiodes, as in
Section 3.5.4.
5.7.8 Ping-Pong Balls
You can make a rough-and-ready integrating sphere from a ping-pong ball. Paint the
outside white, drill two holes at 120◦ from each other, and put a photodiode in one of
the holes. This is good enough to show some of the advantages of spheres, but not for
real measurements. (Ping-pong balls are also good scatterometers—see Section 9.8.7.)
5.7.9 Ground Glass
Ground glass is much more efficient at light transmission than opal glass but has a big
specular component as well as a diffuse component. Use it where the specular component
is not a critical problem, because besides being cheaper than opal glass, it is dramatically
more efficient (30–70% vs. 1%). Because of TIR, it matters which way round you put
the ground side; light crossing the ground surface from inside the glass is somewhat more
diffuse but dimmer. Objects whose light scattering occurs at a surface tend to produce
a constant spread of u and v, centered on the unscattered k vector. Thus the angular
spectrum is not independent of the incidence angle, which complicates diffuser design.
If you have something like a video projector with a short focal length lens, shining
on ground glass, it will scatter light only a bit around the original k vector, so at the
edges most of the light will still be going up and away rather than straight out, as you
would probably want. Software can correct for this at a single viewing angle, but not for
everyone in the room. This sounds like a job for a field lens (see Section 12.3.14)—the
big Fresnel lens of Figure 5.14. straightens out the light before it hits the ground glass,
which makes the brightness look much more uniform with viewing angle (though no
closer to Lambertian than before).
Figure 5.14. Ground glass and other mild diffusers tend to scatter light into a cone about the
incident ray direction as in (a). Adding a Fresnel field lens as in (b) can improve the apparent
5.7.10 Holographic Diffusers
A better controlled version of ground glass can be made with a holographic element,
the holographic diffuser. These are touted as being useful for laser beam shaping, but in
reality the strong speckle they produce limits their application to low coherence sources
such as LEDs. One very good application is to homogenize the output of fiber bundle
illuminators. Nowadays you can get holographic diffusers that are nearly as Lambertian
as opal glass, or have other angular patterns such as top-hat or square, without the
high losses of opal glass diffusers. Not all holographic diffusers have the other special
properties of opal glass (e.g., independence of illumination angle).
5.7.11 Diffusers and Speckle
A light source such as a HeNe laser, which is highly coherent in both space and time, is
very difficult to use with diffusers. Shining a HeNe into an integrating sphere produces
an optical field that is very Lambertian on a broad-area average, but that has very strong
small-scale structure called speckle. All rough surfaces illuminated with lasers produce
speckles that are a complicated function of position, but whose size is characteristic of
the material and of the size of the illuminated region. More diffuse materials produce
smaller speckles; the angular extent of the smallest ones is on the order of λ/d, where d
is the incoming beam diameter. Speckle consists of a mass of unorganized interference
fringes, caused by the coherent summation of fields from everywhere in the sphere. At
each point, these random reflections produce a certain optical amplitude and phase in
each polarization component, which vary all over the place. The best diffusers, such
as integrating spheres, produce speckles with characteristic size λ/2. Due to speckle
the relative standard deviation of the photocurrent will be on the order of
(PD area)(speckle area), which isn’t that small, and any vibration will smear that out
into a huge noise PSD in the low baseband. Thus diffusers aren’t always the way to get
good measurements, at least with lasers. See Section 2.5.1 for more discussion.
He flung himself on his horse and rode madly off in all directions.
—Stephen Leacock, Gertrude the Governess
Optical polarization is the main way the vector wave nature of light manifests itself in
practical problems. We’ve encountered plane waves, which are always perfectly polarized,
and the Fresnel formulas, which predict the intensity and polarization of plane waves
leaving a dielectric surface. Here we go into the measurement and manipulation of
polarization, and how not to get in trouble with it. Polarization components such as
retarders and Faraday rotators are mysterious to lots of people, but are actually fairly
simple devices unless you try to get deeply into the physics of how they do what they do.
Being practical folk, we’ll stick with their phenomenology and keep their inner workings
pretty well out of it.
The major uses of polarization components in optical systems are to control reflections,
as in sunglasses and fiber isolators, and to split and combine beams without the heavy
losses caused by ordinary beamsplitters.
Unpolarized Light
If you send thermal light through an analyzer,† twist the control ring as you may, the
same proportion of the light comes through. This remains true if you put any sort of
lossless polarization device ahead of it; a wave plate or a Faraday rotator doesn’t change
the polarization at all. Thus we say that thermal light is unpolarized . This is a poser,
because we know that any optical field can be decomposed into plane electromagnetic
† Analyzers
and polarizers are physically identical, but an analyzer is thought of as detecting the polarization
state produced by the polarizer—in communications terms, the analyzer is part of the receiving section, and
the polarizer is part of the transmitting section.
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
waves. Since all such waves are perfectly polarized, how can thermal light be
The key is that we’re really measuring the time-averaged polarization rather than
the instantaneous polarization. The light at any point at any instant does in fact have a
well-defined E vector, because if it didn’t, its energy density would be 0. In an unpolarized field, though, the direction of E varies extremely rapidly with time, changing
completely in a few femtoseconds in the case of sunlight. Thinking in terms of modulation frequency (see Section 13.3), the polarization information is not concentrated at
baseband the way it is with lasers, but instead is smeared out over hundreds of terahertz
of bandwidth. It is spread so thin that even its low frequency fluctuations are hard to
In k-space terms, the polarizations of different plane wave components are completely uncorrelated, for arbitrarily close spacings in K. This is in accord with the
entropy-maximizing tendency of thermal equilibrium—any correlation you could in principle use to make energy flow from cold to hot is always 0 in thermal equilibrium.
Highly Polarized Light
If we pass thermal light through a good quality polarizer, we get highly polarized thermal light. The plane wave components are still uncorrelated in phase but are now all
in the same polarization state. If such light does not encounter any dispersive birefringent elements, its polarization state may be modified but it will remain highly polarized. Its polarization can be changed achromatically with TIR elements such as Fresnel
rhombs, so that we can have thermal light with a well-defined circular or elliptical
Circular Polarization
We’ve encountered circular polarization before, but there’s one property that needs
emphasizing here, since so many useful polarization effects depend on it: the helicity
changes sign on reflection. Left-circular polarization becomes right circular on reflection,
and vice versa—E keeps going round the same way, but the propagation direction has
reversed, so the helicity has reversed too. This is also true of ordinary screw threads
viewed in a mirror, so it’s nothing too mysterious. Although linear polarization can be
modified on oblique reflection from a mirror (if E has a component along the surface
normal), circular polarization just switches helicity, over a very wide range of incidence
angles.† Since linear polarization can be expressed in terms of circular, this should strike
you as odd—there’s a subtlety here, called topological phase, that makes it all come out
right in the end.
An Often-Ignored Effect: Pancharatnam’s Topological Phase
When light traverses a nonplanar path, for example, in two-axis scanning, articulated
periscopes, or just piping beams around your optical system, its polarization will shift.
the reflection occurs at a dielectric interface (where rp = rs ), the polarization will become elliptical, at θB
the ellipse degenerates into linear polarization, and beyond θB , the helicity no longer reverses. (Why?)
† If
For reflection off mirrors, this isn’t too hard to see: since E is perpendicular to k, a
mirror whose surface normal has a component along E will change E. Make sure that
you follow your polarization along through your optical system, or you may wind up
with a nasty surprise.
A much more subtle fact is that the same is true for any system where light travels in a nonplanar path (e.g., a fiber helix). Left- and right-circular polarizations have
different phase shifts through such a path, giving rise to exactly the same polarization
shift we get from following the mirrors; this effect is known as Pancharatnam’s topological phase† and is what accounts for the puzzling difference in the polarization behavior
of linear and circularly polarized light upon reflection that we alluded to earlier (the
corresponding effect in quantum mechanics is Berry’s phase, discovered nearly 30 years
after Pancharatnam’s almost-unnoticed work in electromagnetics). This sounds like some
weird quantum field effect, but you can measure it by using left- and right-hand circular
polarized light going opposite ways in a fiber interferometer.‡ These polarization shifts
are especially important in moving-mirror scanning systems, where the resulting large
polarization shift may be obnoxious.
It sounds very mysterious and everything, but really it’s just a consequence of spherical
trigonometry; the k vector is normal to a sphere, and E is tangent to the sphere throughout the motion; depending on how you rotate k around on the surface, E may wind
up pointing anywhere. Equivalently, 2 × 2 rotation matrices commute, but 3 × 3 ones
If you follow your k vector around a closed loop enclosing a solid angle , the
relative phase of the right- and left-circular polarizations gets shifted by
φ = ±2.
Orthogonal Polarizations
We often describe two polarization states as orthogonal . For linear polarizations, it just
means perpendicular, but what about circular or elliptical ones? The idea of orthogonal
polarizations is that their interference term is 0, that is,
E1 · E∗2 = 0.
Two elliptical polarizations are thus orthogonal when their helicities are opposite, their
eccentricities equal, and their major axes perpendicular (i.e., opposite sense of rotation, same shape, axes crossed). It’s an important point, because as we’ll see when we
get to the Jones calculus in Section 6.10.2, lossless polarization devices do not mix
together orthogonal states—the states will change along the way but will remain orthogonal throughout. One example is a quarter-wave plate, which turns orthogonal circular
polarizations into orthogonal linear polarizations, but it remains true even for much less
well-behaved systems such as single-mode optical fibers.
† S.
Pancharatnam, Generalized theory of interference and its applications. Part 1. Coherent pencils. Proc. Indian
Acad. Sci 44, 2247– 2262 (1956).
‡ Erna M. Frins and Wolfgang Dultz, Direct observation of Berry’s topological phase by using an optical fiber
ring interferometer. Opt. Commun. 136, 354–356 (1997).
A polarizer allows light of one polarization to pass through it more or less unattenuated,
while absorbing or separating out the orthogonal polarization. Any effect that tends to
separate light of different polarization can be used: anisotropic conductivity, Fresnel
reflection, double refraction, walkoff, and the different critical angles for o- and e-rays
(related to double refraction, of course).
Polarizers are never perfectly selective, nor are they lossless; their two basic figures
of merit at a given wavelength are the loss in the allowed polarization and the open/shut
ratio of two identical polarizers (aligned versus crossed) measured with an unpolarized
source, which gives the polarization purity. The best ones achieve losses of 5% or less
and open/shut ratios of 105 or even more.
The dielectric constant connects the electric field E with the electric displacement D,
D = E.
For a linear material, is a tensor quantity (in isotropic materials the tensor is trivial,
just times the identity matrix).† (See also Section 4.6.1.) Tensors can be reduced to
diagonal form by choosing the right coordinate axes; the axes that diagonalize are
called the principal axes of the material; symmetry requires that they be orthogonal in
this case. (The refractive index also of course may depend on polarization but is not a
tensor, because it does not express a linear relationship.)
Some common birefringent optical materials are crystalline quartz, sapphire, calcite
(CaCO3 ), and stretched plastic films such as polyvinyl alcohol (PVA) or polyvinylidene
chloride (Saran Wrap). All these, along with most other common birefringent materials,
are uniaxial ‡ ; two of their three indices are the same, x = y = ⊥ ; light polarized in
the plane they define is an ordinary ray (o-ray), so called because it doesn’t do anything strange. The third index, which defines the optic axis, may be larger (positive
uniaxial) or smaller (negative uniaxial) than the o-ray index; if E has a component
along the optic axis direction, strange things occur, so that the beam is called an e-ray,
for “extraordinary.” Things get stranger and less relevant for absorbing birefringent
materials and for biaxial ones, so we’ll stick with the important case: lossless uniaxial
Electromagnetic fields are transverse, which in a uniform medium means that for a
plane wave, E, H, and k are always mutually perpendicular, and that the Poynting vector
S always lies along k. (The Poynting vector generally defines the direction the energy
† Landau and Lifshitz, The Electrodynamics of Continuous Media, has a lucid treatment of wave propagation
in anisotropic media, which the following discussion draws from.
‡ Less symmetric materials may be biaxial, that is, have three different indices, and in really messy crystal
structures, these axes need not be constant with wavelength. Biaxial crystals exhibit some weird effects, such
as conical refraction (see Born and Wolf).
goes in; that is, it’s the propagation axis of the beam as measured with a white card and
Neither of these things is true in birefringent materials, where we have only the weaker
conditions that D, B, and k are mutually perpendicular, as are E, H, and S. For instance,
the actual index seen by the e-ray changes with angle, unless the light propagates in
the plane defined by the ordinary axes, for only then can E lie exactly along the optic
axis. The propagation vector k defines an ellipsoid (where x, y, and z are the principal
+ z = k02 .
The refractive index n = k/k0 experienced by a given e-ray varies with its propagation
direction. The extreme values of ne are n⊥ (the o-ray index no ) when k is along the optic
axis and n when k is normal to the optic axis. There is a lot of sloppiness in the literature,
with n often being referred to as ne , whereas ne really varies between n⊥ and n . Light
encountering the surface of a birefringent material is split apart into two linearly polarized
components going in different directions. Phase matching dictates that k⊥ is preserved
across the boundary. The o-ray behaves just as if the material were isotropic with an
index of n⊥ , so that’s easy—S is parallel to k.
Determining k and S for the extraordinary ray is straightforward. The direction and
magnitude of ke can be found from (6.4) and the phase matching condition. Once ke is
known, the direction of S can be found easily; it lies in the plane defined by k and the
optic axis, and the angles θk and θS separating the optic axis from ke and S obey
tan θS =
tan θk .
Remember, though, that the phase term is still exp(ik · x)—stick with this and don’t get
confused by trying to calculate propagation distance along S and multiplying by ne k0 or
something like that.
If light travels directly down the optic axis, E has no component along it, so the
material appears to be isotropic. This is useful where the birefringence is obnoxious,
for example, “c-axis normal” sapphire windows used for their strength and chemical
Since the phase velocity of light in a material is c/n, the e- and o-rays propagate at
different phase velocities, so the two linear polarization components with k along z will
be phase shifted with respect to each other by an amount δ, where
δ = (ne − no )k0 z.
is needed in identifying E × H with the local energy flux: Poynting’s theorem applies to the integral of
S·dA over a closed surface, or equivalently with the volume integral of ∇ · S. That means that S is nonunique
in much the same way as the magnetic vector potential—Poynting’s theorem still holds if we add to S the curl
of any arbitrary vector field. It usually works.
† Care
Unless the incoming beam is a pure e- or o-ray, this will change the resulting polarization (as we saw in Section 1.2.8). This phenomenon is called retardation and is the
basis for wave plates. Retardation is usually specified in nanometers, since it a time
delay t that causes a phase shift δ = ω t, in contrast to a reflection phase as in a
Fresnel rhomb, which is almost wavelength independent. (In other words, retarders are
wavelength dependent even when the material has no dispersion.)
Double Refraction
An oblique beam entering such a material from an isotropic medium splits into two
beams, because the different refractive indices give different angles of refraction by
Snell’s law. This phenomenon is called double refraction (which is what birefringence
means, of course).
Besides double refraction, birefringent materials exhibit walkoff , as shown in Figure 6.1.
Although the k vector behaves normally in a birefringent material, the Poynting vector
does not; the energy propagation of the e-ray is not parallel to k, but lies in the plane
defined by k and the optic axis, somewhere between them, so that the e-ray seems
to walk off sideways. This weird effect arises because the Poynting vector is parallel
to E × H. The tensor character of prevents E from being perpendicular to k,† and the
Figure 6.1. Polarizers based on beam walkoff: (a) simple walkoff plate or beam displacer and
(b) the Savart plate, a symmetrical walkoff plate.
† Unless
D is an eigenvector of , that is, is a pure o-ray or lies along the optic axis.
cross-product relation then forces S to be not along k. This effect has nothing to do with
double refraction; instead, it’s a spatial analogue of the phase velocity/group velocity
distinction for a pulse of light. A general beam normally incident on a planar slab of
birefringent material will split apart into two beams going in different directions. Double
refraction can’t cause this directly, since at normal incidence no refraction occurs. Oblique
beams walkoff, as well, but is less obvious then. Now you know why it’s called the
extraordinary ray.
Aside: Defuzzing Filters. Very thin walkoff plates, usually LiNbO3 , are often used
in CCD cameras to reduce the disturbing moiré patterns due to the way the pixels are
arranged in color cameras (see Section 3.9.14). Two walkoff plates mounted at 90◦ to
one another, with a λ/4 plate in between, split an image point into an array of four points,
thus multiplying the OTF of the camera by cos(u dx) cos(v dy), where dx and dy are the
shift distances. This apodization rolls off the OTF to zero at frequencies where the moiré
patterns are objectionable. (Sometimes quartz is used, but it has to be thicker, which
causes more aberration.)
6.3.6 Optical Activity
A birefringent material has different refractive indices for different linear polarizations.
A material that has some inherent helicity, such as a quartz crystal or a sugar solution,
has different indices for different circular polarizations (helical antennas respond more
strongly to one circular polarization than to the other). The different coupling between the
bound electrons and the radiation field gives rise to a slightly different index of refraction for the two helicities. As a linearly polarized wave propagates in such a medium,
E changes direction; if there is no birefringence, E describes a very slow helix as it
propagates. This is called optical activity or circular birefringence.
Noncentrosymmetric crystals such as α quartz and tellurium dioxide may exhibit both
optical activity and birefringence; this combination makes the polarization effects of a
random hunk of crystal hard to predict.
If you put a mirror on one side of a piece of isotropic but optically active material
(e.g., a cuvette of sugar water), the linear polarization that comes out is exactly the same
as the one that went in; the rotation undoes itself. This is because the helicity reverses
itself on reflection—each component crosses the material once as left circular and once
as right circular, so that their total delays are identical, and the original polarization
direction is restored.
The effects of optical activity are fairly weak but highly dispersive; for a 90◦ rotation
in α quartz, you need 2.7 mm at 486 nm and 12 mm at 760 nm; this is around 100
times weaker than the effect of birefringence, so it dominates only when the light is
propagating right down the optic axis. The dispersion is occasionally useful (e.g., in
separating laser lines with low loss), but since it’s an order of magnitude higher than a
zero-order wave plate’s, optical activity isn’t much use with wideband light except for
making pretty colors.
Due to the columnar morphology of coatings (see Section 5.4.7), it is possible to
make artificial circular birefringent coatings by evaporation. The substrate is held at an
angle to the source and slowly rotated about the source–substrate axis, producing helical
columns that are highly optically active.
6.3.7 Faraday Effect
Another effect that leads to the slow rotation of E is the Faraday or magneto-optic
effect, which is often confused with optical activity because the effects are superficially very similar. Terbium-doped glasses and crystals such as terbium gallium garnet
(TGG), immersed in a magnetic field, rotate the polarization of light propagating parallel
to B by
= VBl,
where is the rotation angle, l is the path length, B is the axial magnetic field, and
V is a material property called the Verdet constant. The difference here is that there is
a special direction, defined by B. Heuristically, if you imagine that the application of
B starts a bunch of bound currents going around in circles, then what matters is not
the helicity but whether E is rotating the same way as the currents or not, because the
dielectric susceptibility will be different in the two cases.
The key point is that the rotation direction does not change on reflection. If we put our
mirror at one side of the magneto-optic material, E keeps going round the same way on
both passes, so the helicity change on reflection does not make the delays equal; Faraday
rotation doubles in two passes, instead of canceling—it is said to be nonreciprocal . This
property allows us to build optical isolators, which allow light to propagate one way but
not the other, and Faraday rotator mirrors, which help undo the polarization nastiness of
optical fibers.
Some materials exhibit polarization-selective absorption, as in Polaroid sunglasses. They
do it by anisotropic conductivity, which is what you’d expect given the close relationship
between conductivity and the imaginary part of n.
6.4.1 Film Polarizers
Film polarizers are made of anisotropically conductive polymer: stretched polyvinyl alcohol (PVA) doped with iodine. They work throughout the visible, but deteriorate in the
infrared, and are useless in the ultraviolet since PVA is strongly absorbing there. There
are several different kinds, for different wavelength intervals, but the good ones absorb
about 20–40% of the allowed polarization and have open/shut ratios of 104 . The selectivity of older types used to degrade significantly in the blue, but the newer ones are much
better. Their wavefront quality is relatively poor (about like window glass, 2λ/inch), and
they have a very low damage threshold, only 1 W/cm2 or so.
In order not to be limited by the wavefront wiggles, put the polarizer near the image.
An image formed in thermal light has very small phase correlations between points
to begin with (since the phase of object points further than λ/NA apart is essentially
uncorrelated), so phase wiggles are not much of a worry at an image.
6.4.2 Wire Grid Polarizers
Wire grids, which despite their name are arrays of very thin, closely spaced, parallel
wires, function well in the mid- and far-infrared, but the difficulty of making the pitch
fine enough prevents their use in the visible—such a structure is obviously a diffraction
grating, so the pitch has to be fine enough that the first diffracted order is evanescent (see
Section 7.2). Their open/shut ratios are usually about 102 , and they absorb or reflect about
50% of the allowed polarization. They also reflect some of the rejected polarization, but
how much depends on the metal and the geometry. Shorter-wavelength grids are usually
lithographically deposited, so you have to worry about substrate absorption as well.
6.4.3 Polarizing Glass
A development of the wire grid idea is the dichroic† glass Polarcor, made by Corning. It is an optical glass with small metallic silver inclusions. During manufacture, the
glass is stretched along one axis, which transforms the inclusions into small whiskers,
aligned with the axis of the stretch. These whiskers function like little dipole antennas
and are highly absorbing in a relatively narrow band. At present, Polarcor is best used
in the near-infrared (out to 1.6 μm) but is available down to 600 nm in the visible
(transmittance deteriorates somewhat toward short wavelengths). It has excellent transmission in one polarization (70–99%), and excellent extinction in the other (≈ 10−5 ), so
that its open/shut ratio is comparable to that of crystal polarizers. It has a wide (±30◦ )
acceptance angle and good optical quality—though there are a few more striae than in
ordinary optical glass, as one would expect. Polarcor’s laser damage threshold is lower
than calcite’s—if you hit it too hard, the silver grains melt and lose their shape. The
threshold is around 25 W/cm2 CW, or 0.1 J/cm2 pulsed.
At intermediate angles of incidence, reflections from dielectric surfaces are fairly strongly
polarized. At Brewster’s angle, Rp goes to 0, and for glass Rs ≈ 10% per surface. The
effect in transmission is not strong enough to qualify as a polarizer unless it’s used
intra-cavity, but can be enhanced by doing it many times. A pile of microscope slides
at 55◦ or so makes a moderately effective polarizer, and (as we saw in Section 5.4.4) a
(HL)m H stack of dielectric films can be highly effective.
6.5.1 Pile-of-Plates Polarizers
Assuming that the light is low enough in coherence that etalon fringes can be ignored,
m glass plates stacked together and oriented at θB will attenuate the s-polarized light by
a factor of 0.8m , which for 31 plates amounts to 10−3 , ideally with no loss at all in the p
polarization. This nice property is of course degraded as θi departs from θB , but it’s useful
over a reasonable angular range. The transmitted wavefront fidelity of a pile-of-plates
polarizer is poor because of accumulated surface error and multiple reflections of the
s-polarized beam between plates. The reflected beam is even worse; surface error affects
reflected light more than transmitted, and the multiple reflections are not superimposed.
The only real advantages are high power handling capability and ready availability of
† The
word dichroic has been given so many different meanings in optics that it’s now next to useless.
6.5.2 Multilayer Polarizers
Alternating layers of high and low refractive index can be made into an effective polarizer,
similar to the pile of plates but without the beam quality disadvantage. The similarity is
not perfect, because interference effects cannot be ignored in thin films, even for white
We saw this trick in Section 5.4.4 with polarizing cubes, but such a film stack can
also be deposited on a glass plate, forming a polarizing plate beamsplitter. Since there is
no optical cement, the damage threshold of these devices is high, making them a good
match for powerful pulsed lasers, such as ruby (694 nm) and Nd:YAG (1064 nm). They
are rarely used elsewhere, for three main reasons: Brewster incidence is very oblique,
so that the reflected light comes off at an odd angle; the angular alignment is critical
(as in all Brewster polarizers), and there is no obvious cue for rough alignment as there
is in polarizing cubes; and the large index discontinuity at the top surface of the film
reflects an appreciable amount of p-polarized light, making the polarization purity of the
reflected wave poor.
6.5.3 Polarizing Cubes
Next to film polarizers, the most common type of polarizer in lab drawers is the polarizing
beamsplitter cube, which we discussed at length in Sections 4.7.2 and 5.4.5. These are
superficially attractive devices that in the author’s experience cause more flaky optical
behavior than anything else, barring fiber.
Birefringent materials can be used in various ways to make polarizers. The three main
classes use (best to worst) double refraction, walkoff, and TIR.
Crystal polarizers are usually made of calcite because of its high birefringence, good
optical properties, and reasonable cost. Quartz is sometimes used, but its optical activity
causes quartz prisms to exhibit weird polarization shifts versus wavelength, field angle,
and orientation—it matters which way round you use a quartz polarizer. None of these
crystal devices is cheap, so use them only where you need really good performance. The
CVI Laser catalog has an extensive discussion of polarizing prisms.
6.6.1 Walkoff Plates
A very simple polarizer or beam displacer based on beam walkoff (Section 6.3.5) can
be made from a thick plate of birefringent material whose optic axis is not parallel to
its faces, as in Figure 6.1. When a beam comes in near normal incidence, the o-ray
passes through such a plate nearly undeviated, whereas the e-ray walks off sideways.
The two are refracted parallel to the original incident beam when they leave the plate.
Walkoff plates are inexpensive, because single plane-parallel plates are easy to make,
and because no great precision is required in the orientation of the optic axis if only
the o-ray is to be kept. This technique is frequently used in optical isolators for fiber
applications, where cost is a problem and the angular acceptance is small. Note that
the optical path length seen by the two beams is very different, so using walkoff plates
as beamsplitters in white-light interferometers is difficult. The shift is a small fraction
of the length of the prism but works over a large range of incidence angles; thus the
étendue is small if you need the beams to be spatially separated, but large if overlap
is OK.
6.6.2 Savart Plates
The walkoff plate can be made more nearly symmetrical by putting two of them together
to make a Savart plate. These consist of two identical flat, square plates of quartz, calcite,
or LiNbO3 whose optic axes are oriented at 45◦ to the surface normal. The plates are
rotated 90◦ to each other and cemented together. (One plate’s optic axis lies in the plane
of the top edge and the other one’s in the plane of the side edge.)
An o-ray in the first plate turns into an e-ray in the second, and vice versa, so that the
two polarizations are offset by the same amount from the axis, in opposite directions,
and emerge parallel to their initial propagation direction. At normal incidence, they have
zero path difference and hence produce white-light fringes if they overlap.
Away from normal incidence, these are not zero path difference devices, since the e-ray
is polarized at a large angle from the optic axis, the path difference changes linearly with
angle, rather than quadratically as in a Wollaston prism; Section 19.1.1 has an example
where this seemingly obscure point caused a disaster.
Double-refraction polarizers exploit the different index discontinuity seen by e- and o-rays
at an interface. Generally they have excellent performance, but like other refracting prisms
their deflection angles change with λ, and they anamorphically distort the beam to some
6.7.1 Wollaston Prisms
A Wollaston prism consists of two wedges of calcite, with their optic axes oriented
as shown in Figure 6.2a (the diagram shows a Wollaston prism made from a positive
uniaxial crystal such as quartz). A beam entering near normal incidence is undeviated
until it encounters the buried interface. There, the e-ray will see the index go down at
the surface and so will be refracted away from the normal, whereas the o-ray will see
an index increase and be refracted toward the normal by nearly the same amount. (The
bending goes the other way for negative uniaxial crystals.) Both beams hit the output
facet at a large angle and so are refracted away from the axis.
n e < no
ne < no
Figure 6.2. Double-refraction polarizers have the best extinction and purity of any type. (a) Wollaston prisms have no etalon fringes. (b) Rochon prisms have one undeviated beam.
The result is a polarizing prism of unsurpassed performance: polarization ratios of
10−6 , nearly symmetrical beam deviation, and, crucially, no internal back-reflection to
cause etalon fringes. The beams are not perfectly symmetrical because of the asymmetric
incidence on the buried interface. The phase shift between beams is linear in y, independent of x, and varies only quadratically in the field angle, since the optic axis lies in
the plane of the prism faces, making Wollastons good for interferometers. You can find
a lateral position where the OPD between the two beams is zero, and moving the prism
sideways makes a nice phase vernier. Quartz Wollastons have beam separations of 1◦ to
3.5◦ , while calcite ones are typically 10◦ to 20◦ , and special three-element calcite ones
can achieve 30◦ . Wollastons have excellent étendue on account of their wide angular
6.7.2 Rochon Prisms
Table 6.1 shows that the refractive indices of calcite are in the range of optical glass.
You can thus make a Wollaston-like prism by using one calcite wedge and one glass one
(with nglass ≈ no ), as shown in Figure 6.2. The difference is that one beam is undeviated,
as in a Glan–Thompson. Rochon prisms suffer from severe etalon fringes due to the
undeviated path, but the polarization purity is similar to a Wollaston, and because of
the larger angular acceptance of double refraction, you can use the Rochon tipped fairly
far to avoid the back-reflection. The undeviated beam is an o-ray and so has no major
chromatic problems.
Some Rochons (and the closely related Senarmont prisms) are made with the glass
wedge replaced by a chunk of calcite with its optic axis normal to the output facet, so
that both rays see nearly the o-ray index in the second wedge. It has the optical properties
of the Rochon without its cost advantage, but is better color-corrected and less likely to
delaminate due to differential thermal expansion.
Because of the variations of ne with incidence angle, a prism (like the Rochon) that
transmits the o-ray undeviated is probably superior in imaging applications, as it is easier
to correct the resulting aberrations, particularly astigmatism, anamorphic distortion, and
chromatic aberration.
Aside: Quartz Rochon Prisms. Occasionally a Rochon prism will surface that has
both wedges made of α quartz; unfortunately, optical activity in the output wedge will
TABLE 6.1.
Properties of Common Birefringent Materials
Magnesium fluoride
α Crystal quartz SiO2
Sapphire Al2 O3
Potassium dihydrogen
phosphate (KDP)
Ammonium dihydrogen
phosphate (ADP)
Barium titanate (BaTiO3 )
Lithium niobate (LiNbO3 )
Calcite (CaCO3 )
Rutile (TiO2 )
(n⊥ )
n − n⊥
0.15–2, 3.3–6
Shear Wave
A-O Cell
n e > n o n e < n o Fringe Locus
Figure 6.3. Wollaston prisms have a variety of uses: (a) beam splitting, (b) beam combining, (c)
heterodyne interferometry, and (d) solid Fourier transform spectrometer.
severely degrade the polarization purity—not only rotating it, but rotating light at different locations and wavelengths by different amounts.
6.7.3 Cobbling Wollastons
Wollaston prisms are usually used as etalon-fringe-free polarizing beamsplitters, but that
doesn’t exhaust their usefulness. They can of course be run backwards as beam combiners,
where two input beams are combined into a single output beam. More generally, a
Wollaston can be used to make two beams of orthogonal polarization cross outside the
prism, as shown in Figure 6.3.
Due to the slightly unequal angular deviations of the two beams, Wollaston prisms
have a path difference that depends on position; an interferometer can have its operating
point† adjusted by moving the prism back and forth slightly in the direction of the
wedges. Note that if you’re using the prism to combine beams of any significant NA,
this asymmetry makes the plane of the fringes not exactly perpendicular to the axis, so
that some in–out motion is needed to maintain the same average optical path in the two
beams as you move the prism sideways.
Wollastons are quite dispersive, so they aren’t especially useful for imaging in white
light unless the beam separation is very small, as in schlieren interferometry, which takes
advantage of the colored fringes formed by very small-angle Wollastons to make images
of weak phase objects such as air currents.
Example 6.1: Solid Fourier Transform Spectrometer. Interferometers based on Wollaston prisms have been used fairly widely. One interesting approach is static Fourier
transform interferometry, similar to an FTIR spectrometer (see Section 10.5.6) but with
no moving parts. The wide angular acceptance of Wollastons makes high étendue interferometers easy to build. The limiting factor in such interferometers is the birefringence
† The
operating point of an interferometer, amplifier, or what have you is the nominally stable point of a
nonlinear response about which the (supposedly small) signal excursions occur.
of the plates, which makes the transmission phase of an off-axis ray a peculiar function
of the angle and limits throughput. Using a positive uniaxial crystal (e.g., quartz) for the
splitting Wollaston and a negative uniaxial one (e.g., ammonium dihydrogen phosphate)
for the recombiner results in the birefringence canceling out, so that the full étendue is
available† .
6.7.4 Nomarski Wedges
A Nomarski wedge is a modified narrow-angle Wollaston prism. In the Wollaston, the
beams diverge from the middle of the prism, so that building an interference microscope
requires the microscope objective’s exit pupil to be outside the lens, whereas it’s usually
just inside. The Nomarski prism has the optic axis of one of the wedges oriented out
of the plane of the end face; thus the e-ray walks off sideways far enough that its exit
pupil (where the e- and o-rays cross and the white-light fringes are located) is outside
the prism. In a symmetrical system using two Nomarski wedges, the path difference
between the two beams is zero, as is required for achromatic differential interference
contrast (DIC) measurements.
6.7.5 Homemade Polarizing Prisms
The one serious drawback to birefringent prisms is that they’re expensive, especially
in large sizes. You can make an adjustable prism similar to a Wollaston of a few milliradians by bending a bar of plastic such as polycarbonate. Stress birefringence splits
the polarizations, and the bending causes the bar to become trapezoidal in cross section
(by the local strain times Poisson’s ratio) so that the two polarizations are refracted in
slightly different directions. This has been used in schlieren interferometers.‡ The material creeps, so these prisms aren’t very stable, and cast plastic isn’t too uniform, so they
have to be used near an image.
The third major class of birefringent polarizers is based on TIR at a thin layer between
two prisms of birefringent material. Because ne and no are different, the e- and o-ray
critical angles will be different as well, so that we can transmit one while totally reflecting
the other. In a sufficiently highly birefringent material, the difference is large enough to
be useful, although TIR polarizers always have a much narrower angular acceptance
than double-refraction ones. The exit angles are set by o-ray reflections, so they are
pretty well achromatic as long as the exit face is close to perpendicular to the beam (see
Section 6.8.1).
The small angular acceptance leads to small étendue, and there are some familiar drawbacks such as poor polarization purity in the reflected beam.§ A more subtle problem is
† D. Steers, B. A. Patterson, W. Sibbett, and M. J. Padgett, Wide field of view, ultracompact static Fourier
transform spectrometer. Rev. Sci. Instrum. 68(1), 30–33 (January 1997).
‡ S. R. Sanderson, Rev. Sci. Instrum. 76, 113703 (2005).
§ It’s a bit more complex than in a polarizing cube, because an oblique reflection at the TIR surface can mix
e- and o-rays. Real prisms are carefully cut to avoid this effect.
that since calcite has a negative birefringence, it’s the e-ray that is transmitted undeviated, and in imaging applications, its wavefronts are aberrated by the variation of ne with
angle (see the earlier Rochon prism discussion). All in all, TIR polarizers are inferior to
double-refraction types for most uses.
6.8.1 Refraction and Reflection at Birefringent Surfaces
When calculating the behavior of obliquely incident light at the surface of a birefringent
material, life gets somewhat exciting unless the optic axis is perpendicular to the plane of
incidence. When doing this sort of problem, remember that phase is phase is phase—you
calculate phase matching at the interface based on the k vector of the incoming beam,
period. The angle of reflection is a consequence of the fundamental physics involved,
namely, the phase matching condition, which remains in force.
For example, imagine making the calcite prism of Figure 6.4a out of MgF2 , so that
the e-ray is now the one reflected. Light coming in s-polarized is a pure o-ray, but the
p-polarized light starts out as a high index e-ray and winds up as a low index e-ray (if
the wedge angle were 45◦ it would wind up as an o-ray). Thus the value of k changes,
so the “law of reflection” is broken: θr = θi .
6.8.2 Glan–Taylor
As shown in Figure 6.4a, we can make a TIR polarizer from a simple triangular calcite
prism, with the optic axis lying parallel to one edge of the entrance face, and with a
wedge angle α whose complement lies between the e- and o-ray critical angles. It has
calcite’s wide transmission range (220–2300 nm), and because there are no cemented
joints, its laser damage threshold is high, 100W/cm2 or so. This simple device has some
serious disadvantages, too; as in the beamsplitter cube, the reflected polarization purity
is poor, but there are more. The transmitted beam exits near grazing, because the two
refractive indices are not very different; it is anamorphically compressed, which is usually
Figure 6.4. TIR polarizing prisms: (a) simple calcite prism (ne < no ); (b) quartz Glan–Taylor
(ne > no ) adds an air-spaced second prism to straighten out the transmitted beam; and (c) calcite
Glan–Thompson (ne < no ) uses cement with ne < n < no .
undesirable, and it also exhibits chromatic dispersion. The angle of incidence can range
only between the e- and o- ray critical angles, which limits the field severely.
If we put two such prisms together, as in Figure 6.4b, we have the Glan–Taylor prism.
Here the first prism does the polarizing, and the second one undoes the beam deviation
and almost all the chromatic dispersion. The Glan–Taylor keeps the wide transmission
and high damage threshold of the calcite prism, but the stray light is quite a bit worse
due to multiple bounces in the air gap, where the reflection coefficient is high due to the
oblique incidence.
You don’t usually try to use the reflected beam from a Glan–Taylor for anything,
because its polarization is impure and changes with angle due to the birefringence of the
entrance prism. For laser applications, you can cut the prisms at Brewster’s angle for the
e-ray, which lies above the o-ray critical angle. The resulting Glan-laser prism reduces
the angular acceptance while improving the multiple reflections and stray light.
6.8.3 Glan–Thompson
The Glan–Thompson prism of Figure 6.4c is made by cementing two skinny calcite
wedges together, hypotenuse to hypotenuse, using cement with an index of about 1.52.
The superficial similarity to the Glan–Taylor is somewhat misleading; because nglue
is between the e- and o-ray indices, the e-ray cannot experience TIR, so the Glan–
Thompson has a much wider angular acceptance than the Glan–Taylor, even though it
is longer and skinnier.
As in the Glan–Taylor, the first prism does the polarizing, and the second straightens
out the transmitted beam, so that the transmitted beam is undeviated. Because the indices
of the calcite and the cement are not very different (no = 1.655, nglue ≈ 1.52), this
requires near-grazing incidence, making the Glan–Thompson prism rather long for its
The o-ray makes two or three TIR bounces in the entrance prism, so that’s a good place
for a beam dump; Glan–Thompson prisms are usually embedded in hard black wax (a
reasonable index match to the o-ray in calcite), so that only the undeviated beam emerges.
With a four-sided prism on the entrance side, the reflected ray can be allowed to leave
through its own facet, near normal incidence. This configuration is called a beamsplitting
Thompson prism and is quite a good device; the closer index match at the interface
makes the reflected polarization purer, and the reflected light doesn’t see any serious
birefringence since its polarization direction is unaltered. Nonetheless, Glan–Thompson
prisms share most of the disadvantages of polarizing cubes, including strong etalon fringes
and low damage threshold (≈1 W/cm2 ) due to the glue, and reflected light polarization
purity inferior to that of double-refraction polarizers.
The polarization of a monochromatic electromagnetic wave can be decomposed in terms
of two arbitrary orthonormal complex basis vectors. This means that, for example, a
linearly polarized light beam can be expressed as the sum of two orthogonal circularly polarized beams (of right and left helicity), and vice versa. The devices of this
section all use this property to apply different phase delays to different polarization
6.9.1 Wave Plates
Retarders or wave plates are the simplest application of birefringence. A uniaxial plate
of thickness d with its optic axis parallel to its faces will delay a normally incident
o-ray by
to = no
and similarly for the e-ray, so that the phases of the two are shifted by
δ = (ne − no )
Retarders based on this principle are called wave plates. When φ is λ/4 for a given
wavelength, you have a quarter-wave plate, and when it’s λ/2, a half-wave plate; these
are the two most useful kinds. Note that in the absence of material dispersion, this
retardation is a pure time delay, so that the phase varies rapidly with λ. As we saw
in Chapter 4, there are also retarders based on the phase shift upon TIR whose phase
shift is nearly constant with λ. There are also achromatic wave plates made of multiple
polymer layers, whose phase shift is reasonably independent of λ. As usual, there is a
three-way trade-off between complexity (i.e., cost and yield), degree of achromatism,
and bandwidth.
6.9.2 Quarter-Wave Plates
Quarter-wave plates are good for converting linear to circular polarization and back.
Consider a λ/4 plate lying in the xy plane whose fast axis is x, with a plane wave
passing through it with k parallel to z. If E is at π /4, the e-ray and o-ray amplitudes will
be equal. These get out of phase by π /2 crossing the plate, so that the field exiting is
Eout = √ (x̂ cos ωt + ŷ sin ωt),
which is left-circular polarization (goes like a right-handed screw). Putting the slow axis
along x (or equivalently putting E at −π /4) makes right-circular polarization instead.
6.9.3 Half-Wave Plates
A half-wave plate delays one Cartesian component by half a cycle with respect to the
other, which reflects E through the fast axis. This is very useful where both the initial and
final polarization states are linear—you twist the wave plate until the polarization lines
up just right. Linear polarization stays linear, but with circular or elliptical polarization,
the helicity gets changed, so right and left circular are exchanged. Figure 6.5 shows how
this works.
Combining differently oriented retarders is most easily done with rotation matrices; a
retarder of βλ whose slow axis is at θ1 can be written
cos θ1 − sin θ1
exp(i2πβ) 0
cos θ1
sin θ1
R(β, θ1 ) =
− sin θ1 cos θ1
sin θ1
cos θ1
which we’ll come back to a bit later.
λ /4
λ /4
−π/4 y
R Circ
= λ /2
Figure 6.5. Retarders. Here two quarter-wave plates (n⊥ < n ) turn a linearly polarized beam first
into circular, then into the orthogonal linear polarization, just as a half-wave plate would.
6.9.4 Full-Wave Plates
The retardation of a wave plate depends on its rotations about x and y, so that small
errors can be removed by tilting it. For a uniaxial material, the retardation is increased
by rotating it around the optic axis, since no and ne are unaltered and the path length is
increased; tipping the other way also increases it, because in almost all uniaxial materials,
|ne − no | declines more slowly than the path length grows.
A full-wave plate nominally does nothing at all, but in fact the dependence of the
retardation on the tipping of the plate makes it a sensitive polarization vernier control,
able to provide small amounts (<λ/8) of retardation to balance out other sources of
birefringence in the beam.
6.9.5 Multiorder Wave Plates
Much of the time, it is impractical to make the plate thin enough for such small retardations (a visible quarter-wave plate made of quartz would be only 20 μm thick). For
narrowband light propagating axially, it is sufficient that the retardation be an odd multiple of λ/2 for a half-wave plate or λ/4 for a quarter-wave plate. It is more practical to
make a 20.25λ plate (say), which has just the same effect.
Neglecting the change in ne with angle, the retardation goes approximately as the
secant of the incidence angle, so that a 20.25λ multiorder plate will have an étendue over
three orders of magnitude smaller than a 0.25λ zero-order one for a given retardation
tolerance, and chromatic and temperature shifts 80 times larger as well. This isn’t usually
a problem with laser beams used in the lab but is serious when you need to run a
significant aperture or work in field conditions. One good thing about multiorder wave
plates is that they can be made to work at more than one wavelength; for example, you
can get quarter-wave plates that work at 633 nm and 1064 nm. It’s easier to make them
when the wavelengths are significantly different. That kind are often cut with the optic
axis slightly out of the plane of the end faces, to make the ratios come out just right with
a reasonable thickness. Small amounts of walkoff will result.
6.9.6 Zero-Order Wave Plates
The poor étendue of multiorder wave plates can be fixed by putting two of them together,
oriented at 90◦ to each other so that the aperture-dependent phase shifts largely cancel.
We might laminate a 20.38λ plate with a 19.88λ plate to make a zero-order 0.50λ
6.9.7 Film and Mica
The cheapest retarders are made from very thin layers of mildly birefringent materials
such as PVA film or mica. These materials are easily prepared in very thin sheets, and
since only a small retardation is desired, only a little is needed. Unfortunately their
accuracy, uniformity of retardation across the field, and transmitted wavefront distortion
have historically been poor. On the other hand, because even the cheap ones are usually
zero-order devices, their dispersion and angular sensitivity are small. Polymer retarders
are enjoying a bit of a renaissance, because when carefully made (usually in two-layer
stacks) they can have unique properties, such as decent achromatism. In a low coherence
imaging system, you can get around their wavefront distortion by putting them near
a focus.
6.9.8 Circular Polarizers
The usual polarizer types resolve the incoming polarization state into two orthogonal
linear polarizations. This is not of course the only choice; since an arbitrary linear polarization can be built up out of right and left circularly polarized light with a suitable phase
shift between them, it follows that we can use these as basis vectors as well. Circular
polarizers are quite useful for controlling back-reflections (e.g., glare on glass panels).
They’re made by laminating a film polarizer to a polymer film quarter-wave plate, with
the fast axis of the plate at 45◦ to the polarization axis. One serious gotcha is that there’s
usually a retarder on only one side. If used in one direction, that is, with the polarizer
turned toward an unpolarized light source, this does result in a circularly polarized output and will indeed attenuate reflected light returned back through it; but it won’t pass a
circularly polarized beam through unchanged, nor will it work if it’s turned round. For
that you need two retarders, one on each side.
6.10.1 Basis Sets for Fully Polarized Light
We saw in Section 6.2.5 that light could be expressed in terms of Cartesian basis vectors,
the Jones vectors:
E⊥ =
= Ex x̂ + Ey ŷ.
A similar decomposition in terms of circularly polarized eigenstates is useful in discussing optical activity and Faraday rotation. A plane wave propagating toward positive
Z with complex electric field Ẽ⊥ can be decomposed as
Ẽ⊥ ≡
ẼR 1
2 −i
TABLE 6.2.
Jones Matrix Operators for Common Operations
cos θ
sin θ
Coordinate rotation of θ
− sin θ cos θ
δ Radian retarder, slow axis along x
1 0
Analyzer along x
0 0
cos2 θ
sin θ cos θ
Analyzer at θ to x
sin θ cos θ
cos2 θ
where left- and right-circular components ẼL and ẼR are given by
ẼL = E · √
and ẼR = E · √
Linearly polarized light has ẼL = eiφ ẼR , where φ is the azimuthal angle of E measured from the x axis. Polarization gizmos can be represented as 2 × 2 matrix operators;
in Cartesian coordinates, the result is the Jones matrices shown in Table 6.2.
Like ABCD matrices, these are not powerful enough to model everything without
hand work; for example, treating reflections takes some care. We model optical activity
and Faraday rotation as coordinate rotations, but since one adds and the other cancels on
the way back, we have to be very careful about the bookkeeping; we’ll do an example
of this later when we consider the Faraday rotator mirror.
6.10.2 Partial Polarization and the Jones Matrix Calculus
Light can be partially polarized as well, most often by reflecting thermal light from a
dielectric. This polarization is often fairly strong, especially when the reflection takes
place near Brewster’s angle, as we know from the effectiveness of Polaroid sunglasses
in reducing glare from water, car paint, and glass. Rayleigh scattering also produces
partially polarized light; try your sunglasses on a clear sky—when you look at 90◦ to
the sun, the polarization is quite pronounced.
The vectors that are adequate for describing full polarization fail for partial polarization, which is an intrinsically more complicated situation. If the direction of k is
prespecified, we can express the polarization properties of a general narrowband beam
as a four-dimensional vector (the Stokes parameters, see Born and Wolf) or by a 2 × 2
coherency matrix .† As Goodman explains,‡ the coherency matrix formulation lets us follow the polarization state of our beam through the system by matrix multiplication of E⊥
by the accumulated operator matrices, written in reverse order, just the way we did ray
† The two are very closely related; the elements of the coherency matrix are linear combinations of the Stokes
‡ Joseph W. Goodman, Statistical Optics. Wiley, Hoboken, NJ, 1986.
TABLE 6.3.
Coherency Matrices for Some Polarization States
1 0
0 0
Linear y
Linear x
0 0
0 1
1 −i
Right circular E0
Left circular E0/2
−i 1
1 0
0 1
tracing with the ABCD matrices, and it is easily connected to time-averaged polarization
measurements. The coherency matrix J is the time-averaged direct product ẼẼT∗ :
= ⎣#
Ẽx Ẽx∗
Ẽy Ẽx∗
$ #
$ #
Ẽx Ẽy∗
Ẽy Ẽy∗
It’s easy to see from the definition that (up to a constant factor) Jxx and Jyy are
the real-valued flux densities you’d measure with an analyzer along x(I0◦ ) and y(I90◦ ),
respectively. The complex-valued Jxy is related to the flux density I45◦ that you get with
the analyzer at 45◦ , and the I45
◦ you get by adding a λ/4 plate with its slow axis along
y before the analyzer,
1 Jxy = I45◦ − 12 (I0◦ + I90◦ ) + i I45
◦ − 2 (I0◦ + I90◦ )
Table 6.3 has coherency matrices for some special cases.
6.10.3 Polarization States
It is commonly held that when you superpose two beams, their Js add, but that assumes
that they are mutually incoherent, which is far from universally true. You’re much safer
sticking closer to the fields unless you know a priori that the waves you’re combining are mutually incoherent but still narrowband enough for the Jones matrix approach
to work.
A couple of mathematical reminders: a lossless operator L is unitary—all its eigenvalues are on the unit circle and LL† = I, that is, L† = L−1 , where the adjoint matrix
L† is the complex conjugate of the transpose, L† = (LT )∗ . These lists can be extended
straightforwardly by using the definition (6.15) and matrix multiplication. Remember that
although the operators multiply the fields (6.13) directly, applying a transformation to J
or L requires applying it from both sides; if E⊥ = LE⊥ ,
∗T E (E ) = (LE)(L∗ E∗ )T = LJL† .
It’s worth writing it out with explicit indices and summations a few times, if you’re
rusty—misplacing a dagger or commuting a couple of matrices somewhere will lead
you down the garden path otherwise.
6.10.4 Polarization Compensators
Extending the zero-order quartz wave plate idea, we can make one plate variable in
thickness by making it from two very narrow wedges rather than a single plate, yielding
the Soleil compensator. Provided the two wedges are identical, the retardation is constant
across the field and can be adjusted around exactly 0 by sliding one of them, which is
a great help in wide-field and wideband applications, for example, looking at small
amounts of stress birefringence with a nulling technique. Exactly zero retardation is
important only in such cases; in narrowband, low NA systems, it’s enough to have the
retardation be 0 mod 2π , and these compensators are no better than a pair of quarter-wave
6.10.5 Circular Polarizing Film for Glare Control
Laminated circular polarizers of moderate performance can be made cheaply in very large
sizes, which makes them attractive as a glare-reduction device for instrument displays;
room light passing through the polarizer and then being reflected is absorbed on the
return trip. Before CRT screens were AR coated efficiently, circular polarizers were very
popular computer accessories, and they remain useful in similar situations.
6.10.6 Polarization Rotators
Optically active materials such as quartz or sugar solutions can be used for polarization
rotators. Those made from amorphous materials (e.g., Karo corn syrup) have no birefringence, and so linearly polarized light stays linearly polarized regardless of wavelength,
field angle, or what have you. They’re inconvenient to make, hard to adjust, and ten times
more dispersive than half-wave plates, so apart from special situations such as optical
diodes, they have few compelling advantages.
6.10.7 Depolarizers
It is impossible to reproduce the extremely rapid polarization changes of thermal light
when starting from a more coherent source such as a laser beam. Devices that purport
to depolarize light never do it well enough for that; they just produce a speckle pattern
varying more or less rapidly in time or space. If your experiment is slow enough, this
may suffice, but in fact it rarely does. Polarized light is something we just have to
live with.
There are two classes of depolarizers: wave plates whose retardation varies rapidly
across their faces (e.g., Cornu depolarizers), and moving diffusers, such as a disc of
ground glass spun rapidly on a shaft. A Cornu depolarizer can do a reasonable job on
wideband light, providing the 2π period of the polarization change is sufficiently rapid
and the spatial resolution sufficiently low.
Fixed depolarizers help to eliminate the mild polarization dependence of some optical instruments, for example, PMTs, grating spectrometers, and so on, when used with
broadband light that may be partially polarized. They do a good enough job for that,
The spinning ground glass technique often tried with laser beams is much less successful: all you get is a rapidly rotating speckle pattern, which causes a whole lot of
noise. Unlike the situation in Section 2.5.3, the order-1 changes in instantaneous intensity at any point are not smeared out over a bandwidth of hundreds of terahertz, but
concentrated in a few hundred kilohertz; the result is not pretty. The rotating speckle
pattern also produces undesired correlations in space. These correlations can be reduced
by using two discs rotating in opposite directions; if this is properly done, the speckles
no longer appear to rotate. Doing it really properly is not trivial, however, and anyway
the huge intensity noise remains. If you are forced to do your measurement this way,
be prepared to integrate for a long time; it is normally preferable to use polarization
flopping, where you do separate measurements in horizontal and vertical polarization
and combine the two, or use a rotating λ/2 plate and integrate for a whole number
of periods.
6.10.8 Faraday Rotators and Optical Isolators
Faraday rotators are straightforward applications of the Faraday effect, using a magnetooptic crystal such as YIG in a magnetically shielded enclosure with carefully placed,
stable permanent magnets inside providing a purely axial field in the crystal. The main
uses of Faraday rotators are in optical isolators and in transmit/receive duplexing when
the returned light is not circularly polarized, for one reason or another.
These devices, shown in Figure 6.6, all rely on nonreciprocal polarization rotation.
The simple isolator uses two walkoff plate polarizers, oriented at 45◦ to one another, and
a 45◦ Faraday rotator. Light making it through the first rotator gets its E rotated through
45◦ on the first pass, so that it is properly oriented to pass through the second polarizer
without loss. Light coming the other way has to make it through the second polarizer
and is then rotated 45◦ in the same direction, putting it at 90◦ to the first polarizer, so
that none gets through. Ideally the isolation would be perfect, but it is more typically
30 dB per isolator, with a loss of about 1 dB.
1/8 Faraday Rotator
Beam Passed
Beam Blocked
Figure 6.6. Two polarizers plus a 45◦ Faraday rotator make an optical isolator.
This simplest Faraday isolator requires fully polarized input light, but polarizationinsensitive ones can also be made; since you have to use two different paths, it isn’t
trivial to preserve the input polarization in the process, unfortunately.
The most important uses of Faraday isolators are preventing feedback-induced instability in diode lasers and in preventing high-finesse Fabry–Perot cavities from pulling
the frequency of sources being analyzed as the F-P scans.
It is possible to build a circulator, an M-port device where the input on port m goes
out port m + 1 (mod M). Circulators are common in microwave applications but rare in
optics. A related device is the optical diode, a 45◦ Faraday rotator plus a −45◦ optically
active cell, used in ring laser cavities; only one propagation direction is an eigenstate of
polarization, so that the ring lases in one direction only (the other one gets killed by the
Brewster windows).
6.10.9 Beam Separators
A polarizing beamsplitter such as a cube or Wollaston, plus a λ/4 plate, makes a beam
separator, very useful for separating the transmit and receive beams in an interferometer or
scanning system. The wave plate is aligned at 45◦ to the axes of the beamsplitter, as shown
in Figure 6.7. On the transmit side, p-polarized light passes through the prism and gets
turned into left-circular polarization. Specularly reflected light comes back right-circular,
and so the λ/4 plate turns it into s-polarized light in the cube, which is reflected. If
the components are lossless, the wave plate accurate, and the incoming light perfectly
p-polarized, the beam suffers no loss whatever in its round trip.
6.10.10 Lossless Interferometers
In Section 4.8.1, we saw that sending light on two passes through a nonpolarizing beamsplitter costs you a minimum of 75% of your light. That’s 93.75% of your detected
electrical power, representing an SNR degradation of 6 dB in the shot noise limit and
12 dB in the Johnson noise limit—and that’s in the best case, with a beamsplitter without
excess loss.
L Circ
R Circ
Figure 6.7. A polarizing beamsplitter plus a quarter-wave plate make a beam separator, able to
disentangle the transmit and receive beams of an interferometer or scanning system.
If we have a polarized source such as a laser, we can use a beam separator to split and
recombine the light as shown in Figure 1.12. The only problem is that the two beams
are orthogonally polarized and so don’t interfere. The usual solution to this is to put an
analyzer at 45◦ to the two beams, resulting in 100% interference but 6 dB detected signal
loss. However, by using a polarizing beamsplitter oriented at 45◦ , detecting the two pairs
of beams separately, and subtracting the resulting photocurrents, we get the equivalent
of 100% interference with no signal loss, as in the ISICL sensor of Example 1.12.
6.10.11 Faraday Rotator Mirrors and Polarization Insensitivity
As an application of the Jones matrix calculus, let’s look at the Faraday rotator mirror,
which is widely used in fiber optics. It consists of a 45◦ Faraday rotator in front of a
mirror, so that the light passes twice through the rotator, and so the total rotation is 90◦ .
We have to complex-conjugate the beam to represent the mirror, because the helicity
changes and in this model we’ve no way of expressing the propagation direction.
In operator representation, this is
E = Rπ/4 Rπ/4 E = Rπ/2 E∗
0 1
= 0,
⇒ E · E = Ex Ey
−1 0
that is, the light coming back is polarized orthogonally to the light coming in, regardless
of the incoming polarization —it works for circular and elliptical, as well as linear. It’s
obviously orthogonal if the polarization exiting the fiber is linear (courtesy of the Faraday
rotation) or circular (courtesy of the mirror). Elliptical polarizations have their helicity
inverted by the mirror, and their major axis rotated 90◦ by the Faraday rotator.
The power of this is that the polarization funnies encountered by the beam, that is,
birefringence and optical activity, are all unitary operations, so the incoming and outgoing
polarizations remain orthogonal everywhere, as long as they traverse the same path. That
means that our optical fiber can misbehave as much as it wants, in theory, and as long
as we’ve got a Faraday mirror at the other end, the round-trip light comes out polarized
orthogonally to the incoming light; if we send in vertically polarized light, it comes out
horizontally polarized, no matter how many waves of birefringence it encountered. This
doesn’t work quite as well as we’d like, because the accuracy requirements are very high
and it ignores scattering, multiple reflections, and transients. Nonetheless, we can build
more-or-less polarization-insensitive fiber interferometers this way.
A slightly more subtle benefit is that the propagation phase is polarization insensitive.
A lossless fiber has two orthogonal eigenmodes. If we decompose any incoming polarization into these modes, we find that the Faraday rotator mirror exchanges the components
in the two eigenmodes, so that the total round-trip phase delay is the sum of the one-way
delays of the eigenmodes. You do have to think about Pancharatnam’s phase, though
(see Section 6.2.4).
Exotic Optical Components
To know a thing well, know its limits. Only when pushed beyond its tolerance will its true
nature be seen.
—Frank Herbert, Dune
The mundane optical tasks of directing and transforming beams are mostly done with
lenses and mirrors, with the occasional polarizer or wave plate thrown in, and this
is enough for cameras and microscopes. They’re the workhorses: steady, dependable,
and reasonably docile, but not very fast. Building advanced instruments is more like
racing. The more exotic components like fibers, gratings, scanners, and modulators are
thoroughbreds—they’re good at what they’re good at, and aren’t much use otherwise.
(Watch your step in this part of the paddock, by the way.) Fibers are left until Chapter 8,
but there are lots of connections between there and here.
A diffraction grating is an optical surface with grooves in it, shaped and spaced so as to
disperse incident polychromatic light into a sharply defined spectrum. There are lots of
variations, but they’re all basically holograms—the grooves reproduce the interference
pattern between an incident monochromatic beam and the desired diffracted beam, and
so form an optical element that transforms the one into the other. Some gratings are
made holographically, and some are ruled mechanically, but the operating principle is
the same.
The most common type is the classical plane grating, a flat rectangular surface with
equally spaced grooves running parallel to one edge. Phase matching at the surface
governs their operation; as with a planar dielectric interface, this condition can be satisfied
over a broad continuous range of incident k vectors.
There are also Bragg gratings, where the grating structure runs throughout some volume, a more complex structure that is important in fiber devices, holograms, acousto-optic
cells, and some diode lasers, as we’ll see later in this chapter.
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
m = −1
Figure 7.1. Plane diffraction grating.
Diffraction Orders
We can make this a bit easier to understand by looking at the special case of Figure 7.1.
A metallic plane grating lies in the xy plane, with G sinusoidal grooves per unit length,
h(x, y) = a sin(2π Gx),
where a λ and G, kx , and ky are small compared to k (i.e., a weak grating of low spatial
frequency). A plane wave exp(ikinc · x) hits the surface and gets its phase modulated by
the changing height of the surface. If the medium is isotropic, linear, and time invariant,
the modulus of k can’t change,† so we’ve got a pure spatial phase modulation, as in
Section 13.3.7. Thus the phase modulation produces mixing products (see Chapter 13)
with wave vectors kDm ,
kDm = ki + mkG ,
where kG = 2π Gx̂ and m = . . . , −1, 0, 1, 2,. . . . Equation (7.2) can also be derived
immediately from the phase matching condition: because the surface is periodic, the fields
have to be invariant (apart from an overall phase) if we shift it by an integer number of
cycles. Only a finite range of m produces propagating waves at the output—only those
whose |kxm | < k; that is,
− k 2 − ky2 − kx
k 2 − ky2 − kx
Although we’ve kept ky in this formula, gratings are nearly always oriented to make
ky as small as possible, so that it enters only quadratically in θd . Since G is wavelength
independent, we can solve (7.2) (in the special case ky = 0) for λ, yielding the grating
equation ‡
sin β − sin α
λm =
† When
we get to acousto-optic modulators, this won’t be true anymore, and frequency shifts will occur.
sometimes see it used with the other sign convention, so that there is a plus sign in u = sin θd − sin θi ;
in any event, specular reflection (m = 0) has u = 0.
‡ You
The illuminated patch on the grating is the same width for both beams, but because
of obliquity, the diffracted beam undergoes an anamorphic magnification in x of
cos β
cos α
In general, the spectrum gets spread out in a cone, but in the ky = 0 case, it gets
spread out in a line, with only a slight curvature due to the inescapable finite range of ky
in real apparatus. If we send broadband light in at θi , we can select a narrow wavelength
band centered on λ by spatial filtering.
The nonlinear relation of θd to θi for a given wavelength means that classical plane
gratings cause aberrations if the light hitting them is not collimated in the x direction.
These aberrations reduce the resolution of the spectrum and cause spectral artifacts, so
we normally allow only a single kx and a limited range of ky .
Example 7.1: Czerny–Turner Monochromator. A monochromator is a narrowband tunable optical filter, based on the Fourier optics of gratings and slits. The classical design
is the Czerny–Turner, shown in Figure 7.3. Polychromatic light from the entrance slit
is (spatially) Fourier transformed by spherical mirror M1, so that every point in the slit
produces its own plane wave at each wavelength. These are then dispersed by the (rotatable) grating and transformed back by M2 to produce a set of images of the entrance slit
on the plane of the exit slit, of which only one is allowed to escape. Rotating the grating
moves the dispersed spectrum across the exit slit, so that a different λ emerges.
The spherical mirrors are used well off-axis, so there is a significant amount of coma
and astigmatism as well as spherical aberration, which must be accounted for in the
design. Note the anamorphic magnification and pupil shift in the figure; normally we use
mirrors that are somewhat larger than the grating to combat this.
In designing monochromators and spectrometers, we have to remember that most of
the light doesn’t make it out the exit slit, but bounces around inside the box, so we need
good baffles. Otherwise this stray light would bounce all over the place inside, and some
of it would eventually escape through the exit slit and contaminate the output spectrum.
There’s no place in Figure 7.2 to put a baffle that won’t obscure the optical path, so real
Czerny–Turners don’t have planar layouts. The mirrors are canted down a bit (into the
Entrance Slit
Exit Slit
Figure 7.2. Czerny–Turner monochromator.
page), which depresses the grating position enough to fit a good baffle in over top, which
helps a lot.
Another thing to watch out for in grating instruments, especially adjustable ones, is
temperature-compensating the slit opening. Carelessness here, such as using brass slits
on steel screws and a long mechanical path, typically leads to throughput changes of
several percent per ◦ C.
The other main problem with spectrometers is that they’re all polarization sensitive.
The p-to-s diffraction efficiency ratio of a transmission grating is massively wavelength
dependent, and mirrors used off-axis can easily contribute several percent polarization
(see Example 5.2).
So far, a grating is a reasonable spatial analogy to a heterodyne mixer (see Section
13.7.1). The analogy can be pressed further, because the grating also has analogues of
LO phase noise (scattered light), LO spurs (periodic errors in the grating phase, giving
rise to ghosts), and spurs due to LO harmonics (multiple orders, leading to overlap). It
starts to break down when the 3D character of the intermodulation starts entering in;
spatial frequency differences can arise from shifts in ω or θi , but the resulting fields are
quite different.
Order Overlap
For any grating and any θi , the function λ(θd ) is multivalued, so that more than one
wavelength will make it through a monochromator at any given setting. Simple grating
spectrometers are limited to a 1-octave range in the first order, as shown in Figure 7.3,
and the limitation gets tighter at higher orders. The best way to reject light that would be
aliased is to use cross-dispersion: just put a second grating or prism at 90◦ to separate out
876 5 4
sinβ - sinα
G = 1200 L/mm
Wavelength (μm)
Figure 7.3. Mth order wavelength as a function of sin θ .
the orders; the second grating’s dispersion limits the allowable slit length. Cross-dispersed
gratings are a good match to 2D detector arrays such as CCDs.
Ghosts and Stray Light
If we consider a grating as a frequency mixer, it isn’t surprising that irregularities in the
fringe spacing get transformed into artifacts in the diffracted spectrum. Small-scale irregularities give rise to a smoothly varying diffuse background of scattered light, which sets
the maximum rejection ratio of a spectrometer. Low frequency variations in the grating
pitch, caused, for example, by diurnal temperature variation in the ruling engine during
a run, produce close-in sidebands on the diffraction spectrum just the way they would
in an RF mixer; these close-in artifacts are called Rowland ghosts. Ghosts occurring
further away are called Lyman ghosts and are even more objectionable, since it’s much
harder to connect them to their true source. As we saw, baffles help the stray light a
lot, but ghosts are more of a problem, since they follow the normal light path through
the exit slit. Both can be dramatically reduced by using another grating and a third slit,
to make a double monochromator , and for some very delicate measurements such as
Raman spectroscopy, people even use triple monochromators. It’s amazing that any light
makes it through all that, but it does if you do it properly, and you get double or triple
the linear dispersion, too.
Classical plane gratings are wonderful at dispersing different wavelengths, but bad at
almost everything else—they cost a lot, aberrate converging beams, treat s and p polarizations differently enough to be a bother but not enough to be useful, require throwing
away most of our light to get high spectral resolution, the list goes on and on. Lots
of different kinds of gratings have been developed to try to deal with some of these
Nearly all gratings sold are replicas, made by casting a thin polymer layer (e.g., epoxy)
between the master grating and a glass blank (using a release agent to make sure it sticks
to the glass and not the master). Reflection gratings (the usual kind) are then metallized.
Aside: Grating Specifications. Since everything about gratings depends on k|| , their
properties are usually specified for use in the Littrow configuration, where kd = −ki
(i.e., the diffracted light retraces its path). This implies that ky = 0, and that there is no
anamorphic magnification of the beam, which simplifies things a lot, but isn’t necessarily
representative of what you should expect far from Littrow.
7.4.1 Reflection and Transmission Gratings
The essential function of a grating is to apply a very precise spatial phase modulation
to an incoming beam. This can be done by reflecting from a corrugated surface, or by
transmission through different thicknesses of material. Transmission gratings have the
advantages of lenses: compact multielement optical systems, lower sensitivity to flatness
errors, and less tendency for the elements to get in one another’s way. On the other hand,
multiple reflections inside the transmission grating structure give rise to strong artifacts,
making them unsuitable for high resolution measurements in general. With the exception
of hologon scanners, you should use reflection gratings for nearly everything.
Reflection gratings are usually supplied with very delicate bare aluminum coatings.
Touching a grating surface will round off the corners of the grooves and ruin the grating,
and anyone who tries to clean one with lens paper will only do it once. Pellicles can
be used to protect gratings from dust. Gold-coated gratings are useful, especially around
800 nm where the efficiency of aluminum is poor.
7.4.2 Ruled Gratings
Ruled gratings have nice triangular grooves, which allow high diffraction efficiency, but
since the process of cutting the grooves produces minute hummocks and irregularities
in the surface, they also have relatively high scatter. The increasing scatter limits ruled
gratings to a practical maximum of 1800 lines/mm.
Ruled gratings can be blazed by tipping the cutting point so that the grooves are
asymmetrical; by the array theorem of Fourier transforms,† a regular grating illuminated
with a plane wave will have an angular spectrum equal to the product of the angular
spectrum of a single grating line (the envelope) times the line spectrum from the I
function. Blazing is a way of making the peak of the envelope coincide with the diffracted
order, by tipping each grating line so that the specular reflection from that line is in the
diffracted direction. The gratings of Figure 7.4(a) and (b) are blazed.
A grating blazed at λB works well from about λB to 1.5λB , but falls off badly at
shorter wavelengths. Like most other grating parameters, the blaze wavelength is quoted
for Littrow incidence.
7.4.3 Holographic Gratings
Holographic gratings are frozen interference fringes, made by shining two really, really
good quality laser beams on a photoresist-coated substrate. The grating period can be
adjusted by changing the angle between the beams, or a transparent substrate can be
suspended in an aerial interference pattern with fringe frequency k, for example, from
a laser bouncing off a mirror, and tipped to change the spatial frequency k|| at the
surface. If the resist layer is thin and is developed after exposure (i.e., unexposed resist
is washed away), a surface grating results. This grating is then transferred to a durable
metal surface by plating it, attaching the plated surface to a stable support (e.g., a glass
blank), and then chemically or mechanically stripping the resist, leaving its image in the
metal. This metal submaster is then used to make replicas.
The grooves in a holographic grating are ideally sinusoidal in shape (although they can
be blazed by ion milling or evaporation of metal from oblique incidence, or by special
lithography techniques). The peak-to-peak phase modulation in the reflected wavefront
is then roughly
φ = 2kZ d,
where d is the peak-to-valley groove height, and shadowing has been neglected.
Best overall efficiency with a sinusoidal modulation is obtained when the specular
order goes to 0, which happens with φ = 4.8 radians (in RF terms, the first Bessel
† We’ll
see it in Example 13.8 and Section 17.4.3 as convolution with a I function.
Metal Substrate
Metal Plating
Evaporated Metal
Epoxy Cast
Glass Substrate
Evaporated Metal
Glass Substrate
Figure 7.4. Diffraction gratings: (a) ruled, (b) replicated, and (c) holographic.
null at a modulation index of 2.405—see Section 13.3.7). Deep gratings are therefore
used for long wavelengths, and shallower ones for short. Holographic gratings have
less small-scale nonuniformity than ruled ones, so they exhibit less scatter and possibly
fewer ghosts. The diffraction efficiency of holographic gratings is less strongly peaked
than blazed ruled gratings, so they may be a better choice for weird uses.
7.4.4 Concave Gratings
From a diffraction point of view, there’s nothing special about a flat surface; since the
grooves embody the interference pattern between the incident and diffracted light, we
can sample the pattern anywhere we like. Concave gratings combine the collimating,
focusing, and diffracting functions in a single element. They are very expensive but are
worth it in the deep UV, where mirror coatings are very poor (R ≈ 0.20–0.35), so extra
bounces cost a lot of photons.
The trade-off is bad aberrations; the focusing mirror is used off-axis, and the diffracted
beam fails to obey the law of reflection on which mirror design is based. Concave mirror
spectrographs are therefore usually highly astigmatic. This is not as bad as it sounds,
since the image of a point in an astigmatic system is a line in the tangential plane (at
the sagittal focus) or in the sagittal plane (at the tangential focus) (see Section 9.4.3).
Providing the slits are at the tangential focus, the errors caused by astigmatism can be
kept small, since the astigmatism’s main effect is then to smear out the image along the
slit. Of course, in a real situation the tangential image is not a perfectly straight line, and
its curvature does degrade the spectral resolution somewhat.
There exist aberration-reduced holographic concave gratings, where the groove spacing
is intentionally made nonuniform in such a way as to nearly cancel the leading aberrations
over some relatively narrow range of θi and λ, which are great if you have the budget
for a custom master, or a catalog item happens to fit your needs.
7.4.5 Echelles
An echelle grating (shown in Figure 7.5) is a coarse-ruled grating used in a high order,
near grazing incidence; the scattering surface used is the short side of the groove instead
Figure 7.5. Echelle grating.
of the long side (like the risers of the stairs, rather than the treads). It is not unusual
for an echelle to be used in the 100th order. Echelles usually have between 30 and 500
lines per millimeter and a really big one (400 mm) can achieve a resolving power of
2W/λ ≈ 106 , a stunning performance. Because of the angular restriction, echelles are
usually used near Littrow.
Problems with echelles include expense and severe grating order overlap. The
expense is due to the difficulty in maintaining precision while ruling the coarse, deep
grooves required, and of course to the low manufacturing volumes; the overlap requires
cross-dispersion or a very well understood line spectrum.
7.5.1 Spectral Selectivity and Slits
The usefulness of grating instruments lies in their selectivity—their ability to treat the
different wavelengths differently. It’s no use dispersing the light at one θi if it instantly
remixes with other components at different θi , different ω, but the same θd (take a
grating outside on a cloudy day, and try seeing colors). Just how we want to treat each
wavelength varies; in a monochromator, we use a spherical mirror to image the angular
spectrum on a slit, in order to select only one component, whereas in a pulse compressor,
we just add a wavelength-dependent delay before recombining.
7.5.2 Angular Dispersion Factor
In computing the resolution of a grating instrument, we need to know the scale factor
between λ and θd , the angular dispersion D:
(sin β − sin α)
= GM sec β =
λ cos β
7.5.3 Diffraction Limit
The wavelength resolution of an ideal grating is limited by size, order, and operating wavelength. A uniformly illuminated square grating of side W has a sinc function
lineshape as a function of k ,†
sinc (v − vi )
E(u, v) = E0m sinc (u − ui − mG)
† The
xy projection of k, that is, (kx , ky ).
(with u = kx /k and v = ky /k as usual, and assuming that k is along x). By smearing
out the angular spectrum, this effect sets a lower limit to the useful slit width in a
spectrometer. As we’ll see in great detail in Section 17.6, this sinc function is a nuisance,
having slowly decaying sidelobes in its transform. (Don’t worry about the sinc functions
due to the slits—the exit slit is at an image of the entrance slit, whereas those sidelobes
are in the pupil.)
In applications needing the cleanest peak shapes, it is sometimes worth apodizing the
incoming light so that its diffraction pattern decays before hitting the edge of the grating.
A carefully aligned linear fiber bundle can do a good job of this, provided the aperture
of the spectrometer is a bit larger than the fibers’.
The equivalent width of this sinc function is sin θd = u = λ/W , or in angular
β =
W cos β
The theoretical resolving power R of a grating is the reciprocal of the
diffraction-limited fractional bandwidth:
(sin β − sin α) W
= mN =
where N is the number of grating lines illuminated. This gives a useful upper limit on R,
Rmax =
with the limit achieved when sin θd = − sin θi = 1 (i.e., coming in at grazing incidence
and coming back along the incident path). The resolving power is a somewhat squishy
number, of course, because the definition of resolution is arbitrary, you never use slits as
narrow as the diffraction limit, the grating is never perfectly flat nor perfectly uniformly
ruled, and the effects of coma are a limiting factor anyway. Typical values of Rmax range
from 104 to as high as 106 for very large UV gratings.
7.5.4 Slit-Limited Resolution
We normally don’t try to achieve the diffraction-limited resolution, because it requires
extremely narrow slits, which are very fiddly and let through very little light. Thus in
most cases, it’s sensible to use ray optics to discuss spectrometer resolution.
A slit of width w used with a mirror of focal length f produces a beam with an angular
range of w/f radians. The grating and the second mirror will reimage the entrance slit on
the exit slit, with an angular magnification of 1/M = cos θi / cos θd . The spatial pattern
that results is the product of the two slit patterns, so the angular smear is the convolution
of that of the two slits, taking account of magnification,
A(β) = rect
f (β − β0 )
M rect
Mf (β − β0 )
which has equivalent width
θdslit = (wexit + went /M)/f.
This can be translated into spectral resolution by dividing by ∂θd /∂λ,
(wexit cos β + went cos α)
λ =
λ slit
sin β − sin α
7.5.5 Étendue
Neglecting diffraction, the étendue n2 A of a grating spectrometer is the product of the
entrance slit area wL and the projected solid angle of the grating as seen from the entrance
slit, which is approximately W 2 /f 2 at normal incidence. The oblique projection of the
grating area goes down by cos θi , and the anamorphic magnification M = cos θd / cos θi
changes the effective size of the exit slit, but those are both effects of order 1, so we’ll
sweep them under the rug and say
A =
wLW 2
which is tiny; if f = 250 mm, a 25 mm grating with a 5 mm × 20 μm slit (4× the
diffraction limit at λ = 500 nm, about typical), n2 A = 10−5 cm2 ·sr. Unfortunately, the
resolution goes as 1/w, so we are faced with a 1:1 trade-off of resolution versus photon
efficiency for a fixed grating size. The diffraction efficiency of the grating isn’t that great,
about 0.8 if we’re lucky, and we lose another 5% or so at each mirror. Furthermore, it
is only at the centre wavelength that the entrance slit is imaged at the exit slit; at other
wavelengths it is offset more and more until the overlap goes to 0, so the total efficiency is
reduced by almost half. A good ballpark figure is that the efficiency of a monochromator
is around 30%.
7.6.1 Order Strengths
In general, accurately predicting the strengths of the various diffraction orders requires
a vector-field calculation using the true boundary conditions, but there are some general
features we can pull out by waving our arms.
Right away, we can see by conservation of energy that there are liable to be sharp
changes in the grating efficiency whenever some m enters or leaves the range (7.3),
whether by changing θi or λ. These are the so-called Wood’s anomalies, which are sharp
peaks and troughs in the diffraction efficiency curves. With some grating profiles, these
anomalies are extremely large and sharp. Figure 7.6 shows the calculated diffraction
efficiency of a ruled plane grating with a 9◦ blaze angle, used in Littrow, which displays
strong polarization sensitivity and Wood’s anomalies.
7.6.2 Polarization Dependence
The diffraction efficiency, of a grating is usually a strong function of polarization, being
highest for light polarized across the grooves (i.e., p-polarized, when ky = 0). This is
intuitively reasonable when we consider the behavior of wire antennas—light polarized
θ = 9º
Figure 7.6. Theoretical diffraction efficiency in Littrow of a 9◦ blazed grating. Note the strong
Wood’s anomalies and polarization dependence. (Courtesy of The Richardson Grating Laboratory.
Note that the Richardson book interchanges p and s.)
along the wires (s) causes strong currents to flow in the wires, which therefore reflect it
(or absorb, if there’s a load attached). The current is not interrupted by a change in the
direction of the surface, because it’s flowing along the grooves; thus there is little light
scattered. On the other hand, light polarized across the grooves (p) causes currents that
have to go over the top of the grooves, leading to strong scatter. By and large, this is
how gratings behave, although there are lots of subtleties and exceptions.
7.6.3 Bragg Gratings
The detailed theory of free-space Bragg gratings is fairly involved because of multiple
scattering, but weak ones with constant fringe spacing can be treated by coupled-mode
theory, since there are only a few orders to worry about.
The main feature of Bragg gratings is that the phase matching condition has to apply
throughout the volume, rather than just at one plane, leading to the Bragg condition,
which for a sinusoidal grating is
kd − ki = ±kG ,
which is known as the Bragg condition.
It’s a very stiff condition, since as Figure 7.7 shows, kG is the base of the isosceles
triangle made up of ki and kd ; for a given wavelength and an infinite grating, there’s
only a single choice of ki that works. This ki selectivity is smeared out by the finite
depth of the grating, just as the angular spectrum is by the finite width of the grating.
Nonetheless, a deep Bragg grating with many fringes is a high-Q filter in k-space.
Bragg gratings can have diffraction efficiencies approaching unity, and in fact since
the diffracted wave meets the Bragg condition for being diffracted back into the incident wave, Bragg diffraction is a coupled-modes problem of the sort we’ll encounter in
Section 8.3.3.
Δ k = + − kG
ki | = | kd
=> Only these k values work
Figure 7.7. Bragg grating.
We noted that all gratings are holograms, in the broad sense of embodying the interference pattern of the incident light with the diffracted light. The class of holograms is of
course much more general than just diffraction gratings. Fresnel zone plates are basically
holograms of lenses, for example, and with the development of computer-generated holograms we can conveniently make more general patterns, for example, holographic null
correctors for testing aspheric optics with spherical reference surfaces, and even beamsplitters producing several diffracted beams in different directions with widely differing
It is somewhat ironic, but a consequence of their very generality, that holographic
elements tend to be application specific. You probably won’t find an off-the-shelf item
that does what you want, so holograms are used mostly in high volume applications such
as bar code scanners.
One exception is the holographic diffuser, whose range of scattering angles can range
from 1◦ to 60◦ . These work poorly with lasers due to speckle but are just the ticket for
situations where an ugly illumination function has to be spread out, for example, LEDs,
fiber bundles, and liquid light pipes.
Holograms function differently depending on the number of distinct phase or amplitude
levels available. Simple holograms have only two levels: zone plates made from annular
holes in chrome masks or from a single layer of photoresist. This presents some problems.
For one thing, a two-level zone plate functions as both a positive and negative lens, since
with only two levels, exp(ik · x) is indistinguishable from exp(−ik · x). Just as adding
bits helps a DAC to mimic an analog signal more accurately, so more levels of phase or
amplitude improves holographic optics. The most popular way to implement this idea is
binary optics: add more levels of lithography, with binary-weighted phase shifts, just as
in a DAC.
7.7.1 Combining Dispersing Elements
It is often convenient to combine two dispersive elements in one system, such that their
angular dispersions add or subtract. The usual rule is that if the diffraction direction tends
to straighten out, the dispersion effects tend to subtract, whereas if the beam is being sent
round and round in a circle, the effects add. This of course applies only for elements of
Small Effect
Additive Effect
Figure 7.8. Cobbling gratings to add or subtract the diffractive effects, while possibly changing
the beam shape anamorphically. Leaving the second grating near grazing multiplies the tuning
sensitivity of both diffractions.
the same sort; gratings bend shorter wavelengths less, whereas prisms bend them more.
A combination grating and prism could have additive effects even with a nearly straight
optical path, like an Amici prism (Section 4.9.3).
Gratings and prisms can be used to change the beam diameter anamorphically (i.e.,
with magnification different in x and y). The gratings in Figure 7.8 first expand and
then contract the beam. The width out of the page is of course unaltered. This anamorphic property is often useful with diode lasers, and Figure 7.8 shows how it can be
accomplished with and without tuning sensitivity of the angular deflection.
We’re all familiar with retroreflecting materials, used for bicycle reflectors, traffic signs,
and safety clothing. They have lots of possibilities in instrument design too, and so
everyone should know something about them. The basic idea is to use large numbers of
small, poor quality retroreflectors, so as to return incident light generally back toward
its source, with an angular spread of 0.5◦ to 5◦ in half-angle and (unlike larger corner
cubes) no significant lateral shift.
There are two main kinds: glass beads and corner cube arrays embossed in plastic (see
Section 4.9.8). A sphere in air makes a pretty good retroreflector if its index is chosen
so that incident light is focused on the opposite surface of the sphere. A ray at height
h, parallel to the axis, has an angle of incidence sin θi = h/R, and to focus on the back
surface, sin θd = h/(2R), so by Snell’s law,
sin θi
= 2,
sin θr
which can just about be done in glass. The angular spread can be adjusted via defocus,
by varying n. To prevent most of the light getting out the other side, aluminum-coated
beads are used, with the aluminum etched off the top surface after the coating is applied.
The embossed plastic stuff has a narrower acceptance angle than the spheres, because
away from the symmetry axis of the corner, more and more of the light fails to make
the three TIR bounces required for retroreflection. It’s also a bit of a puzzle to mount,
because you can’t touch the TIR surface or you’ll deactivate all the little cubes. The
standard solution is to use a quilted foam backing that touches only a few percent of the
area, which makes the retroreflection spatially nonuniform.
The figure of merit for a retroreflective material is its specific brightness, which is
measured in inverse steradians, although it’s quoted as lux per steradian per lux or
other cute units that amount to the same thing: if you put in 1 W/m2 , how much flux
is radiated per steradian at specified angular offsets from exact back-reflection? For a
retroreflector with an RMS beam spread half-angle of θ and photon efficiency η, the
specific brightness is
π(θ )2
For a θ of 0.5◦ , this is around 4000—a factor of 13,000 over a Lambertian ( = π )
reflector, assuming that η = 1. The best real materials are more like 900 or 1000, a factor
of 3000 or so over Lambertian. This is an enormous effect—a strip of tape in the right
place can get you a 70 dB (electrical) signal level increase, which is well worth the labor
and the compromises it takes to use these materials (they’re very speckly when used with
lasers, for example).
The stuff made for use in signs and clothing isn’t that great; the specific brightness
is usually below 300, and 60 is not uncommon, but those numbers still represent big
signal increases in systems where most of the illumination is not collected. There does
exist material that gets into the 103 range, but it isn’t easy to find. You can also buy
just the beads,† made usually from barium titanate glass with an index of 1.9 or a bit
higher. They’re used for spraying onto traffic paint before it dries and may be helpful
for special situations where it’s inconvenient to use the made-up sheet material. Other
considerations include rain—glass bead retroreflector relies on refraction at the air–glass
boundary, and so doesn’t work well when it’s wet. Assuming the TIR surfaces remain
dry, the corner cube stuff is almost unaffected by rain.
The best retroreflective materials in the 3M catalog for instrument use are Scotchlite
2000X Very High Gain Sheeting (corner cubes) and Scotchlite 7610 High Gain Reflective
Sheeting (spheres). Both are available in tape rolls and can really make a difference to
your SNR, especially in a fiber-coupled instrument. The other main manufacturer of this
stuff, Reflexite, also has cube material tailored to produce a nearly fixed offset angle (not
180◦ ).
TIR film will mess up the polarization of your beam, as in Section 4.9.8. Reflexite
makes metallized corner cube material, which reduces this problem. Plastic retroreflector
is naturally of no use in the UV, but the high brightness glass bead stuff is quite good,
as it has no plastic overcoat on top of the spheres.
Scanning systems are one of the thorniest areas of electro-optical instrument building.
The whole point of scanning is to interrogate a huge A volume sequentially with a
low-A system. None of the available methods is as good as we’d like, and the cost
of commercial solutions is enough to make you gasp (how about $4000 for a 25 mm,
two-axis galvo scanner with analog driver cards and no power supply?).
† Suppliers
include Potters Industries and Cataphote.
The difficult design issues, together with the very high cost of playing it safe, make
scanner design worth a considerable amount of attention. The main points to worry about
are scanner and encoder accuracy, rectilinearity, range, speed, jitter, aberration buildup,
field flatness, temperature drift, and power consumption (other than that, it’s easy). We’ll
start with mechanical scanners.
Before we begin, it is useful to have a bit of vocabulary to describe different scanner vices. Nonrepeatable motions, caused, for example, by out-of-round bearings, are
called jitter if they’re in the scan direction and wobble otherwise. Repeatable errors are
scan nonlinearity along the scan direction, and scan curvature out of it. Temperature
drift, which is sometimes very serious, comprises offset (zero) drift and gain drift. Scanning is often combined with signal averaging to reject noise and spurious signals; see
Section 10.9.2.
7.9.1 Galvos
Galvanometer scanners are electric motors that don’t turn very far (±45◦ , maximum) but
can accelerate very fast and move very accurately. They usually run on ball bearings,
but some use flexure bearings. Single- and double-axis galvos are available that provide
12 bit angular accuracy with settling times of several milliseconds. Small ones are much
faster than large ones, because the moment of inertia I goes as mr 2 , which tends to grow
as r 5 . The angular accuracy of the encoders isn’t all that great over temperature, typically
10 arc seconds drift and 0.05% gain/◦ C, with some being much worse. If you really need
that 12 bits, you have to compensate for those in some way. Jitter typically runs 10–20
arc seconds, and wobble more like 5–10. Those are really what limit the resolution. The
torque from any motor tends to go as the volume—due to field and current limitations,
the available force per unit area is approximately constant, so the torque goes as the
surface area times the diameter. The moment of inertia grows faster than that, so big
galvos tend to be slow.
Resonant galvos shrink the mirror and the motor, and add a big torsion spring to
get their restoring force, which enormously reduces their moment of inertia. This makes
them quite a bit faster (500 Hz), but being high-Q resonant systems, they cannot be
controlled within a cycle; only the amplitude and phase of their sinusoidal oscillation can
be changed, and many cycles are required for settling afterwards. Thus resonant galvos
are good fast-scan devices, where they compete with rotating polygons and hologons;
the trade-off is adjustable angular range and sinusoidal scan versus uniform scan speed
through a fixed angular range.
7.9.2 Rotating Scanners
All reciprocating scanners have to slow down and turn around at the end of their travel,
which makes them relatively slow. What’s worse, their varying scan speed makes it
relatively difficult to take data points equally spaced in angle—it requires a continuously
varying clock frequency. This can be implemented with a lookup table in hardware, or
done by resampling the data afterwards (see Section 17.8). Since the dwell time on each
pixel is different, more lookup tables may be needed to take out the resulting gain error
and offset. All these lookups, whose contents depend on unit-to-unit differences such as
the details of loop damping and rotor imbalance, require an onerous calibration procedure
for high accuracy applications.
Nonconstant scan speed is particularly obnoxious when you’re using a continuous
frame scan, since it leads to hooks at the ends of the scan line. Large amounts of overscan
are necessary to combat it. A scanner with good acceleration can make sharper turns,
so less overscan is needed with high torque motors, low moment of inertia (i.e., small
mirrors), and slower scan rates. This is all manageable, but still a constant, fast-scan
speed (constant in m/s or rad/s depending on the application) would be useful for raster
One partial solution is a continuously rotating scanner, such as a polygon mirror or
holographic scanner.
7.9.3 Polygon Scanners
A polygon scanner is a spinning wheel with flat mirror facets on its periphery (Figure 7.9).
These may be oriented normal to the radius vector, so that the wheel is a polygonal cylinder, or angled like a cone. In order to get a straight-line scan with a cylindrical polygon,
the beam has to come in normal to the rotation axis, although other configurations exist
with tilted facets.
With the beam so aligned, rotating a mirror through θ /2 deviates the reflected light
by θ , so an n-facet polygon deflects light through an angular range θ of
θ =
although you can’t use all that range, since at some point the edge of the facet has to
cross your beam, leading to a dead time on each facet. A polygon rotating at constant
speed naturally produces a constant angular velocity (CAV) scan, which is great for
some things (e.g., lidar) but a problem for others (e.g., document scanning, where a
constant linear velocity is much more convenient). Polygons can go very fast; speeds
over 50,000 rpm can be achieved with a small solid beryllium scanner in a partial vacuum,
though not easily or cheaply. The ultimate limit is set by deformation and then failure
of the polygon itself. High end printers ($1M) use polygons with 10–12 facets running
at up to 40,000 rpm on air bearings (with 10 beams, that’s a 70 kHz line rate), but
in more pedestrian applications, keep below 7000 rpm, and remember that below 3000
Figure 7.9. Polygon scanner.
things get much easier. Polygons cause no aberration of the scanned beam. They have a
unidirectional scan pattern, with a retrace interval as the edge between two facets crosses
the beam. Well-made polygons can have a facet-to-facet angular accuracy of a few arc
seconds, which is good enough for most applications. Cheap polygons are closer to an
arc minute, but cost only a few dollars.
7.9.4 Polygon Drawbacks
Polygons have a number of drawbacks. Their moment of inertia is high, so that the
scan rate cannot be changed rapidly. We usually use them to get high speed, so that the
kinetic energy is high, which compounds the speed problem and adds wind resistance,
turbulence, and noise, ultimately forcing us to use a vacuum chamber.
The constant angular velocity of the beam means that it scans a flat surface at a
nonuniform rate, unless we do something about it. A subtler difficulty is that since
the rotation axis does not pass through the surface of the mirror, as it does with a
galvanometer, the scanned beam translates as well as pivoting during a line. Thus a
polygon-scanned system lacks a true pupil. You can get (very expensive) f -θ lenses,
which have just the right amount of built-in barrel distortion to convert a constant angular
velocity scanned beam into a constant linear velocity spot, and a flat field; they’re big
chunks of glass, used at very low aperture (e.g., a “250 mm f /16” lens whose front
element is 90 mm across). It is somewhat galling to have to use a $1200 lens with a $25
The remaining trouble is their very high sensitivity to shaft wobble. A polygon accurate
to a few arc seconds is very much easier to get than a motor whose shaft is straight and
stable to that accuracy, especially at the end of its life. Air bearings are a good choice
for high speed polygons, but they don’t usually work as well at low speed.
7.9.5 Butterfly Scanners
In Section 4.9.4, we encountered the pentaprism, which is a constant deviation 90◦
prism that works by having two reflections; tipping the prism increases one angle while
decreasing the other, making the total constant. The same principle can be applied to
scanning, resulting in the butterfly scanner of Figure 7.10, which is a nearly complete
solution to the shaft-wobble problem; drawbacks are complexity, expense, probably worse
fixed facet-to-facet errors, and much higher air resistance, noise, and turbulence.
7.9.6 Correcting Rasters
Once the shaft wobble has been corrected, the scan is still not perfect. To get really good
rasters from any multisegment device, you really have to have software that knows which
segment you’re on, and dials in the appropriate offsets. While this requires calibration,
it’s not a difficult problem since it’s only line-to-line variation and can be done in the
instrument itself using a vertical bar test pattern. Furthermore, the dimensional stability
of the hologon or polygon means that it can be done once and forgotten about.
In-plane errors cause only timing trouble, which can be eliminated completely.
Out-of-plane errors are more obnoxious visually, causing obvious nonuniformity in
raster line spacing, and are more difficult to handle, requiring an additional deflection
element such as a Bragg cell.
Figure 7.10. The butterfly scanner works on the pentaprism principle.
7.9.7 Descanning
In order to reduce the required n2 A of the detection system (which makes it smaller,
faster, cheaper, and dramatically more resistant to background light), we usually need to
descan the received light. There’s an easy and a hard way to do this.
The easy way is to work in backscatter, and use the same scanner on the transmit and
receive sides of the instrument, for example, to interrogate a sample one point at a time.
If the scanner is at the pupil of the objective lens, light backscattered from the sample
will be recollimated, so that it comes back antiparallel to the transmit beam. The mirror
won’t have had time to move far during the time of flight, so by symmetry, both the
scan angle and any angular jitter are removed, leaving only a bit of Doppler shift.
The hard way is to use a second scanner, synchronized to the first. You have to do this
sometimes, for example, in a long path sensor where you’re sweeping a focused beam
back and forth without steering it, and can’t work in backscatter for some reason. This
really is a miserable way to make a living, so work hard to avoid it; a corner cube or
some retroreflecting tape will often let you off this hook.
Aside: Preobjective and Postobjective Scanning. Before scanning, our beam is
usually collimated, so its NA is very low. It seems a bit perverse to take something like
that, which we can focus very well with inexpensive lenses, scan it through perhaps 45◦ ,
and then try to focus it with a very expensive f -θ lens. Couldn’t we focus first and scan
If the NA is low enough and the working distance long enough, this is a good possibility. The major drawback is that the field flatness and nonuniform scan speed (in m/s
on the sample surface) are uncorrected unless you do something fancy yourself. This
may not matter in your application, in which case this postobjective scan strategy will
work well. Just don’t try it with hologons (see Section 7.9.9). A hybrid scheme, where
the line scan is preobjective and the frame scan is postobjective, is also widely used.
7.9.8 Constant Linear Scan Speed
It isn’t that easy to achieve constant scan speed with a mechanical scanner. Rotating
a mirror at a constant speed θ̇/2 produces a reflected beam whose angular speed θ̇ is
constant; if the scanner is a distance h above a planar surface, the scan position x on the
surface is
x = h tan θ,
which is stretched out at large angles, just like late-afternoon shadows. Reciprocating
scanners such as galvanometers and voice coils slow down and stop at the ends of
their angular travel, so they tend to scan more slowly at the edges; if the beam scans
sinusoidally through ±θpk at radian frequency ω, then the scan speed v is
− θ 2 sec2 θ
v(θ ) = hω θpk
(θ̇ > 0).
The slowdown of θ̇ at large θ approximately compensates the stretching out of x, so
that the resulting curves look like the Chebyshev filters of Section 15.8.3; choosing θpk =
40.6◦ produces a maximally flat response. Table 7.1 shows optimal scan parameters for
an equiripple error from ±0.01% to ±5%: tolerance, linear range θL , peak range θpk , duty
cycle (or scan efficiency), and the corresponding error with a CAV scan of ±θL . Note how
the duty cycle improves as the error band is relaxed, and how much lower the maximum
error is for the galvo versus the polygon, at least when used unaided. Usually we have
to accept more scan speed variation and compensate for it with slightly nonsinusoidal
motion (easiest), nonuniform pixel clock speed, resampling digitally afterwards, or (as
an expensive last resort) an f -θ lens.
If we need to focus the beam on the surface, it’s usually enough to put the galvo at
the pupil of a flat field lens, with due attention to the lens’s geometric distortion.
7.9.9 Hologons
A holographic scanner consists of a spinning surface containing holographic elements
(Figure 7.11). The most common type is the hologon, short for holographic polygon.
A hologon is a spinning glass disc containing radially segmented transmission gratings
(like orange segments), with the grating lines oriented tangentially.
Hologon scanners are best operated near minimum deflection, that is, when the incoming and outgoing beam make equal angles with the surface normal. Small amounts of
wobble in the shaft then cause only second-order angular deviations in the frame direction,
which is an important advantage of holographic scanners over simple polygon mirrors,
though butterfly scanners do even better. Beiser shows that for a scanner producing 90◦
TABLE 7.1.
Approximating a Constant Linear Velocity Scan with a Sinusoidal Galvo
Speed Tolerance (±%)
θL (±◦ )
θpk (±◦ )
Duty Cycle (%)
Constant AV Error (±%)
1.4 Ω
Figure 7.11. A hologon scanner is the diffractive analogue of a polygon. It isn’t as efficient but
has much lower jitter, weight, and wind resistance.
deviation (θi = θo = 45◦ ), a large shaft wobble of 0.1◦ (360 arc sec, or 1.75 mrad) produces a scan wobble in the frame direction of only 1.3 arc sec, an improvement of nearly
600:1 over a polygon, with even better performance for smaller wobbles.
The scan line produced by a hologon scanner is in general curved, because ky can’t
always be 0 when we’re rotating the grating in azimuth. By choosing θi = θd = 45◦ , the
scan can be made almost perfectly straight, which is how they are usually used. The
deflection is easily found by applying phase matching; if the grating is rotated through
an angle φshaft from the center of the facet, the change in kx of the diffracted beam is
equal to that of the grating, so in the middle of the scan line,
which is equal to 2 for the 45◦ –45◦ scanner. The angular scan speed θ̇ is also mildly
nonuniform, being slightly compressed at the ends of the travel (we saw that this is
actually an advantage in scanning flat surfaces). The effect is nowhere near as strong as
with galvos.
As a practical matter, hologons are restricted to collimated beams. A focused
beam used with a collimated-beam hologon produces an astounding amount of
astigmatism—dozens of waves over a 45◦ scan angle, even with an NA of only 0.01.
Since they are holograms, it is possible to make a scanner that focuses as well as deflects
the light. It might seem that the resultant saving of a lens would be very valuable, but
in fact doing this throws away the major advantage of hologons, the insensitivity to
shaft wobble. Resist the temptation to be too fancy here, unless your performance specs
are modest (e.g., in hand-held bar code scanners). One possible exception would be
adding a few waves of optical power to correct for aberrations in a simplified scan lens,
because the wobble effect would then still be small. The angular accuracy of a hologon
can be as good as 10 arc seconds facet to facet, although 30 is easier and hence cheaper.
If the facets are made with slightly different values of G they will deflect the beam
at slightly different angles, so that an N -facet hologon by itself can perform an N line
raster scan, which allows a useful trade-off between scan speed and alignment accuracy.
(Doing this with a polygon would make it dynamically unbalanced.)
The diffraction efficiency of hologons is normally quite good—80–90%, but that isn’t
as good as a properly chosen mirror coating, so you’ll pay a decibel or two in detected
signal for using a hologon.
7.9.10 Fast and Cheap Scanners
If your scan range and accuracy requirements are modest, don’t forget the obvious candidates, for example, mounting a laser diode on a piezoelectric translator or a bimorph,
and letting the collimating lens do the work. Life doesn’t get much easier than that.
7.9.11 Dispersive Scanning
It is also possible to scan over a small range by using a tunable source (e.g., a diode laser)
with a dispersive element, such as the second compound grating device in Figure 7.8.
This is a very good technique for some purposes, because it is extremely fast (∼20
resolvable spots in 3 ns), and although you do have to avoid mode jumps and cope with
power variations, it presents few problems otherwise.
7.9.12 Raster Scanning
Raster scanning requires a 2D scanner or two 1D scanners in cascade. You can get
two-axis voice coil type scanners, which tip a single mirror about two axes; they behave
a bit like galvos but have only a few degrees’ scan range and aren’t as stable or repeatable,
because the mirror mount usually relies on a single wire in the middle for its location,
and the orthogonality of the tilts is not well controlled.
If we need to use two scanners, we must either accept a pupil that moves around
a good deal (several centimeters with most galvos), or use a relay lens to image the
center of one scanner on the center of the other. The usual approach is to use a small
fast scanner first, to do the line scan, and a large, slow one afterwards for the frame
scan, although the relay lens approach allows both scanners to be the same size. The
moving pupil problem is worst with a pure preobjective scan approach, but if you can
put the scan lens between the line and frame scanners, it gets a lot easier; in the ideal
case of pure postobjective scanning, you can use a simple lens for focusing, with perhaps
a weak toric field-flattening lens to correct for the different object distances at different
scan positions.
7.9.13 Mechanical Scanning
Another approach is to keep the optical system very still and move the sample, as is
done in some confocal microscopes and step-and-repeat photolithography tools. This is
slow and prone to mechanical jitter, due to the requirement to start and stop the motion
of a large mass quickly, and to the instabilities of translation stages. On the other hand,
your point-spread function is really constant with position, and there is no limit on the
number of resolvable spots. Another mechanical scanning method is to rotate or translate
the entire optical system, as in a periscope or an astronomical telescope, which scans
slowly to correct for the Earth’s rotation.
Diode lasers are unique among optical sources in being highly directional and easily modulated at high speed. Unfortunately, most sources of interest are not like that, so we need
external modulators. Under this rubric lie a fairly motley collection of out-of-the-way
physical effects, all of which have serious drawbacks, not all widely known. Modulators
in general are troublesome devices if you need nice pure beams with uniform polarization,
no etalon fringes, and smooth amplitude profiles.
Most modulators are based on bilinear interactions† between two fields in some material, for example, the electro-optic effect, where the applied electrostatic field causes the
dielectric tensor to change, or the acousto-optic effect, where the optical wave diffracts
from a sinusoidal phase grating produced by the traveling acoustic wave.
7.10.1 Pockels and Kerr Cells
Optical modulators based on the linear (Pockels) or quadratic (Kerr ‡ ) electro-optic effects
are shown in Figure 7.12. These are among the fastest modulators of all, but they are a
genuine pain to use. Think of them as voltage-controlled wave plates.
KTP Crystal
(a) Longitudinal Pockels Cell
(b) Transverse Pockels Cell
(c) Kerr Cell
CS 2
Figure 7.12. Pockels and Kerr cells.
bilinear interaction is one that is linear in each of two independent variables; that is, it is expressible as
f (x, u) = g(x)h(u). An electronically variable attenuator is an example of a bilinear device if its gain is linear
in its control voltage.
‡ The electro-optic Kerr effect is not the same as the magneto-optic Kerr effect, which leads to polarization
rotation in linearly polarized light reflected from a magnetized medium.
Kerr cells are based on a quadratic electro-optic effect in isotropic materials (usually nasty inflammable organic liquids such as carbon disulfide or nitrobenzene). Their
quadratic characteristic makes them harder to use in applications needing linearity, of
course, but since the static birefringence is 0, they are pretty predictable. Kerr cells are
excited transversely by dunking capacitor plates into the liquid cell. They are normally
used singly, with bias voltages around 10 kV. The organic liquids are strongly absorbing
in the UV, so Kerr cells are generally limited to the visible and near IR. Getting decent
uniformity requires limiting the fringing fields, which (as we’ll see in Section 16.2.5)
means making the plate dimensions several times their separation.
The variable retardation of electro-optic cells can be used to modulate the polarization
and phase of a beam, as shown in Figure 7.13. Pockels cells are built from crystals such
as potassium dihydrogen phosphate (KDP) or lithium niobate, nasty birefringent things
whose dielectric tensor depends on the applied E. The dependence of is generally
complicated; a given applied E can change all the coefficients of . Since the material
is already anisotropic, the leading-order change is linear in applied field.
Pockels and Kerr cells are usually used as amplitude modulators, by following the
polarization modulator with an analyzer. They can also be used as phase modulators, by
aligning the polarization of the beam with one of the crystal axes, so that the polarization
remains constant but n varies. It’s hard to get this really right in a multielement cell,
(a) Polarization Modulator
(b) Phase Modulator
(c) Amplitude Modulator
Figure 7.13. E-O modulators: (a) polarization, (b) phase, and (c) amplitude.
because the optic axes of the elements may not be aligned well enough. Fancier things
can be done, for example, frequency shifters made by putting a rotating E field on
a crystal between crossed circular polarizers, but they tend to be rare compared with
phase, amplitude, and polarization modulation applications.
There are two main designs for Pockels cells. Because the applied E changes more
than one element of the dielectric tensor, the field can be applied longitudinally (parallel
to k) or transversely. Longitudinal cells usually have a ring electrode around the outside
of the face, which produces some polarization nonuniformity that limits their ultimate
extinction ratios to a few hundred, even with long thin cells. Transparent electrodes such
as indium–tin oxide are also used, but they’re pretty lossy, and not conductive enough
for the highest speed applications; for really fast stuff, the champion electrode is a low
pressure neon plasma, which improves the extinction to a few thousand, even while
improving the étendue.†
Transverse cells are simply metallized. The trade-off between the two styles is in
étendue and capacitance versus voltage; a longitudinal Pockels cell has a reasonably
large étendue (especially the ITO and neon plasma types) but requires high voltage,
whereas a transverse one has high capacitance and (because it is long and narrow) very
low étendue.
Since the effect is linear, the number of waves of birefringence in a longitudinal cell
is proportional to the line integral of E·ds along the ray path, that is, to the voltage
drop across the crystal. Since going from off to fully on requires a change in retardation
of one-half wave (why?), the figure of merit for a given electro-optic material is the
half-wave voltage Vπ , which—most inconveniently—is usually several thousand volts.‡
Because both polarizations get phase shifted in the same direction, the retardation is less
than the phase modulation, so phase modulators can work at somewhat lower voltages.
The Pockels effect is fast; a properly designed transverse cell such as a 40 Gb/s
telecom modulator can switch in 10 ps, and the intrinsic speed is even higher. The
problem in bulk optics is that with a longitudinal cell whose half-wave voltage is (say)
4 kV, to get a 1 ns edge we have to make the bias voltage change at a rate of 4,000,000
V/μs, which is an interesting challenge—if the 50 connecting cable is 10 cm long,
it’ll take 80 A for 1 ns just to charge the cable. You can do that with a spark gap or a
series string of avalanche transistors, but only just; the usual method is a big thyratron
producing a 10 ns edge, running into a ferrite-loaded coaxial pulse forming network.§
All these are limited to shuttering applications since you can’t stop an avalanche once
it starts. (These techniques are discussed in Section 15.14.1.) Accordingly, people have
worked out ways to ease the voltage requirement. The main way is to take many thin
plates of electro-optic material dunked in index oil and drive them in parallel as in an
interdigitated capacitor, as shown in Figure 7.12. You can get these down into sub-400
V territory, which is a lot easier although still not trivial.
The optical path in a Pockels cell contains a great big chunk of birefringent material, so
it has a huge static retardation (many waves), and thus has a few percent nonuniformity
† Pockels
cell people enjoy suffering almost as much as femtosecond dye laser people used to.
is an example of the extreme linearity of optical materials—even if we get to pick an arbitrarily horrible
material, and orient it for maximum effect, we still need thousands of volts to make an order-unity difference
in the transmitted field.
§ Saturation in the ferrite causes the back end of the pulse to move faster than the front end, which leads to
shock formation, like the breaking of an ocean wave on a beach. You can get below 100 ps rise time for a
20 kV pulse in 50 , but you have to really want to, and the EMI problems are, ahem, interesting.
‡ This
of retardation across its face, fairly poor wavefront fidelity, and a significant amount
of temperature drift. Many of the crystals used for Pockels cells are piezoelectric, and
so they may exhibit low frequency resonances; those made with biaxial crystals have
particularly nasty temperature dependence, since (unlike uniaxial crystals) the optic axes
can move around with temperature. For a device with as much retardation as a Pockels
cell, this can be a serious drawback. Low voltage devices have lots of etalon fringes too.
For high accuracy applications, longitudinal Pockels cells need a lot of babysitting, and
that confines them pretty much to lab applications.
7.10.2 House-Trained Pockels Cells: Resonant and Longitudinal
Many applications of modulators are relatively narrowband, so that we can stick the cell
in an electrical resonator to reduce the required voltage by a factor of Q. Cells like that
are available from 100 kHz up past 30 GHz, with operating voltages of 6–30 V.
Transverse cells tend to be long and thin because we win by a factor of L/d, which
allows operating voltages down to the tens-of-volts range, a much better match to ordinary
circuitry. This leads to higher capacitance, but since the electrical power goes as CV 2 ,
we win anyway, and traveling-wave structures can be used to make the device look
resistive when that becomes a problem.† The most serious limitation is their very small
étendue. Even for light going right down the axis, the beam often has to be so small
in diameter that diffraction limits the length of the cell. This is an especially serious
limitation in the infrared, where diffraction is worse and more retardation is required
to get a half-wave shift. Transverse modulators are most commonly found in integrated
optics and fiber-coupled applications, where they are entirely practical; a single-mode
waveguide made of electro-optic material needs very little étendue, the field confinement
of the waveguide avoids any length limitation due to diffraction, and nonuniformity
causes fewer problems since only one mode is involved. The really fast traveling-wave
integrated-optic Pockels cells used for telecom usually need about 100–200 mW of RF
power and have rise times as short as 12 ps or so. Telecom modulators are usually zero
chirp, that is, they produce little or no phase modulation, which otherwise shows up as
spurious FM sidebands. Chirp is one of the main limitations of directly modulated diode
lasers, so this matters. If you really need fast beam modulation, consider using one of
these and expanding the beam later.
7.10.3 Liquid Crystal
Another class of electro-optic devices is based on liquid crystals (LCs). These odd materials are liquids made of large molecules, which show some large scale orientation effects
even in the liquid state. The physics of liquid crystals is very rich (read complicated).
A very slightly grooved surface (e.g., glass that has been wiped in one direction with a
cloth pad) can determine the orientation for fixed applications such as wave plates; an
applied voltage can change their alignment, which changes the resulting birefringence.
Because they rely on the physical motion of molecules, rather than electrons, all liquid
crystal modulators are slow (1 μs to 1 ms). You use them like big, slow, low voltage
Pockels cells, to modulate polarization, phase, or amplitude.
† Think
of coaxial cable, which is 100 pF/m, but can handle gigahertz signals over many meters because of its
traveling-wave character.
They come in two basic types: the extremely slow, continuously variable nematic
ones, and the somewhat-faster, binary ferroelectric ones. One of the best things about
LC devices is their huge étendue; you can get 100 mm diameters with ≈ 0.5 sr. They
are also nearly indestructible—their damage thresholds are so high they’re not easy to
measure.† Being liquids, they make intimate contact with the electrodes; because their
birefringence is so high, they can be very thin. This makes it easy to build spatially
addressable LC spatial light modulators (SLMs). Besides the familiar LCD displays,
SLMs are used to make shutters, masks, and low resolution computer-generated holograms, among other things.
Example 7.2: Phase Flopping Interferometers. One especially good use of LC modulators is in making zero-background imaging measurements by inverting the phase of
the signal but not the background, and frame subtracting. For example, many years ago a
colleague of the author’s, T. G. Van Kessel, built a very successful Nomarski interference
system for measuring the latent image in photoresist. (The image is latent between exposure and development.) It was a normal Nomarski-type metallurgical microscope (see
Example 10.2) with the addition of an early liquid crystal variable wave plate before the
analyzer, oriented parallel to the Nomarski axis (45◦ to the analyzer axis). Changing the
retardation from 0 to λ/2 on alternate video frames caused a π relative phase shift in
the combined beams; this inverted the Nomarski contrast but preserved the background
amplitude. Under frame subtraction, the weak phase contrast signals added and the strong
background canceled out, making an electronically zero background measurement (see
Section 10.8).
7.10.4 Acousto-optic Cells
The most common Bragg grating in bulk optics is the acousto-optic Bragg cell. We
encounter the piezo-optic effect in Section 8.5.6, where it causes stress birefringence
in lenses and prisms. Launching a strong acoustic plane wave in a material with a big
piezo-optic coefficient makes a moving Bragg grating. Typical frequencies are 40 MHz to
2 GHz, which produce acoustic wavelengths of 2–100 μm. If the interaction zone is too
skinny, phase matching perpendicular to kA is no longer a constraint, so we get many
weak diffraction orders, spaced at multiples of kA . This is the Raman–Nath regime,
shown in Figure 7.14.
That grating has some unique properties: the diffracted light gets frequency-shifted by
±fA depending on which direction it was diffracted. Also, the diffraction efficiency can
be smoothly varied from 0% to 80% or so merely by changing the RF power from 0 to
a watt or two (and more than that for some materials, e.g., glass).
The phase matching condition can be modified (and sometimes considerably relaxed)
by using a birefringent material. By a suitable choice of cut, the change in kdiff with
incidence angle or grating period can be compensated by that due to the change of n.
This trick is used all the time in acousto-optic devices.
Acoustic waves in solids are tensor waves, which include scalar (longitudinal) and
vector (transverse) waves, but more general shear waves can propagate too. Predicting
the effects of a given order and type of diffraction can be done by classical field theory,
but it is far easier and less blunder-prone to take a simplistic quantum view. We know
† That
doesn’t apply to the film polarizers on LC shutters, however.
+ 3k A
+ 2k A
Δk = k A
−k A
Figure 7.14. Acousto-optic cells: Raman–Nath and Bragg regimes.
that a photon has energy, momentum, and angular momentum; well, so do phonons,
and they are all conserved during the interaction, on a quantum-by-quantum basis. A
photon that absorbs a phonon (the + or anti-Stokes branch) gets its frequency upshifted
(E = ω), and is bent along the acoustic propagation direction (p = k)—the energies
and momenta add. If instead it emits one (by stimulated emission, the − or Stokes
branch), it’s downshifted and bent away from kacoustic . Similarly, conservation of angular
momentum means that a linearly polarized photon that absorbs or emits a shear phonon
has its polarization shifted—it goes from s to p or p to s. A second-order diffraction
gets twice the shift in each, because it involves emitting or absorbing two phonons, and
so on.
Acousto-optic cells are usually used with laser beams, because their aperture is so
small; the major use is as medium-speed amplitude modulators (DC to 10 MHz easily,
DC to 100 MHz if you really work at it—focused beams, fast materials, small crystals).
One exception is the acousto-optic tunable filter (AOTF), which achieves a wide field
angle (and hence a decent étendue) by noncritical phase matching, where the curve of
λ versus the phase-matched θi has a maximum, so the phase matching error is quadratic
in θi . These are available in both collinear and noncollinear designs. Narrowness of
the passband requires more waves of interaction zone, as we saw, so the angular acceptance goes down as the selectivity goes up; a typical cell with ν/ν = 0.2% works
over ±5◦ ( = 0.023 sr), a much wider range than a grating instrument with the same
You can image through AOTFs, but it doesn’t work very well unless you’re careful.
The images are corrupted by ghosts and prism aberrations, and the ultimate spectral
selectivity is limited by a high background light level. Both of these problems are caused
by sinc function sidelobes due to a finite interaction length and the random phase matching
of all the light and acoustic energy bouncing around inside the crystal. Putting two in a
row is an effective but pricey solution to the ghost problem, and putting the AOTF at an
image instead of a pupil pretty well solves the aberration problem too.†
7.10.5 AO Deflectors
The same crystal-cutting tricks allow a reasonable range of output angles (±2–3◦ ) for
a 1-octave frequency shift, making a narrow-range but very fast scanner with high
† Dennis
R. Suhre et al., Telecentric confocal optics for aberration correction of acousto-optical tunable filters.
Appl. Optics 43(6), 1255– 1260 (February 20, 2004).
diffraction efficiency, the acousto-optic deflector (AOD). In this case, we want θi to
be constant over a wide range of acoustic frequency, a condition called tangential phase
matching that can be met in TeO2 .
There are interesting tricks you can do with AODs. If you gently focus a beam at the
center of an AOD, it will pivot about its focus with f . Recollimating produces a beam
that moves from side to side without steering, which is very useful for long path sensors
such as the extinction system of Section 10.8.3. Galvos work for this too, of course, but
since an AOD has no moving parts, you can get a much higher scan frequency, with
zero jitter and wobble.
Getting high resolution out of an AOD requires lots of fringes, just like any other
Bragg grating; the number of resolvable spots is equal to the transit time–bandwidth
product. Like Rayleigh and Sparrow resolution criteria, there’s a factor of order 1 in
front that we can argue about, depending on your beam profile and how much overlap
you allow between distinct spots.
Aside: Acoustic Phase Delay. Bragg cells are often used in heterodyne interferometers, and it is sometimes important to remember that the acoustic propagation delay
translates into a huge phase shift. This acoustic phase shift gets impressed on the optical
phase, and so appears in the phase data. It is often far from negligible; if you’re using two
passes through an 80 MHz cell that 200λ delay has a phase sensitivity of 31 rad/MHz.
This is a nuisance in wide-range AOD measurements, or where it makes the oscillator
spurs show up in the data, but in fixed-frequency applications it can be useful—you
can stabilize the operating point of your system by using feedback to adjust the acoustic
frequency. Shear-wave TeO2 devices are the best overall in the visible. Optics people
used to birefringent materials with (δn)/n of a percent or less are usually surprised that
the slow shear wave in TeO2 goes at 600 m/s while the longitudinal wave goes at 4200.
A really big TeO2 cell can have 1000 resolvable spots in a single pass, though several
hundred is more typical.
While they’re the best of the fast bulk-optics modulators, AO cells have some major
drawbacks. Cells with small Bragg angles (longitudinal devices at lowish frequency) have
severe etalon fringes. AODs are less prone to these, because of the high angles, high
diffraction efficiency, and polarization shift. There is also usually some beam apodization
due to acoustic nonuniformity, and ripples in the diffraction efficiency in space and
frequency due to acoustic standing waves. The standing wave effect is eliminated in
shear wave devices by cutting the bottom of the cell at 5◦ ; because of the huge v, this
totally destroys the phase matching between the reflected acoustic wave and the light.
Some people say that AO cells have poor wavefront fidelity, but the author has never
had a moment’s problem with it. Scanning rapidly does introduce aberrations however.
It takes some time for the wave to cross the beam, so a frequency ramp produces
astigmatism by sending rays at different x in different directions; a frequency staircase
with time allowed for settling avoids this problem. The polarization eigenstates of a shear
wave cell are also very slightly elliptical, which one occasionally needs to remember.
Overall, a slow shear wave AOD is a pretty trouble-free device.
7.10.6 Photoelastic Modulators
Besides a change in refractive index, the acousto-optic effect also induces stress birefringence in the crystal. Normally we don’t worry too much about this, especially with TeO2
devices, where the incoming beam has to be polarized in the right direction to get good
performance. Photoelastic modulators are acousto-optic devices that exploit this effect.
The basic idea is to run an AO cell at a very low frequency, 20–100 kHz, so that the
acoustic transducer basically just shakes the entire crystal back and forth, and tune the
frequency to the lowest order longitudinal vibration mode of the crystal—just like an
organ pipe. The acousto-optic effect leads to a more or less uniform phase modulation,
but the stress birefringence (the photoelastic effect) basically turns the crystal into an
oscillating wave plate, whose retardation can reach ± 12 wave. Photoelastic modulators
thus act a bit like acoustic Pockels cells, only much slower. Their big advantage is greater
uniformity of the birefringence across their field.
7.10.7 Acousto-optic Laser Isolators
The acousto-optic effect is symmetrical, so reflected light propagating backwards along
the beam will be diffracted back into the laser. The returned first-order beam enters the
cell at f ± fA from its first pass, and winds up at f ± 2fA , because the sign of the optical
k vector has changed. The frequency shift is so small that the difference in deflection
angle is negligible, and the light goes straight back into the laser.
The laser is a Fabry–Perot resonator, though, and provided facoustic has been properly
chosen, and the cavity finesse is high (as in gas lasers), virtually none of that light will
make it back into the laser cavity to cause feedback problems. Even with diode lasers,
where the finesse is low, and a lot of the light does make it into the cavity, the beat
frequency 2fA is so much higher than the ∼100 kHz of mode hops that its effect is much
reduced. (Why doesn’t this work with the zero-order beam?)
Fiber Optics
All mankind in our age have split up into units, they all keep apart, each in his own groove;
each one holds aloof, hides himself, hides what he has from the rest, and he ends by being
repelled by others and repelling them.
—Elder Zosima, in The Brothers Karamazov by Fyodor Dostoevsky
We all know that fiber optics and lasers have revolutionized long-haul and high bandwidth
terrestrial communications. Fibers are unequaled at that job, there’s no doubt about it.
They’re also good for instruments, though the situation there is less clear.
Fiber optics really constitutes a different domain, a bit like k-space, where different
things are easier or harder than in bulk optics. There is some new physics available in
fibers, and a lot of old physics under a new aspect and in new operating regimes. The
trade-offs are different enough that they’re worth a considerable amount of study, so
we’ll concentrate on the characteristics of fibers in instruments. There are a lot of books
available on fiber communications and fiber sensors, but not a lot on why to choose fiber
or bulk for a particular application.
The thing to remember about using fiber is that it’s far harder than it looks. It’s
seductive; a fiber sensor can easily be made to work at some level, which makes it easy
to suppose that making that sensor robust enough for field use is straightforward—and
it isn’t straightforward at all.
An optical fiber is a thin cylinder of doped fused silica, with nonuniform n near the center,
covered with a protective plastic or metal jacket. It works as a very low loss dielectric
waveguide. The most common type is step-index fiber, where a central core has a uniform
index n1 , which abruptly changes to n2 where the annular cladding begins. The cladding
index is also constant out to the edge of the jacket. Most fibers use a germania (GeO2 )
doped silica core and a pure silica cladding. (See Table 8.1.)
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
TABLE 8.1.
Fiber Parameters
Core radius
Core index
Cladding index
Normalized index difference
Mode volume
Fracture strength
NA (multimode)
2.5–5 μm (single mode)
25–900 μm (multimode)
n21 − n22
V = an1 k0 2
NA =
n2 A
n21 − n22 = n1 2
0.004–0.04 (single mode
usually small)
1.8–2.4 (single mode)
0.01–4 GPa
10−8 to 10−5 cm2 ·sr
Fiber Virtues
Fibers have lots of virtues, even besides those that all optical components share, such as
wide temporal bandwidths.
Cheapness. It’s nearly free (10 cents per meter in large quantity for single-mode communications fiber), and components are getting cheaper. Fiber connectors are amazingly
cheap for such precise artifacts; they’re only a few dollars, as long as you’re using 125 μm
OD fiber. You can thus bolt stuff together with fibers very quickly, which in principle
would allow checking out a measurement idea faster than with bulk optics, where everything has to be aligned manually. And components such as directional couplers are mass
produced, which makes them far cheaper than previously.
By putting the optical complexity at the far end, the business end of a fiber sensor can
be made very small and cheap, even cheap enough to be disposable, as in fiber catheters,
or sacrificial, as in fiber strain gauges cast into concrete structures or shrapnel velocity
sensors for ordnance development.
Good Transmission Properties. For long distance transmission, fibers are tops, as
already mentioned. They have very low loss (0.2–10 dB/km), exhibit no crosstalk, and
transmit over a reasonably wide wavelength range—250–2000 nm at least, and special
fibers go somewhat beyond both extremes.
EMI Immunity. Not being inductive or capacitive at ordinary frequencies, fibers are
totally immune to electromagnetic interference (EMI), ground loops, and pickup; you
can send a very weak signal a long way in a hostile environment. This is a key virtue
for distributed sensing and smart structures.
Versatility. Most things that you can do with bulk optics, you can do with fiber, at least
in principle. It provides some altogether new capabilities and more convenient ways of
doing the same things, or is lower in cost because of economies of scale (driven by fiber
transmission), simplified mounting, and reduced size and mass.
Ease of Use. Fibers are mechanically flexible and can bend light around in a pretzel
shape if you want. They couple well to many kinds of lasers. Once you’ve got the light
going in and out the ends, no more alignment is necessary. In connectorized systems,
the alignment is done by the manufacturer of the connector. Convenient collimators are
available inexpensively.
Environmental Robustness. Fibers, being made almost entirely of quartz, survive
high and low temperatures extremely well. When properly jacketed, they are immune
to dirt and most fluids, too, though they become much more vulnerable to breakage
when wet.
New Physics. The long interaction lengths possible with fiber allow us to base good
measurements on intrinsically very weak effects, such as the Faraday effect in silica.
We can make all-fiber sensors of completely new types, for example, multiplexed or
distributed sensors like OTDR and fiber Bragg grating types.
Ideal Properties of Fiber
Besides these comparative advantages, fiber systems have a few perfect characteristics,
which can be used to good effect.
Pointing Stability. Single-mode fibers have essentially perfect pointing stability,
because with due care, we can ensure that really only one mode reaches the end facet.
The mode pattern depends only on wavelength and material properties, so it is extremely
stable, merely changing very slightly in diameter and NA with temperature, due to
∂n/∂T (see Example 9.7).
There-and-Back Phase Symmetry. There is a broad class of cases where the
one-way phase delay is exactly the same in both directions in a fiber. The mode structure
doesn’t change when we replace kz with −kz , so the propagation velocity of a single
mode is unaffected (Figure 8.1). In the face of mode coupling, we have to be a bit
more careful. For the mode coupling to undo itself, the fiber has to be lossless and the
Unstable Elliptical
SM Fiber
45° Faraday
Polarization Stays
Huge Birefringence
Rotated 90°
Opposite Helicity
Polarization Messed Up
Cleaned Up Automatically
Figure 8.1. There-and-back polarization stability with Faraday rotator mirrors.
polarization has to come back in complex conjugated (i.e., with the same major axis and
helicity) from the way it went out (see Section 8.3.3 below).
Scatter will degrade the matching stochastically, and there are also three effects that
modify it deterministically: Pancharatnam’s topological phase (see Section 6.2.4) in nonplanar systems, the Sagnac effect in accelerated or rotating ones, and the Faraday effect
in magnetically excited ones.
Transient effects will of course arrive at different times in the two directions, depending
on where in the fiber they occur. Although silica is a very linear material, in a long fiber
path you may see nonlinear effects, for example, the optical Kerr effect that causes phase
offsets in Sagnac interferometers.
Two-Pass Polarization Stability. Polarization has two degrees of freedom per mode.
Light in a lossless single-mode fiber can thus be resolved into two orthogonal polarization
states, and given any one state φ1 , we can always construct a φ2 that is orthogonal to it.
Light launched into the fiber gets its polarization modified wholesale, due to bend
birefringence, topological phase, twist-induced optical activity, and other effects, but
throughout the process, φ1 and φ2 remain orthogonal, and so maintain their identity if
not their appearance. This is a consequence of the unitarity of the Jones matrix of a
lossless polarization element (see Section 6.10.2); an infinitesimal length of fiber can be
represented by a differential Jones matrix, and the whole fiber’s Jones matrix follows by
passing to the limit of many segments of very small length.
A Faraday rotator mirror (or FRM, see Section 6.10.11) turns any monochromatic
polarization state into its orthogonal partner, which retraces its path through the fiber but
remains orthogonal to the first pass. Thus the light reemerging from the input facet has a
stable polarization, which is orthogonal to the input light, and so can be separated from
it with a polarizing prism. Of course, really high polarization selectivity (e.g., 50 dB
optical) requires unfeasibly accurate fiber couplers, end faces, and FRMs.
Fiber Vices
Fiber has its own peculiar vices as well, some of which are crippling, and few of which
are obvious. Most stem from having a too accurately aligned system with a long path
length in glass that’s hard to get light into. Attentive readers may note a certain similarity
between the lists of virtues and vices—that’s where our discussion of fibers in instruments
Fiber Isn’t So Cheap in Real Life. That 10 cents per meter pricing requires you to
buy many kilometers of fiber, and it only applies to garden-variety single-mode telecom
fiber—not PM or multimode fiber. Fiber components and connectors are also not cheap
except at 1310 and 1550 nm and 125 μm OD.
Mechanical Fragility. It’s made of glass, after all, and is vulnerable to snagging,
kinking, and abrasion. Bending also causes loss.
Really Tiny Étendue. It’s so low that fiber doesn’t couple well to anything but
focused laser beams. The same effect is responsible for fiber’s horrible etalon fringe
problem—there’s nowhere for reflections to go except exactly back the way they came.
(Angle polishing helps a lot, but not enough.) Its power handling capacity is limited, too.
Mode Instability. Multimode fiber, especially if it has only a few modes, exhibits large
amounts of noise and drift in its illumination pattern due to mode coupling.
Etalon Fringes and FM–AM Conversion. Every optical system is an interferometer,
and fiber optic systems are several interferometers at once, each one demodulating phase
noise like a delay line discriminator. The too accurate alignment leads to severe etalon
fringes and laser feedback and (as we saw in Section 2.5.3) removes the spatial averaging
that helps so much to eliminate the FM–AM conversion from interference with delayed
Besides the obvious facet reflections, there are distributed effects that are often even
larger; multiple Rayleigh scattering within a long fiber causes serious FM–AM conversion too (see Section 13.5.6). This sounds like a trivial effect, but unfortunately it
is not; it is usually by far the dominant broadband AM noise source in a single-mode
fiber sensor when the fiber length is a significant fraction of both a coherence length
and an absorption length.† The physics is similar to the coherence fluctuation noise of
Sections 2.5.3 and 19.1.1.
Phase and Polarization Instability. The long path in glass makes fibers vulnerable
to piezo-optic shifts, temperature changes, vibration, and bending, so that every fiber
device has serious instabilities of phase and polarization with time.
Sensitivity to Everything Except EMI. Use of fibers for DC measurements is
extremely difficult, because apart from pure intensity instruments, every fiber sensor
is also a thermometer. For example, fiber strain sensors are plagued by temperature sensitivity, because 1 ◦ C of temperature change gives the same signal as a fairly large strain
(10 μ), and separating the two effects well enough to get high sensitivity thus requires
very accurate, independent temperature measurements, and an accurate knowledge of the
response of the system to temperature alone.
In order to be able to use the good properties of fiber, and avoid being blown up by the
bad ones, we need to spend some time on their theoretical properties.
A step-index fiber can be modeled as a right-circular cylinder of radius a and index
n1 , surrounded by a practically infinite thickness of n2 . Most of us probably remember
how to solve the wave equation in a uniform medium with cylindrically symmetric
boundary conditions: the Laplacian operator separates, so we get a one-dimensional
wave equation in the axial direction, Bessel’s equation in the radial coordinate, and an
eigenvalue equation for the azimuthal coordinate, leading to solutions of the form
ψm(1) (r, φ, z) = Jm (κr)ei(mφ+kz z) ,
ψm(2) (r, φ, z) = Nm (κr)ei(mφ+kz z) ,
† A.
Yariv, et al., IEEE Trans. Lightwave Technol . 10(7), 978– 981 (1992).
where Jn and Nn are the Bessel functions of the first and second kind, respectively, and
kz2 + κ 2 = k 2 = n21 k02 .
From experience, we expect to apply boundary conditions and get a denumerable set
of modes for any given k. For a finite radius, only a finite number can propagate, as
For step-index fibers, we have two dielectrics, so we have to patch the fields together
at the boundary, using the phase matching condition in z (because of translational invariance) and continuity in r and φ of tangential E and H and of perpendicular D and B,
leading to an eigenvalue equation for the allowed values of κ. This requires a significant amount of bookkeeping and results in four families of modes—TE , TM , EH , and
HE . For large diameters and small index differences, these can be combined into nearly
transverse electromagnetic (TEM) forms, and the scalar approximation is valid; thus we
use the scalar patching conditions, that is, phase matching in z and continuity of ψ and
∂ψ/∂r. With a little bit of hand work, this yields the linearly polarized LP modes. While
these solutions are approximate, it’s much better in practical work to have an intuitive
approximation than an unintelligible exact result.
We normally confine our attention to those modes that are guided, so that the field
decays with r outside the core of the fiber. Such modes occur when kz is large enough
so that (8.2) forces κ to be imaginary in the cladding while being real in the core, which
occurs when
n2 k0 < kz < n1 k0 .
When κ is imaginary, the solutions are more conveniently written in terms of the
modified Bessel functions,
ψm(3) (r, φ, z) = Im (γ r)ei(mφ+kz z) ,
ψm(4) (r, φ, z) = Km (γ r)ei(mφ+kz z) ,
where kz2 − γ 2 = n22 k02 . These functions are both monotonic: Im (r) grows exponentially
and is regular at 0, while Km (r) decays exponentially but has a pole at 0. Because the
cladding is very thick and doesn’t extend to 0, only the Km (γ r) solution is physical,
and the patching conditions lead to an eigenvalue equation for κ, which sets the allowed
This may seem a bit recherché, but it has a very important consequence: because
we have to patch Jm and d/dr(Jm ) together with a monotonically decaying function,
|Jm (κr)| must be decreasing at r = a. Thus the mth radial mode can only propagate when
d/dr(|Jm (κa)|) ≤ 0. Below the first zero of J1 (κa), only a single mode can propagate,
so the fiber is single mode when
λc >
an1 2.
The light is not confined entirely to the core, as it is in a metal waveguide, but spreads
a considerable way into the cladding; the patching condition can always be met for the
lowest mode, so there is no long-wavelength mode cutoff of a fiber the way there is for
a metal waveguide. On the other hand, the guiding becomes weaker and weaker as more
and more of the light is in the cladding, so there is a practical minimum for the core
diameter. Guiding behavior is captured by the mode volume or normalized frequency V ,
which from Table 8.1 is
V = an1 k0 2.
Good guiding and reliable single-mode operation occurs for V from about 1.8 to 2.4.
The lowest order mode, LP01 , has a round top and exponentially decaying sides, so it
looks roughly Gaussian. The mode field radius w0 is defined similarly, as the 1/e2 radius
of the best-fit Gaussian, and for single-mode fiber is approximately
w0 ≈ a ⎝ +
2.6 3
As with Gaussian beams, we normally care about the details of only the lowest
order mode, because any device that relies on a multimode fiber having exactly predictable mode properties is doomed; a multimode fiber has such strong coupling between
modes that the mode amplitudes and phases change rapidly with propagation distance
and environmental perturbations. One exception is when we know we have exactly two
well-behaved modes, for example, in a polarization preserving fiber (see Section 8.4.4).
When we talk about a fiber being single mode, we are of course talking about the scalar
approximation, whereas light in the fiber has two degrees of freedom in polarization.
For perfect cylindrical symmetry and homogeneous, isotropic materials, the fiber has
rotational symmetry, which means that the two polarization states have exactly the same
kz ; that is, they are degenerate. Degenerate states don’t get out of phase with each other,
so even a very small coupling causes complete energy redistribution between them, as
we’ll see.
Mode Coupling
Many fiber devices are based on weak lossless (adiabatic) coupling between two waveguide modes, either the LP01 modes in two fibers brought together, as in a directional
coupler, or between two modes in a single fiber. By weak , we mean that the mode amplitudes change on a scale much longer than a wavelength, not that the energy transfer isn’t
large eventually. Under these conditions, we can treat mode coupling by first-order perturbation theory, which is a good break because for that we need only the zero-order
waveguide modes. Consider the situation of Figure 8.2: two modes, ψ1 and ψ2 , that are
orthonormal† except for a small coupling, which is constant in z and t. We write the
† That
is, orthogonal and with the same normalization, so that unit amplitude in one mode corresponds to unit
Interaction Zone
Figure 8.2. Coupled modes.
mode amplitudes as (A1 , A2 )T . The first-order perturbation equation for A is then
c11 c12
c21 c22
with |c12 |, |c21 | |k1 |, |k2 |. If the perturbation is lossless, then d/dz|A|2 = 0. Rearranging a few indices leads to
ikz1 c12
T ∗
c + (c ) = 0 ⇒ c =
that is, c is skew-hermitian,† and all its eigenvalues are therefore imaginary. We’d expect
the pure imaginary elements on the main diagonal, because that’s the behavior of the
uncoupled modes; the skew-hermitian symmetry of the off-diagonal elements makes sure
that the power leaving one mode winds up in the other.
This constant-coefficient system is solved by finding the eigenvalues iβj and eigenvectors φj of c, because in the eigenvector basis (the principal components basis), the
equations decouple, leading to solutions ψj (z) with a simple exponential z dependence,
φj = exp(iβj z),
βj =
kz1 + kz2 ±
(kz1 − kz2
+ 4 |c12 |
We’ll use primed quantities in this basis (e.g., A is the principal component representation of A, but it’s the same vector really). If the two waveguide modes have the
skew-hermitian or antihermitian matrix is i times a hermitian one, so with that modification we can apply
all the usual theorems about hermitian matrices—orthogonal eigenvectors, real eigenvalues, and so on.
same kz (e.g., the fiber coupler case, or polarization coupling in an ordinary single-mode
βi = kz ± |c12 |
and φ j becomes
φ1 = √
φ2 = √
This isn’t too spectacular looking—just two quadrature pairs with slightly different
propagation constants. √
Now, however, let’s consider the usual case, where
√ all the√light
is initially in ψ 1 = 1/ 2(φ 1 + φ 2 ), so A(0) = (1, 0)T and A (0) = (β1 / 2, β2 / 2)T .
In that case, we get a time dependence
A(z) = e
ikz z
cos(c12 z)
i sin(c12 z)
so that after a distance π/(2c12 ), all the energy has moved from ψ 1 to ψ 2 , and it then
transfers back and forth sinusoidally, forever. Thus regardless of the magnitude of c12 ,
we can pick a length for our coupler that will give us any desired output power ratio
from 0 to ∞.
If kz1 = kz2 , we still get sines and cosines, but with offsets due to the maximum
coupling being less than 1; the power doesn’t finish moving from ψ 1 to ψ 2 before the
phase shift between modes exceeds π , and it starts moving back again. It’s useful to
define the beat length λb , which is the period of the spatial beat between modes,
λb =
|kz1 − kz2 |
Looking at the square root in (8.10), there are two distinct limits,
|c12 |2 (kz1 − kz2 )2 /4
|c12 |2 (kz1 − kz2 )2 /4
(long beat length) and
(short beat length), which govern how we do the binomial expansion of the square root
(see the problems in the Supplementary Material). In the long beat length limit, the
situation is very similar to the degenerate case, except that the maximum coupling is
reduced slightly. In the short beat length limit, though, the rapidly growing phase shift
between modes prevents much power from coupling at all, so the modes remain more
or less uncoupled.
Space-Variant Coupling
The situation is quite different when the coupling coefficient c12 is a function of z,
especially if we make it periodic near λb . If we do so, for example, by launching an
acoustic wave of that wavelength, making a photorefractive Bragg grating, or by pressing
a corrugated plate against the fiber, we can recover the strong coupling that we had lost.
If the coupling has spatial frequency 2kz , light going to positive Z will get backscattered
strongly, as in a fiber Bragg grating. The coupled-mode theory becomes a bit more
complicated in this case, but is still quite manageable.
Because different order modes have different kz , they have different phase velocities,
vp =
and because of the nonlinearity of the expression for kz in terms of k and κ, their group
vg =
are different as well. This intermodal dispersion † severely limits the transmission bandwidth of step-index multimode fibers, as is well known. A simple ray optics model of a
ray bouncing off the core–cladding boundary at exactly θC gives an estimate of it,
vm =
c ⎢
⎥ c(NA)
⎣1 − ⎦≈
1 + (NA)2 /n21
which for an NA of 0.2 predicts a velocity spread of over 1%—around 45 ns/km.
The exact modulation behavior of a given fiber depends on its length, manufacturing
variations, and the details of its mode spectrum. We can feel pretty safe as long as
the phase spread between modes is below a radian or so at the highest modulation
frequency, but we expect to be in big trouble when it gets to half a cycle. That leads to
a bandwidth–distance trade-off,
L × BW <
which is pretty sharp; that 45 ns/km limits us to 3.5 MHz in a 1 km length. Bending
the fiber into a coil reduces this effect and in rectangular cross-section waveguides can
almost eliminate it; bending redistributes light between low- and high-angle modes, so
that most of the light will spend some time in each, reducing the transit-time spread.
(This is rather like the quenched mode coupling of Section 8.12.1.)
Gradient-index fiber works like a big long GRIN lens, ideally making rays at all
angles see the same propagation delay. Waveguide effects and manufacturing errors limit
† This
is of course a different use of the word dispersion than elsewhere in optics, where it refers to variation
of n with λ.
its intermodal dispersion to 0.5–2 ns/km, which is much better, but still pretty big.
Single-mode fibers don’t have this problem, which (along with much lower loss and
lower fiber cost) is why they are popular.
A single-mode fiber suffers waveguide dispersion † as well, because the fiber structure influences κ, so that kz is a nonlinear function of k, and vp slows down at long
wavelengths. However, the quartz also has normal material dispersion, with n increasing
at short wavelengths; for ordinary silica fiber, the two cancel out in the neighborhood
of 1.3 μm, the zero-dispersion wavelength. You can send a fast signal a long way
at 1.3 μm.
The loss minimum is at 1.55 μm, not quite the same as the dispersion minimum.
In order to save money on repeaters, people have designed dispersion-shifted fiber,
whose index profile is modified to reduce the waveguide dispersion enough to shift the
zero-dispersion wavelength to 1.55 μm. There is also dispersion-flattened fiber, which
is a compromise between the two and has low but not zero dispersion between 1.3 and
1.55 μm.
Single-Mode Optical Fibers
Single-mode fiber is pretty well behaved stuff on balance; it will carry light long distances
without doing serious violence to it, except for messing up its polarization, and it can be
coupled into and out of good quality laser beams without great difficulty (once you get
used to it). Once you get the light in, you can pipe it around corners, split it, combine it,
modulate it, attenuate it, and detect it much the way you would with RF signals in coax.
Due to the fiber’s polarization degeneracy, the polarization of the transmitted beam is
unstable, and because of its perfect spatial coherence, there are a lot of etalon fringes
that have to be controlled with angled fiber ends, which is very awkward. Most of the
single-mode fiber sensors we’ll encounter later are limited by these twin problems.
The étendue of a single-mode fiber is the smallest of any optical component’s—it’s
about the same as the A product of a perfectly focused Gaussian beam, which is
λ2 /2, on the order of 10−8 cm2 ·sr at 1.5 μm. A laser is the only source that can cope
with that.
Multimode Optical Fibers
Multimode fibers have a large V , which means that many modes can propagate; the
number goes as V 2 , and is on the order of 200 for a step-index fiber. For the same
value of V , a graded-index fiber will have half the number of modes of a step-index
fiber. (Why?) These modes couple into one another at the slightest provocation, and as
we saw, in step-index fibers they have very different propagation velocities, leading to
severe bandwidth degradation.
Graded-index multimode fibers are the most common type; making the index of the
core decrease smoothly from a broad peak on the axis brings the propagation velocities
of the modes very close to one another, and so reduces the intermodal dispersion to a
† The
term “mode dispersion” should be avoided, because different writers use it to mean both intermodal
dispersion and waveguide dispersion.
few nanoseconds per kilometer. With laser light, the output of the fiber is a speckly mess
that wiggles around like sunlight filtering through the trees on a windy day. It’s even
worse if the whole mode volume hasn’t been excited, as often happens when we couple
a laser into it with a low-NA lens.
On the other hand, multimode fiber is pretty stable with broadband illumination, if
we carefully excite all the modes, for example, by using a diffuse source such as an
integrating sphere, or homogenizing them by mandrel wrapping.
Multimode fiber is inherently lossier than single mode. High angle modes are weakly
guided and so are vulnerable to scatter from irregularities in the fiber, or to bending
and pressure. You see this as a gradual decrease of the measured NA of the fiber as its
length increases. This hits a steady state eventually, since low angle light gets coupled
into high angle modes, eventually spreading the loss evenly throughout the modes. The
high loss encountered by high angle modes makes the far field pattern of a long piece
of multimode fiber approximately Gaussian.
We often want to couple thermal light into multimode fibers, and so need to know
something about their étendue. It’s pretty easy to calculate for a step-index multimode
fiber, because the acceptance angle is more or less constant across the facet, so n2 A ≈
(π a 2 )(π(NA)2 ). Calculating the étendue of graded-index fiber is not so easy, because
high angle rays near the edge of the core aren’t well guided, but go into leaky modes.
We can intuitively see that the étendue will be significantly less, because high angle rays
can only enter near the fiber axis. A pure ray optics approach could calculate the NA
as a function of radius and integrate up to r = a, but we’ll content ourselves with arm
waving here; it turns out that GI fiber has halt the modes of SI.
At any given wavelength, each mode is fully coherent, and so we can imagine making
a couple of multiple-exposure holograms that would transform each individual mode into
a different focused spot. Thus each mode should contribute about the same étendue as a
focused spot, that is, about λ2 /2. We thus expect the étendue of fiber with a given NA
to go as
λ2 N
n2 A ≈
The wisdom here is that to pipe low spatial coherence light around, step-index multimode is really the way to go unless you need the fast temporal response.
Few-Mode Fiber
The worst case is fiber with only a few modes, for example, telecom fiber used in
the visible. There aren’t enough modes for their spatial pattern to average out, and
their coupling is a really serious source of noise because their propagation speeds are
very different; thus the relative phase delay grows rapidly with z, leading to FM–AM
conversion. Because the two modes will not in general have the same polarization,
phase drift leads to severe polarization instability. Furthermore, the pointing instability
of few-mode fiber is appallingly bad, with the beam wandering around by a degree or
more depending on the phase delay.
Polarization-Maintaining (PM) Fiber
Polarization-maintaining (PM) fiber is SMF intentionally made asymmetric in profile
to break the polarization degeneracy of circular-core fiber, and is seriously misnamed.
A built-in stress birefringence or (less often nowadays) an elliptical core makes the
two polarizations travel at slightly different speeds. The speed difference is fairly
slight; the beat length is a few centimeters in ordinary PM fiber to 1–2 mm in
high birefringence (HiBi) fiber. As we’ve just seen, a short beat length means that
the coupling of the two modes is incomplete. If we launch light into the fiber in
one polarization state and look at what comes out, we find that the polarization is
reasonably stable; regardless of mild abuse, bending, and so on, the polarization ratio
at the output is typically 20 dB. That’s enough to help, but far from enough to fix
the polarization noise and drift problem; a 20 dB polarization ratio means that the
polarization axis can wander ±6◦ , and you aren’t going to do a dark field measurement
with that.
Furthermore, the short beat length means that the delay discriminator action is working
moderately hard at demodulating the FM noise of the laser; a PM fiber has lots of noise
if the orthogonal output polarizations are made to interfere, for example, by a misaligned
polarizer or an angled fiber end, and the mode coupling eventually does the same thing
to us.
The real reason that PM fiber is misnamed is that it doesn’t preserve any old input
polarization, just the two eigenmodes. Its birefringence drifts with temperature, and the
drift rate is proportional to the total retardation, so high birefringence PM fiber (1 mm beat
length) with equal amounts of light in both modes has around three orders of magnitude
more polarization drift with temperature than ordinary single-mode fiber.
We therefore use PM fiber with as pure a polarization mode as we can give it. Since
the output polarization purity is typically 20 dB, using PM fiber that way makes the
small-scale instability worse, but restricts it to a limited range of elliptical polarizations
about the desired mode.
The great virtue of PM fiber is thus to stabilize the operating point of our polarizationsensitive sensors well enough that a properly normalized AC measurement can pull good
data out without needing active stabilization, and that’s truly a major point gained—we
don’t have to ship a graduate student with each instrument.
Chalcogenide Fibers for IR Power Transmission
A silica fiber will transmit at least some light from 250 to 2000 nm, and fluoride fibers
reach 3000 nm, but the loss gets pretty large near the extremes. In the mid-IR, good fibers
can be made from chalcogenide glasses. The main types are sulfide glasses, As40 Sx Se60−x ,
which work from 1 to 6 μm, and telluride glasses, Gea Asb Sec Ted , for 3 to 11 μm.
These are flexible enough, and low enough in loss, to be an attractive way of piping
mid-IR power around, for example, CO2 light for heating or surgery. Even at their best,
chalcogenide fibers’ losses are between 0.2 and 1 dB per meter, thousands of times worse
than single-mode silica at 1.5 μm, but they can handle CW power levels of 100 kW/cm2
and pulsed levels of 200–400 MW/cm2 .
Fiber Bundles
The relatively low cost of fiber lends itself to some creative brute force solutions. For
example, say we want to be able to pipe light from a lamp and condenser around easily.
The étendue of a multimode fiber may be 10−5 cm2 ·sr, but we can use 103 of them in
a bundle, which starts to be quite respectable (and no longer so cheap). Bundles work
best with broadband illumination, since otherwise speckles due to mode shifting from
bending and stress will drive you crazy. They come in all sizes from two fibers to 20 mm
diameter or larger; it depends on how much you want to pay for material, and for all that
laying, cleaving, and polishing. They are always multimode because of the ratio of the
core and cladding areas, but cladding modes are important and may need to be removed
by potting in black wax.
The basic distinction is between image bundles, where each point on the output surface
connects to the corresponding point on the input, so that an image can be transmitted, and
random bundles, where no such care has been taken. Random bundles aren’t really very
random; they won’t homogenize a space-varying illumination function, for example. The
fibers just go any old way they happened to land. Now that LEDs are so good, randombundle illumination is much less useful than it once was—we might as well mount the
LEDs at the business end and run wires instead.
Imaging bundles are further divided into fused and flexible types. A flexible bundle
has the fibers glued together only at the ends, leaving it free to bend. Fused bundles
are used mainly for manipulating images; a short bundle with one concave and one flat
surface makes a fiber-optic field flattener (also called a fiber-optic faceplate), which gives
image tubes a flat image plane. Two image tubes can be cascaded with a biconcave fiber
plate, which works better than any lens system and is dramatically smaller too. Tapered
bundles, made by softening the bundle and stretching one end, magnify the image by
the ratio of the face diameters. You can even twist the softened bundle, to make a fiber
image rotator.
Of course, the phase gets completely screwed up in the process of transmission—the
fibers are multimode, and no two are identical anyway. None of these image bundles
can produce an aerial image—if the image is out of focus on the entrance face, it’s out
of focus forever, just like a ground glass screen. That’s why the common term coherent
bundle is a misnomer. On the other hand, the bulk optics alternatives are either a long
succession of relay lenses or a long, specially designed GRIN rod. These are no picnic
to get right, on account of the buildup of field curvature in the lenses and the specialized
nature of the GRIN rod, but most of all because they can’t bend without fancy and
delicate articulated mirror mounts at each hinge.
Split Bundles
You can get split bundles that allow you to couple both a source and detector to a
remote head. This can be pretty useful sometimes; with a tungsten bulb on one end and a
spectrometer on the other, it allows decent low resolution spectroscopic measurements in
tight spaces, for example. Combining a split bundle with a light bulb, band-limiting filters,
and a monochromator is easy and very successful. If the monochromator is actually an
integrated grating spectrometer/OMA design that plugs into a PC, you can put a nice film
thickness measurement device together inexpensively that will do measurements under
water, in awkward places, and in the face of lots of vibration and shock.
Another application of bundles is in fiber optic slip rings. A modulated laser on a
rotating part is focused onto a surface having an annular ring of fibers. All the fibers
are gathered together into one photodiode at the other end. The focused spot radius is
about one fiber diameter, so that communications don’t drop out periodically as the spot
rotates. Communication the other way can be done with a more much powerful laser
feeding the entire bundle, with the photodiode on the rotating part, but this is seldom
done because of the laser power required.
Coupling into Bundles
We usually use a condenser to launch light into a bundle, as we saw in Example 2.1. The
output end is a bit dicier. Well-homogenized multimode fibers illuminated with white
light have nearly uniform output brightness across the core, but it narrows and dims for
long lengths and short wavelengths, due to the excess loss in the high angle modes.
The fibers themselves do not pack all that tightly (see the problems), so that the
illumination function is inherently nonuniform. The cleaved fiber ends have some nonzero
wedge angle, which is usually nonuniform among the fibers in a bundle, so the far field
pattern is a bunch of more-or-less overlapping cones rather than a single cone. This can
lead to artifacts in grating spectrometers, for example.
Many-fiber bundles are more predictable than few-fiber ones, because the holes in the
illumination pattern are smaller by comparison with the whole bundle. A fiber bundle
used to receive light from the apparatus subjects it to a very odd spatial filtering function,
which is usually poorly understood, as shown in Figure 8.3. Don’t do it if you don’t
have to, and if you do, think about putting the bundle far enough out of focus that the
pattern doesn’t cause artifacts. A 125 μm fiber has to be surprisingly far out of focus
for the modulation of the intensity to be below 1%, say.
Fiber bundles are more or less telecentric, since the angular pattern from each fiber
is more or less the same. You can get fiber ring lights, which go around the outside of a
lens, and provide a nice uniform illumination with an odd annular pupil function.
Liquid Light Guides
A variation on the fiber bundle is the liquid light guide, which is just a huge flexible
waveguide made from high index liquid in a low index plastic tube. They’re flexible but
have poor illumination patterns. These have been pretty well superseded by white LEDs
mounted where the light is needed.
Cone Axes Differ
Due To Spread
of Cleave Angles
Fibers Laid Irregularly
Far Field
But Still
Intensity Very Nonuniform
On End Face
Figure 8.3. Pupil function of a fiber bundle.
Leaky Modes
Modes that are just allowed or just cut off are generically called leaky modes. They
get excited any time there’s a mode mismatch. Their losses are very high, of course,
which makes the overall fiber loss look anomalously high for the first bit of fiber after a
launcher, splice, or connector.
Cladding Modes
Even leakier are cladding modes, in which the jacket takes over the role of the
cladding and the whole fiber that of the core. You’ll get these nearly always, but
they’re especially troublesome with single-mode fiber because the étendue of the
highly multimode-fiber/jacket system is far larger than that of the core/cladding system.
Cladding modes can be eliminated in a short distance by stripping the fiber jacket and
putting a blob of index-matched black wax on the cladding, or by mandrel wrapping.
Fiber guiding occurs because there is no propagating mode in the cladding whose kz
phase matches with the light in the core. As soon as we bend the fiber into an arc,
though, that is no longer true; light going round the arc at a larger radius sees a longer
path length, so it can move faster and still stay phase matched. For any finite bend radius,
there is a distance into the (notionally infinite) cladding at which a propagating mode is
phase matched.
For long bend radii, that isn’t a problem, because the guided mode has died off more
or less completely before reaching that distance. As the bend gets tighter and tighter,
though, the phase matched distance gets closer and closer, until it starts overlapping
the propagating mode significantly and light starts getting lost. At that point, the loss
increases exponentially as the bend radius declines. You can easily see it by putting a
HeNe beam through a fiber and bending it with your thumb and forefinger.
For a given fiber, the guiding gets weaker as λ increases. The bend edge is the
longest wavelength for which guiding occurs, as a function of the radius of the bend.
You normally won’t get very close to it.
Bending and Mandrel Wrapping
Bending loss has its uses, especially in stripping unwanted cladding modes and filling
up the mode volume in a multimode fiber. It’s pretty safe to bend a fiber into a coil
600 fiber diameters across, and you can go as small as 200× briefly. For lab use, where
broken fibers are easily replaced, you can be much more daring; a 125 μm OD fiber
can be wrapped 10 times around a pencil to strip leaky modes and cladding modes, for
example. In an unattended system, stick to the 600× rule and use lots more turns, or use
the black wax trick for cladding mode stripping instead.
Bend Birefringence and Polarization Compensators
Wrapping a fiber around a mandrel makes it birefringent, with the fast axis normal to the
axis of symmetry. How birefringent a given piece of fiber becomes at a given wavelength
is usually a pretty well kept secret—you have to measure it. A rule of thumb is that the
retardation in meters of a single-mode fiber wrapped into one turn of diameter D is
βb = Kλ
where k ≈ 0.13 m at 633 nm for silica fiber; a 5/80 μm HeNe fiber looped into one turn
of 72 mm diameter makes a nice quarter-wave plate, and that in turn allows us to make
all-fiber polarization compensators.
Just as two quarter-wave plates can turn any fully polarized state into any other, so
two loops of fiber attached to flat discs, hinged so as to allow them to rotate through 180◦ ,
will allow us to control the polarization funnies of our fiber, at least until it starts moving
again. Three-paddle compensators (two λ/4 and one λ/2) are often used to provide some
tolerance for errors and wavelength shifts, because if your wave plates are not exactly
λ/4, there are some places you can’t get to—just the way you can touch your nose to
your shoulder and to your wrist, but not to your elbow. Since we’re usually interested
in linear polarization, after tweaking the two λ/4 paddles to get rid of ellipticity, we can
get any linear polarization by turning the λ/2 paddle.
Piezo-optical Effect and Pressure Birefringence
Squashing a fiber produces a piezo-optical shift in n, as we saw in Section 8.5.6. Quartz
isn’t very piezo-optic, but √
there’s a lot of it in your average fiber system—and even a
random effect grows with L.
Twisting and Optical Activity
Analogously, if we twist a fiber about its own axis, it becomes optically active. The total
polarization rotation of a fiber twisted through ξ radians is
θ = gξ,
where g is about 0.15 for silica fiber at 633 nm.
Aside: Normal Propagation. When we discuss the propagation of light in circular core,
single-mode fibers under peculiar conditions (e.g., bending, twisting, and so on), we are
looking at the deviation of the light from the normal situation, that is, k and E being
constant in the lab frame. Twisting a fiber drags the polarization direction along with
the twist, but that’s solely a material property. Maxwell’s equations and the waveguide
properties per se aren’t altered, so without the material change, the light would continue
with the same polarization in the lab frame, completely oblivious to the change in the
guide. Things are a bit more complicated with PM fiber, because we have to look at the
twist or bend as a coupled-modes problem.
8.5.8 Fiber Loss Mechanisms
Silica fibers are amazingly low loss devices, far more transparent than air in some spectral
regions. Loss is dominated by Rayleigh scattering and microbending at short wavelengths,
although there is also strong electronic absorption deeper into the UV.
The Rayleigh scatter goes as λ−4 , so it’s very strong below 400 nm and almost nonexistent beyond 1.2 μm. There are overtones of the OH vibrational resonance at 2.73 μm,
which occur in bands near 950, 875, 825, and 725 nm, and molecular vibrational absorption beyond ∼2 μm. The sweet spot is in the 1–1.6 μm range, where the absorption,
Rayleigh scatter, and microbending are all weak, and the fiber can achieve losses of
0.15–0.5 dB/km. By comparison, a 1 m thick piece of window glass looks as green
as a beer bottle (try looking through a bathroom mirror edgeways sometime). The few
meters of silica fiber in the typical instrument will transmit light from 250 to 2000 nm
or thereabouts.
At 254 nm, silica fiber’s absorption goes up to around 1000 dB/km, and permanent darkening or solarization may start to occur due to UV damage. You can get
solarization-resistant fiber that works stably down to 180 nm (1.5 dB/m). Multimode
fiber is especially flaky in the UV, because the mode coupling is very strong. Low angle
modes get coupled into high angle modes, which are attenuated much more strongly.
Wiggling the fiber makes the amplitude go up and down, by as much as 1 dB/m at
254 nm.
Minor manufacturing errors also contribute to fiber loss; small-scale roughness at the
core–cladding interface (a form of microbending) is the major one, but there is also the
occasional bubble or inclusion. Microbending is worst at high , just as high index lenses
are more vulnerable to surface errors. These intrinsic losses dominate at long lengths, but
in instruments we’re more often fighting splices, connectors, and larger scale bending.
8.5.9 Mechanical Properties
Glass is a brittle material, which is to say that cracks that are not held shut will eventually
propagate through it, causing failure. The time to failure goes as a very high negative
power of the tensile stress. At the crack tip, chemical bonds are being broken, which
takes energy. It takes a lot less energy if there’s water around to bond to the newly
created surface, so the time to failure is enormously longer for dry fiber versus wet.
It’s odd at first sight, but the strength of a fiber depends on its length. For a uniformly
loaded fiber, it’s the largest flaw in the entire length that causes failure, just like the weak
link in a chain. The statistics of how big the largest flaw is are sensitive to how long it
is, and of course very sensitive to the history of that piece of fiber.
The fracture strength of fibers thus varies all over the place, with some published
curves showing 0.1% probability of fracture per kilometer at stresses of only 50 kPa,
which is less than the stress on your spine from holding up your head. On the other hand,
a really good piece of fiber breaks at around 4 GPa, which is three times stronger than
the best steel.
8.5.10 Fabry–Perot Effects
In Section 11.9.2, we’ll see that minor misalignment protects us from etalon fringes to a
considerable degree. There’s no way to misalign a single-mode fiber, so we would seem
to be in trouble; along with the polarization instability, etalon fringes are the leading
cause of trouble in laser-based fiber instruments. One thing that helps is to cut the ends
of the fibers at an angle, so that the reflection is outside the fiber NA, and so goes off into
the cladding to be absorbed. Fibers tend to cleave normal to the axis, but twisting them
while cleaving will produce an angled facet. Angled cleavers exist that work reasonably
repeatably, and you can get angled physical contact (APC) connectors and matching
fiber polishing kits. These will get you down to around 10−4 reflectance, which is still
poor in the scheme of things, but is pretty good for fiber. We’re often forced to choose
fragile APC connectors and expensive Faraday isolators, which significantly reduces the
attraction of fiber.
One gotcha is that the etalon fringes in fiber are polarization dependent, because
different polarizations see different delays. In a broadband system, this will decorrelate
the noise in the two polarization eigenstates.
8.5.11 Strain Effects
If you imagine stretching a hollow metal waveguide to 1 + times its original length,
it’s easy to see that the two main effects are a propagation distance increased by the
same factor, and a decreased propagation speed due to the slight narrowing of the guide
due to the stretch.†
Fiber behaves the same way, with the addition of the change in n due to strain. The
rate of change of phase with respect to stretching in a fiber is
= nk0 ξ,
where ξ is given by
ξ =1−
so that
(P12 − μ(P11 + P12 ) ) ≈ 0.78
≈ 0.78nk0 .
8.5.12 Temperature Coefficients
We saw in Section 4.2.2 that the temperature component of the optical path length is
1 ∂n
+ CTE.
n ∂T
The CTE of fused quartz is very small, about 5 × 10−7 /◦ C, but its TCN is around
+9 × 10−6 /◦ C, so that its TCOPL is 7 × 10−6 /◦ C.
Although the exact value is slightly modified by waveguide effects, this is the dominant
effect on the phase change in fiber devices with temperature, and also in the temperature
tuning of fiber Bragg gratings.
If the fiber has a strong plastic cladding (e.g., 1 mm diameter nylon), the temperature
sensitivity will go up by an order of magnitude due to the huge CTE of the plastic
straining the fiber.
† Stretching
things makes their dimensions perpendicular to the strain shrink. Poisson’s ratio μ tells how big
the relative shrinkage is. For something like rubber or chewing gum, where the total volume stays constant,
(1 + )(1 − μ)2 = 1 + O( 2 ), so μ = 0.5. Glass has μ ≈ 0.4, and most metals, 0.33.
t1 r1
t2 r2
E = Σ t1t2 (r1r2)n e i 2n Δφ
Figure 8.4. Demodulation of laser noise by etalon fringes and double Rayleigh scatter: this is
generally what limits the SNR of fiber measurements.
8.5.13 Bad Company: Fibers and Laser Noise
We saw in Section 2.5.3 that multiple reflections can cause serious intensity noise. As
shown in Figure 8.4, even a short length of fiber can do it, and so can multiple Rayleigh
scattering even in a perfectly terminated fiber. This coherence effect is one of the main
SNR limitations of fiber sensors.†
If the scatterers are localized (e.g., facet reflections), you get fringes in the spectrum
from localized mirrors. Even tiny frequency fluctuations get turned into huge amplitude
variations due to the very steep slope of the fringes with frequency.
In the double Rayleigh case, the phase of all the little scatterers is random, so the
scattering contributions add in power. Either way, for fibers longer than 1/(nν), you
basically get the whole interference term of the scattered light turning into noise with
a frequency spectrum that is the autocorrelation of the source spectrum. There’s the
usual 3 dB reduction in intensity noise, as in Section 13.6.9, since some of the interference term remains phase noise instead of all of it becoming amplitude noise. As
we saw in Section 2.5.3, this effect can become dominant with surprisingly short path
The fringe phase is strongly dependent on fiber length and stress birefringence, so two
fibers fed from the same source will produce a lot of uncorrelated noise. This effect makes
double-beam noise reduction systems such as laser noise cancelers (see Section 10.8.6)
far less effective with fibers.
8.6.1 Getting Light In and Out
Getting light out of a fiber is relatively simple, because the light is already coming out
on its own; you just stick the facet at the back focus of a collimating lens, and you’re
done. The NA of the fiber makes 10× to 40× microscope objectives a good choice;
if you want good beam quality, use single-mode fiber, a good lens, and a good solid
† Amnon
Yariv et al., Signal to noise considerations in fiber links with periodic or distributed optical amplification. Opt. Lett. 15, 1064 (1990).
Figure 8.5. Coupling to fibers.
mount—not some big six-axis stage. With multimode fiber, there’s no avoiding an ugly
beam, so don’t worry about the lens quality so much.
If you just want to dump the light, or shove it into a photodiode, you can dunk the
fiber end into some index matching material (oil, gel, or wax). If you get the index right,
the back reflection will be very small, and virtually none of the escaped light will bounce
back into the fiber—the low étendue works in our favor here.
Getting light into the fiber in the first place is much more difficult, because you
don’t have the light itself to guide you (see Figure 8.5.). The first thing you need is to
believe in reciprocity. Reciprocity says that an antenna used for receiving has the same
spatial pattern and the same gain as when it’s transmitting, or that a beam coupling into
a fiber has the same loss that you’d see if it were coming out of the fiber and being
spatial-filtered to look just like the incoming beam.
Thus it is necessary to mode-match to get the best coupling efficiency. Because the
beam coming out of the fiber is nearly Gaussian, with an NA specified by the manufacturer, you can just use the same NA on the launching side, and be sure of the best
If your coupling efficiency is below 80%, you haven’t got it quite right; measure the
NA of the output of the fiber (remember we’re talking about the 1/e2 diameter), and use
that on the input side. Your coupling efficiency should improve to the 80–90% range.
The coupling efficiency as a function of mode field diameter mismatch is roughly
ηc ≈
2w1 w2
w12 + w22
Focus error and aberrations do to coupling efficiency just what they’d do to an interferometer, and for the same reasons. There are formulas available to calculate how close
they should be, but basically you need to be within, say, 0.1 Rayleigh ranges of perfect
focus over temperature. This isn’t particularly onerous, since the Rayleigh range is tens
of microns.
The major aberration we have to deal with is astigmatism in diode lasers, and even
this isn’t normally such a big problem since the diode’s beam is oblong and we’re only
using the central bit, where the aberration hasn’t had time to get too large; the coupling
efficiency is dominated by the shape mismatch unless you circularize the beam. There
are formulas for calculating all these things, but nobody ever uses them in real life.
Aside: A Slightly More Rigorous Look at Reciprocity. An electromagnetic definition
of a reciprocal component is that its behavior is unchanged when we replace t with −t,
E with E∗ , and H with −H∗ . Consider shining a laser beam through a component, and
putting a lossless phase-conjugate mirror after the component, so as to send the beam
back through it the other way. The component is reciprocal if and only if the two-pass
beam is a phase-conjugated replica of the incoming beam, independent of polarization.
Lossless lenses, mirrors, wave plates, beamsplitters, and fibers are reciprocal; attenuators,
polarizers, and Faraday rotators are not. Loss can be put in by hand, so we often loosely
think of attenuators and polarizers as sort-of reciprocal, though a specific term for that
would be useful. Hand-waving reciprocity is a very useful working concept.
8.6.2 Launching into Fibers in the Lab
Those nice fiber collimators that Thor Labs and others sell are great for coming out of
fiber, but much less good for going in, because you can’t see what you’re doing during
alignment very well. If you don’t mind searching blindly for some time, you can put
one of these in a two-axis tilt mount and twiddle till you see something. Alternatively,
if yours are connectorized, you can start with a multimode fiber patch cord running into
an optical power meter. Find the points where the intensity falls off, and then adjust so
you’re halfway in between. Then when you put your single-mode patch cord on, you’ll
see some light right away. For visible light, you don’t need the power meter—provided
the total power is low enough to be eye-safe, just look at the end of the fiber.
Sometimes nobody makes collimators suitable for our wavelength or beam diameter.
The wavelength problem is due to chromatic aberration in the coupler lens, which we
can fix, and mistuned coatings, which we can’t. The power of lenses generally increases
toward shorter wavelength, so if your coupler’s center wavelength is too short (e.g., using
a 1310 nm collimator at 1500 nm), try a weak convex lens in front of it, and if it’s too
long, try a weak concave one.
For the fully manual approach, the first thing to start with is a 20×, 0.4 NA microscope
objective. That’s far higher than the NA of most fibers, but then most laser beams are
far smaller than the pupil of the microscope lens, so the actual NA you wind up with is
closer to 0.2 in most cases anyway. Using a 10×, 0.2 NA will require filling the whole
pupil—5 mm in diameter or more—and doesn’t leave any elbow room for the wings of
the Gaussian. Here are the steps (refer to Figure 8.6).
1. If the laser is visible, measure its power and adjust it with an ND filter until it’s
well below the Class II limit (200 μW is a good number); the most sensitive way
to do the early aiming is by looking into the fiber, and it’s nice to not hurt ourselves
in the process. (This applies in fault conditions too; for example, you bump the
power control knob or the monitor photodiode lead comes loose. If you don’t have
a really trustworthy power meter, don’t look into the fiber.)
2. Get the laser beam running horizontally, parallel to one edge of the optical table,
and at a convenient height (about 150 mm). Mark the wall where it hits.
3. Strip and cleave the fiber to get a clean end with 1 or 2 cm of stripped length, and
have a good look at it with a fiber microscope. Put it in a holder, either a ceramic
V-groove collet or a milled inside corner, and clamp or wax it in place.
4. Put the fiber on a three-axis mount with locking screws, and check with a ruler
that the axis of the fiber chuck is horizontal and that both it and the z translator
XYZ Stage
Adjusts Only Offsets
Wobble Plate
Adjusts Only Angle
Solid Mount
Optical Table
Figure 8.6. Launching light into a fiber.
are running parallel to the table edge (±1◦ or so is OK). Check that the light beam
hits the fiber end.
Put a 20× microscope objective on a sturdy mount, and move it until the center
of the output beam coincides with the spot on the wall. Look at the shadow of the
fiber in the light beam when it’s backed off—the shadow should stay centered and
not walk off sideways as z is adjusted.
Put an optical flat (or even a microscope slide, in a pinch—AR coated, preferably)
on an ordinary two-tilt mirror mount between the laser and the objective. (This is a
vernier adjustment, because the amount of tilt you can get isn’t too big, and since
it only tilts the output beam and doesn’t translate it, it should be orthogonal to the
xyz adjustments.)
Watch the way the light reflects from the fiber facet as you adjust z to get it
centered, a millimeter or so past focus. Light should be visible coming from the
Twiddle the xyz knobs to maximize the light coming out of the fiber end. It probably
won’t be too impressive yet because we haven’t adjusted the angle carefully.
Adjust the optical flat tilts to maximize the transmitted power. Iterate the xyz and
angular adjustments until the coupling is maximized.
If you’re using ordinary SM fiber, you’re done. It’s a bit tougher with PM fiber,
because we have to keep the polarization pure going in, and we don’t in general know
the orientation of either end of the fiber. For PM fiber, add these steps.
PM Fiber. Before adjusting,
4b. Make sure the polarization going in is really linear.
4c. Put a rotatable analyzer at the output of the fiber.
5b. Use a zero-order quartz λ/2 plate instead of the flat; it’s thick enough to do both
jobs if we’re careful. You can use both if you like, or if your wave plate is too
skinny to displace the beam much without a huge tilt.
9. Iterate rotating the analyzer to minimize the transmitted light and rotating the
wave plate to minimize it some more.
10. Jiggle the fiber and look at how much the intensity jiggles. Iterate some more
until the jiggle is as small as it’s going to get.
11. If you’re using a diode laser, you can use minimum detected noise on a photodiode
as your criterion for polarization twiddling instead of minimum transmission.
12. Mark the axis orientation of the fiber at both ends for next time.
8.6.3 Waveguide-to-Waveguide Coupling
The easy way to couple two fibers is to use a connector splice. To avoid the resulting
broad etalon fringes, you can use microspheres, little spherical lenses that you can put
between two fibers. This isn’t too easy to do manually. GRIN rods are perhaps a bit
more useful; a GRIN rod of 14 period or a bit more will couple two fibers together very
nicely, and you can increase the NA by going closer to half a period, at the expense of
working distance. Stick the GRIN rod in place, position the first fiber so that light comes
out with the same NA that it went in (easy to see on a white card a little way away),
and then go through the steps in the previous section.
Connecting fibers to diode lasers and integrated optics devices is usually done in one
of four ways: microspheres, lensed (i.e., domed) fiber ends, GRIN rods, or proximity
(butting the fiber right up to the device). GRIN rods and proximity are the two easiest
ways if you’re rolling your own.
8.6.4 Connecting Single-Mode to Multimode Fiber
You can easily send light from a single-mode fiber into a multimode one, but not the
other way. In general, trying to get light from multimode into single mode will cost you
about 20 dB in signal, together with very large (order-1) variations due to bending and
temperature changes. This is because your average multimode fiber has about 50–100
modes, all of which are orthogonal. Because of their orthogonality, they can’t all couple
well into your fiber mode—in fact, if all N modes are illuminated equally with incoherent
phases, you can get at most 1/N of the light into your single-mode fiber.
8.6.5 Fibers and Pulses
One situation where fiber really does behave a lot like coaxial cable is when you’re
using short pulses with a very low repetition rate. A single pulse entering a complicated
fiber system will generate all sorts of smaller pulses due to reflections, which will rattle
round the system for some time. On the other hand, if the pulse is short compared
with the delays between different reflections, and the rep rate is low enough for them
all to die away before the next pulse, things can be pretty well behaved. The author
has a 20 Hz, 20 ps tunable laser system, which works fine with fibers as long as the
reflections are suppressed by time gating. Of course, the polarization instability problems
8.6.6 Mounting Fibers in Instruments
In instruments, we have another set of problems, caused by dirt, vibration, shock, and
temperature drift, and mechanical instabilities due to lubricant flow and stick–slip energy
storage (you know, like the San Andreas Fault). It isn’t a problem inside the fiber, obviously, but it is for launching and collimating. The easiest and most expensive way round
this is to start with pigtailed components and just stick them together with connectors.
You can get armored fiber patch cords, whose steel gooseneck jackets enforce a minimum bend radius and can be bolted down, which helps a lot. The corrugated interior
causes microbending, though, so they aren’t as useful with multimode fiber, especially
in the UV.
One thing to remember is that connectors for weird fiber may be impossible to find
(and the price may wish you hadn’t found them if you do), so try to stick with 125 μm
cladding diameters which is the standard for communications fiber.
The other thing about using fibers in instruments is that you have to be very careful
in choosing a jacket material. Teflon and polyimide are the most common choices at
present, and they’re both good as long as they’re not thick enough to stretch the fiber.
We’ve already talked about a thick plastic jacket straining the fiber with temperature,
but there are other possible problems. Nylon and some other plastics are hygroscopic, so
that they swell in humid conditions—you don’t want that source of instability either.
8.6.7 Connectors
Fiber connectors used to be expensive and very miserable, but now they’re cheaper and
more mildly miserable. (At least if your fiber has a 125 μm cladding OD—special sizes
are much more expensive, (for example, $40 for an 80 μm ST connector, versus $8 for
125 μm.) This is of course a jaundiced view, but how else do you describe a splice that
is guaranteed to send a few tenths of a percent of the light right back into the source (or
0.01% for angled ones)? Even that 100 ppm is more than enough to make a diode laser
misbehave and will cause percent-level etalon fringes.
8.6.8 Splices
The lowest-reflection way to connect two fibers is fusion splicing, where two cleaved
fibers are butted and then melted together, but that isn’t too convenient. We more often
use epoxy splices, which come in kits and are easy to use. Fusion splices have reflections
of 1 ppm or so, which is a big help in reducing instability. There are also temporary
splices, mostly based on very accurate V-grooves in ceramic materials, with a variety of
clamping strategies. You stick the fibers in, usually with a little index gel, and engage
the clamp. Expect 0.1 dB coupling loss for a fusion splice and 0.3–0.5 dB for an epoxy
Remember the egregious polarizing beamsplitter cube in Example 4.1, whose reflections were 1%, and whose length was only 25 mm—it had a temperature coefficient of
transmission of 12%/◦ C. A fiber patch cord is even worse, at least with highly coherent
sources, and you even get strong etalon fringes between the two fiber ends in the connector itself. As usual, you get strong fringes with single-mode fiber because there’s no
misalignment to help suppress them.
You’ll probably use FC or ST connectors, in either normal or physical contact (FC/PC,
ST/PC) varieties. The PC types use fibers with a slightly convex polish on the ends; they
have lower loss and back-reflections, but they are easily destroyed by dust on an end
facet. You can also get angled physical contact (APC) connectors, with a nominal 8◦
facet angle, and these help quite a bit with back-reflections, as long as the reflected wave
is all beyond the core/cladding critical angle, so that it all leaks out. APC connectors are
unforgiving of angular misalignment, so they come with keys to keep them straight. These
keys are the same kind used in PM fiber connectors, which can make life interesting if
you’re not careful.
You can’t mate an APC connector to another kind, because the light comes out at a
4◦ angle to the fiber axis, and because you can’t get the ends close enough together on
account of the angle. That 4◦ angle makes trouble in coupling light in and out, too.
Connector manufacturers sell inexpensive kits that help a lot with getting the end facet
polish right, so by all means buy one. You can clean fiber facets using a Q-tip and some
very clean ethanol, or (for quick and dirty use) some scotch tape.
8.6.9 Expanded-Beam Connectors
At the price of a couple of decibels’ loss per pair, you can use connectors with integral
collimating lenses. They are tolerant of dirt and other sorts of abuse, and you can shoot
the beam some distance in mid-air if you need to, which is a plus in instruments. The
lenses aren’t good enough to make a decent beam, unfortunately.
8.6.10 Cleaving Fibers
Cleaving fibers is easy but takes a bit of practice; most of us are much too rough at
first. Semiautomatic fiber cleavers exist, but careful hand work is just as good. Strip the
fiber back a couple of inches, wet it, and nick it ever so gently with a V-point diamond
scriber in the middle of the stripped section (don’t buy the cheap carbide scribers). Bend
the fiber gently at the nick until it breaks.
Another technique is to strip it back a bit further, wet and nick it as before, and bend
it back in a hairpin loop between thumb and forefinger. Pull gently on one end so that
the loop slowly gets smaller until it breaks.
Bad cleaves are usually due to too big a nick or to pulling the loop in too fast; hackles†
form when the fracture propagates faster than the speed of sound in the glass. Big fibers
are much harder to cleave than little ones. Twisting the fiber makes an angled cut.
It’s worth emphasizing the distinction between all-fiber devices, which use some intrinsic
property of the fiber, and fiber-coupled devices, where the fiber is grafted onto a bulk
optic or integrated optic device. Directional couplers, Bragg gratings, and polarization
compensators are all-fiber, but Pockels cells, isolators, and detectors are merely fiber
coupled. All-fiber devices often have a cost advantage, whereas fiber-coupled ones are
more expensive than bulk. Bulk devices are covered elsewhere, so here we’ll stick with
the all-fiber ones.
hackle is a bump or dip where a bit of glass has remained attached to the wrong side of the cleave.
8.7.1 Fiber Couplers
By bringing the cores of two single-mode fibers close together, the mode fields overlap,
leading to the coupled-mode behavior we saw in Section 8.3.3. Phase matching makes
such evanescent wave couplers highly directional. If the beat length is longer than the
interaction zone, nearly 100% coupling can be achieved, but we more often want 3 dB,
which gives maximum fringe visibility in interferometers. Couplers are usually made by
fusing two fibers together and stretching the fused region until its diameter is slightly
less than that of a single fiber, as shown in Figure 8.7a, which shows a 2 × 2 tapered
coupler. By conservation of étendue, the NA must increase as the core diameters drop,
and the high angle light is no longer guided. It therefore goes off into cladding modes.
The reverse process occurs as the core broadens out again, corralling most of the light
into the cores again. Because most of the light is in cladding modes at some point, the
outside of the cladding must be clean and untouched; thus tapered couplers are usually
suspended in air inside a case.
All these devices are inherently 2N -port, so in combining two beams with a 3 dB
coupler, you’re going to lose half of each one out the other coupled port. This is
easy to see thermodynamically, because otherwise you could feed 100 fibers from a
low temperature source, combine them, and use the result to heat a higher temperature
sink. The four-port character is not necessarily a disadvantage, but you do have to be
aware of it.
Coupler figures of merit are the coupling ratio, directivity, and excess loss. For a
four-port (2 × 2, i.e., two fibers in, two fibers out) coupler, if we feed P1 into port
1, the coupling ratio is P4 /(P4 + P2 ), the directivity is P3 /P1 , and the excess loss is
P1 − (P4 + P2 ).
Provided the taper is sufficiently gentle, the reflections are small, and the excess loss
is well below 1 dB; large-core multimode splitters can have 1.5 dB of excess loss, and
some single-mode ones reach 0.1 dB. The directivity is usually −50 to −60 dB (optical),
and back-reflections are relatively low too, around −50 dB, because the fiber stays more
or less intact. If you need adjustability, you can imbed fibers in epoxy blocks and lap
Core Narrows, Mode Expands
Into Cladding
Modes Overlap
Gradual Transition-> Low Loss
Lapped To Near Core
Epoxy Block
Index Oil
Epoxy Block
Figure 8.7. Fiber couplers: (a) tapered fused coupler and (b) lapped coupler.
them until the mode fields reach the surface; putting two such fibers together makes an
adjustable coupler.
8.7.2 Fiber Gratings
A single-mode fiber is an ideal application for Bragg gratings. For a given wavelength,
we know exactly what kz is, and it’s sufficient to put in a grating with kG = 2kz ẑ to
make a very effective mirror; in wavelength terms, the Bragg wavelength λB is
λB = 2nλG .
As with plane gratings, the resolving power of a weak Bragg grating (R < 10%) is
R = mN , the number of lines in the grating times the diffraction order (providing, of
course, that the grating is weak enough that the far end gets illuminated). A weak Bragg
grating exhibits a strong, narrow spectral dip, which is insensitive to polarization (see
Figure 8.8).
You can use fiber Bragg gratings like mirrors, for example, to make fiber lasers or
interferometers. You can use them as filters, say, to confine the pump light in a fiber
amplifier or laser. And you can use them as fiber sensors, because stretching the fiber or
changing its index will cause the reflection peak to move.
In the beginning, people made these fiber gratings by etching the cladding, but nowadays they’re made by the photorefractive effect: strong UV (248 nm) irradiation causes
Frequency Offset
Photorefractive Grating
Due To Stored Charge
Damage Due To
Localized Melting
Frequency Offset
Figure 8.8. Fiber Bragg grating: structure and spectral behavior in reflection and transmission.
permanent changes in the index of the glass. Ordinary Type I fiber gratings (Figure 8.8a)
have a small index change that extends more or less uniformly through the core, which
makes the coupling to cladding modes very weak, normally a very desirable property.
A short-pulse, high rep rate excimer laser shining through a mask writes gratings in
a fiber fed past from reel to reel. This is potentially very inexpensive in large quantities,
although if you buy one unit, it isn’t much different from a 25 mm square plane grating
at this writing ($500).
The peak reflectance of a Type I fiber Bragg grating is typically below 10%, though
the newer commercial ones are long enough to reach 90% or more. The tuning of the
grating depends on strain and temperature; the interaction between the two is slight, so
the shift of the peak can be predicted from (8.24) and (8.27).
8.7.3 Type II Gratings
It is also possible to make extremely strong Bragg gratings, in which the grating lines
are conceptually more like a stack of dielectric films; these Type II gratings are made by
actually melting the core–cladding boundary with much higher laser fluence. They are
asymmetrical and hence couple light into cladding modes very strongly for wavelengths
shorter than λB ; the loss can be as much as 90% (Figure 8.8b).
Like dielectric stacks, these gratings can achieve very high, flat reflectivities over a
moderately broad band—99.8% has been reported.
8.7.4 Fiber Amplifiers
Fibers doped with rare-earth ions are widely used as amplifiers; erbium at 850, 990,
and 1550 nm, neodymium at 1.06 and 1.32 μm, and holmium at 1.38 μm. They boost
the signal level without needing to be converted to electrical signals and back again,
as in a repeater. The spontaneous emission increases the noise, so you can’t use too
many in a row. The spontaneous emission has shot noise, and it also leads to severe
beat noise, caused by its interference with the desired signal. This is similar to the
coherence fluctuations and double Rayleigh scattering problems, and shows up because
there’s no spatial averaging to get rid of it as there is in bulk optics. You can’t do
much about signal/spontaneous emission noise, but a narrow filter will mostly eliminate
spontaneous/spontaneous beats, which can be much worse.
Erbium-doped fiber amplifiers (EDFAs) are pumped with special diode lasers at 980
or 1490 nm; the pump light is fed in via a coupler and stopped with a Bragg grating,
or a coated facet, or sometimes just allowed to exit along the output fiber, where it will
ultimately be attenuated by the fiber loss.
Don’t be tempted to see fiber amplifiers as equivalent to packaged RF amplifiers in
coax systems, because the resulting systems are not nearly as stable; a garden-variety
RF amplifier gives you 20 or 30 dB of isolation from output to input, whereas a fiber
amplifier gives none whatever. In fact, it gives less than 0 dB isolation, since the gain
is not intrinsically directional—it amplifies the light going the other way just the same
amount. RF folks would say that a 20 dB EDFA has minus 20 dB of isolation, which
would make them unhappy, and for good reason. Part of the skill of designing fancy
fiber systems is knowing how to use the minimum number of expensive isolators to get
good stability.
8.7.5 Fiber Lasers
By adding Bragg mirrors tuned to the signal wavelength, putting mirror coatings on the
facets, or by turning it into a ring resonator with a coupler, a fiber amplifier can be made
into a solid state fiber laser. The simplicity of the idea is its main advantage, but in
principle it can be of great benefit in long path difference interferometers such as Fabry–
Perot types, because the linewidth can be made narrower than the typical 10–100 MHz
of a single frequency diode, and the coherence length correspondingly longer.
8.7.6 Fiber Polarizers
Polarizers are an example of something that’s hard to do in fiber but easy in bulk. Persuading a fiber to actually absorb the unwanted polarization requires giving it anisotropic
loss, for example, by lapping the cladding down to get to the mode field, and then putting
on a metal coating to absorb the TM polarization. The TM mode excites surface plasmon waves in the metal, which are enormously lossy. Metal fiber polarizers can have
open/shut ratios of 50 dB and losses of 0.5 dB.
You can get polarizing (PZ) fiber, in two types. One works by the single-mode equivalent of frustrated TIR; if the evanescent field falls off at slightly different rates for s and
p polarizations in the cladding, then a localized cladding region with high absorption
or low refractive index can cause one polarization to be attenuated differently from the
other. The other kind works by the difference in the bend edge in the two polarizations;
a few meters of carefully aligned coiled fiber sends most of the more weakly guided
mode off into the cladding. Usually we just put a bulk polarizer before or after the fiber.
Walkoff plates are especially popular for this because one beam is undeviated, which
makes them easy to make and to align.
8.7.7 Modulators
We’ve already encountered integrated optic Pockels cells, which are the most common
modulators used with fibers. Fiber phase modulators are often made by wrapping many
turns around a large piezoelectric tube, but these fiber stretchers can’t go much faster
than 50 kHz. They also have a lot of bend birefringence unless you anneal the fiber in the
coiled state (which is hard since the piezo won’t stand the 800◦ C annealing temperature).
8.7.8 Switches
The most common type of intensity modulator or switch is the integrated optic Mach–
Zehnder type, in which a phase modulator in one arm of a waveguide Mach–Zehnder
interferometer causes a bright or dark fringe to occur at the output waveguide. Interferometers are always four-port devices, even though these ones don’t look like it; the third
and fourth ports are radiation into and out of the substrate when the two waves are out
of phase. You can also do it by putting fibers on piezos, and moving them in and out
of alignment. People have built N -way switches in that fashion, but it’s a hard way to
make a living.
8.7.9 Isolators
Fiber isolators are also extrinsic devices; the two most common are Faraday and acoustooptic, which we’re familiar with from bulk optics. Faraday isolators rely on polarizers
to get their directional selectivity, so use PM fiber or mount the isolator right next to
the laser, where the polarization is very stable. Faraday rotator mirrors provide some
isolation as well, providing a polarizing beamsplitter is used to separate out the returned
Both of these technologies are wonderfully useful, but they require much more engineering than one at first expects. Neither is a quick fix for an implementation problem.
Especially seductive is the ease of hacking up a connectorized fiber system: just twist
the connectors together, and you’re done.
The parallel between this and doing signal processing with a table covered in boxes
and RG-58/U patch cords is real but not perfect; the main differences are that the length of
the optical fiber is at least millions of wavelengths, whereas the coax is usually less than
1, and that electronic amplifiers provide isolation, whereas almost all optical components
are inherently bidirectional. An optical fiber setup is more closely analogous to a whole
cable TV or telecommunications cable system, with miles and miles of coax. Stray cable
reflections can cause severe ghost images in TV reception, much the way etalon fringes
foul up measurements.
Even there, the analogy breaks down somewhat; the reflections from the cable network
don’t cause instability or frequency pulling of the sources, because they’re well isolated;
diode lasers are much more susceptible. And besides, coax doesn’t have polarization or
mode problems, or lose signal when you bend it. There’s no coax network long enough
to get coherence fluctuations from a typical oscillator with a 1 Hz linewidth, either.
Diode lasers and fiber optics work better apart than together, unless you’re prepared to
spend some real dough ($1000 or more) for decent isolators, or to destroy the coherence
by large UHF modulation, and even then, you have the coherence fluctuation problem
we’ve already discussed.
It is important to distinguish a fiber sensor from a sensor with fibers in it. There are right
and wrong reasons for putting fiber optics in your setup. The right reason is that there
are unique physical measurements you can do with the fiber, serious safety concerns,
or compelling cost advantages, and eventually lots of similar sensors are going to be
made, so that it’s worth putting in the engineering time to get it really right. Borderline
reasons are that you need a really good spatial filter, and pinholes aren’t enough, or
that the system needs 100 kV of voltage isolation. The wrong reason is that your bulk
optical system is inconvenient to align, and it would be nice to use fibers to steer the
light around to where it’s needed. If you’re building sensors for others to use, you’ll
quite likely spend a lot of late nights in the lab, wishing you hadn’t done that. Where
conventional sensors (optical or not) can do the job, you should do it that way almost
Fiber sensors can do some really great and unique things, but they’re just one of
many tools for the instrument designer. If you really like fibers, and want to use them
everywhere, you face the same type of problem as the man in the proverb who only has
a hammer—everything looks like a nail.
8.9.1 Sensitivity
The basic trade-off in fiber sensors is sensitivity versus convenience and low cost. It’s
often very fruitful to go the low cost route, since the enormous dynamic range of optical
measurements is not always needed, and the market potential may be enormous. Do
realize, though, that you are making that trade-off, so that if it isn’t low cost, there’s
often no point.
8.9.2 Stabilization Strategy
If we are going to make a fancy fiber sensor, we have to be able to predict its performance. Elsewhere, we’ve been asking “What’s the SNR going to be?” and calculating
it from first principles and measured device parameters. With fiber sensors, we’re not
that fortunate, because there are two prior questions. The first is: “How is it going to
be stabilized?” Every sensitive fiber sensor is seriously unstable unless you figure out a
way to stabilize it, and this instability is the primary bogey in fiber sensor work. Scale
factors drift, operating points go all over the place, and all on a time scale of seconds or
minutes—unless you figure out how to avoid it. Some sort of common-mode rejection
scheme is usually needed, e.g. a fiber strain gauge with an unstrained reference fiber in the
same package, two wavelengths, or two modes. Some of these schemes work a lot better
than others.
8.9.3 Handling Excess Noise
The next question is: “How do we get rid of the excess noise?” After stability, the worst
problem in laser-based fiber sensors is their huge technical noise, caused by demodulation
of source phase noise by etalon fringes and scattered light. Only after dealing with these
two issues do we get to the point of worrying about our SNR in the way we do elsewhere.
Fiber measurements are intrinsically squishier than bulk optical ones (see Section 11.4.2).
8.9.4 Source Drift
LEDs drift to longer wavelengths with increasing T , by as much as several parts in
104 /◦ C. Diode lasers drift less (at least between mode jumps) because they are constrained by their resonators. Any fiber sensor that is sensitive to changes in the source
spectrum will have to neutralize these effects somehow. The easiest way is to temperaturecontrol the source, but temperature compensation or normalization by an independent
source spectrum measurement will often work too.
Most mainstream fiber sensors at present are based on changes in received intensity, at
or near DC. There are many kinds, but only a few physical principles: microbending loss
from pressing against corrugated plates, evanescent coupling between nearby fiber cores,
and misalignment.
Intensity sensors more or less ignore phase and polarization, which makes them fairly
insensitive but pretty trouble-free (for fiber devices). The limitations on their sensitivity
Poorly Calibrated
Fiber Bends Cause Loss
Figure 8.9. Microbending sensors.
come from the smallness of the signal and their vulnerability to interference from other
losses, e.g. microbending outside the sensor and source drift and noise.
Intensity measurements can help us normalize the output of fiber interferometers. Sampling the beams before recombining them gives us the intensity information separately,
which can be subtracted or divided out.
8.10.1 Microbend Sensors
Microbending increases loss, with the attenuation being very roughly linear in the curvature. Microbending sensors exploit this effect in a variety of ways (see Figure 8.9).
They aren’t very sensitive. Most cause only a few percent change in transmission, full
scale, and they tend to be fairly badly nonlinear; oddly shaped deviations of tens of percent from the best straight line are common, and that isn’t too confidence-inspiring since
we often don’t know its origin. Microbend vibration sensors are a good technology—the
AC measurement avoids most of these troubles if you’re careful.
8.10.2 Fiber Pyrometers
Black body emission follows the Stefan–Boltzmann equation, so an IR fiber with a
decent black body on one end and a thermal-IR detector makes a nice thermometer for
high temperature applications (up to 800 ◦ C).
8.10.3 Fluorescence Sensors
The upper-state lifetime of many fluorescence transitions goes down as T goes up.
Because the decay is exponential in time, measuring the decay time normalizes out the
intensity shifts. A cell full of fluorophore solution or a fluorescent coating on the fiber
facet are the common ways of doing it.
An unselective detector followed by a log amp and a bandpass differentiator is all you
need; this is the same basic idea as cavity ring-down spectroscopy. The most popular
fiber for this is Nd:glass, whose lifetime varies from 210 μs at 0 ◦ C to 170 μs at
250 ◦ C, which is quite a convenient range. The repeatability of an individual, undisturbed
sensor is about 0.5◦ , but the exact dependence of the lifetime on temperature seems
to vary a lot from sensor to sensor (as much as ±3◦ nonlinear shift from sensor to
sensor) so that for high accuracy, individual calibration over the full range appears
High Loss Fiber
Low Loss
Index Oil
Beam Dump
distance to splice = t s− t 0
slope = attenuation
0 t0
Exit Facet
Figure 8.10. Optical time-domain reflectometry.
Fluorescence sensors based on fluorophores immobilized in fiber coatings are also
used for oxygen detection; O2 diffusing into the fiber coating quenches the fluorescence.
(Of course, normal fluorescence measurements are often done using fibers as light pipes,
but that isn’t a fiber sensor in the sense used here.)
8.10.4 Optical Time-Domain Reflectometry
Optical time-domain reflectometry (OTDR) works a lot like radar: you send a tall, narrow
pulse down the fiber, and look at what comes back, as a function of time delay. The
result is a plot of log(r) versus t, that is, distance. As shown in Figure 8.10, we get a
constant Rayleigh scatter signal, which decays at twice the fiber loss rate, and reflection
spikes from splices and other discontinuities. A steeply dropping slope means excess
loss. Chemical sensors, microbend sensors (e.g., fiber with a helically wrapped member
inside the armor), and even porous cladding sensors for water can all be interrogated by
OTDR. Polarization and phase are more or less ignored.
8.11.1 Fiber Bragg Grating Sensors
Bragg grating sensors work by illuminating the fiber with a broadband source such as
an LED, and using a spectrometer to watch the absorption peak move when the fiber
is stretched† or heated, in accordance with (8.24) and (8.27). Stretching the fiber moves
the absorption peak by a combination of the change in length, the piezo-optic effect,
and the change in the waveguide diameter through Poisson’s ratio. The resolution of the
measurement depends on the quality of the spectrometer and the length of the grating.
This spectral encoding makes for a nice stable sensor, at least once the temperature
dependence is removed. Fiber Bragg gratings escape nearly all the problems of other
fiber sensors; they are insensitive to polarization, phase instability, excess loss, and etalon
fringes (if you use fiber-coupled LEDs, you even escape alignment). They do require an
external spectrometer, and the higher its resolution, the higher the sensitivity of the
sensor. Because of the low étendue, you don’t get that much light, but on the other hand
you don’t need much, because these sensors are typically measuring mechanical and
thermal things, which are slow.
By putting several gratings (as many as 10 or 20 of different periods) into the same
fiber at different positions, a single fiber can do simultaneous strain measurements at
many positions. This idea is used especially in smart structures, which take advantage
of Bragg grating sensors’ ability to make measurements in otherwise totally inaccessible
locations, such as the interior of reinforced concrete bridge abutments.
The strain sensitivity of this technique isn’t bad; from Section 8.5.11, it’s about
1 ∂λB
= 0.78,
λ B ∂
so that a grating stretched by 0.1% of its length (1000 μstrain) changes its wavelength
by +780 ppm. The total allowed strain isn’t very big, 1000–2000 μstrain, so a sharply
peaked wavelength response is obviously important. People have tried using interferometers to read out Bragg gratings, but that throws away the main advantage of the technique:
its insensitivity to everything but strain and temperature. One interesting approach is to
use a Mach–Zehnder delay discriminator to measure the average wavelength of the light
reflected from the grating; although this is obviously highly sensitive to the exact lineshape of the reflection, it does get around the resolution limit imposed by spectrometer
pixels. Fitting a grating instrument with a split detector or a lateral effect cell would be
another way to do this.
The accuracy of fiber strain gauges isn’t bad, a percent or so. A more serious problem
is temperature drift; the tempco of λB is
1 ∂λB
= +6.7 ppm/ C,
λB ∂T
so that a 1 ◦ C change gives the same signal as a 9 μstrain stretch—which is on the
order of 1% of full scale. To leading order, the two effects are both linear in λ, so
separating them is difficult; the best method seems to be the one used in resistive strain
gauges, that is, comparison with an unstrained grating at the same temperature. Various
two-wavelength schemes have been tried, which use the fiber dispersion to make the
strain and temperature coefficients different. These all fail in practice, either because
the wavelengths are so far apart that the fiber is no longer single mode, or because the
coefficients don’t shift enough to give a well-conditioned measurement.
† Of
course, you can do it with a tunable laser and a single detector, but then all the usual fiber problems
reappear, and sufficiently widely tunable lasers don’t grow on trees either.
Current laboratory accuracies are about 1 μstrain or 0.1 ◦ C, but field accuracies are
less because it’s hard to make a good, cheap readout system. The main challenge in
low resolution field systems is in maintaining a good enough bond between the strained
element and the fiber that strain is transferred accurately. Quartz is very rigid stuff; glue
isn’t, especially not glue that’s gotten hot or wet.
8.11.2 Extrinsic Fabry–Perot Sensors
Spectral encoding is also the idea behind many types of extrinsic Fabry–Perot sensors, for example, diaphragm-type pressure sensors, which look at the interference of
light reflected from the fiber end and from a nearby diaphragm. Their low finesse
and narrow spacing make their fringes broad enough to interrogate with white light
or LEDs.
Other examples are thermometers based on refractive index drift in silicon.
8.11.3 Other Strain Sensors
Extrinsic piezo-optic strain sensors are sometimes used with fiber coupling, but this
technology seems not to be becoming mainstream. Other strain-based sensors include
magnetostriction-type magnetic field sensors, in which a strain sensor is bonded to a
magnetostrictive element. It’s pretty easy to put an unstrained companion on one of
these, e.g. by bonding one with indium solder or brazing, and the other with elastomer.
8.11.4 Fiber Bundle Spectrometers
Grating spectrometers and FTIRs are big clunky things, so it’s natural to want to couple
them with fibers into inaccessible places. It’s especially nice to use linear bundles, where
the fibers are arranged in a row to fill the entrance slit. This can work OK if you
pay extreme attention to the pupil function you’re imposing on the measurement. An
integrating sphere and a multimode bundle are stable enough for good spectra, but if you
simply bounce light from one fiber off a surface and into another fiber, you’re just asking
for ugly measurement artifacts. You also give up a lot of light unless you use a lot of
fibers; a grating spectrometer’s étendue is nothing to write home about, but a fiber’s is
worse. It also tends to be unstable.
Two of the author’s colleagues abandoned fiber-bundle spectroscopy after giving it a
good run in an in situ application, in favor of mounting miniature monochromators right
on the instrument head—they couldn’t get enough photons otherwise.
One application where fibers actually increase the amount of light you get is in
astronomical fiber plate spectroscopy, where a computer-controlled milling machine drills
holes in a plate just where the stars will be imaged. Sticking multimode fibers into the
plate brings all those widely separated sources into one spectrometer—a nice example
of putting your étendue where you need it.
8.11.5 Raman Thermometers
The Raman effect is a bilinear mixing between vibrational levels in the fiber material
and the optical field, and is similar in character to the acousto-optic interaction we talked
about in Section 7.10.4. A photon can emit a phonon, and so be downshifted (Stokes
shift), or it can absorb a phonon and be upshifted (anti-Stokes). The Stokes/anti-Stokes
intensity ratio depends on the occupation number of the vibrational states, and hence on
the temperature. The ratio changes by a few tenths of percent/◦ C at a few hundred wave
numbers offset, but the signal is weak; an OTDR setup with a good filter and a decent
PMT or APD gives a few meters’ resolution at 1 ◦ C; by using photon counting and very
narrow pulses, you can get 5 ◦ C accuracy with 10 cm spatial resolution.
8.11.6 Band Edge Shift
Absorption and fluorescence spectra are often highly temperature dependent. The band
edge of a semiconductor moves toward long wavelength with temperature, while remaining sharp, so it can be used as an extrinsic fiber thermometer. These don’t get better than
1 ◦ C, and they’re almost as nonlinear as thermistors, but they can work from room
temperature to 800 ◦ C.
8.11.7 Colorimetric Sensors
Fiber chemical sensors can be made by cladding a fiber in a microporous sol-gel glass with
a chromophore (color-changing chemical) mixed in. The chromophore is immobilized in
tiny pores in the glass. The target chemical species diffuses in and combines with the
chromophore, changing its absorption spectrum. This trick is used in commercial pH
sensors, for instance.
8.12.1 Faraday Effect Ammeters
Currents produce magnetic fields, which in turn produce Faraday effect polarization shifts
in fibers. This underpins all-fiber Faraday rotation ammeters, in which the small Verdet
constant of silica (V = 4.6 × 10−6 rad/A) is overcome by using many turns; with N
turns wrapped around the conductor, the Faraday shift is
θ = V · N · I.
This of course works only if the two circular polarizations remain distinct. If they are
strongly coupled, as by bend birefringence, the Faraday effect is quenched ; light in the
fast and slow circular polarizations change places several times inside the coil, so that
the phase difference tends to cancel out (a bit like twisted pair wiring). Annealing the
coiled fiber can help a lot with this, or you can quench the quenching by twisting the
fiber as it is coiled. The resulting circular birefringence prevents the linear birefringence
from building up enough to couple the fast and slow circular polarizations much; the
circular birefringence just gives rise to a fixed phase shift between polarizations, which
isn’t much of a problem. The fibers are relatively short, so double Rayleigh scatter isn’t
usually serious. The path difference between the two polarizations is small, so source PM
doesn’t turn into polarization jitter, and as long as the etalon fringes are well controlled,
the FM–AM conversion is thus minor; the predominant noise is source intensity noise,
1/f noise in the front end amplifier, and the usual huge amount of low frequency junk
due to temperature and bending.
In the scheme of things, nobody’s about to use a fiber sensor when a better and
cheaper technique such as a current transformer or a series resistor and isolation amp
is applicable; thus these sensors are used only inside high voltage transmission systems.
This is helpful from the noise point of view, since the current is AC, and therefore so is
the measurement.
8.12.2 Birefringent Fiber
The change in birefringence with strain, pressure, and temperature is roughly proportional
to the static birefringence, so people have tried using polarimetry with high birefringence
fiber to sense pressure and vibration. This works at some level, but the temperature
dependence is so strong, and the operating point so unstable, that such techniques appear
to have little to recommend them.
8.12.3 Photonic Crystal Fiber
Recently, fibers have been developed whose cross sections are not solid but include
arrays of air holes, making them 2D photonic crystals. Their structure gives these holey
fibers unusual properties. Some are highly nonlinear, for example, those used in femtosecond supercontinuum comb generation; others have very wide mode fields that are
very constant with wavelength, or actually guide the light inside one of the holes, rather
than in the glass. They haven’t been used much in fiber sensors yet, but that’s likely to
change soon.
Fiber interferometers produce an order-unity change in output for a 1 radian phase shift,
just as bulk ones do. Most of the usual kinds can be constructed in fiber: Michelson,
Mach–Zehnder, Fabry–Perot, and so on.
The big challenge is that fiber interferometers are sensitive to everything, especially
temperature. This lack of selectivity makes it hard to have much confidence in the data,
unless great efforts are made. Temperature gradients are especially serious.
At the root of the problem is that the phase shift due to interfering effects tends to go
as the length of the fiber; since it goes into the exponent rather than the scale factor, at
some point the signal-to-noise ratio cannot be improved any more by using more fiber.
Fiber interferometers place immense demands on the stability and coherence of the
laser. Ordinary diode lasers are not even in the running for this use, because their
linewidths are in the tens of megahertz, and their drift, mode hopping, and mode partition
noise make them very difficult to use barefoot. External-cavity stabilized diode lasers are
a good choice, as are solid state devices such as diode-pumped YAGs.
The instability restricts the dynamic range of fiber interferometers. You can use them
over a wide range by fringe counting, or over a narrow one by AC modulation, but
you can’t easily splice them together into a single wide-range measurement as you can
with bulk. Even though the path length may be a factor of 1000 larger in the fiber case,
fringe-counting resolution of 10−6 m/1 km is far poorer than the 1 Hz shot noise limit
of 10−14 m/1 m (5 mW HeNe).
8.13.1 Single Mode
If single-mode fiber were really single mode, instead of degenerate two-mode, and that
mode were really lossless, building single-mode fiber interferometers would be a piece of
cake. Although neither is true, still single-mode fiber is so much superior to multimode
for interferometry that it’s the only game in town. We therefore make the best of it.†
8.13.2 Two Mode
The availability of communications fiber that supports two modes at some readily accessible wavelengths has led people to try using the mode coupling or differential delay or
polarization shift between modes as a sensor. This is an admirable example of actually
trying to realize the hitherto mythical cost savings due to economies of scale, instead of
just talking about them. This works OK if you babysit it, but getting exactly two modes
forces us to work in a regime where the second mode is somewhat leaky, and hence sensitive to manufacturing variations in the fiber, as well as to handling, bending, and so on.
Higher order modes are not well preserved in splices, and especially not in connectors.
It’s a tough business. At present, avoid two-mode and multimode interferometric sensors
if you want your gizmo to be usable by anyone but you, but stay tuned—if someone
figures out how to do it well, this trick could really take off.
By using fiber couplers instead of beamsplitters, it is quite straightforward to make
Michelson, Sagnac, and Mach–Zehnder fiber interferometers, as shown in Figure 8.11.
There are two basic approaches to using them; directly as sensors, or as delay line discriminators for optical frequency measurements. A lot of blood has been spilled making
intrinsic interferometric fiber sensors, and it’s just a very hard business to be in. Fiber
gyros have been in development for many years and have only recently overcome their
many, many second-order problems to become a mainstream technology. That’s good if
you’re cranking out PhD theses, but not so good if you’re building sensors for a living.
If you’re in the latter class, beware.
8.14.1 Mach–Zehnder
Mach–Zehnders are the workhorses, but they’re horribly unstable if made of fiber, for all
the usual reasons. Integrated optics ones are much better. All-fiber Mach–Zehnders are
commonly used as delay line discriminators in high velocity Doppler shift measurements,
but require a lot of babysitting due to polarization funnies.
8.14.2 Michelson
Fiber Michelsons are attractive because you can use FRMs, which eliminates the polarization drift and allows passive interrogation—the fringe visibility in a Michelson with
decent FRMs normally stays above 95%, which is pretty good. The main residual
problems are the few degrees of rapid polarization and phase wobble with (average)
† Every
mode is its own interferometer, and you can’t keep them all in phase.
temperature, and temperature gradients, which cause large amounts of phase drift in the
relative phases of the two arms. A fiber Michelson built with FRMs and interrogated
with the modulation-generated carrier approach works adequately for AC measurements,
where the phase drift and fringe visibility can be taken out in software without screwing
up the operating point.
8.14.3 Sagnac
Sagnac interferometers use waves propagating in the forward and reverse directions,
more or less fixing the phase instability problem.
The static phase delays in the two directions are the same except for the Faraday
effect, Berry’s phase, and the Sagnac effect, which causes a phase shift in a rotating
reference frame. An N -turn coil of cross-sectional area A, turning at angular velocity ,
produces a phase shift between clockwise and counterclockwise propagating light of
φ ≈
8N π A
The instability of fiber is so very well suppressed in fact that you don’t get any output
below a certain rotation speed—the backscattered light dominates until the Sagnac phase
shift gets sufficiently large. Transient effects of course break this symmetry because they
arrive at different times unless they occur exactly midway on the path.
Sagnac interferometers based on big coils of fiber have a lot of bend birefringence,
just like Faraday effect ammeters of Section 8.12.1.
8.15.1 Fabry–Perot
There are two kinds of fiber Fabry–Perots: intrinsic ones, where the fiber is entirely
within the cavity, and extrinsic ones, where at least part of the cavity is outside the fiber
(Figure 8.11).
An intrinsic F-P is a stretch of fiber with a Bragg grating or a mirrored facet at each
end. It is sensitive to all the usual F-P things, minus misalignment, plus the usual fiber
things, which is a really bad trade. Polarization instability prevents the use of intrinsic
F-Ps in unattended situations and severely limits the attainable finesse; if you don’t
mind babysitting, however, their potentially very long path length makes them extremely
sensitive detectors of tuning, strain, and temperature, provided that your laser linewidth
is narrow enough. Because the cavity is normally long, the free spectral range is short;
a 10 m piece of fiber has an FSR of 10 MHz, and if its finesse is 50, the FWHM of
the passband is 200 kHz. Diode lasers typically have linewidths of 10–100 MHz, so the
fringes show up in the spectrum rather than in the intensity, making all-fiber F-Ps hard
to use except with very fancy lasers.
Low finesse intrinsic F-Ps can be made by cleaving the desired length, depositing a
TiO2 film on each end, and fusion-splicing the result back into the main fiber. Reflectivities are 1–10%. Because of their low finesse, they are optically much more like
unbalanced Michelsons or Mach–Zehnders, but need no couplers. Another trick is the
fiber loop mirror of Figure 8.11c, in which a cross-connected fiber coupler sends part
3 dB
3 dB
Region Being Sensed
(a) Mach-Zehnder
Out 2
3 dB
3 dB
Region Being Sensed
Out 1
(Often Omitted)
Index Oil
Beam Dump
(b) Michelson
Figure 8.11. Two-beam fiber interferometers.
of the light back on itself. If the power coupling ratio is x : (1 − x) and we neglect
depolarization, the reflectance is
R = x(1 − x) sin2 (φ).
This simple fiber device can be used as a two-beam Sagnac interferometer itself, or as
a mirror in a larger system such as a fiber laser. Since the reflectance is a very strong
function of wavelength, temperature, and polarization, some sort of tuning will probably
be required.
Extrinsic F-Ps come in two flavors: fiber delay lines with one or more bulk optic
mirrors, and fiber coupled F-Ps, where the fiber just brings light to the real Fabry–Perot.
An extrinsic F-P with one Faraday rotator mirror corrects the polarization problems of
the original, and in the process makes the effective length of the cavity polarization
independent, thereby removing one source of ambiguity. (See Figure 8.12.)
8.15.2 Ring Resonator
A ring resonator is very similar to an F-P, except that its transmission looks like the
reflection of an F-P. It has all the same problems, too. If you launch light going both
ways, you can use a ring as a resonant fiber-optic gyro. A good technique to eliminate
polarization problems in a ring resonator is to use PM fiber, but splice it rotated 90◦ ,
so that the two polarizations exchange identities each turn. This actually makes the
periodicity twice as long, but produces two bumps per period; adjacent passbands are
slightly different.
Potted Mounting
Multilayer Coating/Fusion Splice
exp(i φ)
Figure 8.12. Fiber Fabry–Perot sensors: (a) extrinsic, with external diaphragm; (b) intrinsic, with
multilayer coating spliced into fiber; and (c) fiber loop (Sagnac interferometer used as a mirror).
8.15.3 Motion Sensors
Doppler-type motion sensors are a good match for fiber interferometry; you put a collimator on the output of one fiber of a Michelson interferometer and point it at the thing
you want to measure. Absolute phase stability is not usually needed, and providing the
polarization instability is controlled, all the low frequency headaches can be filtered out,
and fringe-counting accuracy is usually good enough. Near-field applications such as
fiber catheter flow cytometry don’t need the collimator, but far-field ones do, and a bit of
retroreflecting tape is usually a big help in getting light back into the single-mode fiber.
If the radial velocity is too high to count fringes (v · k/2π > 100 MHz or so), you
can use the fiber interferometer as a delay discriminator on the returned light instead.
8.15.4 Coherence-Domain Techniques
There is a family of techniques for measuring displacement and distributed parameters
such as temperature without needing a fast photon-limited OTDR setup (Figure 8.13).
They rely on the idea of white-light fringes. The measured output of an interferometer
is a short-time-averaged autocorrelation between the signal and LO, which means that
for a very wideband source fringes appear only when the path lengths of the two arms
are very close. The key feature of coherence-domain measurements is that, apart from
polarization funnies, the autocorrelation of the signal is preserved pretty well through
the fiber. Some dispersion exists, of course, but that is compensated by the equal-length
reference arm. Wideband sources such as LEDs have only a single autocorrelation peak,
Region Being Sensed
3 dB
3 dB
Fiber Stretcher
(a) Strain Gauge
3 dB
FWHM ~ v
L + LΔ
Δ Z = n ΔL − d
Path Difference Δ z
(c) Response
(b) Proximity Sensor
Figure 8.13. Coherence-domain techniques: (a) strain gauge, (b) OCDR proximity sensor, and
(c) typical response curve.
unlike Fabry–Perots and highly coherent interferometers. Thus by stretching the reference
arm until the white-light fringes appear, we can measure the distance to a scatterer
The accuracy is correspondingly less, because we have to locate the center of the
autocorrelation peak, which of course has a fast carrier frequency imposed on it; thus
unless we can use an optical I /Q mixer to get the true envelope, there is always a
one-fringe error band.
Because the autocorrelations of multimode lasers tend to be about 103 fringes wide,
the SNR needed to get one-fringe accuracy is far higher than with a fringe-counting
interferometer: 50–60 dB instead of 15 dB, which is a serious limitation. An LED with
a 5% bandwidth is rather better, but on the other hand, you don’t get as much light. One
way to get around it is to synthesize an autocorrelation function that is much less smooth,
and in a known way. For example, we can use a slightly unbalanced Mach–Zehnder
to put fringes on the autocorrelation, as shown. This can also be done by using two
wavelengths, or a filter with two passbands on a single wideband source. It is analogous
to a dual-wavelength interferometer, where the fast fringes give high resolution and the
long beat length disambiguates the 2π phase wraps. This is an example of a multiple-scale
measurement, which we discuss in Section 10.4.2. Coherence-domain techniques are
good for low resolution length measurements, such as noncontact surface sensing, for
interrogation of features in fiber that are far too close for OTDR (e.g., 1 mm apart), and
for disambiguation of interferometric data. They’re also good for measurements in turbid
or multiple-scattering media such as tissue.
Getting a stable operating point in a fiber sensor is a bit of a challenge, as we’ve seen.
There are lots of methods in use, of which the following are the best.
8.16.1 Passive Interrogation
If the paths are sufficiently unequal (>1 cm), the phase and amplitude can be interrogated separately using a tunable laser. To avoid the linewidth problems we had with the
Fabry–Perot, the laser should be tunable over many times its linewidth, and the nominal
path difference adjusted accordingly—for example, L = 1.5 − 2.5 cm for a CD-type
diode laser whose tuning range is 1 cm−1 . This allows stabilizing the operating point
electronically, by tuning the laser.
One good way to do it is with the modulation-generated carrier approach, as in
Section 13.9.6. This technique requires modulating the phase of the light by ±2.6 radians
or so, where the J1 and J2 terms in the Bessel expansion of the phase modulation are
equal. Because you can combine in-phase and quadrature (I and Q) signals so as to
ignore the slow phase drift completely (perhaps with a feedback loop to get the balance
exactly right), this is a pretty effective solution to the phase problem. You do have to
keep the modulation stable at that value, and the SNR suffers surprisingly little, since
85% of the interference term energy is in the first and second harmonics.
8.16.2 Frequency Modulation
Sufficiently rapid FM modulation can help a bit in reducing the coherence fluctuation
noise, by transforming the widely different time delays of the twice-scattered components
into large frequency shifts so that most of that contribution lies outside the measurement
bandwidth. The modulation frequency needs to be several times the laser linewidth, and
the modulation index should be large. Still, as we saw in Section 4.7.4, this technique
isn’t as good as we’d like. Very rapid chirps turn distance into frequency shifts, which
is often better than simple FM.
8.16.3 Fringe Surfing
You can use a fiber stretcher or other phase modulator to surf on a fringe, essentially
phase-locking to the fringe drift. This scheme, discussed in Section 10.7.7, is the first
method everybody thinks of. Unfortunately, the phase will in general walk a long way
over long times, far further than phase modulators can accommodate; for instance, a YAG
laser resonator that changes by 0.001% in length will change the delay of a 50 m piece of
fiber by 0.001% of 70 million wavelengths, or 1400π radians, and a 1.5 ◦ C temperature
change in the fiber will do the same. Thus fringe surfing works only in the lab unless your
measurement can be interrupted for a second or two at irregular intervals to reacquire
lock. Because of the limited bandwidth of most such setups (fiber stretchers resonate
in the hundreds of hertz to low kilohertz), they are seriously microphonic, too—bump
the table and the loop will lose lock, to come back to rest somewhere else tens of
milliseconds later.
As with a PLL phase demodulator (Section 13.9.5), you can take as the system output
either the detected output from the optical phase detector, before the loop integrator, or
the control voltage to the fiber stretcher, after the integrator. Unfortunately, the control
voltage here is useless for measurements. There are multiple operating points on the fiber
stretcher, so every time the loop resets, there will be a huge voltage step on that output;
furthermore, the actuator is nonlinear, so there is no easy way of reconnecting a run of
data that has been interrupted by a reacquisition.
8.16.4 Broadband Light
Using broadband light and spectrally encoded data makes a lot of things easier, provided
the source drift is taken care of. The main drawback is the pixel resolution of the
spectrometers used to decode the spectrum.
8.16.5 Ratiometric Operation
One way to get rid of the scale factor drift is by comparing the signal to a known one;
for example, a Faraday effect gaussmeter could use an electromagnet to excite the coil
stably at a known frequency; the ratio of the two signals would correspond to the ratio
of the applied field to the standard one. This requires only high linearity, and not high
stability. Avoid trying to use two wavelengths, because etalon fringes among other things
are highly tuning sensitive.
8.16.6 Polarization-Insensitive Sensors
Polarization drift can be greatly reduced by using Faraday rotator mirrors, as we’ve
seen, and although this isn’t magic, it keeps the polarization drift well enough controlled
that the operating point doesn’t move around much. The residual drift will still need
correcting for in many cases, but that can be done in postprocessing.
One source of residual polarization drift is polarization-dependent losses, for example,
Fresnel reflection at angled fiber cleaves. These sorts of effects produce errors by coupling the supposedly orthogonal modes together, thus ruining the orthogonality on which
the FRM trick depends. Another is light that doesn’t get transformed in the FRM, for
example, facet reflections and front surface reflections, and multiple reflections in the
Faraday crystal. A third is errors of the FRM itself, for example, misadjustment, dispersion, drift, and nonuniformity.
8.16.7 Polarization Diversity
The modulation-generated carrier approach is an example of phase diversity; it is also
possible to use polarization diversity, although this is much more complicated. One
method is to combine the ideas of Sections 6.10.10 and 15.5.4; after recollimating the
output light, use bulk optics to split it into two or three copies. Use polarizing prisms
rotated through different angles to select different projections of the two fields, and pick
whichever one has the best fringe visibility. This takes care of the problem of misaligned
major axes, leaving only the usual relative phase problems.
8.16.8 Temperature Compensation
Separating temperature from other effects usually involves measuring temperature independently and using a calibration curve. Another property of the fiber (e.g., Raman
backscatter) can be used, or an IC temperature sensor. The particular problems with this
are generating the calibration curve, and figuring out what temperature to use, in the face
of significant gradients across the dimension of the fiber system.
8.16.9 Annealing
You can anneal fiber that has been wound round a mandrel by heating it up to 800 ◦ C for
a while and cooling it slowly, being sure not to stretch it by letting the mandrel expand
too much. That gets rid of the bend birefringence, but it tends to be hard on the jacket,
which makes it unattractive unless you really, really need to, e.g. in Faraday sensors for
power transmission ammeters. It is wise to leave the fiber undisturbed on the mandrel
Some types of intrinsic fiber sensor lend themselves to multiplexing, which permits the
use of many sensors with only one readout system, and perhaps with only one interrogating fiber. Multiplexing can be done in all the standard electronic ways; time-division
(TDM), frequency-division (FDM), and code-division (CDM), plus a uniquely optical
one, coherence multiplexing. The details of this are really beyond our scope, being
network and interconnection issues, but they are discussed in Udd and elsewhere.
As the old saw runs, “It’s not what you don’t know that hurts you, it’s what you do know
that ain’t so.” Fiber sensors are currently still fashionable, though not as modish as they
were. In part, this is because of the advantages already enumerated, although as we’ve
already seen, there’s less than meets the eye about some of them. The author is somewhat
sceptical of fiber sensor claims in general, because the sheer volume of hype obscures
any clear-eyed assessment of their strengths and weaknesses, and because the discussion in fiber sensor papers—even review papers and book chapters—seldom seems to
include comparisons to bulk optic alternatives. We’re now in a position to enumerate and
critique a few.
1. Fiber Sensors Offer Unique Capability. Some types certainly do. Fiber gyroscopes,
Bragg grating sensors, distributed sensors like Raman thermometers, and arguably Faraday effect current sensors all have capabilities that are difficult or impossible to replicate
with bulk optics. Fiber-optic smart structures, oil-well thermometers, and fiber catheter
devices all have an important place.
2. Fiber Sensors Are Cheap. They could be, in principle, but sensitive ones almost
never are—people who are used to the prices of electronic sensors are liable to keel
over from pure sticker shock. Some intensity-based systems, such as fluorescent fiber
thermometers, really are intrinsically cheap. Those tend to be the ones where the fiber
is just used to pipe the light around, which is fair enough. Distributed systems, such as
fiber Bragg grating strain gauge systems, where the cost of the interrogation system is
amortized across hundreds of sensors, might one day be cheap on a per-sensor basis,
which is why they are interesting for fiber-optic smart structures.
Cheapness is often said to be a consequence of economies of scale in the telecommunications business, but it ain’t so. Telecommunications fibers are not single mode at the
wavelengths of cheap sources and detectors, and the cost of fiber transmission systems
is not driven by the cost of lasers and detectors, as fiber sensor costs are. Besides, the
real economies of scale are in diode laser-based consumer products—which are 100%
bulk optics. (Compare the cost of a 1.5 μm diode laser to a 670 nm one of the same
power level, if you doubt this.)
Simplicity and low cost can rapidly go away when we start dealing with phase and
polarization instability, etalon fringes, FM–AM conversion, and extremely low étendue;
we often find ourselves using $10,000 single-frequency diode-pumped YAG lasers to get
decent performance from $50 worth of fiber.
It’s also common to find apples-to-oranges comparisons; if a sensor has to work in
situ, compare a fiber system to a miniaturized and cost-reduced bulk optic system, not to
some behemoth with an argon laser and a table full of Newport mounts. Think in terms
of DVD drives and digital cameras.
3. Fibre Sensors Are Highly Sensitive. This is sometimes true compared with nonoptical sensors, but is more or less entirely false in comparison to bulk optics. Leaving
aside the expensive kinds, like gyros and Faraday rotator mirror interferometers, the
really unique fiber sensors are all relatively low sensitivity devices. What’s more, the
performance of even fancy fiber sensors falls far short of the fundamental physical
limits.† Fibers are highly sensitive to temperature, bending, vibration, you name it,
and that leads to enormous amounts of low frequency drift and noise. High frequency
performance is limited by the multiplicative effects of the low frequency stuff, including drift of the scale factors and operating points—things we’d never tolerate in a
DVM—and by demodulation of source phase noise by etalon fringes and double Rayleigh
There are an awful lot of interferometric fiber sensor papers where somebody is
bragging about his 120 dB dynamic range in 1 Hz. That isn’t a bad number, but in
reading these, remember that a bulk optic interferometer with a $5 diode laser and a
$10 laser noise canceler just sits there at the shot noise level forever, unattended, with
a dynamic range of 160 dB in 1 Hz; and that performance isn’t hype, it’s available in
commercial hardware.‡ Don’t forget to ask about their relative stability and 1/f noise,
† This
isn’t entirely a fiber problem, of course.
be fair, getting the noise canceler down to $10 requires building your own—the commercial ones are over
$1000. The good part of this is that it’s easy to do.
‡ To
Optical Systems
A pile of rocks ceases to be a rock pile when somebody contemplates it with the idea of a
cathedral in mind.
—Antoine de Saint-Exupéry, Flight to Arras
An optical system is a collection of sources, lenses, mirrors, detectors, and other stuff that
(we hope) does some identifiable useful thing. We’ve talked about the pieces individually,
but until now we haven’t spent much time on how they work together. This chapter should
help you to think through the behavior of the system from end to end, to see how it ought
to behave. That means we need to talk about the behavior of light in systems: practical
aberration and diffraction theory, illumination and detection, and how to calculate the
actual system output from the behavior of the individual elements.
In Section 4.11.2, we looked at the Gaussian (i.e., paraxial) imaging properties of lenses.
We were able to locate the focus of an optical system, calculate magnification, and
generally follow the progress of a general paraxial ray through an optical system by
means of multiplication of ABCD matrices.
Here, we concentrate on the finer points, such as the aberrations of an optical system,
which are its deviations from perfect imaging performance. We will use three pictures: the
pure ray optics approach, where the aberrations show up as ray spot diagrams where not
all the rays pass through the image point; the pure wave approach, where the aberrations
are identified with the coefficients of a polynomial expansion of the crinkled wavefront,
derived from exact calculation of the wave propagation through the system; and a hybrid
ray/wave picture. (See Figure 9.1.)
The hybrid picture is messy but useful and, in fact, is the basis of most “wave optics”
models. It takes advantage of the fact that ray optics does a good job except near focus or
in other situations where diffraction is expected to be important. Accordingly, we trace
rays to the vicinity of the exit pupil from a single object point, construct a wavefront
Building Electro-Optical Systems, Making It All Work, Second Edition, By Philip C. D. Hobbs
Copyright © 2009 John Wiley & Sons, Inc.
Ray Intercept
X ....
Ray Phases
On Plane
Interpolated Ray
Phase Errors
Fourier Transform
Of Nonspherical
Terms On Pupil Plane
Fourier Transform
Of Interpolated Ray
Phase Error On Plane
Figure 9.1. Three ways of looking at an imperfect optical system: (a) ray spot diagram, (b) wavefront polynomial expansion, and (c) wave aberration of rays.
whose phase is determined by the calculated propagation phase along the ray paths,
and then use wave optics from there to the focus. This is unambiguous unless rays
traversing widely different paths get too close to one another. By now you’ll recognize
this as another case of putting it in by hand, which is such a fruitful approach in all of
The different pictures contain different information. A ray tracing run is very poor
at predicting the shape of the focused spot, but contains lots of information about the
performance of the system across the field of view. For example, field curvature and
geometric distortion show up clearly in a ray trace, since different field angles are presented at once, but tend to disappear in a pure wave propagation analysis, where only a
single field position can readily be considered at a time.
Ray Optics
Ray optics assumes that all surfaces are locally planar, and that all fields behave locally
like plane waves. To compute the path of a ray encountering a curved surface, we
notionally expand the ray into a plane wave, and the surface into its tangent plane. We
then apply the law of reflection and Snell’s law to derive the direction of the reflected
and refracted k vectors, and these become the directions and amplitudes of the reflected
and refracted rays. This is a first-order asymptotic approach, valid in the limit ka → ∞,
where a is the typical dimension of the surface (e.g., its radius of curvature, diameter,
or whatever is most appropriate). There is a slight additional subtlety. A single ray,
being infinitesimally thin, transports no energy; to find out the field amplitudes, we must
consider small bundles of rays, or pencil beams, occupying an element of cross-sectional
area dA, measured in a plane normal to their propagation direction. Conservation of
energy requires that the product of the ray intensity I dA be constant along the axis.
Curved surfaces and any refraction or diffraction will in general cause |dA| to change,
either by anamorphic magnification or by focusing. Thus in computing the contribution
Di of a given ray bundle to the intensity at a given point x from that at x, we must
multiply by the Jacobian,
dA dI (x ) = dI (x) .
If the incoming illumination is spatially coherent, we must instead sum the (vector)
fields, which transform as the square root of the Jacobian,
dA dE(x ) = dE(x) .
Going the other way, for example, computing a specular reflection by starting from the
obliquely illuminated patch to the propagating beam, we have to put in the reciprocal
of the obliquity factor—otherwise energy wouldn’t be conserved on reflection from a
perfect mirror. (See Section 9.3.5.) The mathematical way of putting this is that the
Jacobian of the oblique projection equals the ratio cos θ2 /cos θ1 . We saw this effect in
radiation from planar surfaces in Section 1.3.12, and it shows up in wave optics as the
obliquity factor (see Sections 9.2.1 and 9.3.4).
Connecting Rays and Waves: Wavefronts
In order to move from one picture to another, we have to have a good idea of their
connections. The basic idea is that of a wavefront (Figure 9.2). Most people who
have had an upper-level undergraduate physical optics class will picture a wavefront
as a plane wave that has encountered some object (e.g., a perforated plane screen or
a transparency) and has had amplitude and phase variations impressed upon it. While
this picture isn’t wrong, it also isn’t what an optical designer means by a wavefront,
and the differences are a frequent source of confusion, especially since the same
diffraction integrals are employed and the conceptual differences are seldom made
A physicist will tell you that a wavefront is a surface of constant phase, which can
have crinkles and ripples in it, whereas a lens designer will say it’s the deviation from
constant phase on a spherical surface centered on the Gaussian image point. In actual
fact, whenever people actually calculate imaging with wavefronts (as opposed to waving
Aberration = some poorly specified
distance between sphere
and equiphase surface
Best-Fit Sphere
Surface of
Constant Phase
Aberration = Deviation from Spherical
Wave on Pupil Plane
Best-Fit Sphere
True Fields = Spherical + 1/2 Wave of
At Pupil
Primary Spherical
Figure 9.2. Wavefront definitions: (a) surface of constant phase and (b) deviation from a prespecified spherical wave on a plane.
their arms) the real definition is the phase deviation, on a plane, from a spherical wave:
W (x) = K arg(eik|x−x0 | ψ(x)),
where K is a constant that expresses the units in use: radians, waves (i.e., cycles), or
OPD in meters. As long as the deviations from sphericity are sufficiently small and slow,
the two descriptions are equivalent.
Deviations from the perfect spherical wave case are called aberrations; aberration
theory is in its essence a theory of the way phase propagates in an optical system.
Amplitude and polarization information is not given equal consideration, which leads to
the total disregard of obliquity factors, among other things. A vernacular translation is,
“it gives the wrong answer except with uniform illumination and small apertures, but is
close enough to be useful.”
Aside: Phase Fronts on the Brain. From what we know already of diffraction theory,
concentrating solely on phase like this is obviously fishy; an amplitude-only object such as
a zone plate can destroy the nice focusing properties, even while leaving the phase intact
initially. Neglect of amplitude effects is our first clue that aberration theory fundamentally
ignores diffraction. The software packages sold with some measuring interferometers
have serious errors traceable to this insistence on the primacy of phase information.
The author’s unscientific sample is not encouraging; he has had two such units, from
different manufacturers, manufactured ten years apart. Both were seriously wrong, and in
different ways.
9.2.3 Rays and the Eikonal Equation
We have the scalar wave equation, which allows us to predict the fields everywhere in a
source-free half-space from an exact knowledge of the full, time-dependent, free-space
scalar field on the boundary of the half-space. In the limit of smooth wavefronts and
wavelengths short compared to D 2 /d (where D is the beam diameter and d the propagation distance), we can neglect the effects of diffraction. In this limit, the gradient of
the field is dominated by the ik · x term, and each segment of the wavefront propagates
locally as though it were its own plane wave, ψlocal (x) ≈ A exp(iklocal · x).
Phase is invariant to everything, since it’s based on counting; we can look on the
phase as being a label attached to a given parcel of fields, so that the propagation of the
field is given by the relation between x and t that keeps φ constant. That means that the
direction of propagation is parallel to klocal ,
klocal ≈
which gives us a natural connection between rays and wavefronts.
If we take a trial solution for the scalar Helmholtz equation,
ψ(x) = A(x)eik0 S(x) ,
applying the scalar Helmholtz equation for a medium of index n and taking only leading
order terms as k0 → ∞ suppresses all the differentials of A, which is assumed to vary
much more slowly than exp(ik0 S), leaving the eikonal equation
|∇S(x)|2 = n2 (x),
where the eikonal S(x) is the optical path length (it has to have length units because
k0 S must be dimensionless). Once we have S, we can get the propagation direction from
(9.4). What’s more, Born and Wolf show that in a vector version of this, the Poynting
vector lies along ∇S, so in both pictures, (9.6) is the natural connection between rays
and waves.
This connection is not a 1:1 mapping between rays and waves, however. The eikonal
can be used to attach rays to a wavefront, then trace the resulting rays as usual, but that
procedure doesn’t lead to the same results as propagating the field and then computing
the eikonal, because the whole eikonal idea breaks down near foci and caustics, as well as
having serious problems near shadow boundaries. The approximation becomes worthless
at foci because the phase gradient vectors can never really cross; that would require
a field singularity, which is impossible for the wave equation in a source-free region.
Another way to say this is that the optical phase can be made a single-valued function
of position in any given neighborhood, and therefore its gradient is also a single-valued
function of position. So identifying rays purely as gradients of the phase cannot lead to
rays that cross. For example, the eikonal equation predicts that a Fresnel amplitude zone
plate (see Section 4.13.2) has no effect on wave propagation other than blocking some
of the light.
The eikonal approximation shares the wave optics difficulty that there’s no simple way
to include multiple source points in a single run; the total field has only one gradient at
each point.
9.2.4 Geometrical Optics and Electromagnetism
Geometrical optics (GO) is an asymptotic electromagnetic theory, correct in the limit
k → ∞. Like most asymptotic theories (e.g., the method of steepest descents), it requires
some funny shifts of view. We go back and forth between considering our ray to be
infinitesimally narrow (so that all surfaces are planes and all gradients uniform) and
infinitely broad (so that the beam steers like a plane wave). The way a GO calculation
goes is as follows:
1. Pick some starting rays, for example, at the centers of cells forming a rectangular
grid on the source plane. Assign each one an amplitude and phase. (You can do
GO calculations assuming incoherent illumination, but you’re better off computing
the amplitudes and phases of each ray and applying random phasor sums to the
results—it’s computationally cheap at that stage and avoids blunders.)
2. Assuming that the true optical field behaves locally like a plane wave, apply the
law of reflection, Snell’s law, and so on, to follow the path of the ray to the
observation plane.
3. Add up the field contributions at the observation plane by computing the optical
phase (the integral of k · ds over the ray path) and amplitude from each ray and
summing them up. Remember to apply the Jacobian—if you imagine each ray
occupying some small patch of source region, the area of the patch will in general
change by the time it gets to the observation plane. This isn’t mysterious, it’s just
like shadows lengthening in the evening. Field amplitudes scale as the reciprocal
square root of the area. If the ray directions at the observation plane are similar,
you can add up the fields as scalars. Otherwise, you’ll need to be more careful
about the polarization. Nonplanar rotations of k will make your polarization go all
over the place.
You can trace rays in either direction, so if only a limited observation region is of
interest, you can start there and trace backwards to the source. Either way, you have to
make sure you have enough rays to represent the field adequately. You have to think of
a GO calculation as involving nondiffracting rectangular pencil beams, not just rays; in
general, the patches will overlap in some places, and you have to add all the contributions
in complex amplitude.
In an inhomogeneous but isotropic medium, the geometric optics laws need to be
generalized slightly. The local direction of wave propagation is the gradient of the phase.
This leads to the eikonal equation (9.6) or the curvature equation (9.12), which are
differential equations giving the change of the ray direction as a function of distance
along the ray. Note that the Jacobian has to be carried along as well if you want to
get the correct answer for the field amplitudes. If the medium is anisotropic as well as
inhomogeneous, life gets a good deal harder, as you have to carry along the polarization
state and any beam walkoff as well as the k vector and Jacobian as you go. If you have
sharp edges, caustics, or shadows, geometric optics will give you the wrong answers
there—it ignores diffraction, will exhibit square-root divergences at caustics and edges,
and will predict zero field in the shadow regions.
9.2.5 Variational Principles in Ray Optics
Some propagation problems yield to a less brute-force approach: calculus of variations.
If the medium is nonuniform, so that n is a function of x, we need to put an integral in
the exponent instead,
n(x)ds ,
ψ(x) = A(x) exp −iωt + ik0
as in the eikonal approximation (9.5). Since the ray path depends on n, we don’t know
exactly what path to do the integral over, so it doesn’t look too useful. In fact, we’re
rescued by Fermat’s principle, a variational principle that states that
n(x)ds = 0;
δS = δ
that is, the integral has an extremum on the true ray path P . The path yielding the
extremum is said to be an extremal . Fermat called it the principle of least time,
which assumes that the extremal is a global minimum. There’s obviously no global
maximum—a given path can loop around as much as it wants—so this is a good bet.
We solve variational problems of this sort by imagining we already know the
parametrized P = x(u), where x(0) is the starting point x 0 , x(u1 ) is the end point x1 ,
and u is a dummy variable. We demand that a slight variation, Q(u) (with Q ≡ 0 at
the ends of the interval), shall make a change in S that goes to 0 faster than [usually
O( 2 )] as → 0. Since the arc length is all we care about, we can parameterize the
curve any way we like so we’ll assume that x is a continuous function of u and that
dx/du = 0 in [0, u1 ]. Thus
(n(x) + Q · ∇n) ẋ + Q̇ − n(x)|ẋ| du = O( 2 ),
where dotted quantities are derivatives with respect to u. Since it’s only the term we’re
worried about, we series-expand the squared moduli, cancel the zero-order term, and
keep terms of up to order , which yields
u1 /
nẋ · Q̇
+ |ẋ|Q · ∇n du = 0,
which isn’t too enlightening until we notice that it’s nearly a total derivative, with one
term in Q and one in Q̇. Integrating by parts, and using the fact that Q = 0 at the ends
and is continuous but otherwise arbitrary, we get the result
(ẍ − ẋ(ẋ · ẍ))
∇n − ẋ(ẋ · ∇n)
which can be written more neatly by changing back to arc length and using the convention
that ∇⊥ is the gradient perpendicular to ẋ, yielding the curvature equation
d 2 x ∇⊥ n
ds 2 ⊥
This says that the curvature of the path is equal to the perpendicular gradient of log n,
which makes a lot of physical sense, since we don’t expect the path to change when we
(say) double the refractive index everywhere.†
9.2.6 Schlieren Effect
One interesting consequence of the curvature equation (9.12) is that light waves steer
like tanks: they turn toward whichever side goes more slowly. Accordingly, a refractive
index gradient causes the wave to bend, the schlieren effect. Since dn/dT < 0 for gases,
a temperature gradient in air produces schlieren, which is why there are mirages. On a
hot, sunny day, the ground is warmer than the air, so dn/dz < 0 and light bends upwards;
an image of a patch of sky appears near the ground in the distance, looking like a pool of
water. At sea, with the opposite sign of dT /dz, a ship becomes visible before it crosses
the geometrical horizon. More complicated gradients can cause multiple images, as in the
beautifully named fata Morgana (after Morgan Le Fay, King Arthur’s nemesis); ghostly
shorelines with fantastic mountains can appear in the middle of the ocean. (Who said
there was no poetry in optics?)
There are a couple of examples that show up more often in instruments: thermal
lensing, which is pretty much like a mirage, and gradient-index (GRIN) optics, as we
saw in Sections 4.13.1 and 8.3.5. Thermal lensing in nonaqueous liquids can be a big
effect—big enough to be a sensitive laser spectroscopy method. A visible probe laser
traverses a long path in some solvent, coaxially with an infrared pump beam. An axially
symmetric temperature gradient results in progressive defocusing of the probe beam,
which can be detected very sensitively with a masked detector. Water is a disappointing
solvent for thermal lensing, with a low dn/dT and a high thermal conductivity.
9.2.7 The Geometrical Theory of Diffraction
For objects whose typical dimension a is large compared to a wavelength, the ordinary ray optics laws (the law of reflection and Snell’s law) apply with high absolute
accuracy except near shadow boundaries and places where rays cross—foci and caustics. (The relative accuracy is of course also very bad inside shadows, where geometric
optics predicts zero fields.) For such large objects, it is reasonable to apply a local correction in these situations, the geometrical theory of diffraction (GTD), formulated by
Keller,‡ Ufimtsev,§ and others. Like ray optics, GTD is an asymptotic theory valid in
the limit ka 1, but it lets us include higher order terms to get better accuracy. The
basic idea is that illuminated bodies follow geometrical optics (GTD) or physical optics
(PTD) except within a wavelength or two of shadow boundaries and sharp edges. Large
objects (ka 1) can be looked on as arrangements of flats, curved regions, and edges,
† Extremals
that minimize some smooth functional such as (9.9) are called weak extremals, because the true
minimum may not be continuous. If discontinuous and unsmooth functions are considered, the result is called
a strong extremal . The strong extremal is usually a global minimum, never a maximum, but sometimes just
a local minimum. If you don’t know any calculus of variations, consider learning it—it isn’t difficult, and
it’s a great help in optical problems. The book by Gelfand and Fomin (see the Appendix) is a good readable
‡ J. B. Keller, Geometric theory of diffraction. J. Opt. Soc. Am. 52, 116–130 (1962).
§ Pyotr Y. Ufimtsev, Method of Edge Waves in the Physical Theory of Diffraction. Available at http://handle.dtic.
Material boundary
Figure 9.3. GTD and PTD calculations combine geometrical or physical optics calculation with a
correction factor due to the vector diffraction from edges and shadow boundaries: (a) edge waves
from discontinuities and (b) creeping waves from curves.
and the scattered fields can be decomposed into sums over those individual contributions. Locally these can be described as flat, ellipsoidal, cylindrical, wedge-shaped,
or conical—all shapes for which rigorous analytic solutions exist, at least in the far
field. The beautiful trick of PTD is to take each of these canonical cases, solve it
twice—once rigorously and once by physical optics—and then subtract the two solutions, yielding the edge diffraction contribution alone. For sharp edges, this is expressed
as a line integral around the edges, turning a 3D problem into a 1D problem. In
most calculations the diffracted contributions are further approximated as diffracted
rays. (Curved surfaces give rise to creeping rays, which are harder to deal with—the
F-117A stealth fighter is all flats and angles because it was designed using 1970s
The two kinds of diffracted rays are shown in Figure 9.3: edge rays, which emanate
from each point of discontinuity, such as a corner or an edge, and creeping rays, generally
much weaker, which emanate from shadow edges on smooth portions of the surface.
So the way it works is that you do the calculation via geometrical or physical optics,
which ignores the edge contributions, and then add in the vector diffraction correction in a
comparatively simple and computationally cheap way. Complicated geometries will have
important contributions from multiple scattering, leading to a perturbation-like series in
which nth order terms correspond to n-times scattered light.
These approximations are usually very complicated, but on the other hand, they contain
information about all incident and scattered angles, all positions, and all wavelengths,
in one formula. The information density in that formula dwarfs that of any numerical
solution, and frequently allows algebraic optimization of shapes and materials, which is
very difficult with numerical solutions. This makes GTD and PTD well suited for design
problems, especially with computer algebra systems available for checking.
GTD approximations tend to diverge as x −1/2 at shadow boundaries, caustics, and
foci, which of course are points of great interest. The same idea, local approximation
by analytically known results, can be used to get a uniform asymptotic approximation,
valid everywhere. For details, see the IEEE collected papers volume† and the excellent
monographs of Ufimtsev and of Borovikov and Kinber referenced in the Appendix.
† Robert
C. Hansen, ed., Geometric Theory of Diffraction, IEEE Press, New York, 1981.
9.2.8 Pupils
The entrance pupil is an image of the aperture stop, formed by all the elements ahead
of the stop, and the exit pupil is the image of the same stop formed by all succeeding
ones. Thus they are images of one another, and each point in the entrance pupil has
a conjugate point in the exit pupil. Since nobody really knows what a lens does, we
rely on this property heavily in wave optics calculations of imaging behavior. The most
consistent high-NA approach to the problem is to use the ray optics of thin pencil
beams to construct the fields on the pupil plane, and then propagate from there to the
image using the Rayleigh–Sommerfeld integral. Remember the ray/wave disconnect:
in the wave picture, pupil refers to the Fourier transform plane, not to the image of
the aperture stop. (Like most ray/wave terms, the two are related but generally not
Not all imaging optical systems possess proper pupils; for example, a scanning system
with the x and y deflections performed by separate mirrors lacks one unless intervening
optics are included to image one mirror onto the other. An optical system without a
pupil is not a shift-invariant system, so that Fourier imaging theory must be applied with
9.2.9 Invariants
There are a number of parameters of an optical beam which are invariant under magnification. One is the state of focus: if an object point is 1 Rayleigh range from the beam
waist, its image will be at 1 Rayleigh range from the waist of the transformed beam
(neglecting diffraction). This is because the longitudinal magnification of an image is not
M but M 2 .
The best known is the Lagrange invariant, which we’ve encountered already as the
conservation of étendue. You can get this by putting two rays as the columns of a 2 × 2
matrix R. No matter what ABCD matrix you hit R with, the determinant of the result is
equal to Det(R): x1 θ2 − x2 θ1 is invariant in the air spaces in any paraxial optical system.
If we generalize to the case n = 1, the ABCD matrix that goes from n1 into n2 is
n1 /n2
whose determinant is n1 /n2 , so the generalized Lagrange invariant L is
L = n(x1 θ2 − x2 θ1 ).
The more usual form of this is the theorem of Lagrange, where for a single surface
between media of indices n1 and n2 ,
n1 x1 θ1 = n2 x2 θ2 .
Another invariant is the number of resolvable spots, which is the field of view diameter
or scan distance measured in spot diameters; if we take the two ends of the scan to be
the two rays in the Lagrange invariant, the range goes up as the cone angle goes down,
and hence the spot size and scan angle grow together.
9.2.10 The Abbe Sine Condition
The Lagrange invariant holds for paraxial systems, but not for finite apertures. Its most
natural generalization is the Abbe sine condition,
n1 x1 sin θ1 = n2 x2 sin θ2 ,
which we don’t get for free. (Optical design programs include offense against the sine
condition (OSC) in their lists of aberrations.) A system obeying the sine condition is
said to be isoplanatic, and has little or no coma.† Like other aberration nomenclature,
this term has a related but not identical meaning in the wave picture: an optical system
is said to be isoplanatic if its transfer function does not vary with position in the image.
You can see that this is a different usage by considering vignetting; a few missing
rays won’t make the sine condition false, but they will certainly change the transfer
Aside: NA and f-Number. It’s possible to get a bit confused on the whole subject of
numerical aperture and f -number, because there are two competing definitions of f # in
common use. One, historically coming from photography, is
f # = D/EFL = 1/(2 tan θ ),
where D is the pupil diameter, θ is the half-angle of the illuminated cone, and EFL is
the effective focal length (just focal length f to us mortals). There’s no clear upper limit
to this number—light coming in from a hemisphere effectively has an infinite radius at
any nonzero focal length, so f # = ∞.
The other definition, coming from microscopy, is
f # = 1/(2 sin θ ) = 0.5/NA,
assuming n = 1. Since NA ≤ 1 in air, in this definition a hemispherical wave would be
coming in at f /0.5. The two are equivalent for small NA and distant conjugates, so they’re
often confused. Photographers care most about image brightness, since that determines
exposure, so the quoted f # on the lens barrel actually applies on the image side of the
lens, and is nearly constant as long as the object distance do f . Microscopists care
most about resolution, so microscope NA is quoted on the object side, where it’s also
nearly constant because of the small depth of focus. The two definitions express the same
information, but confusion is common when we don’t keep them straight. (The author
recommends the 0.5/NA definition as being closer to the imaging physics as well as
giving a simpler exact formula for image brightness, since n2 = π(NA)2 .)
There is not enough space in this book to treat diffraction in complete detail. For purposes
of measurement systems, diffraction is important in four ways: in imaging; in gratings
† Optical
system design is full of forbidding terms like that, but don’t worry—half an hour’s work and you’ll
be obfuscating with the best of them.
and holograms; in spatial filtering; and in vignetting, the incidental cutting off of parts of
the beam by the edges of optical elements, apertures, and baffles. We’ve already covered
the ordinary Huyghens–Fresnel theory in Section 1.3, so this section concentrates on the
finite-aperture case.
9.3.1 Plane Wave Representation
Monochromatic solutions to the scalar wave equation in free space can be expressed
exactly as sums of plane waves of different k. The k-space solution is exactly equivalent
to the real-space solution; no approximation is involved. Thus if we have a focused beam,
and we know its plane wave spectrum exactly, we can calculate its amplitude and phase
at any point (x, t) we like. It’s important to hold on to this fact in discussing diffraction;
once we have specialized to the scalar case, there are no further approximations in the
actual wave propagation calculation. The additional approximations of diffraction theory
involve how spatial Fourier coefficients on surfaces couple into plane waves, and how
an obstruction in a free-space beam modifies its plane wave spectrum.
The easiest case is a plane boundary, because different plane waves are orthogonal
on that boundary; thus a Fourier transform of the fields on the surface, appropriately
weighted, gives the plane wave spectrum directly. Life gets significantly harder when
the boundary is nonplanar. There are a handful of other coordinate systems in which
the Laplacian separates, but the only three useful ones are Cartesian, cylindrical, and
spherical. (Ellipsoidal coordinates are a special case of spherical for electrostatics, but
not for electrodynamics.) Generally, though, unless you’re a glutton for punishment,
you have to choose among plane interfaces, asymptotically large spheres, and numerical
9.3.2 Green’s Functions and Diffraction
The study of diffraction is based on the idea of the Green’s function, which is the response
of a system consisting of a linear partial differential equation plus boundary conditions
to a source term of δ(x − x ). There is so much confusion around as to what the origins
and limitations of diffraction theory are that it seems worth going through the math here.
The following discussion follows Jackson fairly closely, so look there for more detail if
this is unfamiliar. We solve the equation for the Green’s function, and then we can solve
the equation by a superposition integral of the source term f (x ) (neglecting boundary
ψ(x) =
f (x )G(x, x )d 3 x .
all space
The usual case in diffraction theory is a bit more complicated, in that we actually
have the field (rather than a source) specified on some surface, which may or may not be
one of the boundaries of the space. Boundary conditions couple to the normal derivative
of the Green’s function, n̂ · ∇G. (Don’t confuse n the refractive index with n the unit
vector normal to the surface.)
We’ll specialize to the Helmholtz wave equation, so the defining equation for G is
(∇ 2 + k 2 )G(x, x ) = −δ 3 (x − x ).
The two Green’s functions of interest are G0 , the one for free space,
exp ik|x − x |
G0 (x, x ) =
4π |x − x |
and G+ , the one for Dirichlet boundary conditions on the plane z = 0,
exp ik|x − x |
exp ik|x − x |
G+ (x, x ) =
4π |x − x |
4π |x − x |
where x is the mirror image of x .
Green’s theorem is a straightforward corollary of the divergence theorem,
(φ ∇ ψ − ψ ∇ φ)d x = 2
where surface S encloses volume V . If we choose φ = G+ , and make S the plane z = 0
plus a hemisphere off at infinity, then by applying the wave equation and the definition
of G, we get the Rayleigh–Sommerfeld integral,
ψ(x) =
exp ik|x − x |
n · (x − x )
ψ(x )d 2 x .
|x − x |
k|x − x |
|x − x |
A limiting argument shows that the contribution from the hemisphere goes to 0.
If we choose G0 instead, we get the Kirchhoff integral ,
ψ(x) = −
∇ φ + ik 1 +
ψ · n dA .
kR R
These scary-looking things actually turn out to be useful—we’ll revisit them in
Section 9.3.6.
Ideally what we want is to find the exact plane wave spectrum of the light leaving S for
a given plane wave coming in, because that makes it easy to do the propagation calculation. Getting the correct plane wave spectrum is easy for a planar screen, since different
plane waves are orthogonal on a plane, and because we can use the correct Green’s
function in the planar case (the Rayleigh–Sommerfeld theory). For more complicated
boundaries, life gets very much harder since analytically known Green’s functions are
rare, and plane waves are not then orthogonal on S, so we can’t just Fourier transform
our way out of trouble. The surfaces of interest are usually spheres centered on some
image point, so we’d need to expand in partial waves, and then find the plane wave
spectrum from that. Fortunately, there’s an easier way.
Aside: Theory That’s Weak in the Knees. One problem for the outsider coming
to learn optical systems design is that it’s a pretty closed world, and the connections
between the scalar optics of lens design and the rest of optics are not clearly brought out
in books on the subject, or at least those with which the present author is familiar—it
isn’t at all obvious how a given ray intercept error influences the signal-to-noise ratio, for
example. This is not helped by the uniformly inadequate presentation of the theoretical
underpinnings, which almost always base Fourier optics on the Fresnel approximation
and aberration theory on a sloppy use of the Huyghens propagator.
A charitable interpretation of this is that it is an attempt to make the subject accessible
to undergraduates who don’t know anything about Green’s functions. Yet it is unclear
how they are aided by such sloppiness as defining the wavefront on a spherical reference
surface near the exit pupil, then doing the integrals as though it were a plane.
Some claim that this is the Kirchhoff approximation (it isn’t), and others unapologetically toss around the (paraxial) Huyghens integral on the spherical surface, even for
large-aperture lenses. The funny thing about this is that, apart from neglect of obliquity,
they get the right result, but for the wrong reasons. It matters, too, because the confusion
at the root of the way the subject is taught damages our confidence in our results, which
makes it harder to calculate system performance with assurance. If you’re an optics
student, ask lots of rude questions.
9.3.3 The Kirchhoff Approximation
Usually we have no independent way of measuring the actual fields on the boundary
and are reduced to making a guess, based on the characteristics of the incoming wave.
The Kirchhoff approximation says that, on surface S, the fields and their derivatives are
the same as the incoming field in the unobstructed areas and 0 in the obstructed ones.
This turns out to work reasonably well, except for underestimating the edge diffraction
contribution (see Section 9.2.7). The degree of underestimate depends on the choice of
propagator (see below); empirically the Kirchhoff propagator does a bit better on the
edge diffraction contribution than the Rayleigh–Sommerfeld propagator.
You can find lots more on this in Stamnes, but the net is that these physical optics
approximations work pretty well for imaging and for calculating diffraction patterns,
but it won’t get fine details right, for example, the exact ripple amplitude, and will
underestimate the field in the geometric shadow regions. You need GTD or PTD to do
that properly.
9.3.4 Plane Wave Spectrum of Diffracted Light
In Section 1.3, we used the Huyghens propagator, which in real space is
(x, y, z) =
(x − x )2 + (y − y )2
exp ik
2(z − z )
dx dy ,
(x , y , z )
(z − z )
where P is the xy plane, and in k-space is
(u, v)|z=0 ei(2π/λ)(ux+vy) e−i(2πz/λ)(u +v )/2 du dv,
(x, y, z) =
where P is the uv plane.
If a beam gets cut off sharply, it scatters strong fringes out to high angles. Being
a paraxial approximation, the Huyghens integral requires very large values of z − z
to be used in that situation. The Rayleigh–Sommerfeld result (9.22) is the rigorously
correct scalar solution for a correctly given ψ(x) on the plane z = 0, because it is based
on the correct Green’s function for a half-space above that plane. To get the k-space
representation (angular spectrum), we choose x to be on the surface of a very large
sphere of radius R, and neglect the constant term −ieikr /R, which yields
w circ(1 − w)
ψ(u, v) =
(ux + vy ) ψ(x , y )dx dy ,
where u and v are the direction cosines in the x and y directions as before, w = (1 −
u2 − v 2 )1/2 = kz /k, and circ(x) is 1 for 0 ≤ x < 1 and 0 otherwise. It is clear from this
equation that the k-space solution is the Fourier transform of the fields on the boundary,
multiplied by a factor of −ikz = 2π iw/λ = ik cos θ , where θ is the angle of incidence
of the outgoing light. A heuristic way of looking at this uses a pencil beam rather than a
plane wave. A circular beam coming from a surface at an incidence angle of θ occupies
an elliptical patch on the surface, whose area is π a 2 sec θ . On this patch, the field
strength is not diminished by spreading out (different places on the long axis of the
patch are seeing the same beam at different times), so the obliquity factor w = cos θ is
required to counteract the tendency of the integral to become large as the angle approaches
grazing. (We saw this as the Jacobian in Section 9.2.1 and in the drinking-straw test of
Section 1.3.12.)
The k-space Kirchhoff integral is similar,
(ux + vy ) ψ(x , y )dx dy ,
which is just the same as the far-field Rayleigh–Sommerfeld integral except for the
obliquity factor. The Neumann boundary condition case, where n̂ · ∇ψ is specified on
the boundary, yields the same Fourier transform expression with an obliquity factor of
winc . The three propagators are all exact since they predict the same fields if the source
distribution is correct—they differ only when we make an inaccurate guess at φ(x, y).
(winc + w) circ(1 − w)
ψ(u, v) =
9.3.5 Diffraction at High NA
Diffraction from apertures in plane screens can be calculated for all z by assuming that
the field is the same as the incident field in the aperture, and zero elsewhere. In imaging
problems, the screen has reflection or transmission amplitude and phase that depend on
position. If we just take the incident field as our guess, we wind up suppressing the
high-angle components by the obliquity factor (see Section 9.2.1), so in fact we have to
put the reciprocal of the obliquity factor into the illumination beam in order for energy
to be conserved (i.e., multiply by the Jacobian of the inverse transformation). This is
physically very reasonable, since the screen could have a transmission coefficient of 1
(i.e., not be there at all), in which case the plane wave components had better propagate
If the illumination beam has high NA, then the obliquity factors of the plane wave
components of the illumination beam will be different, and that has to be taken into
account. If the object has only low spatial frequencies, and doesn’t have large phase
variations due to topography, then each plane wave will be scattered through only a
small angle, so that cos θ doesn’t change much, and the obliquity factors cancel out.
This effect is partly responsible for the seemingly unaccountable success of Fourier
optics at high NA.
As we discussed in Section 1.3.9, the simple thin object model used in diffraction
imaging theory is a complex reflection coefficient, which depends on x and not on k.
Height differences merely change the phase uniformly across the pupil. This works fine
as long as the maximum phase difference across the pupil is smaller than a wave, i.e.,
we’re within the depth of focus, and providing we take a weighted average of the phase
shift over the entire pupil, i.e., the phase shift with defocus isn’t kz z anymore (see
Example 9.4).
9.3.6 Propagating from a Pupil to an Image
We’re now in a position to say what the exact scalar field propagator is between a pupil
and an image. Consider the exit pupil plane of an optical system, with a mildly wrinkled
wavefront that is basically a spherical wave centered on the nominal image point x0 ,
e−ik|x −x0 |
ψ(x ) = Ã(x ) ,
|x − x0 |
where the pupil function à is a complex envelope that carries the amplitude and phase
information we care about. (In a little while it will be apparent that the natural variables
for expressing à are the direction cosines u and v, just as in the paraxial theory.) We’re
interested in the structure of the image, so we use the Rayleigh–Sommerfeld integral to
propagate to x1 = x0 + ζ , where |ζ | is assumed to be small compared to |x − x0 |. We
further assume that 1/(k|x − x|) 1, that is, we’re in the limit of large Fresnel number,
which allows us to discard that term (which turns out to represent the evanescent fields
and pupil edge diffraction), so we write (where P is the uv plane as before)
ψ(x) =
exp(ik|x − x |) exp(−ik|x0 − x |)
n · (x − x ) 2 d x.
|x − x |
|x0 − x |
|x − x |
Note that we haven’t made any assumptions about small angles or slowly varying
envelopes—apart from the scalar field and discarding the evanescent contributions, this
is an exact result. Providing that ζ is small compared to |x − x |, we can ignore it in the
denominator, but since it isn’t necessarily small compared to 1/k, we have to keep it in
the exponent. Doing a third-order binomial expansion of the exponent, we get
|x0 + ζ − x | − |x0 − x | = ζ ·
x0 − x
+ζ ·
|x0 − x |
(x0 − x )(ζ · (x0 − x ))
|x0 − x |2
2|x0 − x |
+ O(ζ 3 ).
The first term is the phase along the radial vector, which as usual is going to turn
into the kernel of a Fourier transform; the second is the phase due to the length of the
vector changing. (Note that there is no radial component of the phase in order ζ 2 .) If we
were in the paraxial case, we’d just forget about terms like that, or at most say that the
focal surface was a sphere centered on x , but the whole point of this discussion is that
x − x0 be allowed fractional variations of order 1, so we can’t do that.
What we do need to do is restrict ζ . In order for the ζ 2 term to be small compared
to 1/k, it is sufficient that
λ|x − x0 |
|ζ | π
Since we imagine that the pupil function has been constructed by some imaging system, the rays have been bent so as to construct the spherical wavefront. For consistency,
we must thus put in the inverse of the obliquity factor, and the n̂ · (x0 − x ) term then
goes away to the same order of approximation as neglecting ζ in the denominator.† We
also transform into direction cosines, so that (dx, dy) = |x0 − x |(du, dv), which leaves
a pure inverse Fourier transform,
eiku·ζ Ã(u)dudv.
ψ(x) =
For a pupil-image distance of 20 mm and a wavelength of 0.5 μm, this Fraunhofertype approximation is valid in a sphere of at least 100 μm in diameter, even at NA = 1.
In order to cause the image to deviate seriously from the Fourier transform of the pupil
function, there would have to be hundreds of waves of aberration across the pupil, so
that for all interesting cases in imaging, where the scalar approximation applies, Fourier
optics remains valid. This applies locally, in what is called the isoplanatic patch, and
does not imply that the whole focal plane is the Fourier transform of the whole pupil,
as it is in the paraxial case, because that depends on things like the field curvature and
distortion of the lens, and the different obliquities at different field positions.
This analysis applies backwards as well, in going from a small-diameter object to the
entrance pupil of the lens, although if the object is a material surface and not itself an
aerial image, the usual thin-object cautions apply. The combination of the two shows
that an object point is imaged to an image point, and that the point spread function of
the system is the Fourier transform of the pupil function Ã, at least in the large Fresnel
number limit.
This is really the main point: the imaging of an object point into an image point via
a pupil is controlled by Fourier optics in all cases, and for an imaging system faithful
enough to deserve the name, the amplitude PSF of the imaging operation really is the
Fourier transform of the pupil function Ã, regardless of NA.
Example 9.1: High-NA Fourier Optics—Metal Lines on Silicon at NA = 0.95.
Figures 9.4 and 9.5 show the Fourier optics result versus experiment for a 90 nm tall
line of gold on silicon. Even though the scalar Fourier optics approximation to high-NA
imaging is a fairly sleazy one, it nevertheless works extremely well in practice.
Example 9.2: When Is the Paraxial Approximation Valid? The special case of a perturbed spherical wave is very important in applications, but the usual Fourier optics result
is more general; the far-field pattern is the Fourier transform of the pupil function. What
is the range of validity of that approximation?
† For a system of unit magnification, this cancellation is exact when both the object-to-pupil and pupil-to-image
transforms are computed; when the magnification is not 1, the pupil function à will need some patching up,
but that’s not a fundamental objection at this point.
Raw Data
Position (Microns)
120 150 180 210
Raw Data
−30 0
Phase (Degrees)
Figure 9.4. Heterodyne microscope image of Au lines on Si, 515 nm, 0.90 NA: amplitude.
Position (Microns)
Figure 9.5. Au lines on Si: phase.
Comparison of the Huyghens integral with the Kirchhoff and Rayleigh–Sommerfeld
ones shows two differences: the Huyghens integral omits the obliquity factor, and for
a plane wave component exp(i2π(ux + vy)/λ), the Huyghens integral replaces the true
phase shift kz(w − 1) by the first term in its binomial expansion, apparently limiting
its use to applications where this causes a phase error of much less than 1 (it is not
enough that it be much less than kz because it appears in an exponent). If we require
that the next term in the binomial expansion be much less than 1, we find that this
requires that
3 x
|z| .
This restriction is necessary for general fields, but not for paraxial ones. The slowly
varying envelope equation is
d 2 d 2
= 0.
+ 2ik
dx 2
dy 2
Its validity depends solely on the initial conditions; a sufficiently slowly varying
envelope will be accurately described by this equation for all z. For slowly varying and small z − z , the error in the phase term does indeed become large, but a stationary
phase analysis shows that the large-x contribution to the integral goes strongly to zero
as z → z , due to the rapidly varying phase factor, so that the integral remains valid for
all z, and the Huyghens integral is not limited to far-field applications. This is perhaps
easier to see in the spatial frequency representation.
If we take α (x) = eiαx eiγ z in (9.34), requiring the phase error to be small compared
to 1 leads to (9.33) for fixed α of order k, and an absolute phase error that grows secularly
with z, as one would expect. This is not a deadly error, as it amounts only to a deviation
of the field curvature from spherical to parabolic; if we take as our reference surface
a parabola instead of a sphere, it goes away; it may make the calculated optical path
incorrect, and in applications where that matters, it should be checked by comparison
with the Rayleigh–Sommerfeld result.
For fixed z, the restriction can be applied to α instead:
|α| 4
This is easily satisfied for small z as well as large.
9.3.7 Telecentricity
As Figure 9.6 illustrates, a telecentric optical system is one in which the principal ray is
parallel to the optical axis. This means that, roughly speaking, the axis of the cone of
light arriving at the image or leaving the object is not tilted, and is equivalent to saying
that the pupil is at infinity. An optical system can be telecentric in the object space, the
image space, or both.
Focal Plane
Aperture Stop
Figure 9.6. A telecentric optical system.
This property is of more practical interest than it may sound. In a telecentric system, tilting the sample or moving it in and out for focusing does not change the
magnification—the image is an orthographic projection, like an engineering drawing.
Light reflected from a plane sample, such as a microscope slide or a flat mirror, retraces
its path. Both of these properties are very useful for scanning or imaging interferometers
such as shearing interference microscopes.
In a telecentric imaging system with telecentric illumination, the illumination diagram
is independent of position; all points in the field of view are imaged with light covering
the same range of angles. Because both the illumination and collection NAs are constant,
the range of spatial frequencies received is the same everywhere too. These two properties
together give telecentric systems nearly space-invariant point spread functions. This is
of great benefit when interpreting or postprocessing images, for example, in automatic
inspection systems. Obviously a telecentric system can have a field of view no larger
than its objective (the last optical element on the outside), and usually it’s significantly
9.3.8 Stereoscopy
Stereoscopic vision requires the ability to look at a scene from two different directions and synthesize the resulting images. This is different from merely binocular vision.
A binocular microscope presents the same image to each eye, whereas a properly stereoscopic microscope splits the pupil into two halves, presenting one half to each eye.
Since pupil position corresponds to viewing angle, this reproduces the stereo effect.
Splitting the pupil reduces the resolution, but the gain in intuitive understanding is well
worth it.
9.3.9 The Importance of the Pupil Function
Pupil functions don’t get the respect they deserve. The point spread function h(x) of an
optical system is the Fourier transform of the pupil function Ã(u), and the point spread
function ranks with the étendue as one of the two most important attributes of an imaging
system. The pupil function is the filter that is applied to the spatial frequency spectrum
of the sample to generate the image.
In signal processing, we choose our filter functions very carefully, so as to get the
best measurement, but this is less often done in optics, which is odd since optics are
much more expensive. One reason for it is confusion of two quite different objects,
both called transfer functions, and both giving rise to point spread functions (PSFs). The
confusion has arisen because, for historical reasons, the one less clearly connected to the
electromagnetic field quantities E and B has staked out the high ground.
9.3.10 Coherent Transfer Functions
When we describe the actions of an optical system in terms of the plane wave decomposition of the scalar optical field E, and apply Fourier transform theory to describe how a
sinusoidal component of the field distribution at a sample plane propagates to the image
plane, we are using the coherent transfer function (CTF) of the system. The CTF is the
convolution of the illumination and detection pupil functions, because the amplitude PSF
of the measurement is the product of the illumination and detection PSFs. Most of the
time, one of the two is far more selective than the other, so the CTF and the broader of
the two pupil functions are often interchangeable.
The CTF is the right description of a translationally invariant phase-sensitive optical
system; this class includes holography setups, scanning heterodyne microscopes, and
phase shifting imaging interferometers, as well as any system producing an aerial image,
such as binoculars. To determine the output of such a system, multiply the Fourier
transform of the sample’s complex reflection coefficient, as a function of position, by
the instrument’s CTF, and take the inverse transform. This is of course mathematically
equivalent to convolving the sample function with the instrument’s 2D amplitude point
spread function. Since the optical phase information is preserved, digital postprocessing
can be used to transform the complex fields in a great variety of ways.
The net effect is that provided you measure both phase and amplitude, on a sufficiently
fine grid and at sufficiently high SNR, you can do with postprocessing anything you could
do on an aerial image with optical elements; this is a remarkable and powerful result.
Example 9.3: Heterodyne Microscope. A heterodyne microscope is basically a heterodyne Michelson interferometer, using an AO deflector as its beamsplitter, and with a
microscope objective in one or both arms. Some versions use separate lenses, and some
send both beams down to the sample through the same lens, as shown in Figure 9.7.
A uniform pupil has an illumination pupil function L = circ((u2 + v 2 )/NA2 ), which
transforms to an illumination PSF of
l(χ ) =
k0 − ka
J1 (π χ )
Moving Beam
k r + ka
Fixed (LO) Beam
r0r(x) exp(i4πfat)
2f + fIFa
Figure 9.7. Heterodyne confocal microscope.
where χ = rNA/λ. The coherent detector uses interference with a nominally identical
beam s(χ ) to produce the AC photocurrent
iAC = 2R Re
d 2 xψLO ψs∗
|ψLO (x)||ψs (x)| exp(i φ(x, t)) dA ,
= 2R Re exp(−i ωt)
as in Section 1.5. By the power theorem, this dot product can be computed in the pupil
or the image, or anywhere in between. For our purposes, it is easiest to see what happens
if we compute it at the sample surface. There, the two jinc functions are superimposed
and multiplied by the local complex reflection coefficient r̃ of the sample S. Thus the
total complex AC photocurrent is
l(x)r̃(x)s ∗ (x)d 2 x,
ĩAC = R
which if both beams are unaberrated and focused on x is
r̃(x )
ĩAC (x) = Rλ
J1 ((π NA/λ)|x − x |)
|x − x |NA
d 2x,
so by construction, the amplitude PSF of such a microscope is
J1 (π χ )
g(x) =
The CTF of this microscope is the Chinese hat function,
H (ω) =
2 % −1
cos (ω) − ω 1 − ω2 ,
whose name makes perfect sense if you look at Figure 9.8 and remember that it’s cylindrically symmetric. This function has a cusp at 0 and extends out to ω = 2NA. In talking
about the CTF, we’re subtly sliding into the thin-object Fourier optics approximation,
where a spatial frequency component at ν = 2NA/λ scatters light coming in at u all the
way to −u, which can still just make it back through the pupil.
The line spread function is
l2 (x) =
8π NA
H1 (ξ )
16NA 1
(−1)m ξ 2m
(2m + 1)!!(2m + 3)!!
CTF G(u)
cos (0.5 u/NA)
Chinese Hat
Figure 9.8. CTFs of a heterodyne interference microscope before and after Fourier postprocessing.
where ξ = 2kxNA, and H1 (x) is the Struve function of order 1 (see Abramowitz and
Stegun). This function has an asymptotic series for large x,
l2 (x) ∼
16NA −2 8NA(cos ξ + sin ξ )
ξ −
+ O(ξ −4 ) monotonic + O(ξ −4.5 ) oscillatory,
which is a distressingly slow falloff. The slowness is due to the cusp at the origin
and the higher order nondifferentiability at the outer edges. Because the heterodyne
system preserves phase information, this cusp can be removed by digital filtering in
a postprocessing step (see Section 17.7.1). Even a very gentle filter can make a big
difference to the settling behavior; for example, F (u) = cos2 (u/NA)/G(u), which turns
the Chinese hat function into a von Hann raised cosine (see Section 17.4.9). This filter
removes the cusp and makes the edges go to 0 quadratically, and as Figures 9.8–9.10
show, the step response settles at its final value when the uncorrected LSF is still 5%
away. It does this with essentially 0 dB noise gain, so there’s no penalty whatever.
A different filter, which boosts the wings of the transfer function further, can yield
results that approach those expected from a microscope operating at half the wavelength,
provided the noise is sufficiently low† ; the 10–90% edge rise going over a λ/6 phase step
can be reduced from 0.45λ to 0.19λ, that is, 90 nm for λ = 514 nm. (Phase edges are
a bit sharper than pure amplitude ones, since the amplitude dip in the middle sharpens
the edge; the pictures in Figures 9.4 and 9.5 were preternaturally sharp because the step
happened to be close to λ/4 tall.) Since an optical image is one of the few real examples
of a band-limited function (because waves with spatial frequency higher than 1/λ cannot
propagate to the detector), this is as much as can be achieved in a model-independent
† P.
C. D. Hobbs and G. S. Kino, Generalizing the confocal microscope via heterodyne interferometry and
digital filtering. J. Microsc. 160(3) 245– 264 (December 1990).
Experimental Points
Phase ( Degrees )
Deconvolution I
Deconvolution II
Position ( nm )
Type 2
fco = 0 (von Hann)
fco = . 75 (40 terms)
fco = . 75
−. 2
Step Response
Figure 9.9. Experimental and theoretical phase plots for a heterodyne confocal microscope looking
at an aluminum-on-aluminum step, 80 nm tall, before and after deconvolution.
−. 3
−. 2
Position (Microns)
Figure 9.10. Theoretical step response of a heterodyne confocal microscope to an amplitude step,
and several deconvolutions.
Example 9.4: Modeling Defocus. Another way to illustrate the difference between
coherent and incoherent optical systems, and the value of coherent postprocessing, is
the compensation of defocus. In an incoherent system, it is impossible to distinguish
between positive and negative defocus, because (apart from aberrations) the difference
is only the sign of the phase shift, which gives rise to no intensity change. Although
there are minor differences in the behavior of lenses depending on the direction of the
defocus, this does not change the basic point.
In a coherent system, on the other hand (provided that we measure the amplitude and
phase independently, with adequate spatial resolution and signal-to-noise ratio), we can
numerically refocus an image, or focus on any given depth. This√can be done as follows:
decompose the beam into plane waves, multiply by exp(−ikz 1 − u2 − v 2 ), where u
and v are the x and y direction cosines as usual, then put it back together. This is a kind
of convolution filter.
The author’s former colleagues, Paul Reinholdtsen and Pierre Khuri-Yakub, used this
idea with a confocal acoustic microscope to remove blurring caused by out-of-focus
structures, by numerically defocusing an in-focus image of the interfering top surface.
Looking at a quarter, they were able to read QUARTER DOLLAR on the back, right
through the coin, by defocusing George Washington and subtracting him out.
Performing the convolution and taking the real part, we can get the (complex) vertical
response of a confocal reflection microscope (where the phase shift is doubled):
ĩ(z) = 2π
dω exp −i2kz 1 − ω2 .
Here we’ve assumed that the pupil function is uniform, so that the obliquity factors in
transmit and receive cancel out exactly, and that the medium is air. With a change of
variable from ω = sin θ to r = cos θ , this becomes
ĩ(z) = 2π
exp(−i2kzr )r dr .
This is easily done by partial integration, but the result is a mess. We can get a good
memorable result accurate to about 0.2% up to NA = 0.5 by setting the factor of r outside the exponent to 1 and computing the envelope and carrier:
ĩ(z) = 2π(1 − r)sinc(rz/λ) exp(−i2krz),
that is, the amplitude response is a sinc function and the phase shift is not 2kz but is
reduced by a factor [1 − (NA)2 ]1/2 . The exact result shows that the phase slope reduction
reaches a factor of 2 at NA = 1.
9.3.11 Optical Transfer Functions
The CTF is not the most commonly encountered transfer function in the literature. The
more usual optical transfer function (OTF ) is another beast altogether, and it’s important to keep the distinction crystal clear. The OTF predicts the intensity distribution
of the image based on that of the sample, with certain assumptions about the spatial
coherence of the illuminator, i.e., the statistical phase relationships between the various
Fourier components. There is no 1:1 correspondence to the propagation of plane wave
components through the optical system. As we’ll see, the OTF isn’t a proper transfer
The intensity† of the beam is ψψ ∗ cos θ . Since the propagation of à to the image
plane is governed by the CTF H , the autocorrelation theorem gives us the OTF O:
O(u, v) = H (u, v) H (u, v).
The OTF for an ideal system whose pupil function is circ[(u2 + v 2 )/(NA)2 ] is our old
friend the Chinese hat; the circ function is real and symmetric, so its transform is real
and symmetric, and therefore its self-convolution equals its autocorrelation. (This is only
true in focus, of course.)
The OTF and CTF each presuppose a concept of spatial frequency, but it must be
understood that these two concepts do not map into each other in a simple way. Intensity
is related to the squared modulus of the field variables; this nonlinearity results in the
field amplitude spatial frequencies of the CTF undergoing large-scale intermodulation
and distortion in the process of becoming the optical intensity spatial frequencies of the
OTF. In particular, the width of the OTF is twice that of the CTF, but that does not
imply the ability to resolve objects half the size. In discussing the OTF, we still use
the variable names u and v, but do be aware that they no longer correspond directly to
pupil plane coordinates, nor to the direction cosines of the plane wave components of ψ.
(This is another example of a problem that’s endemic in optics: reusing nomenclature in
a confusing way.)
Being autocorrelations, optical transfer functions always droop at high spatial frequencies, and since intensity is nonnegative, OTFs must always have a maximum at
zero. Interestingly, the OTF can go negative at intermediate values of spatial frequency,
leading to contrast inversion for objects with periodicities falling in that region, an effect
called spurious resolution. The OTF is purely real for symmetric optical systems but can
exhibit phase shifts in systems lacking an axis of symmetry.
The justification for centering on the OTF is that, with thermal light, the phases of
image points separated by more than a couple of spot diameters are uncorrelated, so there
is no utility in keeping the phase information. This is of course fine if an in-focus image
is being projected on film or an intensity detector, which is intrinsically insensitive to
optical phase, but is inadequate for an aerial image or a phase-preserving system like a
phase shifting interferometer or a laser heterodyne system, where the phase information
still exists and can be very important, not least in fixing the imperfections of the image.
Perhaps the most intuitive way of capturing the distinction is that the OTF is not
changed by putting a ground-glass screen at the image, whereas the CTF has its phase
scrambled. Classical lens and optical systems designers use the OTF as one of their
primary tools, which explains some of the difficulty encountered by workers in the
different fields when they talk to each other.
Aside: Nonuniqueness of the Intensity Pattern. Since the relative phases of the
plane waves in the CTF are lost when going to the OTF, any two patterns whose fields
differ only in phase will produce the same intensity pattern, for example, positive and
negative defocus.
† Well,
irradiance, to be exact—the author resists the Humpty-Dumpty approach to radiometric nomenclature
that goes around redefining preexisting terms to mean something else, in this case intensity being used for total
power per steradian through some surface.
9.3.12 Shortcomings of the OTF Concept
The classical formulation of the optical transfer function is not a good analogue to transfer
functions as used in circuit theory, ordinary differential equations, and so forth, although
it might superficially look like it.
The behavior of fields is much more intuitive than that of image irradiance, because
the fields exist throughout the optical system, whereas image irradiance doesn’t. There
are other ways in which the OTF isn’t really a transfer function, the most important one
being that you can’t compute the OTF of two systems in cascade by simply multiplying
the individual OTFs.
For example, consider a 1:1 relay system consisting of two lenses of focal length f ,
spaced 4f apart, as in Figure 12.1a. With an object at −2f from the first lens, there will
be a good image at the center of the system and another one at 2f past the second lens.
If we choose the reference plane for the individual OTFs to be the center, everything
works reasonably well. On the other hand, if we choose it to be off center, the image
at that plane will be out of focus, leading to an ugly OTF, falling off very rapidly from
zero spatial frequency. The second lens will also be defocused, leading to another ugly
OTF, so their product will be ugly squared. This is exactly the right answer, provided
we put a diffuser at the reference plane.
In real life, of course, an odd choice of reference plane doesn’t affect the system
operation at all—the defocus of the first half is undone by the defocus of the second,
leading to a good image. The OTF gets this wrong, but the CTF gets it right—the phase
curvatures of the two CTFs compensate correctly, and you get the right answer.
Lest anyone say that this is just silly, that nobody would set up a calculation that
way, let’s go a bit deeper into the problem. A symmetric optical system such as this 1:1
relay has no odd-order wave aberrations, because the second half’s aberrations cancel
out the first half’s. (The even orders add.) Computing the overall OTF by multiplying
the two half-OTFs will get this wrong, because the phase information is lost, so all the
aberrations add in RMS instead of directly. Odd-order contributions will be overestimated,
and even-order ones underestimated. Yet this weird OTF thing is called “the transfer
function” and tossed about as though it had physical meaning. Beware.
9.3.13 Modulation Transfer Function
The modulation transfer function (MTF) is the magnitude of the OTF, normalized to
unity at zero spatial frequency, and is most commonly used to describe the resolution
performance of lenses, while not considering their photon efficiency.
9.3.14 Cascading Optical Systems
Under appropriate assumptions, when two optical systems are cascaded, their transfer
functions are multiplied to get the transfer function of the cascade. If there is a diffuser,
image intensifier, television system, or other phase-randomizing device between the two,
use the OTF or MTF. Otherwise, use the CTF.
9.3.15 Which Transfer Function Should I Use?
This depends on the properties of the illuminator, and to a lesser degree on those of
the detector. The assumptions leading to the derivation of the OTF are: an illuminator
with very low spatial coherence, and a detector that is sensitive only to intensity, such
as a television camera or photodiode, with no phase reference (as in an interferometer).
The resulting near-total loss of phase information severely limits the opportunities to
gain from postprocessing, although the work of Fienup and others has demonstrated that
some phase information can often be retrieved.
Example 9.5: OTF of an Ideal CCD Camera. As an example of the use of the OTF,
consider a CCD camera with square pixels of pitch δ, a 100% fill factor, QE = 1 everywhere, and negligible bleed of one pixel into its neighbor. This is a spatial analogue of
the sampled-data systems we’ll encounter in Section 17.4.3, so although the detector is
not shift invariant, we lose no information about the true OTF as long as the pixel pitch
obeys the Nyquist criterion, and it is still sensible to talk about the OTF and MTF of
such a system. The detector sensitivity pattern is rect(x/δ) rect(y/δ), which is unaltered
by squaring. Since u and x/λ are the conjugate variables, the detector CTF is the product
of x and y sinc functions scaled by δ/λ, and its OTF is the same, so the OTF of the
lens/CCD system is
OTFtot (u, v) = OTFlens (u, v)
(We couldn’t use this detector coherently without an LO beam, of course, so we can
think of the spatial filtering action corresponding to this CTF as occurring on a Gaussian
surface just above the detector.)
A lens is often thought of as imaging an object plane on an image plane, but really it
images a volume into another volume. A perfect imaging system would image every
point in its object space to a corresponding point in its image space. The fidelity of the
image would be limited only by diffraction, and in the transverse direction, it would
be perfectly faithful geometrically as well. Squares would come out square (no distortion or anamorphic errors), and a flat object would give rise to a flat image (no
field curvature), but unless the magnification was unity, the longitudinal magnification
would nonetheless differ from the transverse magnification. Paraxial theory, whether the
ray model (as in ABCD matrices) or the field model, always predicts perfect imaging, apart from defocus. We therefore expect the aberrations to turn up in the higher
order terms.
Unfortunately, the algebra gets ugly in a hurry when we’re dealing with exact ray
tracing or scalar wave propagation; there are lots of square roots. Good quality optical
systems ought to have phase aberrations that are small compared to the total phase delay
through the system, so we anticipate that a power series expansion will yield useful
simplifications. This power series is really not that well defined, because higher orders
yield higher spatial frequency information that will eventually be corrupted by edge
diffraction and vignetting, so that the aberration series is really a high-order polynomial
plus some hard-to-treat residual, which we will assume is small.†
† This
is rather like the distortion polynomial of Section 13.5.
Nobody uses high-order analytical aberration theory anymore. Lens designers use the
low-order terms as conveniences, but rely on computer ray tracing and (manually guided)
numerical optimization of an intelligently chosen starting configuration. For the system
designer, high-order aberrations are of peripheral concern as well.
Aberrations are most obtrusive in wide field, high-NA optics, such as lithographic
lenses and fast telescopes. Lots of instruments use lenses like that, but they are seldom
fully custom designs, because the engineering and tooling costs would be astronomical.
Thus the heroic lens design is someone else’s problem—the rest of us mostly live in
the low-NA, narrow field region behind those fancy lenses. Instrument designers need
to know how aberrations propagate, what produces them, and how to avoid doing it.
For this use, the lowest order aberrations are generally enough. For the same reason,
we’ll ignore the pure ray picture entirely and center on phase shifts of the plane wave
components of a focused spot at an arbitrary field position.
9.4.1 Aberration Nomenclature
Aberration theory is somewhat separate from the rest of optics, because it is primarily
used by lens designers, who have been doing much the same sort of work for 100 years,
in sharp distinction to workers in most of the rest of optics. This is not to disparage the
great strides lens design has made in that time, but it remains true that the formalism of
classical aberration theory is not clearly related to the rest of the optical world, and a
number of the terms used have different meanings than elsewhere. For example, to most
practical optics folk, defocus means that the focal plane is offset from where it should be.
In the paraxial approximation, a defocus d translates to a quadratic phase (time delay)
across the pupil,
tdefocus ≈
whereas the real phase delay is kz z, which in time delay terms is
tdefocus =
nd nd
cos θ =
1 − u2 ,
which of course contains all even orders in u.
In wave aberration theory, defocus means the quadratic expression (9.49), even at
large NA. A pure focus shift in a large-NA system thus comes out as aberrations of all
even orders, including spherical aberration and so on, even though a twist of the focus
knob will restore the image completely. Those of us who use physical intuition heavily
must guard against being led astray by this sort of thing.
Aberrations with the same name in the two different pictures do not correspond
uniquely to one another; we’ve already seen the problem with defocus, but it also exists
in other places. The names of aberrations have only mnemonic value—once again, if
you expect everything to make sense together, you’ll wind up chasing your tail.
As a way of connecting aberration theory with ordinary experience, let’s calculate
the effects of introducing a plane-parallel slab of dielectric into a perfect, converging
spherical wave of limited NA.
d se
d sec
2 cos
d se (θ1 θ
c θ1 2 )
Figure 9.11. A plane-parallel slab of dielectric introduced into a plane wave.
9.4.2 Aberrations of Windows
Figure 9.11 shows the k vector of a plane wave incident on a plane-parallel slab of
dielectric constant n2 . The refracted wave travels farther, and through a different index
material. In the wave picture, this one is easy; the phase shift is just (kz2 − kz1 )d. Let’s use
the hybrid picture, where we calculate the phase difference along the ray paths. Inspection
of the figure shows that the change in the propagation time due to the presence of the
slab is
t = sec θ2 (n2 − n1 cos(θ1 − θ2 )),
since a translation perpendicular to k has no effect on a plane wave. (If this isn’t obvious,
it’s worth spending a bit of time on. This is a move that the hybrid picture relies on a
good deal.) Without loss of generality, if the slab’s faces are parallel to the (x, y) plane,
and the incident plane wave has direction cosines (u1 , 0), then Snell’s law requires that
u2 = n1 u1 /n2 (we’ve used u1 = sin θ1 ). Writing (9.51) in terms of u1 , we get
⎡ t =
⎣n2 1 −
n1 u1
− n1 1 − u21 ⎦ ,
which (comfortingly enough) is the same as the wave result. This obviously has terms
of all even orders in u1 . Let’s look at the first three orders, t0 to t4 :
t0 =
d u21
d u41
(n2 − n1 ), t2 =
(n1 − n21 /n2 ), t4 =
(n1 − n41 /n32 )
c 2
c 8
The higher order terms in the series expansion are equally simple, and all have the
same form. Specializing to a 6 mm thick plate and BK7 glass (nd = 1.517), we get the
result shown in Figure 9.12. The bulk of this is obviously caused by the zero-order time
delay and the focus shift, but considering that one cycle of green light takes only 1.8 fs
even the residuals may be very large (the right-hand axis goes up to 12,000 waves). We
Aberration (Waves)
Delta t (ps)
O(u )
O(u )
Order 4
Order 2
Order 0
Direction Cosine u
Figure 9.12. Differential time delay t suffered by a plane wave on passing through the dielectric
plate of Figure 9.10, together with approximations of orders 0, 2, and 4. The curves at left are
aberration residuals up to orders 2 and 4.
find out the effects of the aberrations on our nice focused beam by using the delay as
a function of u and v to construct a phase factor to multiply our pupil function Ã, and
then using (9.32) to get the new point spread function.
As we’ve already discussed, the u21 term is usually called simply “defocus,” though
Figure 9.12 shows up the clear distinction between this and real defocus; to keep things
clear, we’ll call the quadratic term paraxial defocus. The term in u41 is called primary
spherical aberration. Spherical aberration is independent of field position, and so it
occurs even for an on-axis beam.
The curves on the left show the aberrations of fourth order and greater, and of sixth
order and greater, in units of waves (i.e., cycles) for 600 THz (500 nm) light, assuming
that the pupil function of the beam is symmetric around u = 0. If that isn’t true, for
example, a low-NA beam coming in at a big field angle, the true aberration is ν times
the spread of time delays across the pupil function, minus the best-fit defocus.
9.4.3 Broken Symmetry and Oblique Aberrations
Other aberrations, such as astigmatism and coma, show up as soon as we run a finite
field angle, that is, move the center of the pupil function distribution away from (0, 0).
In an axisymmetric system like this plate or most lenses, these oblique aberrations are
purely an effect of a shift of origin. (If the plate had some wedge angle, that would no
longer be true.)
A residual effect of this broken symmetry is that if we move the origin to (u0 , 0)
(which loses us no generality), the pupil function is an even function of v. Thus the
aberrations of a really symmetric system depend only on even powers of v, and by
appropriate rearrangement of terms, that means they depend only on the cosine of the
azimuthal angle θ (u = ρ cos θ , v = ρ sin θ ). Manufacturing and assembly errors are in
general asymmetrical and are frequently of the same order as the design residuals, so
don’t make too much of it.
If we move the center of the pupil function to (u0 , 0), we’re calculating the fields
at a point x = (u0 f/(1 − u20 )1/2 , 0, f ), where L is the z distance from the pupil to the
focus. For simplicity, we’ll call this x coordinate the height h. The aberration polynomial
coefficients get a tiny bit more complicated,
(a − b),
t1 = − η(αa − γ b),
t2 = − (η2 [(αβ + α 2 )a − (γ δ + γ 2 )b] + v 2 [αβa − γ δb]),
d ' 3
η [(8αβ + α 3 )a − (8γ δ + γ 3 )b] + ηv 2 [8αβa − 8γ δb] ,
t3 = −
t0 =
and so on, where u = u0 + η, β = 1/(n22 /n21 − u20 ), α = u0 β, δ = 1/(1 − u20 ), γ = u0 δ,
a = n1 /β 1/2 , and b = 1/δ 1/2 . The coefficients of ηi v j are the aberration amplitudes.
9.4.4 Stop Position Dependence
One good way of reducing the effect of badly aberrated edge rays is to block them with
a strategically placed stop. This may seem wasteful of light, but those rays weren’t doing
our measurement any good anyway, so they’re no loss. This is one example of a case
where the stops may be fairly far from the Fourier transform plane.
The standard method of representing the aberration coefficients of a wavefront is the
wave aberration polynomial,†
W =
W2l+n,2m+n,n h2l+n ρ 2m+n cosn φ,
where W is the optical path difference in meters (converted to n = 1 as usual). Think of
kW as the (unwrapped) phase of Ã. Apart from the fact that the practical upper limit of
this summation is very finite, it’s moderately useful, although more mysterious looking
in this form. The coefficients all have names up to order 6 or so (the order of a term
† Warren
J. Smith, Optical design, Chap. 2 in J.S. Accetta and D.L. Shumaker, The Infrared and Electro-Optical
Systems Handbook , Vol. 3.
TABLE 9.1. Seidel Aberrations
W111 hρ cos θ
W020 ρ 2
W040 ρ 4
W131 ρ 3 cos θ
W222 h2 ρ 2 cos2 θ
W220 h2 ρ 2
W311 h3 ρ cos θ
(Paraxial) defocus
Field curvature
is the sum of the exponents of ρ and h, not cos θ ), which are listed up to order 4 in
Table 9.1. Of the ones we haven’t talked about, piston is just an overall phase shift,
which we often don’t care about, and tilt corresponds to a shift in the focal position.
The ray model came first. Ray aberrations are quoted as position errors in the focal
plane; because the ray travels along ∇S, the same term shows up in one lower order
in the ray model—astigmatism is a fourth-order wave aberration but a third-order ray
aberration, which can cause confusion sometimes. We saw in Section 9.2.3 that the local
direction of propagation is parallel to ∇. The ray intercept error is
x = − ∇(OPL).
The most common way to quote aberration contributions is in peak-to-peak waves over
the full diameter of the pupil.
9.5.1 Seidel Aberrations
Wave aberrations up to order 4 are known as Seidel aberrations; their pupil functions are
shown in Figure 9.13 and their functional forms in Table 9.1. Looking at the spherical
aberration and astigmatism profiles, it is clear that the RMS wavefront error could be
significantly reduced by compensation, that is, adding a bit of tilt to the coma and a bit
of defocus to the spherical aberration so as to minimize φ. Compensated waveforms
are shown in the bottom row of Figure 9.13.
9.5.2 Aberrations of Beams
Frequently in instruments we want to talk about the aberrations of a fixed laser beam, so
it doesn’t make much sense to talk about dependence on field angle or image height. In
that case, the only relevant terms up to fourth order are paraxial defocus ρ 2 , astigmatism
ρ 2 cos2 (θ − θ0 ), spherical aberration ρ 4 , and coma ρ 3 cos(θ − θ1 ). Since in general no
symmetry constraint applies, the θi can be anything.
Aside: Zernike Circle Polynomials and Measurements. The Zernike polynomials
are an orthogonal basis set for representing the optical phase in a circular pupil . This
sounds like a great way of expressing measurement results—decomposing a wavefront
into orthogonal polynomials is computationally cheap and well conditioned, and all.
Unfortunately, their practical utility is zilch. Due to vignetting and beam nonuniformity,
our pupils are almost never exactly circular or uniformly illuminated, and errors in the
1 −1
1 −1
1 −1
1 −1
Primary Coma
Primary Astigmatism
1 −1
Primary Spherical
Primary Astigmatism
1 −1
Primary Spherical
1 −1
1 −1
Figure 9.13. Seidel aberrations.
boundary conditions destroy the orthogonality. Defining the “true” Zernike coefficients
is especially problematical when our measuring interferometer is intentionally using an
elliptical clipping boundary, and perhaps choosing a different ellipse for each run. Even
if the pupil stays circular, Zernikes are only obliquely connected to the beam quality
measures we care about (e.g., defocus in diopters).
The power series coefficients stay reasonably still even as the bounding ellipse
changes, and are pretty well connected to what we care about, so use that up to fourth
order in the wavefront. If that isn’t accurate enough, do the calculation numerically.
9.5.3 Chromatic Aberrations
Different wavelengths see different values of n, and a time delay t produces different phase shifts. Thus all the coefficients of the aberration polynomial are wavelength
dependent. Changes in focal length with λ are longitudinal chromatic aberration, changes
in magnification are lateral chromatic aberration or lateral color, and changes in the
primary spherical term are spherochromatism. Few of these things are under our control
as system designers, which isn’t to say they aren’t important.
9.5.4 Strehl Ratio
The Strehl ratio is the ratio of the central intensity of a focused spot to what it would be
with the same amplitude distribution but zero phase error,
Ã(u, v)du dv .
R = Ã du dv (9.60)
The Schwarz inequality guarantees that this ratio achieves 1 only when the phase
error is indeed zero, and never exceeds that value. A Strehl ratio of 0.8 corresponds
to Rayleigh’s λ/4 criterion for a system that is diffraction limited.† When building
focused-beam instruments, we frequently find that the electrical signal power goes as
the square of the Strehl ratio, which is a convenient way of including aberration tolerances in our photon budgets. A useful approximation for the Strehl ratio is Marechal’s
−φ 2 ,
R ≈ exp
2π 2
where φ 2 is the mean square phase error in rad2 (remember to weight the mean square
calculation by the intensity and area, and normalize correctly). If you prefer to use the
phase p in waves (i.e., cycles), it’s exp(−p2 /2), which is pretty easy to remember.
The Strehl ratio suffers from the same problem as equivalent width; if the peak is
shifted from the origin, the ratio may come out very small, even for a good beam. Thus
we often want to calculate the Strehl ratio after the tilt contribution (which moves the
focus sideways) has been removed. Also, it can give some odd results with multiple
transverse mode beams—because the different spatial modes have different frequencies,
DC intensity measurements like Strehl ratio miss the rapidly moving fringe pattern in
the beam, and so underestimate the far-field divergence. The Strehl ratio is an excellent
quality measure for good quality beams, where the intensity profile has one main peak;
for uglier ones, consider using Siegman’s M 2 instead.
Example 9.6: ISICL Signal-to-Noise Calculation. The Strehl ratio shows up by other
names in other fields. In antenna theory, it is called the phase efficiency, which is the ratio
of the on-axis detected signal to that of a perfectly figured antenna, neglecting loss. Strehl
ratio thus shows up as a multiplicative factor on the detected signal from focused-beam
instruments. The ISICL sensor of Example 1.12 uses coherent detection with a Gaussian
beam. Referring to Figure 1.15, a particle with differential scattering cross section dσ/d
crossing near the focus produces a scattered photon flux per steradian
2π PT (NAT )2 dσ
RT ,
† Strictly
speaking, the 0.8 corresponds to 0.25 wave of uncompensated spherical aberration.
where RT and NAT are the transmit beam’s Strehl ratio and numerical aperture. When
coherently detected by a similar LO beam, the detection NA is π(NAR )2 RR (the factor
of RR accounts for the dephased (aberrated) part of the LO power, which goes into the
shot noise but not into the coherently detected signal). Doppler shift makes this an AC
measurement, so from Section 1.5.2, the 1 Hz SNR is equal to the number of detected
signal photoelectrons per second, which is
SNR = n =
2π 2 ηPT (NAT )2 (NAR )2 dσ
where η is the detector’s quantum efficiency.
The detected SNR goes as the product of the Strehl ratios of the transmit and LO
beams. This provides a natural connection between aberrations and signal level (and
hence SNR), which is why the Strehl ratio is so useful for instrument designers.
Many books have been written on lens design, and lots of software exists to help. That’s
not what we’re talking about now, and is in fact beyond our scope. Optical design (in
the restricted sense used here) is concerned with sticking lenses and other parts together
to get the desired result. An analogy from electronics is IC design versus application
circuit design, with the lens being like the IC: most of the time you use standard ones,
but occasionally you have to do a custom one; it will be a better match, but will cost
something to design and will usually be more expensive in production unless you have
gigantic volumes.
Computerized exact ray tracing is not usually necessary in instrument design, although
if you have easy-to-use software for it, you might as well—it doesn’t make anything
worse, after all. On the other hand, we often don’t have the full optical prescription for
the lenses, so exactness doesn’t buy us a lot. Thick-lens paraxial ray optics is better than
good enough for layout, and our hybrid wave optics model plus some simple aberration
theory does the job of calculating image or beam quality, signal levels, and SNR.
If necessary, we can use ray tracing or numerical wavefront simulation to dial in the
design once we’ve arrived at a close approximation, but it is needed surprisingly seldom
since the fancy stuff is usually done at very low NA, where life is a lot easier. Apart
from etalon fringes, using mostly collimated beams (or parallel light in imaging systems)
makes it possible to add and remove components freely, with only minor effects on
optical quality.
9.6.1 Keep Your Eye on the Final Output
In Section 1.7.1, we used an expression for the detected photocurrent as the optical
system output, and that remains the right thing to do when aberrations and finite aperture
systems are considered—you just use the real pupil function instead of the paraxial one,
and pay attention to obliquity factors and the Strehl ratio. It’s an excellent way to know
just when our treatment of system aberrations is losing contact with instrument-building
reality. As we saw in Example 9.6, it leads to a natural interest in the Strehl ratio, which
appears in the photocurrent and SNR calculations. There are other things that matter
besides the photocurrent (e.g., geometrical distortion), but if you don’t see how to relate
the figure of merit under discussion to the actual instrument performance, find another
way of describing it until you can. Lens designers produce lenses for a living, and we
build electro-optical systems.
9.6.2 Combining Aberration Contributions
An optical system is made up of a series of imperfect elements, each contributing its
own aberrations. In order to combine them, we note that all the pupils in the system
are images of each other, and so the pupil functions multiply together. The exit pupil
of each element is imaged at the output of the system by the (ideal) imaging action of
all subsequent elements, and the resulting pupil functions multiplied together to get the
total pupil function of the system. Watch out for magnification differences—those pupils
won’t all be the same size, and the smallest one wins.
9.7.1 Spatial Filtering— How and Why
Spatial filtering is the deliberate modification or removal of some plane wave components
from a beam, and is completely analogous to the ordinary electronic filtering performed
in a radio. It is normally done by using a lens to Fourier transform the beam, inserting
at the focal plane a mask that is transparent in some places and opaque in others (such
as a slit or a pinhole), and then transforming back with a second lens. It is widely used
for cleaning up beams and for removing artifacts due to periodic structures in a sample
(e.g., IC lines).
Spatial filtering using a pinhole can make a uniform beam from a Gaussian one but
will waste around 75% of the light doing it. The smallness of the pinhole required for a
good result with a given NA may be surprising—the first Airy null is way too big (see
Example 9.9).
Spatial filters are not as widely used as one might expect, based on the analogy to
electrical filters. They require complex mechanical parts, which is a partial explanation.
The real trouble is tweakiness: they are difficult to align, easy to misalign, and sensitive
to mechanical and thermal drifts; if ordinary translation stages are used, an expensive
and very labor-intensive device results. These problems can be reduced by clever design,
for example, by using a laser or an in situ photographic process (with a fixed optical
system) to build the mask right inside the filter. If you need a pinhole spatial filter, use
a big pinhole (>20 μm diameter in the visible) and correspondingly low NA to get the
best stability. One other problem is that they nearly all have sharp edges, which isn’t
usually optimal.
9.7.2 How to Clean Up Beams
Laser beams are frequently rotten. Most gas lasers produce beams of reasonable quality,
but these are often surrounded by multiple reflections, scattered light, and spontaneous
emission. Diode lasers are worse; their beams suffer from astigmatism and are highly
asymmetric. We may need to free these beams of their bad associations in order to
make them useful, or to change their spatial distributions to something more convenient.
This is done through spatial filtering and the use of apertures (and occasionally special
elements such as anamorphic prism pairs). Since both of these operations are performed
by passing the beam through holes of various sizes, the distinction is somewhat artificial
but is nonetheless useful: apertures are used on the beam before focusing, and spatial
filters in the Fourier transform plane. A seat-of-the-pants test is that if it requires a fine
screw to adjust, it’s a spatial filter.
Putting an aperture some way out in the wings of the beam (say, four times the
1/e2 diameter) has little effect on its propagation characteristics, so use them freely to
reduce artifacts. If the artifacts are close to the beam axis, it may be helpful to let the
beam propagate for some distance before applying the aperture; a lens may be helpful
in reducing the optical path length this might otherwise require (here is where it shades
into spatial filtering). Appropriately placed apertures can turn a uniform beam into a
good Gaussian beam, or chop off the heavily aberrated wings of an uncorrected diode
laser beam.
Example 9.7: How Small Do I Have to Make My Aperture? Slits and pinholes are
frequently used to render an instrument insensitive to the direction from which incident
light originates. A monochromator (see Example 7.1) uses a slit to ensure that the light
incident on its grating arrives from one direction only; this maximizes its spectral selectivity. There’s obviously a trade-off between selectivity and optical throughput; because
different positions across the slit translate to different incidence angles on the grating, a
wider slit translates directly into lower spectral resolution.
More subtly, with a wide slit a change in the position or direction of the incoming
light can cause apparent spectral shifts. This is because the slit doesn’t obliterate the
spatial pattern of the incoming light, it just clips it at the slit edges, and so broadens its
angular spectrum. If the slit is made small enough, it can’t shift far laterally, and we
expect that the width of the diffraction pattern will swamp any likely angular change
from the incoming light. On the other hand, using a slit that narrow can make things
pretty dim. Exactly how narrow does it have to be?
As we saw in Section 5.7.9, scatterers tend to produce patterns that are aligned with
the incident light, but smeared out in angle. The same is true of slits and pinholes; in
the Fourier optics approximation, the (amplitude) angular spectrum of the incident light
is convolved with the Fourier transform of the aperture’s transmission coefficient.
If we illuminate the slit at an incidence angle θ , the main lobe of the sinc function
is aligned with the k vector of the incoming light, and the intensity along the normal
to the slit will decrease as θ increases. Since x/λ and u are conjugate variables, if we
require that a change of ±δ radians cause a fractional decrease of less than in the
central intensity of the diffraction pattern, the slit width w must obey
w sin δ <
so that for a relatively modest requirement, for example, a shift of less than 5% from
a ±10◦ rotation, w < 2.25λ. It is difficult to use slits this small, but not impossible.
Improving on this, such as requiring a 1% shift from a ±30◦ rotation, is impractical, as
it requires a slit very small compared with a wavelength.
Similar considerations govern the sensitivity to lateral motion of a pinhole in a focused
beam. The moral of this story is that although spatial filters can reduce the sensitivity of a measurement to angular shifts of illumination, a complete cure is not to be
TABLE 9.2. Effects on a Gaussian Beam of a Circular
Aperture of Radius r Placed at the Beam Waist
I (nom)
I (best)
found here. Single-mode fibers are dramatically better, because they control the direction
from which the light hits the pinhole (see Section 8.2.2). Unfortunately, they’ll often
make your light source dramatically noisier as well, by turning FM noise into AM; see
Section 8.5.13.
Example 9.8: How Small Can I Make My Aperture? On the other hand, if we have a
beam that has artifacts out in its wings, such as most gas laser beams, we would like to
make the aperture as small as possible without causing objectionable diffraction rings. If
the beam is Gaussian, the ripple amplitude is very sensitive to slight vignetting. Table 9.2
gives the maximum absolute deviation I from both the nominal and best-fit Gaussian
beam, in the transform plane, due to apertures of different radius placed at the beam
waist. Values have been normalized to a central intensity of 1.0.
Example 9.9: Using an Aperture to Make a Nearly Gaussian Beam. Let’s consider
the use of an aperture in the pupil to make a uniform beam into a reasonable approximation to a Gaussian beam. Although a uniform beam exhibits an extensive set of
diffraction rings, an appropriately placed aperture can greatly reduce their amplitude. A
uniform beam of amplitude E0 and radius a at a wavelength λ gives rise to a far-field
Airy pattern whose first null is at θ = 0.61λ/a. If the beam is Fourier transformed
by a lens of focal length f , so that the numerical aperture NA = a/f , the pattern is
given by
E(r) = E0 k NA2 jinc(krNA).
Figure 9.14 shows the result of truncating this pattern with a circular aperture at the
first Airy null, and recollimating with a second lens. Here the radius of the original
uniform beam was 100 μm and the wavelength was 530 nm. The Gaussian shown has
a 1/e2 (intensity) radius of 95 μm, and the sidelobes are below 1 part in 103 of the
peak intensity. Only about 15% of the light is lost. Note that the peak intensity is twice
that of the original uniform beam. Contrary to appearances, the total beam power has
gone down—the high peak intensity covers a very small area since it’s at the center.
A graded neutral density filter, which is the competing technique, cannot increase the
central value, so that it is limited to at most half this efficiency, and far less if we expect
the Gaussian to drop to nearly zero before the edge of the beam is reached; on the other
hand, it requires no lenses. (See Figure 9.15.)
Original Beam Gaussian
After Pupil Vignetting
Intensity (arbitrary)
Radius (μm)
Figure 9.14. Turning a uniform beam into a nearly Gaussian one with a pinhole of the same radius
as the first Airy null.
Original Beam
After Pupil Vignetting
Intensity (arbitrary)
Radius (μm)
Figure 9.15. The data of Figure 9.14 on an expanded scale.
9.7.3 Dust Doughnuts
In general, a bit of dust on a lens is no big deal. The exception is when the dusty surface
is near a focus, in which case dust is extremely objectionable, leading to strong shadows
called dust doughnuts. (The shadows are annular in shape in a Cassegrain telescope,
hence the name.) How far out of focus does the dust have to be?
Assuming the dust is less than 100 μm in diameter, and that a 1% intensity error is
acceptable, a focused spot has to be at least 1 mm in diameter at the dirty surface, so
the required defocus is
|δZdefocus | 1 mm
2 · NA
This discussion is indebted to an SPIE short course, “Illumination for Optical Inspection,”
by Douglas S. Goodman.
9.8.1 Flying-Spot Systems
A scanning laser microscope is an example of a flying-spot system, as opposed to a
full-field or staring system, in which a larger area is imaged at once. The first flying-spot
optical systems in the 1950s used an illuminated point on a cathode ray tube as the light
source, and a PMT as the detector because the spot was so dim. That at least had the
advantage of speed and no moving parts. Flying-spot systems are simple to analyze,
because their PSFs are the product of the illumination and detection spots (whether
amplitude or intensity is the appropriate description), and there is no problem with speckle
or scattered light smearing out the image.
9.8.2 Direction Cosine Space
Full-field systems require a bit more work to specify accurately. We’re usually using
thermal light from a bulb of some sort, so the illuminat