1 INVERSE OPTICAL DESIGN AND ITS APPLICATIONS by Julia Angela Sakamoto Copyright © Julia Angela Sakamoto 2012 A Dissertation Submitted to the Faculty of the COLLEGE OF OPTICAL SCIENCES In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2012 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by Julia A. Sakamoto entitle Inverse Optical Design and Its Applications and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy _________________________________________Date: 1/6/2012 Harrison H. Barrett _________________________________________Date: 1/6/2012 Russell A. Chipman _________________________________________Date: 1/6/2012 Eric W. Clarkson Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. _________________________________________Date: 1/6/2012 Dissertation Director: Harrison H. Barrett 3 STATEMENT BY AUTHOR This dissertation has been submitted in partial fulfillment of requirements for an advanced degree at the University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this dissertation are allowable without special permission, provided that accurate acknowledgment of source is made. Request for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the copyright holder. SIGNED: ____________________________________ Julia Angela Sakamoto 4 ACKNOWLEDGEMENTS I would like to express my deepest gratitude to all family and friends, near and far, who have offered encouragement and support, and a variety of memorable experiences throughout these past years. Thank you, Mom, for doing anything and everything to ensure my success and well-being throughout my academic life. No one is more supportive and selfless as you. Dad, I immensely value your advice, guidance, and words of wisdom. You are a model person in my life and have my utmost admiration and respect. The old adage, “I could not have done it without (both of) you,” is absolutely fitting. Kenneth, I am so lucky to have found you in this phase of my life and look forward to beginning the next one together. Thank you for your unwavering support and devotion, kind and thoughtful spirit, and our treasured conversations during this whole process. You are a gem. Harry, you have been such a wonderful teacher and mentor, and a terrific rolemodel. I have fond memories of the chalkboard brainstorming sessions and learning so many fascinating things from you. You have taught me more than mere skills -- you have refined my thinking and expanded my mind. I am very grateful for the many opportunities you have provided over the years. Thank you also to Dr. Pui Lam for those countless, invaluable problem-solving sessions, meaningful conversations, and your supreme dedication as an educator. You are one of those special teachers who truly make a lifelong impact. And thank you for pushing me out of the nest. ☺ This work was supported by Science Foundation Ireland under grant no. 01/PI.2/B039C and an E.T.S. Walton Fellowship for H. H. Barrett. Development of the basic methodology for parameter estimation was supported in part by the National Institutes of Health under grant numbers R37 EB000803 and P41 EB002035. Further support was received through the Biomedical Imaging and Spectroscopy (BMIS) and Technology and Research Initiative Fund (TRIF) fellowship programs at the University of Arizona, as well as Canon, Inc. Much appreciation to Robin Richards, Eugene Cochran, and Amy Phillips at the Office of Technology Transfer for your instrumental help in patenting Inverse Optical Design. 5 DEDICATION To Mom, Dad, Christina, Grandpa, Kenneth, Casey, Aiko, and Mia for your boundless love and support. 6 TABLE OF CONTENTS Page LIST OF FIGURES …………………………….…………………..…………..…..….12 LIST OF TABLES ………………………..………………………..………..……........23 ABSTRACT ..……….…………………...………………………..……….…………... 26 CHAPTER 1. INTRODUCTION ………….……………….......…………...……… 28 1.1. Application to vision science and ophthalmology ……..……..…...……..….. . 31 1.2. Application to optical shop testing …..……..………….……………....…….. 35 1.3. Dissertation overview…………………………………………………………. 42 CHAPTER 2. MAXIMUM-LIKELIHOOD ESTIMATION……………...…........ 45 2.1 Historical background …………………...…….……………………………... 46 2.2 Statement of the problem ………………………………………………....….. 49 2.3 Notation system and terminology…….……..………………………………....51 2.4 Fisher information .………………………………………..……….......……... 55 2.5 2.4.1 Score …………………………………………………………...……... 55 2.4.2 Fisher information matrix …...…………………………...…………... 56 2.4.3 Cramér-Rao inequality ……………..……………………….….……. 57 2.4.4 System design ………………………….……………………………... 64 Properties of ML estimators …………………………………………..…….... 64 2.5.1. Bias ...............................................................................………….…… 65 2.5.2. Variance and covariance .............................................…..………....... 67 2.5.3. Mean-square error ………………………………………….………... 68 7 TABLE OF CONTENTS – Continued Page 2.5.4 Asymptotic properties …...…………………………….…………….... 68 2.5.5 Invariance ……………………………………………...………….….. 71 2.5.6 Sufficiency ………………………………………………..…………... 73 2.6 Computer simulated experiments ………………………..………………….... 76 2.7 Nuisance parameters …………………..……………………….........………...78 2.8 Gaussian distributions and electronic noise ……..……………………..…….. 81 2.9 Practical challenges …..……………...….…………………………………..... 86 CHAPTER 3. OPTIMIZATION METHODS ........................................................... 88 3.1 Selecting a search algorithm …………………………………….………….....89 3.2. Global optimization algorithms ………………..…………………………....... 91 3.3. Simulated annealing ………………………….....………………………....…. 93 3.3.1 Overview ……………………………………………………………… 93 3.3.2. Basic concepts in statistical mechanics ..…….……..……..………..…97 3.3.3 The Metropolis algorithm ……….……………..….………….……... 103 3.3.4. Continuous minimization by simulated annealing ……….………….. 104 CHAPTER 4. PROPAGATION OF LIGHT …………………...…………….….. 118 4.1 The electromagnetic field ………………………………………………........ 119 4.1.1 Maxwell’s equations ……………………..……………………….…. 120 4.1.2 Constitutive relations ……..……………..…………………….…….. 122 4.1.3 Time-dependent wave equation ……...…..……………………...……125 8 TABLE OF CONTENTS – Continued Page 4.1.4 4.2 4.3 4.4 Time-independent wave equation …..…………………..………..….. 129 Plane waves and spherical waves ……………………...……………………. 131 4.2.1 Plane waves …………………………………………………....……. 132 4.2.2 Spherical waves …………………………………………..…………. 134 Geometrical optics …………………………………………………………... 135 4.3.1 The eikonal equation ………………….……………..………………. 135 4.3.2 Differential equation of light rays ………………….…….…………. 140 4.3.3 Refraction and reflection ……………………………………………. 142 Diffraction by a planar aperture ………...………………………..………….. 144 4.4.1 A brief history of diffraction theory …………………………………. 144 4.4.2 Geometry of the problem ……………………………………………. 146 4.4.3 Huygens’ principle ……………………………………………..……. 148 4.4.4 Fresnel diffraction …………………………………………......……. 150 4.4.5 Fraunhofer diffraction ………………………………………………. 153 CHAPTER 5. INVERSE OPTICAL DESIGN OF THE HUMAN EYE USING LIKELIHOOD METHODS AND WAVEFRONT SENSING .……..…………...... 155 5.1 Basic anatomy of the human eye ……………………………………………. 156 5.2 Ray-tracing through a schematic eye ………………….…………………….. 160 5.3 Shack-Hartmann wavefront sensors ……………………………….………... 172 5.3.1 Centroid estimation and Fisher information ………..………………. 176 9 TABLE OF CONTENTS – Continued Page 5.4 Data-acquisition system …………………………………….………….……. 179 5.4.1 System configuration ………………….…………….………………..179 5.4.2 Optical-design program …………………..…………………………. 181 5.5 Fisher information and Cramér-Rao lower bounds ……...….………………. 189 5.6 Likelihood surfaces …….……………………………………………………. 199 5.7 Maximum-likelihood estimation of ocular parameters ……………….……... 213 5.8 Summary of Chapter 5 ………………………………………………………. 218 CHAPTER 6. MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERIZED WAVEFRONTS USING MULTIFOCAL DATA……......... 222 6.1 Formulation of the problem ………..……………………….………….……. 222 6.2 Propagation algorithm .…..………………………………………….………..224 6.3 6.2.1 Diffraction propagation vs. ray-tracing …………………………….. 224 6.2.2 Diffraction equation for a converging spherical wave ….…………... 226 6.2.3 Parameterized wavefront description ……………….…………..…... 229 6.2.4 Sampling considerations …...…………………………….………….. 230 6.2.5 Parallel processing with the graphics processing unit ……………… 231 Numerical studies ………………………………………………………….... 234 6.3.1 Test lens description ……………………………………………..….. 234 6.3.2 Pupil sampling ………………………………………………………. 238 6.3.3 Fisher information and Cramér-Rao lower bounds ………...………. 242 10 TABLE OF CONTENTS – Continued Page 6.4 6.5 6.3.4 Likelihood surfaces ……..………………………………….………... 249 6.3.5 Maximum-likelihood estimates …………….………………………... 255 Experimental results ……………………………….………………………... 261 6.4.1 System configuration ……………….………………………………...261 6.4.2 Test lens description ……………….………………………………... 263 6.4.3 Experimental data ……………..…………………….………………. 266 6.4.4 Pupil sampling ………………………………………………………. 267 6.4.5 Huygens’ method vs. Fresnel propagation .…………………………. 269 6.4.6 Fisher information and Cramér-Rao lower bounds ………...………. 273 6.4.7 Likelihood surfaces ……..………………………………….………... 275 6.4.8 Nuisance parameters ………………………………………………... 278 6.4.9 Maximum-likelihood estimates …….………………………………... 281 Summary of Chapter 6 ………………………………………………………. 285 CHAPTER 7. INVERSE OPTICAL DESIGN FOR OPTICAL TESTING…….. 289 7.1 Inverse optical design of aspheric lenses ……………………………………. 289 7.1.1 Optical-design program ……………………………….…………….. 290 7.1.2 Test lens description and system configuration ………………….….. 295 7.1.3 Fisher information and Cramér-Rao bounds ……….…………...….. 301 7.1.4 Likelihood surfaces …………..……………………………………… 302 7.1.5 Maximum-likelihood estimates ..…………………………………….. 307 11 TABLE OF CONTENTS – Continued Page 7.2 7.3 Inverse optical design of GRIN-rod lenses …………………….……………. 310 7.2.1 Ray-tracing through a GRIN-rod lens ……………….………..…….. 310 7.2.2 Test lens description ……………………………………………..….. 318 7.2.3 Fisher information and Cramer-Rao bounds ……………………….. 321 7.2.4 Likelihood surfaces …………………………..……………………… 322 Summary of Chapter 7 ………………………………………………………. 324 CHAPTER 8. CONCLUSION AND FUTURE WORK ......................................... 326 APPENDIX A. FRINGE ZERNIKE POLYNOMIALS ……………….....………. 331 APPENDIX B. LIST OF ACRONYMS ………………………….……….…….…. 336 REFERENCES ……..………………………………………………………..….…….338 12 LIST OF FIGURES Figure 1.1 Page System configuration for estimating patient-specific ocular parameters, based on a clinical Shack-Hartmann aberrometer for measurement of aberrations ….... 29 1.2 Anisoplanatism involving pupil aberrations. Image from Stéphane Chamot (National University of Ireland, Galway) …………………………...…………... 32 1.3 Basic test configuration for performing inverse optical design of a GRIN-rod lens ………………..…………………………..…………………………...…….. 36 1.4 Basic system configuration for parameterized wavefront measurement with a single source ………………………………………………………….….……...... 37 1.5 Basic system configuration for parameterized wavefront measurement with an aspheric test element and multiple point source locations …………………… 39 1.6 Basic system configuration for augmented wavefront measurement with a Shack-Hartmann WFS and multiple point source locations …………………….. 41 2.1 Example of a probability distribution of θˆ conditioned on θ ….....…..………... 66 3.1 A multimodal test function in two dimensions, exhibiting a high degree of nonlinearity and various local minima …………………………..…….…………. 91 3.2 Illustration of the travelling salesman problem and its solution ...…….………… 97 3.3 The simulated annealing algorithm implemented by Corana et al.(1987) ..……. 106 4.1 Geometry for diffraction by a planar aperture (Barrett & Myers, 2004) ........…. 147 5.1 Basic anatomy of the human eye, as seen through a cross-sectional view .......... 157 13 LIST OF FIGURES - Continued Figure 5.2 Page Geometrical eye model corresponding to parameters in Table 5.2, with an on-axis source and 8-mm pupil to demonstrate spherical aberration ……..……. 172 5.3 Shack-Hartmann WFS measuring a perfect incoming wavefront ........................ 174 5.4 Shack-Hartmann WFS measuring an aberrated incoming wavefront ..........…… 174 5.5 Blurred spot profiles in the focal plane of a Shack-Hartmann WFS ...………..... 177 5.6 Data acquisition system for estimating ocular parameters …............................... 181 5.7 Geometrical eye model used to generate WFS data, corresponding to ocular parameters in Table 5.2 .....……………………………………………………... 183 5.8 WFS data used as input to inverse optical design, for beam angle α = 0° ……... 184 5.9 WFS data used as input to inverse optical design, for beam angle α = 6° ……... 185 5.10 WFS data used as input to inverse optical design, for beam angle α = 12° ……. 186 5.11 Focal spot on the retina for a source beam angle of α = 0° ……………............. 187 5.12 Focal spot on the retina for a source beam angle of α = 6° …………………..... 188 5.13 Focal spot on the retina for a source beam angle of α = 12° …………………... 188 5.14 FIM for the chosen system configuration (log scale) …………………..………. 191 5.15 Inverse of the FIM for the chosen system configuration (log scale) ………….... 192 5.16 FIM for the system after increasing the detector element size ………………..... 194 5.17 Inverse of the FIM after increasing the detector element size …………………. 195 5.18 Detector data for α = 0° after increasing the beam and pupil diameters …….….196 14 LIST OF FIGURES - Continued Figure Page 5.19 FIM for the system after increasing the beam and pupil diameters ……………. 197 5.20 Inverse of the FIM after increasing the beam and pupil diameters …………….. 197 5.21 FIM for the system after reducing the number of beam angles ………………....198 5.22 Inverse of the FIM after reducing the number of beam angles ……………….... 199 5.23 Likelihood surface along Rcornea,posterior and Rlens,anterior axes. Final ML estimates indicated by × sign ……………………………………………………201 5.24 Likelihood surface along Rcornea,posterior and Rlens,posterior axes. Final ML estimates indicated by × sign ………………………………………..…………. 202 5.25 Likelihood surface along Rcornea,posterior and ∆tcornea axes. Final ML estimates indicated by × sign ……………..................................................................……. 202 5.26 Likelihood surface along Rcornea,posterior and ∆tant.chamber axes. Final ML estimates indicated by × sign …………......………………………………......... 203 5.27 Likelihood surface along Rcornea,posterior and ∆tlens axes. Final ML estimates indicated by × sign ……………..……………………………………............…. 203 5.28 Likelihood surface along Rcornea,posterior and ∆tvitreous axes. Final ML estimates indicated by × sign …………………………………………….…....………...... 204 5.29 Likelihood surface along Rcornea,posterior and ncornea axes. Final ML estimates indicated by × sign ……….……...…………………………………….....…….. 204 15 LIST OF FIGURES - Continued Figure Page 5.30 Likelihood surface along Rcornea,posterior and nant.chamber axes. Final ML estimates indicated by × sign ……………….………………….…….........…… 205 5.31 Likelihood surface along Rcornea,posterior and nlens axes. Final ML estimates indicated by × sign ………………………………………………………..……. 205 5.32 Likelihood surface along Rcornea,posterior and nvitreous axes. Final ML estimates indicated by × sign ………………………………………………………..……. 206 5.33 Likelihood surface along ∆tcornea and ncornea axes. Final ML estimates indicated by × sign ………..………………………………………….…….…... 207 5.34 Likelihood surface along ∆tant.chamber and nant.chamber axes. Final ML estimates indicated by × sign …………………………………………………………...… 207 5.35 Likelihood surface along ∆tlens and nlens axes. Final ML estimates indicated by × sign …………………………………………...……………………..………... 208 5.36 Likelihood surface along ∆tvitreous and nvitreous axes. Final ML estimates indicated by × sign ……………………………………………………….…….. 208 5.37 Understanding the likelihood as a function of defocus. P1 corresponds to the true minimum and a myopic eye (focuses before retina); P3, P4, and P5 are high points and correspond to zero defocus; P2 corresponds to a hyperopic eye (focuses behind retina) …………………………………………….………. 210 5.38 Level of defocus at P1 (Rcornea,posterior = 6.381 mm, nvitreous = 16.40 mm) ………. 211 16 LIST OF FIGURES - Continued Figure Page 5.39 Level of defocus at P2 (Rcornea,posterior = 6.000 mm, nvitreous = 15.50 mm) ………. 211 5.40 Level of defocus at P3 (Rcornea,posterior = 6.188 mm, nvitreous = 15.97 mm) ………. 212 5.41 Level of defocus at P4 (Rcornea,posterior = 6.512 mm, nvitreous = 15.85 mm) ………. 212 5.42 Level of defocus at P5 (Rcornea,posterior = 5.871 mm, nvitreous = 16.09 mm) ………. 213 5.43 16 simulated annealing trials for the estimation of ocular parameters …………. 215 5.44 Reconstructed eye model of the estimated parameters, superimposed with the true values underlying the data ……..…….………………………...…………...218 6.1 Data-acquisition system for collecting multiple irradiance patterns near the focus of an optical element …………………………………….………………. 224 6.2 Focal region of the highly aberrated test lens. Paraxial focal plane is at z = zf = 157.8 mm ……………………………………………………………… 236 6.3 Wavefront error in the exit pupil of the highly aberrated test lens as a function of normalized radius. Units are in waves …………………..………………….. 238 6.4 Detector data at z = z1 for the highly aberrated test lens using a pupil sampling of: (a) P = 1024, (b) P = 512, and (c) P = 256 ………………………………… 239 6.5 Detector data at z = z2 for the highly aberrated test lens using a pupil sampling of: (a) P = 1024, (b) P = 512, and (c) P = 256 ………………………………… 240 6.6 Detector data for the highly aberrated test lens using a pupil sampling of P = 1024 at image plane: (a) z = z1 and (b) z = z2 ………………….......……… 241 17 LIST OF FIGURES - Continued Figure 6.7 Page FIM for Fringe Zernike coefficients {αn, n = 2,…, 37} in the exit pupil of the highly aberrated test lens (log scale) ………………………………………….... 243 6.8 Inverse of the FIM for Fringe Zernike coefficients {αn, n = 2,…, 37} in the exit pupil of the highly aberrated test lens (log scale) ………………………………245 6.9 FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of The highly aberrated test lens (log scale) ……………………………..………... 247 6.10 Inverse of the FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the highly aberrated test lens (log scale) …………………………..248 6.11 Likelihood surface along the α4 (defocus) and α9 (primary spherical aberration) axes for the highly aberrated test lens …………………………...….250 6.12 Likelihood surface along the α4 (defocus) and α25 (tertiary spherical aberration) axes for the highly aberrated test lens …………………………….... 251 6.13 Likelihood surface along the α9 (primary spherical aberration) and α16 (secondary spherical aberration) axes for the highly aberrated test lens ………. 251 6.14 Likelihood surface along the α16 (secondary spherical aberration) and α25 (tertiary spherical aberration) axes for the highly aberrated test lens ………….. 252 6.15 Likelihood surface along the α4 (defocus) and α5 (primary astigmatism at 0°) axes for the highly aberrated test lens ………………………….………………. 253 18 LIST OF FIGURES - Continued Figure Page 6.16 Likelihood surface along the α25 (tertiary spherical aberration) and α7 (primary coma, x-axis) axes for the highly aberrated test lens ……...…………. 253 6.17 Likelihood surface along the α2 (tilt, y-axis) and α3 (tilt, x-axis) axes for the highly aberrated test lens ……………………………………………………….. 254 6.18 Likelihood surface along the α5 (primary astigmatism at 0°) and α7 (primary coma, x-axis) axes for the highly aberrated test lens …..………………………. 254 6.19 Likelihood surface along the α3 (tilt, y-axis) and α8 (primary coma, y-axis) axes for the highly aberrated test lens ………………………………………….. 255 6.20 12 simulated annealing trials for the estimation of wavefront parameters in the exit pupil of the highly aberrated test lens (log-log scale) …………………. 257 6.21 Comparison between the true data and estimated irradiance patterns for the highly aberrated test lens ………………………………………….……………. 260 6.22 Data-acquisition system for collecting multiple irradiance patterns near the focus of a spherical test lens, including a movable imaging lens …………………….. 262 6.23 Focal region of the spherical test lens. Paraxial focal plane is at z = zf = 90.83 mm ……………………………………….............................................. 264 6.24 Theoretical wavefront error in the exit pupil of the spherical lens as a function of normalized radius. Units are in waves ………………….…………. 266 19 LIST OF FIGURES - Continued Figure Page 6.25 Experimental data for the spherical test lens for image planes: (a) z = z1 and (b) z = z2. Scale bar corresponds to the intermediate image plane just before the imaging lens …………………………………………………………………267 6.26 Detector data at z = z1 for spherical lens using pupil sampling of: (a) P = 512, (b) P = 256, and (c) P = 128 ....…………………………………………………. 268 6.27 Detector data at z = z2 for spherical lens using pupil sampling of: (a) P = 512, (b) P = 256, and (c) P = 128 ……………………………………………………. 268 6.28 Irradiance data at z = z1 for the spherical lens: (a) Fresnel approximation, (b) Huygens integral, (c) difference ……………….……………..…………….. 271 6.29 Irradiance data at z = z2 for the spherical lens: (a) Fresnel approximation, (b) Huygens integral, (c) difference ……………….…..……………………….. 272 6.30 FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens (log scale) ……………………………..………...………. 273 6.31 Inverse of the FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens (log scale) ………………………….……… 274 6.32 Likelihood surface along α4 (defocus) and α16 (secondary spherical aberration) axes for the spherical test lens ……….……………...……………... 276 6.33 Likelihood surface along α7 (primary coma, x-axis) and α9 (primary spherical aberration) axes for the spherical test lens ………………………………...…… 276 20 LIST OF FIGURES - Continued Figure Page 6.34 Likelihood surface along α2 (tilt, x-axis) and α4 (defocus) axes for the spherical test lens ………………………………………………………………. 277 6.35 Likelihood surface along α5 (primary astigmatism at 0°) and α16 (secondary spherical aberration) axes for the spherical test lens ………………….………... 277 6.36 Likelihood surface along α2 (tilt, x-axis) and α7 (primary coma, x-axis) axes for the spherical test lens ……………………………………………………….. 278 6.37 Determining the nuisance parameters in the system for image plane z = z1 via a 2D grid search prior to the estimation of wavefront parameters ………..……. 280 6.38 Determining the nuisance parameters in the system for image plane z = z2 via a 2D grid search prior to the estimation of wavefront parameters ………..……. 281 6.39 12 simulated annealing trials for the estimation of wavefront parameters in the exit pupil of the spherical test lens …………………………………………. 282 6.40 Comparison between the true data and estimated irradiance patterns for the spherical test lens ………………………………………………….…………….285 6.41 Data-acquisition system for collecting multiple irradiance patterns near the focus of an optical element, including a movable diffuser and imaging lens…... 287 7.1 Ray-trace data from our CUDA algorithm for the precision asphere ………..… 297 7.2 Ray-trace data computed by ZEMAX for the precision asphere …………..... 297 21 LIST OF FIGURES - Continued Figure 7.3 Page Irradiance data computed at: (a) z = 95 mm after lens for on-axis source, (b) z = 100 mm for same on-axis source, and (c) z = 90 mm for off-axis source …………………………………………………………………………....299 7.4 Irradiance data computed with ZEMAX at: (a) z = 95 mm after lens for on-axis source, (b) z = 100 mm for same on-axis source, and (c) z = 90 mm for off-axis source ……………………………………………………………… 300 7.5 (a) FIM and (b) inverse of the FIM for prescription parameters describing the precision asphere (logarithmic scale)…………………………..…………….…. 302 7.6 Likelihood surface along RC and κ axes. Global minimum is located at center of plot ………………………………………………………………...………… 304 7.7 Likelihood surface along RC and α4 axes. Global minimum is located at center of plot. (logarithmic scale) …………………………………………………….. 304 7.8 Likelihood surface along RC and α6 axes. Global minimum is located at center of plot. (logarithmic scale) …………………………………………………….. 305 7.9 Likelihood surface along κ and α4 axes. Global minimum is located at center of plot ………………………………………………………...………………… 305 7.10 Likelihood surface along κ and α6 axes. Global minimum is located at center of plot …………………………………………………………………….…….. 306 22 LIST OF FIGURES - Continued Figure Page 7.11 Likelihood surface along κ and α6 axes. Global minimum is located at center of plot …………………………………………………………………………... 306 7.12 20 simulated annealing trials for the estimation of prescription parameters describing the precision asphere ………………………………………….……..308 7.13 Refractive index distribution of the GRIN-rod lens …………….…………….... 319 7.14 Real eikonal rays traced through the GRIN-rod lens. Plot is expanded in the transverse direction to show detail ……….…………………………………….. 320 7.15 (a) Irradiance distribution in the detector plane and (b) irradiance profile for the GRIN-rod test lens ……………………….…………………………………. 320 7.16 (a) FIM and (b) inverse of the FIM for the parameters describing the refractive index distribution of the GRIN-rod lens (logarithmic scale)……...……………. 321 7.17 Likelihood surface along n0 and g axes. Global minimum is located at center of plot …….…………………………………………………………………….. 323 7.18 Likelihood surface along n0 and h4 axes. Global minimum is located at center of plot……………………….…………………………………...……………… 323 7.19 Likelihood surface along g and h4 axes. Global minimum is located at center of plot…………..……………………………………………………………….. 324 A.1 Fringe Zernike Polynomials 2-37……………………………………………….. 335 23 LIST OF TABLES Table Page 5.1 Navarro wide-angle schematic eye model at λ = 780 nm ………..…………….. 162 5.2 Geometry of eye model used to generate WFS data ……..……...…...……….... 163 5.3 Square-root of the CRB (standard deviation) for various system configurations …………………....……………………………………...……… 193 5.4 Estimated ocular parameters, including the true values, starting point in the search, upper and lower limits in the search space, and estimated values with standard deviations ……………………………………………………………... 217 6.1 Product specifications for NVIDIA Tesla C1060 and C2075 models …..….233 6.2 System data provided by ZEMAX™ for the highly aberrated test lens at λ = 0.6328 µm …………………………………………..……………………… 235 6.3 Fringe Zernike coefficients {αn, n = 1,…, 37}, peak-to-valley, RMS, and variance, provided by ZEMAX™ for the highly aberrated test lens. Unlisted coefficients are zero ………………………………………………..….……….. 237 6.4 Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 37} in the exit pupil of the highly aberrated test lens ……………………..………....... 246 6.5 Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the highly aberrated test lens ……………………….………….248 6.6 Range in likelihood surface plots for Fringe Zernike coefficients {αn, n = 4, 9, 16, 25, 36, 37} in the exit pupil of the highly aberrated test lens …………..…...249 24 LIST OF TABLES, Continued Table 6.7 Page ML estimates of wavefront parameters for the highly aberrated test lens, including their standard deviations and the starting point in the search .……… 258 6.8 System data provided by ZEMAX™ for the spherical test lens at λ = 0.6328 µm ………………………………………………………………….. 263 6.9 Fringe Zernike coefficients {αn, n = 1,…, 37}, peak-to-valley, RMS, and variance, provided by ZEMAX™ for the spherical test lens. Unlisted coefficients are zero ………………………………..……………………………265 6.10 Computation time using Huygens’ method for a pupil sampling of 256 × 256 and various detector grid sizes ………………………………………………..... 270 6.11 Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens ………………………………………..... 274 6.12 Range in likelihood surface plots for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens …………...…………………... 275 6.13 ML estimates of wavefront parameters for the spherical test lens, including their standard deviations. Design values were used as a starting point in the search. …….…………………………………………………………………..... 284 7.1 True values of parameters underlying the irradiance data for the precision asphere, and design values of Edmund Optics Precision Asphere NT47-731…………... 295 25 LIST OF TABLES, Continued Table 7.2 Page System data provided by ZEMAX™ for the precision asphere at λ = 0.6328 µm ……………………………………………………………………………….296 7.3 Square-root of the CRB for prescription parameters describing the precision asphere ……………………….……………………………………….………… 302 7.4 Range in likelihood surfaces for parameters describing the precision asphere, relative to the true values ………………………………………………..………303 7.5 ML estimates of prescription parameters describing the precision asphere, including standard deviations. Design values were used as a starting point in the search …………………………………….…………………………………. 309 7.6 Design parameters of the GRIN-rod test lens at an arbitrary design wavelength. Included are the distances in the optical system used in the simulations ………. 318 7.7 Square-root of the CRB for the parameters describing the refractive index distribution of the GRIN-rod lens ……………………………………………… 321 7.8 Range in likelihood surfaces for parameters describing the GRIN-rod lens, relative to the true values …………………………...………………………….. 322 A.1 Fringe Zernike Polynomials {Zn, n = 1,…, 37}………………………………… 333 26 ABSTRACT We present a new method for determining the complete set of patient-specific ocular parameters, including surface curvatures, asphericities, refractive indices, tilts, decentrations, thicknesses, and index gradients. The data consist of the raw detector outputs of one or more Shack-Hartmann wavefront sensors (WFSs); unlike conventional wavefront sensing, we do not perform centroid estimation, wavefront reconstruction, or wavefront correction. Parameters in the eye model are estimated by maximizing the likelihood. Since a purely Gaussian noise model is used to emulate electronic noise, maximum-likelihood (ML) estimation reduces to nonlinear least-squares fitting between the data and the output of our optical design program. Bounds on the estimate variances are computed with the Fisher information matrix (FIM) for different configurations of the data-acquisition system, thus enabling system optimization. A global search algorithm called simulated annealing (SA) is used for the estimation step, due to multiple local extrema in the likelihood surface. The ML approach to parameter estimation is very time-consuming, so rapid processing techniques are implemented with the graphics processing unit (GPU). We are leveraging our general method of reverse-engineering optical systems in optical shop testing for various applications. For surface profilometry of aspheres, which involves the estimation of high-order aspheric coefficients, we generated a rapid raytracing algorithm that is well-suited to the GPU architecture. Additionally, reconstruction of the index distribution of GRIN lenses is performed using analytic 27 solutions to the eikonal equation. Another application is parameterized wavefront estimation, in which the pupil phase distribution of an optical system is estimated from multiple irradiance patterns near focus. The speed and accuracy of the forward computations are emphasized, and our approach has been refined to handle large wavefront aberrations and nuisance parameters in the imaging system. 28 CHAPTER 1 INTRODUCTION In traditional optical design, a trial configuration of optical components is entered into a computer, rays are traced, and the images of one or more point objects are computed; then the configuration is altered in some way to improve the images. At each step in this iteration, the problem can be stated: given the optical system, find the image. While this process is invaluable for many uses, simply changing our view of the problem has potential for various powerful applications. We have developed a unique method which we refer to as inverse optical design (IOD); that is, given the image, find the system. In other words, by obtaining the data at some output plane, we can estimate the set of parameters describing the optical system. The basic method has been patented by Barrett, Sakamoto, and Goncharov (2010). The original motivation of this research is to develop a new technique for studying the time-varying, optical properties of the eye of an individual patient, either for clinical ophthalmology or basic research. The imaging system is based on a ShackHartmann aberrometer for measurement of aberrations in human eyes, in which an incoming light wave provided by a laser diode is distorted as it enters and leaves through the complicated, dynamic optical properties of the eye. The image in this case consists of the output in the focal plane of a Shack-Hartmann wavefront sensor (WFS), a device that measures the distortions of a wavefront and provides very useful information about 29 aberrations in an optical system. In essence, these data are used to estimate surface curvatures, conic constants, refractive indices, thicknesses, tilts and decentrations of all components in the eye, as well as the graded-index (GRIN) distribution of the crystalline lens, which has not previously been achieved using a single ocular diagnostic system. The patient-specific eye model could then be used as a theoretical basis for vision correction of higher-order aberrations, or to develop data bases for the diagnosis of pathologies, facilitate a broad range of critical studies in vision science, or optimize a multi-conjugate adaptive-optics (MCAO) system for imaging the entire retina with a substantial improvement in resolution. Fig. 1.1: System configuration for estimating patient-specific ocular parameters, based on a clinical Shack-Hartmann aberrometer for measurement of aberrations. 30 Inverse optical design relies on computational methods, incorporating an optical design program and statistical analysis. Data are taken from one or more output planes of the optical system, then entered into a computerized optimization algorithm, invoking statistical approaches like maximum-likelihood estimation or maximum a posteriori (MAP) estimation. ML estimation essentially perform a search through parameter space to find the set of parameters that maximizes the probability of occurrence of the observed data, by comparing the data to the output of the optical design program. MAP estimation is a generalization of ML estimation which accepts some prior knowledge on the probability distribution of the parameters. We chose to implement ML estimation for this research. The performance of such estimators can be analyzed with the Fisher information matrix, as it provides the theoretical minimum possible variance in those estimates, referred to as the Cramér-Rao lower bound. It essentially measures information content in the system in terms of the sensitivity of the data to changes in each parameter, but also reveals any parametric coupling, including coupling in the estimates. A significant limitation to overcome in inverse optical design is that the ML estimation step is very time-consuming, so making it practical requires the development of a proficient search algorithm, as well as dedicated computer hardware. Improvement of computational time can be achieved by parallelizing the optical design program describing the forward model of the system. Parallel algorithms can be implemented on a variety of hardware platforms, such as the cell processor used in the Sony PlayStation 3 or the graphics processing unit in high-performance video cards. 31 It should be noted that inverse optical design is a nonlinear estimation problem, since the data depend nonlinearly on the parameters. Similar methods have been applied in astronomy (Redding, 1993), in which image data are used to estimate optical prescription parameters (i.e., first-order geometrical parameters), a process referred to as prescription retrieval. The original application was to estimate the conic constant of the Hubble Space Telescope primary mirror, after it was configured incorrectly and produced an unexpected level of spherical aberration. 1.1 Application to vision science and ophthalmology The availability of a complete patient-specific eye model would provide many key advantages in vision science and ophthalmology. For example, the model could be used as a theoretical basis for vision correction of higher-order aberrations in laser refractive surgery or with corrective lenses, since classical devices only measure and correct for basic refractive errors (i.e., defocus and astigmatism). Accurate determination of optical parameters in normal and abnormal eyes could also be valuable in developing data bases for clinical diagnosis of pathologies, while measurements of ocular surface misalignments would be useful after implantation of intraocular lenses in cataract surgery (Rosales & Marcos, 2006; Rosales, Dubbelman, Marcos, & Van der Heijde, 2006a; Tabernero, Benito, Nourrit, & Artal, 2006). Moreover, knowledge of the refractive index distribution in the lens may be beneficial in optical coherence tomography, which is based on interferometric reflectometry and index changes (Jones, Atchison, Meder, & Pope, 2005; Moffat, Atchison, & Pope, 2002). It could additionally lead to a substantial 32 improvement in retinal imaging. For instance, current adaptive-optics (AO) ophthalmoscopes incorporating a Shack-Hartmann wavefront sensor and wavefront corrector conjugated to a single surface of the eye offer high resolution (Hofer et al., 2001; Roorda et al., 2002), but over a very limited field-of-view (FOV) (Liang, Williams, & Miller, 2007) due to a form of anisoplanatism involving aberrations of the eye. Aberrations collected over different field positions on the retina result from the passage through different parts of the ocular media so that the AO correction is valid only over a certain field area, referred to as the isoplanatic patch (Fig. 1.2). One solution is to conjugate multiple wavefront sensors and correctors to various refractive surfaces in the eye, thereby increasing the isoplanatic patch size and enabling wide-field measurements, but choice of the optimal planes at which to conjugate the correctors would be facilitated by knowing the real eye structure of the individual. Fig. 1.2: Anisoplanatism involving pupil aberrations. Image from Stéphane Chamot (National University of Ireland, Galway). 33 In addition to improvements in vision correction and retinal imaging, the availability of patient-specific parameters could facilitate a broad range of ongoing vision science studies. Of significant interest is the in vivo GRIN distribution and lenticular geometry of the human crystalline lens as a function of both age and accommodation (Hemenger, Garner, & Ooi, 1995; Rosales et al., 2006a; Smith, Atchison, & Pierscionek, 1992), but this information has been difficult to obtain, and reliable measurements are scarce (Dubbelman & Van der Heijde, 2001; Jones et al., 2005; Liang et al., 1997; Moffat et al., 2002; Navarro, Santamaría, & Bescós, 1985). While previous studies suggested that aspheric surfaces in the anterior segment and an effective refractive index for the lens are sufficient to model spherical aberration, lack of knowledge regarding the GRIN distribution precludes both the prediction of off-axis aberrations and study of dispersion in the lens, so that experimental data are limited (Navarro et al., 1985). A complete mapping of the human eye could also be used to evaluate intersubject variability and statistical variations, as well as vision performance and image quality in the central and peripheral visual fields (Navarro, Moreno, & Dorronsoro, 1998; Sheehan, Goncharov, O’Dwyer, Toal, & Dainty, 2007), which could be enhanced by accurate measurement of the retinal curvature (Escudero-Sanz & Navarro, 1999; Mallen & Kashyap, 2007). Another fundamental study in physiological optics is how individual ocular components factor into the overall performance of the human eye (Artal & Guirao, 1998) and how such performance would change if one or more surfaces are altered, a critical element in surgical procedures. While schematic eyes have been extremely useful for that purpose, they often lack asymmetry such as decentration of the lens or pupil, 34 which manifest in the fovea as aberrations of non-axially-symmetric systems (e.g., coma, astigmatism, and transverse chromatic aberration) and may have a significant impact on ocular performance (Bará & Navarro, 2003; Rynders, Lidkea, Chisholm, & Thibos, 1995). A patient-specific mapping of the entire eye, including non-axially-symmetric components, would enable further investigations that have been previously unapproachable. The method of inverse optical design could provide an in vivo, non-invasive, and complete mapping of the human eye, including dozens of parameters that are essential to an accurate representation of the eye and its aberrations. Existing in vivo methods supply a small subset of ocular parameters. For example, a common technique in phakometry uses Purkinje images of the back reflections from the anterior and posterior surfaces of both the cornea and crystalline lens, providing basic curvatures, tilts, and decentrations (Rosales & Marcos, 2006; Rosales et al., 2006a). However, one difficulty in this approach is that insufficient knowledge of the refractive index distribution of the lens leads to significant measurement errors in the lens posterior radius (Schwiegerling, 2004). Scheimpflug slit imaging is increasingly being used to obtain sharp crosssectional images of the anterior eye segment, imparting surface shapes, misalignments, and intraocular distances, although accurate determination of these parameters relies on the correction of optical distortions in the imaging system and within the eye itself (Rosales & Marcos, 2006). Distortion due to the geometry of the Scheimpflug camera can be corrected analytically with relative ease, but correction of distortion due to refraction at intermediate ocular surfaces is much less approachable. Measurements of a 35 particular surface are subjected to refraction at all successive surfaces (Dubbelman & Van der Heijde, 2001; Koretz, Strenk, Strenk, & Semmlow, 2004; Rosales et al., 2006a) and traversal through media of individually varying thickness and curvature. Hence, arbitrary quantification errors in one surface are propagated throughout the system (Dubbelman, Weeber, Van der Heijde, & Völker-Dieben, 2002). Conversely, magnetic resonance imaging has recently been used for in vivo visualization of structures in the anterior segment, which eliminates the distortion dilemma (Koretz et al., 2004), but suffers from low resolution, signal-to-noise ratio (SNR) constraints, and eye motion artifacts due to longer acquisition times (Strenk et al., 1999). On the other hand, corneal topography is a rapidly developing technique that provides very detailed and reliable measurements regarding corneal curvature (Schwiegerling, Greivenkamp, & Miller, 1995; Navarro, González, & Hernández, 2006; Zhou, Hong, Miller, Thibos, & Bradley, 2004; Guirao & Artal, 2000), including astigmatism and surface irregularities, although it does not provide information about the remaining ocular surfaces. However, such accurate corneal information could be used to supplement or validate the parameter estimates acquired with our system, or even used as input to inverse optical design to narrow the high-dimensional parameter space. 1.2 Application to optical shop testing Although inverse optical design has an ophthalmic origin, the basic concept of parameter estimation is much more widely applicable; the same technique could be applied to any situation where the parameters of an optical system are desired. Our method of optical 36 testing via parametric modeling has applications in precision testing of optical components and systems for commercial, industrial, military, and aerospace purposes. Several examples are coating and surface profilometry of aspheres, measurement of aberrations in intraocular lenses (IOLs) and contact lenses, laser machining, and tomographic reconstruction of the three-dimensional refractive index distribution in GRIN lenses or fiber optic cables (Fig. 1.3). Fig. 1.3: Basic test configuration for performing inverse optical design of a GRIN-rod lens. We also need not restrict ourselves to the estimation of geometrical parameters by means of ray-tracing. Another application area is phase-retrieval, the process of trying to recover the wavefront error in an optical system such as a group of lenses, given one or more irradiance measurements near focus. If we consider the coefficients in an arbitrary wavefront expansion as the estimable parameters, we could in principle measure the wavefront error produced by the optical system. Since the wavefront is treated as a 37 continuous function, this process avoids conventional gridlocks such as aliasing or phase ambiguities and has potential for measuring extremely large aberrations, a problem that confounds traditional interferometry. This technique employs diffraction propagation and is therefore practical on both the microscopic and macroscopic scales from microoptics to large telescope mirrors, and for reflective or transmissive parts. It is also practical for measuring very large peak-to-valley wavefront errors. Figure 1.4 shows the basic system configuration for our method of parameterized wavefront measurement with a reflective test element, simply requiring a point source and detector array. Fig. 1.4: Basic system configuration for parameterized wavefront measurement with a single source. 38 Our method of parameterized wavefront measurement is similar to that of Brady and Fienup (2004, 2005a), but we extend it in various ways. We perform system optimization by investigating the FIM and associated CRB, as well as the probability surface. Furthermore, we refined our approach to deal with large-aberration wavefronts, plus we perform rapid processing on the GPU platform. We devise methods for dealing with common practical issues. For instance, a single source and detector may be adequate for sensing the focal spots generated by small-scale optical elements, but will be insufficient for dealing with large aspheric telescope mirrors that are designed to image sources at infinity as the nominal spot will not fit onto the detector array. As an example, a parabolic mirror with a 3.5-meter diameter and an f-number of 1.5 (i.e., a high light-collection efficiency) creates a spot size of 100 millimeters, many times larger than most CCD detectors used in scientific cameras. If the entire spot is not detected, parts of the mirror will be invisible to the system and would manifest as phase errors in the estimated aberration function of the mirror. One feasible solution is to use an array of spatially-separated identical point sources that scan the entire mirror as viewed by a single detector at a fixed location (Fig. 1.5). Each source contributes optically to the total phase aberration contained in sequential images, such that accurate recovery of the entire aberration function is possible. In a phase-diversity approach, differences between images are used as input to the nonlinear optimization and common-mode information falls out, which could serve as a technique to reduce the stray light entering the problem. Additionally, this configuration contains no moving parts or intervening optics and is cost-efficient 39 compared to a multiple-detector system. Another possible solution involves the use of projection and relay optics to transfer the large focal spot to the smaller detector array, which involves placing a large field lens and projection screen at the intermediate focal plane. If the incorporated lenses are placed in respective focal planes of the system, they are not considered intervening optics do not introduce wavefront phase distortions, plus they can easily be built into the forward computation model used in the optimization routine. Fig. 1.5: Basic system configuration for parameterized wavefront measurement with an aspheric test element and multiple point source locations. 40 A unique feature of our method involves the examination of augmented systems in which an additional element is introduced into the optical path for sake of greater information yield. A worthy candidate for this purpose is a Shack-Hartmann WFS, containing a two-dimensional array of small lenslets that samples the incoming wavefront and produces an array of blurred spots in the focal plane. In conventional wavefront sensing, an algorithm processes the detected image and computes the centroids of the spots. The centroids are used to estimate the average local wavefront slopes, which are combined to give a rough reconstruction of the wavefront. Discrete samples of the continuous wavefront, which contains an infinite number of points, may work decently when low aberrations are present, but undersampling of a rapidly varying wavefront will lead to aliasing and phase ambiguities. Also, useful information regarding the finer structure of the wavefront is also thrown away during the centroiding process. In our method, we do not perform centroiding, but instead use the raw detector outputs in the detector focal plane. Even if aliasing occurs, we will know what the aliased wavefront looks like from our computational model, so the statistical estimation of phase polynomial coefficients works fine. Here we use a finite set of data to determine a finite set of polynomial coefficients, but we are able to reconstruct a smooth and continuous wavefront. Additionally, our method permits the splashing of focal spots outside of their territories in the case of large aberrations, a problematic occurrence in the classical centroiding process. This approach to wavefront estimation using WFS data was investigated by Luca Caucci, a graduate student within the research group. 41 Fig. 1.6: Basic system configuration for augmented wavefront measurement with a Shack-Hartmann WFS and multiple point source locations. Interferometry has been ubiquitous in surface profilometry, but requires a larger number of optic elements that introduce aberrations into the system and is extremely sensitive to vibrations between the reference and test arms. For testing aspheres, it also requires a null optic as a reference surface which is very difficult and costly to fabricate, and can suffer from common problems such as undersampling and hysteresis. Additionally, while phase-stepping and phase-shifting interferometers are specifically designed to handle aberrations greater than an optical wavelength, there are phase errors and ambiguities associated with the phase unwrapping process. Our method avoids these problems, since it requires as little as a source and detector, can operate with or without a null component, and is less sensitive to vibrations due to the single optical path. It can 42 also measure large wavefront errors without the need for phase unwrapping. On the other hand, even if we choose to use a null configuration, ML estimation will still perform well as long as we employ an accurate forward model of the system. 1.3 Dissertation overview Chapters 2 – 4 provide the theoretical framework needed to perform inverse optical design, which can be used for all of the aforementioned applications. Chapter 2 introduces fundamental concepts in maximum-likelihood estimation in the context of parameter estimation, including various performance metrics and properties of estimators. We provide rigorous likelihood functions for the detector output, which must incorporate all noise sources and factors that influence the data, and we discuss various methods for handling nuisance parameters. The general formulation of the Fisher information and the Cramér-Rao lower bound are provided. We also discuss limitations of the ML approach, including intensive computational requirements. A necessary component in implementing the ML approach to parameter estimation is an appropriate optimization algorithm, which searches the parameter or configuration space to find the configuration that maximizes the probability of the observed data. In Chapter 3, we discuss various optimization methods and the selection of a suitable search algorithm. Since the likelihood functions in inverse optical design or wavefront estimation tend to be complicated with many local extrema, we are primarily interested in global search algorithms. We give a broad overview of the simulated annealing (SA) algorithm, a feasible candidate for the global optimization problem, 43 which can process high-dimensional functions with extensive nonlinearities, discontinuities, and randomness. For the applications presented in this dissertation, we implemented an adaptive form of simulated annealing for optimizing multimodal functions in a continuous domain. Each iteration of an inverse problem requires a solution to the forward problem; we must compute the output of an optical design program, either through ray-tracing or diffraction propagation. Chapter 4 provides a comprehensive review of the propagation of light, including relevant concepts from geometrical optics and diffraction theory. Although we begin with the fundamental Maxwell’s equations, we derive many practical expressions that are central to the propagation algorithms developed in this research. We derive the basic equation of geometrical optics, the eikonal equation, which is needed in ray-tracing through GRIN lenses. This leads to a discussion of a useful version of Snell’s law in vector form, used throughout this research for the refraction of light rays at optical surfaces. In the wavefront estimation problem, we rely on scalar diffraction theory for modeling the wave propagation from the exit pupil of an optical system to final image plane. Several expressions are developed under various approximations. In Chapters 5 – 7, we present results obtained for several applications using the theoretical background provided in the preceding chapters. Specific details on the propagation algorithm are outlined for each application. We routinely investigate the FIM and associated CRB, as well as the complicated behavior of the likelihood function. Simulated annealing is used in all cases for the estimation procedure. 44 Chapter 5 deals with the original motivation of this research, the estimation of patient-specific parameters of the eye using irradiance data from the focal plane a WFS. We present an overview of the human eye and discuss schematic eye models of varying levels of complexity. A description of our optical design program is provided, which involves an algebraic method for non-paraxial ray-tracing through the optical system of the eye. In Chapter 6, we discuss the estimation of wavefront parameters, including the case of large wavefront errors. Since the computational demands are much higher for diffraction propagation compared to ray-tracing, we stress the complexity and speed of the computations required for accurate determination of wavefront parameters. An introduction to parallel processing with GPUs is provided. Chapter 7 involves two additional applications of IOD for optical testing, including the testing of precision aspheric lenses and GRIN lenses. To estimate parameters describing high-order aspheric surfaces, we developed a rapid ray-tracing program for implementation on the GPU platform. For ray-tracing through the refractive-index distribution of GRIN-rod lenses, we used analytic solutions to the eikonal equation. 45 CHAPTER 2 MAXIMUM-LIKELIHOOD ESTIMATION The method of maximum-likelihood (ML) is one of the oldest and most significant techniques in estimation theory. It is a standard approach in statistical inference, which includes classification tasks and estimation problems. This chapter expounds on the application of ML estimation specifically to parameter estimation. In Section 2.1, we provide a historical overview of parameter estimation, from the earliest methods of estimation to more modern applications with the birth of computer technology. In the process we discuss the evolution of ML estimation, beginning with its discovery by R. A. Fisher. We describe the fundamental estimation problem in Section 2.2, while distinguishing between the Bayesian and frequentist approaches in statistical inference, including the branch of parameter estimation. Section 2.3 imparts the essential notation and terminology, while introducing the concept of maximizing the likelihood. In Section 2.4, we discuss the concept of information in a probability model. We show how to calculate the Fisher information matrix and use it to derive an important performance bound on parameter estimates, the Cramér-Rao lower bound. Alternative bounds are also discussed, along with their advantages and drawbacks. Section 2.5 deals with general performance metrics and properties of estimators, with an emphasis on the 46 ML estimator. This includes the optimal properties of the ML estimators in the largesample or asymptotic limit. Section 2.5 parallels the development in Section 2.4. Section 2.6 proposes realistic methods for dealing with nuisance parameters in a probability model. In Section 2.7, we provide rigorous likelihood functions for modeling electronic noise in detector arrays, described by Gaussian statistics. Finally, we discuss practical challenges of ML estimation in Section 2.8, including the need for an accurate description of random phenomena, as well as computational limitations to be overcome. This chapter assumes a knowledge of elementary probability and statistics. 2.1 Historical background The first published proposition of the method of least-squares for estimating coefficients in linear curve fitting was made by Legendre (1805), whose primary interest was in predicting comet orbits. While he suggested the technique merely as a convenient procedure for treating observations, Legendre made no reference to the theory of probability. Meanwhile, Gauss independently discovered the least-squares method and applied it as early as 1795, but it was not until 1809 that he published a comprehensive treatment of the method, formally outlining the theory and mathematical foundation. In this manuscript, Gauss showed that estimates obtained through least-squares fitting maximized the probability density for a normal distribution of errors, which gave way to the development of statistical parameter estimation. It was also a prevision of the method of maximum-likelihood. Continued work into the early twentieth century focused on the 47 computational side of the least-squares method, involving other mathematicians such as Cauchy, Bienaymé, Chebyshev, Gram, and Schmidt (Seal, 1967). Orthogonal polynomials were also an important development of the work during this era. An overview of the development of the least-squares method is found in Plackett (1972), while Gauss’s contribution is examined by Trotter (1957). The work of Pearson near the turn of the century and R. A. Fisher in subsequent decades provided the underpinnings for advancements in statistical estimation methods. Several of Pearson’s contributions to classical statistics are principal component analysis (Pearson, 1901), the chi-square distribution (Pearson, 1900), correlation theory (Pearson, 1898, 1900), and the method of moments (Pearson, 1894, 1936), with the latter providing an early method for the estimation problem. Fisher based much of his work on that of Pearson, even concurring that the method of moments was superior to the least-squares method – but he had ideas for an even better approach. In 1912, Fisher introduced the principle of inverse probability, from which he derived the “absolute criterion” (Fisher, 1912), although he later discarded the ideas. By 1922 he clarified the difference between “probability” and “likelihood”, thereby completing the basic theory of maximum-likelihood, which he presented in a series of papers (Fisher, 1922, 1925, 1934, 1935). In these papers, he described estimator properties such as sufficiency, efficiency, consistency, and information. Aldrich (1997) gives an excellent, detailed account of the development of ML estimation. During the next fifteen years, Wald (1939, 1945, 1950) made significant contributions to statistical decision theory, which suggests principles for choosing 48 estimation criteria in optimal decision-making. It has since played a substantial role in point estimation and hypothesis testing. Contemporary application of statistical estimation theory began in the 1940s and 1950s by Hood and Koopmans, who used the theory to estimate variables in macroeconomic models. Their work was very important in the progress of econometrics during this era and is outlined in the Cowles Commission Reports (Hood & Koopmans, 1953). Meanwhile, G. E. P. Box and others (1958, 1959, 1962) made significant contributions in the physical sciences by constructing mathematical models and estimating model parameters. Methods of nonlinear parameter estimation that were already established by mathematicians such as Newton, Gauss, and Cauchy truly did not reach extensive practical application until the advent of computer technology in the 1950s. The first general computer program to determine estimates for nonlinear models was written by Booth and Peterson (1958), which implemented nonlinear least-squares fitting. More specific, it employed Gauss’ method with finite difference approximations to solve leastsquares problems of a single equation. The earliest computer program using ML for nonlinear parameter estimation was created by Eisenpress, Bomberault, and Greenstadt (1966a, 1966b) for econometric models of multiple equations. Their system applied the full Newton method with rotational discrimination by evaluating analytic derivatives of all orders. 49 2.2 Statement of the problem This section describes the fundamental components in classical estimation problems. We refrain from elaborating on notation, which we expound on in Section 2.3. A general estimation procedure consists of several important factors. There must first be a vector parameter θ describing some source or object, representing a point in parameter space. In the case of inverse optical design for the ophthalmic application, for example, the parameters are the curvatures, conic constants, thicknesses, indices, and so on. The next component is a probabilistic mapping from parameter space to the finitedimensional observation space, in which the observed data g reside. This is simply the probability law, denoted pr(g|θ ), governing the effect of the parameters on the data, including any noise characteristics or random phenomena in the system. In inverse optical design, as well as many other estimation problems, the mapping is nonlinear since the data depend nonlinearly on the parameters. The final requirement is an estimation rule, or procedure, for mapping the observation space to an estimate, written θ̂ . This rule for processing an observation or set of observations to generate an estimate is also referred to as an estimator. We treat this rule as deterministic, that is, the same data vector will always produce the same estimate. When the estimate is represented by specific numerical values, the process is referred to as point estimation. However, point estimation conveys nothing about the 50 uncertainty in the estimates. Interval estimation, on the other hand, uses sample data to determine an interval of probable values of the unknown parameters (Neyman, 1937). There are essentially two distinctive approaches in the treatment of statistical inference tasks such as estimation problems. The classical or frequentist method regards the parameter to be estimated as unknown, but not random. This approach considers an ensemble of data vectors that is acquired through sampling of pr(g|θ ) and computes performance metrics using averages of the estimates. In this sense, the repeated sampling can be used to verify all probabilities and probability laws. Contrastingly, parameters are treated as random variables in the Bayesian method, so that knowledge of a prior probability pr(θ ) must be assumed. However, this probability is admitted as a “degree of belief” (Ferguson, 1967; Raiffa & Schlaifer, 1961; Savage, 1954) with “subjective choices of plausibility” (Bard, 1974). Both pr(θ ) and pr(g|θ ) are used in Bayes’s rule to ascribe to the parameter θ a posterior density, denoted pr (θ |g), conditioned on the observed data vector g. Therefore, a Bayesian has no concept of an ensemble of data vectors, and performance metrics are determined solely from the posterior (Barrett, Dainty, & Lara, 2007). We utilize classical estimation theory in this paper and treat the parameters to be estimated as nonrandom variables. Estimation procedures typically involve the optimization of some objective function, or in many cases, the minimization of a cost function, denoted C = (θˆ, θ ) . The cost function assigns a penalty to the point in parameter space θ̂ when the true 51 underlying parameter is θ . In other words, it measures the departure of the given data from that generated by a proposed system configuration. In the next section, we will describe the quantity that is optimized in ML estimation. 2.3 Notation system and terminology The notation used throughout this paper will be adopted primarily from Barrett and Myers (2004). We use g = g1, …, gM to represent an M × 1 vector containing random data from some probability law. The probability law itself is described by a P × 1 vector set of parameters θ = θ 1, …, θ P. Note that vectors are indicated by boldface lowercase letters, while matrices are denoted as boldface uppercase letters. If the data can take on continuous values, then the probability law is a probability density function (PDF), denoted as pr(g|θ ). Conversely, if the data are discretely-valued, then the probability law is simply a probability, given by Pr(g|θ ). For the sake of this research, we will consider only continuous random variables. The PDF pr(g|θ ) is simply the distribution from which individual samples of g are drawn. In other words, it represents the probability of obtaining the data vector g conditional upon the parameter vector θ. The most commonly used distributions in practice are the normal, log-normal, and gamma for continuous variables, and the Poisson, binomial, and multinomial for discrete variables. In the classical approach to 52 parameter estimation, once we have a particular data vector, we can express the PDF as a function of the parameters θ given the data g, referred to as the likelihood: L(θ | g) = pr(g | θ ) . (2.1) For a set of M independent and identically distributed (i.i.d.) observations, the likelihood can be written as M L(θ | g) = ∏ pr( g m | θ) . (2.2) m =1 We must emphasize that L(θ |g) is not a PDF on θ. In general, an estimate of the parameter vector is denoted θ̂ , and values that maximize the likelihood θ̂ ML are referred to as ML estimates of θ. If the estimate is a deterministic function of g, which is usually the case, we can express it as θˆ (g ) . However, we will often drop the explicit dependence on g for brevity. ML estimation essentially returns the θ argument which maximizes the probability of occurrence of the observed data, defined as θˆML ≡ argmax pr(g | θ ) , θ (2.3) 53 where θ̂ represents an estimate of the vector set of parameters. Since the logarithm increases monotonically with its argument, (2.3) is equivalent to θˆML = argmax ln pr (g | θ ) , (2.4) θ where ln pr (g | θ ) is the log-likelihood. Note that the log-likelihood is a random variable due to its dependence on g. For practical purposes, it is typically more convenient to minimize the negative log-likelihood, so that (2.4) then becomes θˆML = argmin [− ln pr(g | θ )] . (2.5) θ Furthermore, if the log-likelihood has a continuous first derivative with respect to θ , then this derivative evaluated at θ = θ̂ ML is equal to zero. This is called the likelihood equation: ∂ =0, ln pr (g | θ ) ∂θ θ =θˆML ( g ) where ∂α/∂θ is the column vector [∂α/∂θ ]i = ∂α/∂θ i for a function α(θ ). (2.6) 54 Example: Correlated Gaussian noise Suppose we have a data vector given by g = s(θ ) + b + n, where s(θ ) is a signal parameterized by θ , b is a known background, and n represents correlated Gaussian noise, with n ∼NM (0, Kn). Note that NM (0, Kn) is the normal distribution with zero mean and covariance matrix Kn, where M samples are drawn. The conditional PDF on the data is written as 1 exp− [g − b − s(θ )]t K n-1[g − b − s(θ )] , 2 (2.7) 1 1 ln pr (g | θ ) = ln − [g − b − s(θ )]t K n-1[g − b − s(θ )] . M /2 1/ 2 2 (2π) [det(K n )] (2.8) pr (g | θ ) = 1 ( 2 π) M /2 1/ 2 [det(K n )] and its logarithm is given by According to (2.5), we can minimize the negative log-likelihood with respect to θ to obtain θˆML = argmin [g − g (θ )]t K n-1[g − g (θ )] , θ where the average data vector is g (θ ) = s(θ ) + b , since n is zero-mean. Additionally, g (θ ) is the anticipated data vector for a given set of parameters. (2.9) 55 2.4 Fisher information and the Cramér-Rao bound Before discussing the properties of ML estimators, we will first introduce the concept of Fisher information and how it is used to derive the theoretical minimum possible variance on parameter estimates. This is integral to a discussion on the asymptotic theory of ML estimation, as well as various performance metrics, which are covered in Section 2.5. 2.4.1 Score The score is a vector that describes the sensitivity of the likelihood to changes in the parameters: ∂ pr(g | θ ) ∂ θ ∂ s(g) = = ln pr(g | θ ) . pr(g | θ ) ∂θ (2.10) Mathematically, it is the gradient with respect to θ of the log-likelihood. It can easily be shown that the expectation of the score with respect to pr(g|θ ) is zero: 〈s〉 g |θ = 0 , where the brackets denote the average. Barrett and Myers (2004) also point out that since the score is the gradient of the log-likelihood, all of its components vanish at the point in parameter space corresponding to the ML estimate, s(g | θˆML ) = 0 , provided there are no constraints such as positivity. 56 2.4.2 Fisher information matrix The performance of an ML estimator can be analyzed with the Fisher information matrix (FIM), as it describes the ability to estimate a vector set of parameters. If s has zero mean, then the FIM is simply the covariance matrix of the score, which is expressed in outer-product notation as F = 〈ss t 〉 g|θ , (2.11) with individual components F jk = 〈 s j s k 〉 . Thus, for a vector parameter of P real components, the FIM is a P × P symmetric matrix with real components given by , (2.12) where the angle brackets denote the average over g for a given θ . Converted to integral form, (2.9) becomes 57 , (2.13) . (2.14) which can then be expressed as F jk = − ∂2 ln pr (g | θ ) ∂θ j ∂θ k g |θ The second derivative in (2.14) indicates that the FIM components represent the average degree of curvature of the log-likelihood, where the average encompasses all data sets for a given parameter vector (Barrett & Myers, 2004). 2.4.3 Cramér-Rao inequality The degree of dispersion in the sampling distribution of a random vector is conveyed in the dispersion matrix, defined as the inverse of the FIM, D = F-1. We intuitively know that the more disperse the distribution, the more uncertain is the value of any particular realization of the random variable, thereby leading to greater variance in the parameter estimates (Bard, 1974). It is well-documented in the literature that the variance of any unbiased estimate obeys the Cramér-Rao inequality (Cramér, 1946; Rao, 1945): [K θˆ ] pp = Var{θˆ p } ≥ [F −1 ] pp , (2.15) 58 where θ p is the pth parameter and K θ̂ is the covariance matrix of the estimates. (See Section 2.5 for the definition of the bias in an estimate.) Thus, the variance of the pth parameter cannot be smaller than the pth diagonal entry in the dispersion matrix. Although the inequality was first stated by Fisher and proved by Dugué (1937), Cramér and Rao are accredited with its discovery. The theoretical minimum possible variance in (2.15) is referred to as the CramérRao lower bound (CRB) of an estimate. An estimate that achieves the CRB is said to be efficient. The concept of efficiency will be further discussed in Section 2.5 when we describe the properties of ML estimators. The relationship in (2.15) can be stated more generally using the notational convention of Loewner ordering for the two positive-definite matrices, K θ̂ and F −1 . Since K θˆ − F −1 is positive-semidefinite , one can prove that the covariance matrix for an unbiased estimator must satisfy K θˆ ≥ F −1 . (2.16) For any biased estimator, it can be shown that K θˆ ≥ (∇ θ b +I )F −1 (∇ θ b +I ) t , (2.17) 59 where I is the P × P unit matrix. Therefore, the bias changes the minimum variance, and can even reduce the variance if the gradient is negative (Barrett & Myers, 2004). For a scalar parameter, the variance in the unbiased case as shown in (2.15) is simply given by Var{θˆ} ≥ 1 2 ∂ − 2 ln pr(g | θ ) ∂θ , (2.18) . (2.19) . (2.20) or equivalently, Var{θˆ} ≥ 1 ∂ ∂θ ln pr(g | θ ) 2 Similarly, the variance in the biased case can be written Var{θˆ} ≥ db(θ ) dθ + 1 2 ∂ ∂θ ln pr(g | θ ) 2 We can prove the equivalence between (2.18) and (2.19) by starting with the fact that the integral of any probability density is 1: 60 ∫d M g pr (g | θ ) = 1 . (2.21) ∞ Here we will assume that the first and second derivatives of the log-likelihood exist and are absolutely integrable. Differentiating (2.21) with respect to θ and applying the differentiation rule ∂ pr (g | θ ) ∂ ln pr (g | θ ) = pr (g | θ ) , ∂θ ∂θ (2.22) we have ∫ ∞ dM g ∂ pr (g | θ ) ∂ ln pr (g | θ ) = d M g pr (g | θ ) = 0. ∂θ ∂θ ∫ (2.23) ∞ Differentiating again and applying (2.22) gives ∫ ∞ d M g pr (g | θ ) ∂ 2 ln pr (g | θ ) ∂θ 2 2 ∂ ln pr (g | θ ) + d M g pr (g | θ ) = 0, ∂θ ∞ ∫ (2.24) which can be expressed in terms of expected values: ∂ 2 ln pr (g | θ ) ∂θ 2 ∂ = − ln pr(g | θ ) ∂θ 2 . (2.25) 61 Therefore, (2.18) and (2.19) are equivalent. Proof of the Cramér-Rao inequality can be obtained through the Schwartz inequality (Van Trees, 1968). Without a loss in generality, we will consider the case of a real scalar parameter. Since the estimate in (2.15) is unbiased, the expectation of the difference between the estimate and the true value of the parameter vanishes: 〈θˆ(g ) − θ〉 g|θ ≡ ∫ d M g pr (g | θ )[θˆ (g ) − θ ] = 0 , (2.26) ∞ where the overbar denotes the mean. Differentiating each side with respect to θ and bringing the differentiation into the integral gives ∫ dM g ∞ ∂ {pr (g | θ )[θˆ (g ) − θ ]} = 0 , ∂θ (2.27) which in turn leads to ∫ ∫ − d M g pr (g | θ ) + d M g ∞ ∞ ∂ pr (g | θ ) ˆ [θ (g ) − θ ] = 0 . ∂θ (2.28) After substituting (2.21) and (2.22) into (2.28), we have ∫d ∞ M g ∂ ln pr (g | θ ) pr (g | θ )[θˆ (g ) − θ ] = 1 , ∂θ (2.29) 62 which can be rewritten as ∫d ∞ M { } ∂ ln pr (g | θ ) g pr (g | θ ) pr (g | θ ) [θˆ (g ) − θ ] = 1 . ∂θ (2.30) Now the Schwartz inequality states that for any two functions f (x) and g (x), 2 b b b 2 f ( x) g ( x)dx ≤ f ( x)dx g 2 ( x)dx . a a a ∫ ∫ ∫ (2.31) Applying (2.31) to (2.30) leads to 2 ∂ ln pr (g | θ ) M M ˆ (g ) − θ ]2 ≥ 1 . g g θ θ d pr ( | )[ d g pr (g | θ ) ∂θ ∞ ∞ ∫ ∫ (2.32) Thus, we have Var{θˆ} ≡ 〈[θˆ(g ) − θ ]2 〉 g|θ ∂ ≥ ln pr(g | θ ) ∂θ 2 −1 , (2.33) which is equivalent to (2.19) and proves the Cramér-Rao inequality. We can also demonstrate that if an efficient estimate exists, it is the ML estimate (Melsa & Cohn, 1978; Van Trees, 1968). From the derivation of the Schwartz inequality, we know that the equality in (2.32) holds if and only if 63 ∂ ln pr (g | θ ) = α (θ )[θˆ(g) − θ ] , ∂θ (2.34) for all g and θ , where α (θ ) is a constant that depends on θ . Combining this with the likelihood equation in (2.6), we have 0= ∂ ln pr (g | θ ) = α (θ )[θˆ (g ) − θ ] ˆ . θ = θML ( g ) ∂θ θ = θˆML ( g ) (2.35) The only data-dependent solution that equates the right-hand side to zero requires that θˆ(g ) = θˆML . Therefore, the ML estimate is efficient as long as an efficient estimate exists. Whenever an efficient estimate does not exist, the Cramér-Rao inequality can be improved by computing a larger bound that more accurately depicts the minimum possible variance. While the CRB as stated in (2.18) involves the second partial derivative of the log-likelihood function, the Bhattacharyya bound incorporates higher partial derivatives (Bhattacharyya, 1946, 1947, 1948). Although this procedure is very straightforward, the apparent downside is its computational exhaustiveness, which is prohibitive in most estimation tasks. A bound that offers more practical value is the Barankin bound, since it does not require a differentiable probability density and yields the greatest lower bound (Barankin, 1949; McAulay & Hofstetter, 1971). One major disadvantage, however, is that it requires a maximization over the function of interest, 64 which is usually not a trivial task. Due to the complexity and impracticality of these alternative bounds, we will restrict our attention to the CRB. 2.4.4 System design In general, the FIM (and the dispersion matrix) can be computed for any system configuration and therefore used to design and optimize the system that acquires the data to be used as input to the inverse optical design, prior to practical application. We refer to that system as the inverse-design system. Since the FIM is the covariance matrix of a set of parameters, its off-diagonal entries indicate coupling between different pairs of parameters. Strong coupling can lead to great difficulty in the estimation task, and possibly large errors in the parameter estimates. For an efficient estimator, the inverse of the FIM is essentially the covariance matrix of the estimates. Thus, its off-diagonal elements represent coupling between these estimates. One goal in system design is to find a system configuration that lessens the degree of coupling between parameters, while reducing the number of local minima in the likelihood surface. In Chapters 5, 6, and 7, we will investigate the FIMs and dispersion matrices for various system configurations and types of estimable parameters. 2.5 Properties of ML estimators We begin with a discussion on performance metrics for general estimators from the classical perspective. Then we describe the many optimal properties of maximumlikelihood estimators, including those according to the asymptotic theory of ML 65 estimation, such as efficiency, consistency, and unbiasedness. Furthermore, we discuss the invariance of the ML estimator under changes in parameterization, plus its ability to best utilize information in the data. 2.5.1 Bias In classical estimation theory, the sampling distribution pr (θˆ | θ ) (Fig. 2.1) is defined as the distribution of θˆ (g ) that is acquired through repeated sampling of the data vector g from pr(g|θ ) for fixed θ , then performing the same estimation rule on each sample (Barrett et al., 2007). Since θ̂ is derived from noisy data, it is a random variable that depends on the true value of the parameter. We can implicitly express the mean of a P × 1 vector of estimates in terms of the sampling distribution: ∫ θˆ = d P θˆ pr (θˆ | θ )θˆ . (2.36) If we also know the sampling distribution and estimation rule on g, we can transform (2.36) into the following explicit form: ∫ θˆ = d M g pr ( g | θ )θˆ (g ) ≡ θˆ (g ) g|θ . (2.37) The bias of an estimate is the discrepancy between the expected value of the estimate and the true value of the parameter, and conveys the amount systematic error in the estimation procedure. For a P × 1 vector parameter, the bias is also a P × 1 vector: 66 ∫ b (θ ) ≡ θˆ − θ = d M g pr (g | θ )[θˆ (g ) − θ ] , (2.38) ∞ or in terms of the sampling distribution, ∫ b (θ ) = d P θ pr (θˆ | θ )(θˆ − θ ) . (2.39) ∞ pr (θˆ | θ ) Var (θˆ) θˆ θ θˆ Fig. 2.1: Example of a probability distribution of θˆ conditioned on θ . An unbiased estimate is one whose bias vanishes for all values of the underlying parameter. The concept of unbiasedness will be discussed in Subsection 2.5.4 when we cover the asymptotic properties of ML estimators. 67 Bias is certainly not the only error in any given estimator, for even an unbiased estimator can generate a bad estimate from a particular data set. We clearly desire estimators with small bias, but the bias itself is removable, even in cases where it is a complicated function of the parameter being estimated and the suitable correction is not always obvious (Gray & Schucany, 1972; Miller, 1964; Quenouille, 1956; Robson & Whitlock, 1964). 2.5.2 Variance and covariance Another performance metric for an estimator is the variance, which quantifies the amount of random error in the estimator. It results from fluctuations in the estimate θ̂ over multiple trials. Denoting the pth element of the estimate by θ̂ p , the variance of the pth parameter is written Var{θˆ p } ≡ 〈| θˆ p (g ) − θˆ p |2 〉 g|θ = d M g pr (g | θ ) | θˆ p (g ) − 〈θˆ p (g )〉 |2 ∫ ∞ = d Pθ pr (θˆ | θ ) | θˆ p − 〈θˆ p 〉 |2 , ∫ (2.40) ∞ while elements in the general covariance matrix are given by [K θˆ ] pp' = 〈[θˆ p − θˆ p ][θˆ p' − θˆ p' ]∗ 〉 g|θ . (2.41) 68 Bear in mind that the variance and covariance are for a particular value of the parameter, although they fluctuate with respect to the average estimate. 2.5.3 Mean-square error The mean-square error (MSE) is similar to the variance, except that fluctuations are measured about the true value of the underlying parameter, not the average estimate. Therefore, the MSE contains information about both the bias and the variance, that is, the overall fluctuation. For a vector parameter, it is given by MSE ≡ 〈|| θˆ (g ) − θ ||2 〉 g|θ = d M g pr (g | θ ) || θˆ (g ) − θ ||2 . ∫ (2.42) ∞ For an unbiased estimator, the MSE is equivalent to the variance. 2.5.4 Asymptotic properties Although ML estimation has broad practical appeal, it sometimes produces inferior results for small sample sizes. However, for a large number of observations, the method possesses many feasible properties. The properties of an ML estimator which are valid when the estimation error is small are commonly referred to as asymptotic (Van Trees, 1968). One way to analyze the asymptotic properties is to draw M independent observations of the data g from the sampling distribution pr(g|θ ), then let M → ∞, although they also hold when better data 69 are acquired, such as by acquiring more photons when Poisson noise is dominant or by letting the variance go to zero for Gaussian noise (Barrett et al., 2007). As mentioned in Section 2.4, an efficient estimate is one that achieves the CRB. We also demonstrated in (2.35) that if an efficient estimator exists, it is the ML estimate. Moreover, the ML estimate is asymptotically efficient; in other words, the minimum variance in (2.15) is obtained as the number of samples increases without bound. Thus, for a vector parameter θ , we have Var{θˆ} lim M →∞ ∂ −1 2 ∂θ 2 =1. (2.43) ln pr(g | θ ) Efficient and unbiased estimates are typically not obtained for samples of finite size, but when the sample size approaches infinity, we desire an estimate that converges toward the true parameter value. Consider an estimate based on the data g from M independent observations, denoted as θˆ M (g ) . We say that the estimate is conditionally consistent if, for any positive ε and η, no matter how small, there exists some N such that Pr [|| θˆ M (g ) − θ || < ε | θ ] > 1 − η (2.44) for all M > N . The estimate is unconditionally consistent if (2.44) is satisfied for all θ (Barrett & Myers, 2004). For any given small value of ε , there is a sufficiently large N 70 such that, for all larger sample sizes, the probability of the error ∆θ = || θˆM (g ) − θ || being less than ε is as close to 1 as we like. The estimate θ̂ M is said to converge in probability, or to converge stochastically, to the true value θ (Kendall & Stuart, 1979). Thus, the distribution of a consistent estimate becomes increasingly narrow about the true value of the parameter as the number of observations increases. Cramer (1946) proved that over a broad range of conditions, the ML estimate is consistent. Note that the property of consistency is concerned with the behavior of an estimator as the number of observations tends to infinity, but requires nothing of the behavior for a finite set of observations. If the mean of θ̂ M , as expressed in (2.36) or (2.37), is equal to θ for all θ and N (Kendall & Stuart, 1979), the estimator is unbiased. The terminology for unbiasedness was introduced by Neyman and Pearson (1936) in the context of hypothesis testing. An estimable parameter is one for which there exists an unbiased estimator for all true values of the parameter. However, even if an estimate is unbiased for a certain value of the parameter θ , it is not necessarily unbiased under reparameterization for nontrivial functions of θ (Bard, 1974). The properties of unbiasedness and consistency do not imply each other; that is, an unbiased estimate is not automatically consistent, while a consistent estimator is not automatically unbiased. Nonetheless, a consistent estimator whose asymptotic distribution has a finite mean must also be asymptotically unbiased (Kendall & Stuart, 1979). 71 Fisher established that the sampling distribution of an efficient estimate approaches a Gaussian distribution with minimum variance as the number of samples increases (Fisher, 1922). It can also be shown under general conditions that the sampling distribution of an ML estimate is asymptotically Gaussian due to the central-limit theorem (Cramér, 1946; Daniels, 1961; Huber, 1967; Lecam, 1970), which states that the distribution of the sum of M independent random variables approaches the Gaussian (or normal), distribution as M is made sufficiently large. To summarize, ML estimators are asymptotically efficient, unbiased, consistent, and normally distributed. Despite the motivation for using the ML estimator, one might wonder if there exists an estimation technique that outperforms the ML procedure. Even if an efficient estimate does not exist, there could possibly be an unbiased estimate with a lower variance. The caveat is that there is no general rule for discovering one. For a given estimation task, we can make attempts to improve the ML estimator, although the resultant process is typically more complicated and difficult to implement. We therefore embrace the ML approach for its relative simplicity, as well as its optimal use of information in the data, as we shall see in the following sections. 2.5.5 Invariance A very useful property of the ML estimator is its invariance under a change in parameterization (Tan & Drossos, 1975), so that we can estimate some function of the parameter θ , rather than the actual θ . Suppose that f (θ ) is an invertible single-valued 72 function defined for all θ , where f is a vector of functions. It can be shown that the ML estimator of f, denoted f̂ ML , is given by (Melsa & Cohn, 1978) fˆML = f (θˆML ) . (2.44) The proof begins with the inverse of f, denoted f -1, such that f -1[f (θ )] = θ for all θ . The probability density of the data g, conditioned on θ , can then be written as pr(g | f ) = pr[g | f -1 (f )] . (2.45) If we let f ∗ = f (θˆML ) , then (2.43) transforms into pr(g | f ∗ ) = pr(g | θˆML ) . (2.47) Now the definition of the ML estimate θ̂ ML states that pr(g | fˆML ) ≥ pr(g | f ) (2.48) pr(g | θˆML ) ≥ pr(g | θ ) (2.49) for all f ≠ f̂ ML , or equivalently, for all θ ≠ θ̂ ML . Therefore, it must also hold that 73 pr(g | f ∗ ) ≥ pr(g | f ) (2.50) for all f ≠ f ∗ , so that fˆML = f ∗ = f (θˆML ) , thereby proving that the ML estimate of a function is simply the function evaluated at the ML estimate. 2.5.6 Sufficiency In estimation, a sufficient statistic is one that extracts all relevant information from the data and optimizes the performance of a particular estimation task (Fisher, 1922, 1925). The maximum-likelihood estimator is a sufficient statistic, since it makes optimal use of the information in the data (Barrett & Myers, 2004). A necessary and sufficient condition for θ̂ to be a sufficient estimate is that there exists a factorization L(θ | g ) ≡ pr (g | θ ) = pr (θˆ | θ ) f (g ) , (2.51) where pr (θˆ | θ ) is a function of θ̂ and θ alone and f (g) is independent of θ . We see from (2.51) that the choice of θ̂ to maximize the log-likelihood is equivalent to choosing θ̂ to maximize pr (θˆ | θ ) . This is a special case of the Neyman-Fisher factorization criterion, originally established by Fisher (1922), after which Neyman (1935) developed a method of finding sufficient statistics. The proof of this criterion is beyond the scope of this paper, but a rigorous proof can be found in Halmos and Savage (1949). 74 The condition for sufficiency in (2.51) has a very interesting consequence. Taking the logarithm of both sides and differentiating leads to ∂ ln L(θ | g ) ∂ ln pr (g | θ ) ∂ ln pr (θˆ | θ ) . ≡ = ∂θ ∂θ ∂θ (2.52) By comparing (2.52) with (2.35), the condition of efficiency for the log-likelihood function, we find that such an estimator can exist if and only if there is a sufficient statistic. In other words, as long as (2.35) is satisfied, (2.52) is also satisfied. Thus, the criterion of efficiency is more restrictive than that for sufficiency. In contrast, even if (2.35) does not hold, we may still have a sufficient estimator. Example: Normal data with an unknown mean Consider a data vector g that contains M i.i.d. samples of a normal process with mean µ and variance σ 2, where µ is unknown. Our goal is to factor the likelihood function into the form in (2.51). The conditional PDF of the data is written as M 1/ 2 1 L( µ | g) ≡ pr (g | µ ) = 2 2 π σ m =1 ∏ or equivalently, 1 (gm − µ)2 exp− , σ2 2 (2.52) 75 1 L( µ | g ) = 2πσ 2 M 2 1 exp− 2 2σ (gm − µ )2 . m =1 M ∑ (2.53) We would like to rewrite (2.53) into the form pr ( µˆ | µ ) f (g ) by implementing a trick when manipulating normal densities. The sample mean is defined as µˆ ≡ g = 1 Μ M ∑ gm , (2.54) m =1 which leads to 1 L( µ | g ) = 2πσ 2 1 = 2πσ 2 M 2 1 exp− 2 2σ M 2 1 exp− 2 2σ (gm − g + g − µ)2 m =1 M ∑ M m =1 ∑[( g m − g ) 2 + 2( g m − g )( g − µ ) + ( g − µ ) 2 ] . (2.55) Now observe that the middle term vanishes: M M m =1 m =1 ∑ ( g m − g )( g − µ ) = ( g − µ ) ∑ ( g m − g ) = M ( g − µ )( g − g ) = 0. (2.56) 76 The likelihood function becomes 1 L( µ | g ) = 2πσ 2 M 2 1 exp− 2 2σ 1 ( g m − g ) 2 exp− 2 2σ m =1 M ∑ ( g − µ ) 2 , (2.57) m =1 M ∑ where the portion independent of µ is given by 1 f (g ) = 2πσ 2 M 2 1 exp− 2 2σ (gm − g )2 , m =1 M ∑ (2.58) so that the sampling distribution is 1 pr ( µˆ | µ ) = exp− 2 2σ ( µˆ − µ ) 2 , m =1 M ∑ (2.59) which depends on just µ̂ ≡ g and µ . Therefore, the sample mean is a one-dimensional sufficient statistic for the mean. 2.6 Computer simulated experiments Proof of principle can be demonstrated with data from a real experiment or obtained through numerical simulation of a real physical system. Simulated experiments can help to determine whether a system design or estimation procedure is likely to succeed, while they may be useful in optimizing system configurations prior to practical application. 77 The following procedure can be used to determine properties of a sampling distribution pr (θˆ | θ ) through a set of simulated experiments: 1. Define the forward model in the design program, denoted as f (θ ), and the probability distribution of the errors. Assign “true” values to the vector set of parameters θ true . 2. Generate a different vector set of errors eµ for each of N experiments, where µ is the experiment index, drawn from the specified probability distribution. Many computers have available routines for producing pseudorandom numbers, which are effectively random numbers uniformly distributed between 0 and 1. These are then used to generate random numbers according to any other desired distribution. The simulated data gµ are obtained by adding eµ to the output of the design program at the true parameters f (θ true): gµ = f (θ true) + eµ . (2.60) 3. For each experiment, apply the estimation procedure to the simulated data as if they were real data. Each replicated experiment yields an estimate θˆµ for the parameters. 4. Properties of the sampling distribution are obtained by averaging over the N replications. The estimated mean and covariance matrix of the sampling distribution are respectively given by 78 1 θˆ = Ν N ∑θˆµ , (2.61) µ =1 N ˆ = K 1 (θˆµ − θˆ )(θˆµ − θˆ )t . Ν − 1 µ =1 ∑ (2.62) The estimated bias b of the estimator is written as b = θˆ − θ true . (2.63) The equations above apply whether the data are simulated or real, although an advantage of simulated experiments is that the true parameters underlying the data are exactly known. Computer simulated experiments would allow us to examine the effects of a model-mismatch, that is, we can quantify how much estimation error results from deficiencies in the forward model, by purposely using a different model in the estimation procedure than was used to generate the data. In doing so, we can determine the robustness of the estimator (Bard, 1974). 2.7 Nuisance parameters A nuisance parameter is one that influences the data, but is of no immediate interest to the estimation problem. However, it must be factored into the likelihood function in 79 order to completely specify the PDF on the data, otherwise the results can confound the estimation problem. Suppose we let α θ = , β where α represents the parameters of interest and β represents the nuisance parameters. Barrett et al. (2007) propose the following options for handling β : 1. Disregard the problem and let pr(g|θ ) ≈ pr(g|α). 2. Replace β with a suitable value β 0 and let pr(g|α , β ) ≈ pr(g|α , β 0 ). 3. Estimate β independently from an auxiliary data set, then apply method (2). 4. Assume or measure a prior distribution pr(β ), then marginalize over β . 5. Estimate α and β simultaneously and ignore the estimate of β. Each of these options leads to considerable practical issues. The first approach is essentially equivalent to ignoring modeling errors, which would undoubtedly lead to errors in the estimates of α . The second option is no different from assuming a prior distribution pr(β ) and treating it as a delta function, which would clearly be a strong and unrealistic prior on β . The third approach may lead to better estimates of α , but it requires an additional estimation problem altogether. 80 Barrett and Myers (2004) show that the optimal strategy is to marginalize over the nuisance parameters rather than estimate them, as in the fourth approach. However, this assumes a meaningful prior distribution pr(β ), not one that is simply based on belief or selected for mathematical ease. Now we will discuss the fifth approach, which is to simultaneously estimate both the parameters of interest and the nuisance parameters, and we shall examine the consequences on the variance on the estimates. We showed in (2.15) that the minimum variance on an estimate of the pth parameter θ p is given by Var {θˆ p } ≥ [F −1 ] pp , (2.64) where θ is a P × 1 vector parameter and F is the P × P Fisher information matrix for θ . Suppose only θ p is unknown and needs to be estimated, for sake of this discussion. Then the minimum variance on the pth parameter becomes Var {θˆ p } ≥ [F −1 ] pp , (2.65) For real vectors a and b, the extended Cauchy-Schwarz inequality in matrix form is given by | a t b |2 ≤ (a t Ka)(b t K -1b) , (2.66) 81 where K is a positive-definite matrix and at is the transpose of the column vector a. The equality is true if and only if b = cKa, where c is a scalar. Since F is positive-definite, we can write | a t b |2 ≤ (a t Fa)(b t F -1b) . (2.67) Now consider ep ,the column vector {0, …, 1, … 0}t with the pth component equal to 1 and all other components equal to zero. For a = b = ep , we have | e tp e p |2 ≤ (F pp )(F -1 ) pp . (2.68) Since the left side is equal to one, (2.68) gives 1 ≤ (F -1 ) pp . (F pp ) (2.69) Therefore, the minimum variance on the estimated parameter when nuisance parameters are absent is less than the minimum variance when nuisance parameters are present. 2.8 Gaussian distributions and electronic noise In any practical electronic system, a substantial number of electrons contribute to overall fluctuations in nearly an independent fashion. Therefore, Gaussian statistics provides an accurate representation of electronic noise by virtue of the central-limit theorem. We first 82 propose a general form for the PDF, making no simplifying assumptions regarding noise characteristics. Then we introduce assumptions that simplify the probability model and suggest a tractable, yet realistic, PDF on the noise. If we assume a discrete array of detector elements with electronic coupling between elements, then the noise is correlated. The PDF for electronic noise in the absence of other noise sources is a multivariate Gaussian, given by pr (g | θ ) = 1 ( 2 π) M /2 1/ 2 {det[K (θ )]} 1 exp− [g − g (θ )]t [K (θ )]- 1[g − g (θ )] , (2.70) 2 where K (θ ) is the covariance matrix conditioned on θ , det[K (θ )] is its determinant, and both the mean and covariance are functions of θ . If we assume that the detector elements are uncorrelated, then the covariance matrix has diagonal components equal to the variances in the detector elements [K (θ )]mm = σ 2m , (2.71) with all other components equal to zero, where σ 2m is the variance on the noise at the mth detector element. Moreover, the determinant of a matrix is the product of its eigenvalues, and for a diagonal matrix, the eigenvalues are the diagonal entries: M det[ K (θ )] = ∏ σ 2m . m =1 (2.72) 83 The signal generated by the optical illumination is not zero-mean, but assuming the noise is independent of the illumination, the illumination simply shifts the PDF on the noise. Therefore, only the mean data vector in the PDF depends on the underlying parameters. Finally, the PDF reduces to a product of univariate PDFs, written as M pr (g | θ ) = ∏ m =1 [ g − g m (θ )] exp− m 2 2 2σ m 2πσ m 1 2 , (2.73) where gm is the measured signal at the mth element. Note that the uncorrelated components are also statistically independent, which is true only for a normal random vector, although the reverse is always true (Barrett & Myers, 2004). Barrett et al. (2007) suggest that the PDF given in (2.73) provides a more accurate representation of the data, since the pixels in commercial CCD detectors may have significant variation in dark current and responsivity. These effects can be corrected on average with digital post-processing by measuring and subtracting a dark current map, then dividing by a gain map, but this process does not result in a uniform variance across the pixels. For instance, a pixel with low response would be divided by a small gain factor, which would actually enhance the variance non-uniformity. The PDF could be utilized by measuring the variance in each element after corrections are made on the data. If there is good variance uniformity, we can treat the detector elements as identical with constant variance. Thus, the noise is modeled as i.i.d. zero-mean Gaussian: 84 M pr (g | θ ) = ∏ m =1 [ g − g m (θ )]2 exp− m . 2σ 2 2πσ2 1 (2.74) To perform ML estimation, it is especially convenient in this case to take the logarithm of the PDF, given by M ∑ 1 1 ln pr (g | θ ) = − M ln(2πσ2 ) − 2 [ g m − g m (θ )]2 . 2 2σ m =1 (2.75) We immediately observe that the first term on the right is a constant, and the second term is a sum of squares preceded by a minus sign. So, maximizing the likelihood for Gaussian i.i.d. data reduces to nonlinear least-squares fitting between the measured data and the average data vector: M θˆML = argmin θ ∑[ g m − g m (θ )]2 . (2.76) m =1 For Gaussian i.i.d. data, we will derive the FIM components using (2.14) and (2.75): F jk ∂2 = ∂θ j ∂θ k 1 1 2 M ln( 2πσ ) + 2 2σ 2 [ g m − g m (θ )]2 m =1 M ∑ Since the first term in the curly brackets is a constant, we have . g |θ (2.77) 85 F jk = 1 2σ 2 M ∂2 [ g m − g m (θ )]2 θ θ ∂ ∂ j k m =1 ∑ . (2.78) g |θ Carrying out the differentiation gives F jk =− 1 σ2 1 =− 2 σ M ∂ ∑ ∂θ j [ g m − g m (θ )] m =1 ∂g m (θ ) ∂θk g |θ ∂g m (θ ) ∂g m (θ ) ∂2 g g g m (θ ) − + − θ [ ( )] m m ∂θ j ∂θk ∂θ j ∂θ k m =1 M ∑ . (2.79) g |θ Since the data are zero-mean, the second term vanishes when averaging over the data, which leads to M F jk = 1 ∂g m (θ ) ∂g m (θ ) . 2 ∂θk σ m =1 ∂θ j ∑ (2.80) Similarly, for uncorrelated but non-identical pixels as described by (2.71), we have M 1 ∂g m (θ ) ∂g m (θ ) . 2 ∂θ j ∂θk m =1 σ F jk = ∑ m (2.81) 86 The FIM components in (2.80) and (2.81) depend solely on the average data vector evaluated at the underlying parameter and are inversely proportional to the variance. Therefore, lower noise levels yield greater Fisher information. 2.9 Practical challenges In practice, measurement procedures have limited accuracy, repeated measurements of the same quantity produce different values, and the idealized conditions under which the model was derived are never perfectly achievable. We also know that unpredicted randomness, not taken into account in deterministic models, always occurs. The randomness is a part of physical reality as much as the values of parameters underlying the data, so the model would be incomplete without an accurate description of the random phenomena. One major challenge in ML estimation is that an accurate probability model must be used that includes all sources of randomness. For instance, one might incorporate a Poisson distribution of errors to emulate photon noise, Gaussian statistics for electronic noise, or a combination thereof. Since the systems discussed in this research are not light-starved as in astronomical applications, we will adhere to a Gaussian noise model. Careful attention must also be made to avoid misalignments of the test optics, since such misalignments would manifest as errors in the estimates. One solution is to treat uncertain alignments, distances, magnifications, and so on, in the system as nuisance parameters, then utilize one of the methods in Section 2.6 for their treatment. 87 We mentioned in Chapter 1 that a significant limitation in inverse optical design is that the ML estimation step is very computationally intensive, particularly for complicated probability surfaces. Making the method practical requires rapid processing techniques and dedicated computer hardware. Several hardware platforms are available for implementing parallel algorithms, including the graphics processing unit (GPU) in video cards, capable of massively parallel high-performance computing in scientific and engineering fields. We will elaborate on GPU technology in Section 6.2. 88 CHAPTER 3 OPTIMIZATION METHODS As we discussed in Chapter 2, algorithms that implement the ML approach to estimating parameters θ perform a search through the space defined by all possible values to find a point that maximizes the probability of generating the observed data g: ( θˆ1 , θˆ2 , θˆ3 ... ) = arg max [ log pr( g | θ1 , θ 2 , θ 3 ... ) ] . (3.1) θ 1 , θ 2 , θ 3 ... Practically all search (i.e., optimization) algorithms locate an extremum through iterative methods that execute in a variable number of steps depending on the starting location, the complexity of the probability surface, and values selected for convergence factors. In Section 3.1, we describe various considerations in selecting a suitable search algorithm, while distinguishing between local and global algorithms. Section 3.2 provides a qualitative overview of global optimization algorithms and sets stochastic algorithms apart from deterministic ones. Much of this chapter is dedicated to the simulated annealing (SA) algorithm for global optimization, covered in Section 3.3. We begin with a general overview of SA, including its many desirable properties and ability to handle extremely complicated functions. Embedded in each phase of the SA algorithm is the Metropolis algorithm, which was originally developed in the context of statistical mechanics. We discuss the 89 fundamental concepts of statistical mechanics that are vital to understanding the origins of SA, as well as the applicability to general optimization problems based on a interesting analogy to thermodynamics. Lastly, we describe in detail the specific SA algorithm used in this research for continuous minimization of multimodal functions. We explain different factors in choosing an appropriate annealing schedule. We saw in Chapter 2 that ML estimation reduces to nonlinear least-squares fitting (i.e., minimizing the squares) for an i.i.d. Gaussian noise model, so we will regard the optimization problem as minimization for sake of this discussion. Nothing is sacrificed by restricting our attention to minimization, since maximizing a function is equivalent to minimizing its negative. 3.1 Selecting a search algorithm If the objective (or cost) function of interest is well-behaved and unimodal within a specified domain, there are many search algorithms to choose from to solve the optimization problem. Direct search algorithms, which only accept downhill moves along the surface to minimized, rely exclusively on values of the objective function, therefore they are relatively easy to implement. Classical direct-search methods, at least in the realm of unconstrained minimization, include pattern search methods (Hooke & Jeeves, 1961; Polak, 1971; Davidon, 1991; Torczon, 1997), simplex methods (Spendley, Hext, & Himsworth 1962; Nelder & Mead, 1965), and methods with adaptive sets of search directions (Rosenbrock, 1960; Powell, 1964, 1965). Fletcher (1965) provides an excellent review of direct search algorithms. 90 Other available search algorithms incorporate the derivatives of the objective function, such as the gradient descent or conjugate gradient method (Fletcher & Reeves, 1963, 1964; Hestenes & Stiefel, 1952; Hestenes, 1969) and Newton’s method or quasiNewton methods (Greenstadt, 1967; Spang, 1962). Although derivative-based methods are often faster and more reliable than direct-search algorithms, they are liable to terminate far from the true solution if the objective function is ill-conditioned (Corana, Marchesi, Martini, & Ridella, 1987). Conversely, if the objective function contains multiple minima, straightforward minimization will terminate at the quickest nearby solution, even if the global optimum exists some distance away. A multimodal function can occur for a number of reasons, for instance, if the mean data vector is a nonlinear functional of the parameters (Barrett & Myers, 2004). In inverse optical design, strong coupling occurs between various pairs of prescription parameters, resulting in a complicated probability surface with many local minima. The effects of parametric coupling are also evidenced in the Fisher information matrix, many of which will be presented in Chapters 5 – 7. Since the amount of local minima in the domain of interest can increase exponentially with the number of parameters to be estimated, an exhaustive search for the global minimum would require an impractical number of function evaluations. Thus the search must be limited somehow, either deterministically or stochastically. 91 Fig. 3.1: A multimodal test function in two dimensions, exhibiting a high degree of nonlinearity and various local minima. 3.2 Global optimization algorithms Global optimization algorithms commonly have two phases. An exhaustive global search in parameter space is performed, iteratively identifying a promising starting point to be used in a local search. A local minimum is then determined from each starting point, usually through a deterministic local descent algorithm that is executed by the global phase. This is performed as a black-box procedure without deeper insight into the local structure of the objective function, which makes the process amenable to broad classes of 92 problems. The global phase is either stochastic or deterministic. In contrast to deterministic global phases, stochastic global phases are often heuristic by nature (Liberti & Maculan, 2006). Deterministic algorithms usually operate with a divide and conquer scheme; the search space is recursively partitioned into smaller subspaces. Each subspace is then solved for globally, and upper and lower bounds of the objective function in the subspace are computed to test for optimality of the local minimum. If the difference between the bounds is less than a specified threshold, the local optimum is regarded as the global solution. In every step of a deterministic algorithm, there exists at most one way to proceed; if no way to proceed exists, the algorithm terminates. Since deterministic algorithms do not use random numbers in their instructions, a given input always produces the same output. Although deterministic algorithms can guarantee optimality and precision in their solutions, they are less efficient in problems with high dimensionality or complicated features, and perform at their best on small- to mediumscale problems with more obvious algebraic formulations. Stochastic global phases find the starting points for local searches through random sampling, or by trying to escape from the local basin of the current local minimum, or by emplying a combination of the two techniques. Stochastic algorithms include at least one instruction that incorporates random numbers, thus violating the condition of determinism. Although stochastic global phases do not guarantee a specified amount of optimality in their solutions, they do guarantee asymptotic convergence to the local 93 minimum as the number of evaluated points in the search space increases. They also tend to be very efficient, although the level of efficiency strongly depends on tuning parameters such as the sampling intensity, escaping capacity, and termination criteria (Liberti & Maculan, 2006). These parameters are usually found empirically and will vary depending on the behavior of the objective function. Examples of deterministic global phases are Branch-and-Select (Tuy, 1998), spatial Branch-and-Bound (Falk & Soland, 1969), and state space search. Some of the many examples of stochastic global optimization algorithms are multistart, genetic algorithms (Goldberg & Richardson, 1987), differential evolution (Storn & Price, 1997), adaptive Lagrange multiplier methods (Wah & Wang, 1999), dynamic tunneling methods (Levy & Montalvo, 1985; RoyChowdury, Singh, & Chansarkar, 2000), and variable neighborhood search (Hansen & Mladenović, 2001, 2002). Another example is simulated annealing (Kirkpatrick, Gelatt, & Vecchi, 1983, 1984), which was investigated for inverse optical design. 3.3 Simulated annealing 3.3.1 Overview Simulated annealing is a feasible candidate for the task of global optimization, since it provides a good approximation to the global optimum of a function in a high-dimensional search space with many local extrema. In fact, it has vast utility in large-scale problems with up to tens of thousands of variables (Kirkpatrick et al., 1983, 1984; Romeo, 94 Vincentelli, & Sechen 1984; Smith, Barrett, & Paxman, 1983; Smith, Paxman, & Barrett, 1985; White, 1984). Simulated annealing is a stochastic algorithm that allows transitions out of a local optimum during the search based on a probability criterion. It is able to process objective functions with high degrees of nonlinearities, discontinuities, and randomness due to noise sources. It can also distinguish between gross features and finer “wrinkles” in the function. Macroscopic features of the eventual state of the system appear earlier in the search process, and the system explores high-cost configurations, irrespective of small local minima; finer details develop later in the search, when the system is less likely to escape from the current local basin. Since simulated annealing performs constrained optimization, the search space can contain arbitrary boundary conditions and constraints, allowing one to incorporate a priori information about the system. Although simulated annealing guarantees the true solution under very stringent conditions, satisfying these requirements would lead to the global optimum much too slowly for practical use; the conditions are instead relaxed to trade-off computational time and optimality of the solution (Barrett & Myers, 2004). Even with this compromise, if simulated annealing does not find the true solution, it will find a near-optimal one (Corana et al., 1987). While simulated annealing is very promising in many aspects, its major criticism is high computational demand compared to straightforward optimization methods. 95 Simulated annealing was originally developed by Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, and Teller (1953) from the perspective of statistical mechanics, an application of probability theory in thermal physics, which includes a mathematical framework for dealing with large populations of atoms or molecules. Metropolis created a simple algorithm to simulate the thermodynamic equation of state for a complex system of atoms at a given temperature in thermal equilibrium. Since the number of atoms is on the order of 1023 per cubic centimeter, ensemble averages were replaced by sample averages through Monte Carlo sampling, more specifically, the technique known as Markov-chain Monte Carlo (MCMC). It was later discovered by Kirkpatrick et al. (1983, 1984) that the Metropolis algorithm could be applied to general optimization problems by using its underlying concepts to simulate the physical process of annealing in materials science. Annealing is the heating of a substance, such as a crystal, to a molten state, followed by a gradual reduction in temperature until the crystalline structure is frozen in. Each temperature should be held long enough for the substance to reach equilibrium, and more time must be spent near the freezing point. If done properly, the material will reach a crystalline state of high order and translational symmetry, with the ground state being the configuration of perfect order and minimum energy. However, quenching will occur if the temperature is lowered too rapidly; the substance will depart from equilibrium and the crystal will lock in many irregularities and defects, with an energy level higher than that of a perfect crystal. 96 Kirkpatrick was particularly interested in the optimal design of integrated circuits on computer chips, densely wired with an elaborate network of interconnections and electronic gates. The design variables involved the placement of the gates and partitioning of electrical components, while the objective function was a measure of system performance. In the analogy to thermodynamics, the configuration of gates and components correspond to the atomic positions in a gas or liquid, while the objective function corresponds to energy. Thus the state of lowest cost represents the ground state, or the state of lowest energy. Kirkpatrick essentially wanted to minimize the length of connections, given that wire lengths were proportionate to time delays in signal propagation. However, the configuration with the shortest possible wires did not necessarily give the best solution, because this would likely lead to congestion and noise such as interference between nearby wires. Since the objective function was defined in a discrete domain, or configuration space, this was a problem in combinatorial optimization. Nevertheless, the method was later modified to perform continuous optimization of objective functions of continuous variables, which will be described in Section 3.5 The classic example of combinatorial optimization is the traveling salesman problem, described by Kirkpatrick et al. (1983): “Given a list of N Cities and a means of calculating the cost of traveling between any two cities, one must plan the salesman’s route, which will pass through each city once and return finally to the starting point, minimizing the total cost.” Many problems involving scheduling and design, such as in computer science and engineering are akin to the traveling salesman problem. Two 97 secondary problems are to predict the expected cost of the salesman’s route, and to estimate the computing effort required to determine the route. Fig. 3.2: Illustration of the travelling salesman problem and its solution. 3.3.2 Basic concepts in statistical mechanics In an ensemble of N identical particles, such as in a gas or liquid, the state of the nth particle is defined by its position rn and velocity vn in 3D space. Each possible system configuration is described by a set of 6N coordinates, which can be regarded as a point in 6N-dimensional phase space (Barrett & Myers, 2004). The energy of the ensemble is denoted by ε ({rn, vn}), where the brackets comprise the set of N particles; in the jth state 98 of the system, εj = ε ({rnj, vnj}). Due to the random behavior of the system, the energy fluctuates about some average value, where the average is taken over the ensemble of identical systems. Assuming the system is in thermal contact with a heat bath, both the temperature and mean energy remain constant. One of the most useful functions in statistical mechanics is the partition function, defined as Z≡ ε ε ∑ exp − k BjT = ∑ exp − τj . j (3.2) j The summation is over all possible states of the system, where each state is weighted by its Boltzmann factor, given by exp(-εj / kBT). Here, εj is the energy of the system in state j, kB is Boltzmann’s constant with units of energy per kelvin, and T is the absolute temperature in kelvin. Note that the fundamental temperature τ = kBT differs from the absolute temperature by a scale factor kB and has units of energy. The partition function is the normalizing factor between the probability of the system in state j at thermal equilibrium Pr( j) and the respective Boltzmann factor: Pr( j ) = εj 1 exp − Z τ . (3.3) 99 Derivatives, as well as logarithms, of the partition function lead to a series of important thermodynamic quantities. For instance, the mean energy of a system is obtained by averaging over all states of the ensemble: 〈ε (τ )〉 ≡ ε (τ ) = ∑ ε j Pr( j) . (3.4) j This ensemble average energy represents the states of a system that can exchange energy with a reservoir. Using the relationship ∂ ln Z 1 = , ∂Z Z (3.5) we have ε (τ ) = =− 1 Z ε ∑ε j exp − τj j 1 ∂Z ∂ ln Z ∂Z ∂ ln Z =− =− . ∂Z ∂ (1/τ ) ∂ (1 τ ) Z ∂ (1 τ ) (3.6) ∂ ln Z ∂τ ∂ ln Z =τ 2 , ∂τ ∂ (1 / τ ) ∂τ (3.7) Rearranging further gives ε (τ ) = − 100 The entropy of the system, denoted as σ(τ ), is essentially the logarithm of the “number of ways that the state can be constructed from indistinguishable molecules” (Barrett & Myers, 2004), or the number of possible configurations in the ensemble of particles. The fundamental entropy is given by σ (τ ) = − ∑ Pr( j) ln Pr( j) = ln(Z ) + j ε (τ ) . τ (3.8) Entropy is central to the second law of thermodynamics, also called the law of increase in entropy, which states that the entropy of a thermally isolated closed system cannot decrease; if a constraint internal to the system is removed, then the entropy tends to increase (Kittel & Kroemer, 1980). The probability distribution that maximizes the statistical entropy is the Boltzmann distribution (Bonomi & Lutton, 1984). Another important quantity that utilizes the partition function is the Helmholtz free energy, which is related to the logarithim of Z and carries information regarding the mean energy and the entropy: − τ ln(Z ) = F (τ ) = ε (τ ) − τ σ (τ ) , (3.9) The free energy conveys how to “balance the conflicting demands of a system for minimum energy and maximum entropy” and is at a minimum when the system is coupled to a reservoir, provided that the volume is constant (Kittel & Kroemer, 1980). 101 In fundamental units, the heat capacity of a system (at constant volume) is defined as the rate of change of the energy with temperature, which can be shown mathematically to be proportional to the variance in energy: ∂ε (τ ) 〈ε (τ ) 2 〉 − 〈ε (τ )〉 2 CV (τ ) ≡ . = ∂τ τ2 (3.10) Observe that the heat capacity according to (3.10) is a dimensionless quantity. In conventional units (i.e., energy per kelvin), however, it is written as CV (T ) ≡ ∂ε (T ) 〈ε (T ) 2 〉 − 〈ε (T )〉 2 . = ∂T k BT 2 (3.11) We can prove the relationship in (3.10) by first expanding the mean energy in terms of the Boltzmann factors, as shown in (3.06): CV (τ ) ≡ ∂ε (τ ) ∂ 1 = ∂τ ∂τ Z (τ ) ∑ ε j exp(−ε j / τ ) . j Applying the derivative leads to Z (τ ) CV (τ ) = ∂ ∂τ ∑ε j exp(−ε j /τ ) − j Z (τ ) ∂Z (τ ) ∂τ 2 ∑ ε j exp(−ε j /τ ) j (3.12) 102 Z (τ ) = τ2 ∑ 1 1 = 2 τ Z (τ ) ε 2j exp(−ε j j ∑ 1 /τ ) − 2 τ Z (τ ) 2 ε 2j exp(−ε j j ∑ j 1 /τ ) − Z (τ ) ε j exp(−ε j / τ ) ∑ j 2 ε j exp(−ε j / τ ) 2 , (3.13) and using the Boltzmann probability in (3.3), we obtain 1 CV (τ ) = 2 τ ∑ j ε 2j Pr( j ) − ∑ j ε j Pr( j ) 2 . (3.14) The first and second sums are simply the expected values of ε 2j and ε j , respectively, so that (3.14) is identical (3.10). The heat capacity per unit mass is called the specific heat. An abrupt change in either the heat capacity (or specific heat) with a small change in temperature indicates a phase transition of a thermodynamic system. Accordingly, a large value of the heat capacity signifies a change in state of a system, which can be used during optimization to indicate that freezing has begun, and that gradual cooling is required to avoid quenching (Kirkpatrick et al., 1983). 103 3.3.3 The Metropolis algorithm For complex thermodynamic systems, the partition function for the entire ensemble of particles can be extremely difficult to calculate. In 1953, Metropolis et al. developed a simple algorithm to simulate the equation of state for a many-body system in thermal equilibrium at a given temperature. In this approach, many random samples of various molecular (or atomic) configurations {rn, n = 1, …, N} are generated through Monte Carlo sampling; thus properties of the system can be estimated by using sample averages in lieu of ensemble averages. The algorithm proceeds iteratively: At every iteration, a particle undergoes a small random displacement, resulting in a change of energy ∆ε in the configuration of particles. If ∆ε ≤ 0, the new configuration is automatically accepted, and the system moves to a state of lower energy. If ∆ε > 0, the configuration is accepted with probability Pr(∆ε) = exp(−∆ε/τ). To implement this, a random number ξ uniformly distributed in the range (0, 1) is drawn; if ξ < Pr(∆ε), the new configuration is accepted and the particle moves to its new position, but if ξ > Pr(∆ε), the old configuration is retained. In either case, the current configuration contributes to the calculation of sample averages: F = 1 M M ∑ Fi , i =1 (3.15) 104 where F represents a property of the system, and i is the iteration number, and M is the total number of iterations so far. Through repetitive execution of the basic step described above, one can simulate the thermal motion of particles in equilibrium at a fixed temperature. The choice of Pr(∆ε) means that the system approaches a Boltzmann distribution, as long as all possible states can eventually be reached (Barrett & Myers, 2004). The method is said to be ergodic if this condition is satisfied, that is, if all states can be reached from any other in a finite number of steps. 3.3.4 Continuous minimization by simulated annealing An adaptive form of simulated annealing was developed by Corana et al. (1987) for minimizing multimodal functions in a continuous domain, in other words, to determine the position in N-dimensional space of the minimum of a given function of N variables. This algorithm considers unique physical constraints (e.g., different finite ranges) for each parameter, as well as different sensitivities of the cost function along different parametric axes. It also optimizes computational time by attempting to maintain a oneto-one ratio between the number of accepted and rejected moves, thereby searching the parameter-space more efficiently. The precise method and description of the algorithm as implemented by Corana are outlined in the remainder of this section. 105 Method Suppose Q(θ ) is the objective function to minimize, where θ = {θn, n = 1, …, N}is a vector in RN. The N variables each range over a continuous, but finite, interval: a1 < θ 1 < b1, …, aN < θ N < bN. Though Q must be bounded, it may be discontinuous. To be clear, θ ni denotes the nth component of θ at the ith iteration. The algorithm has an iterative scheme, which is outlined in Fig. 3.3. Starting with a given point θ 0 called the initial guess, it generates a succession of points, θ 0, θ 1, …, θ i, …, in search of the minimum of the function. The initial guess can be made by the use of prior information, such as estimates computed from previous studies, known values determined from other systems, or values calculated through theoretical means. 106 Fig. 3.3: The simulated annealing algorithm implemented by Corana et al.(1987). 107 The set of points that are accessible in a single step are said to be in the neighborhood of the current point. A new candidate point is generated in the neighborhood of the current point θ i by making a random move along a single coordinate direction; this step is repeated for the remaining directions, yielding a cycle through all variables. The step vector v, also a vector in RN, represents the size of the neighborhood around θ i. New coordinate values are uniformly distributed in a bracket centered on the respective coordinate of θ i, and half the size of this bracket for each coordinate is recorded in v. A new point is discarded if it falls outside the definition domain of Q, and another point is generated until one within the definition domain is obtained. The Metropolis criterion determines whether a candidate point θ ′ is accepted or rejected (Metropolis et al., 1953): If ∆Q ≤ 0, then accept the new point: θi+1 = θ ′, else accept the new point with probability: Pr(∆Q) = exp(−∆Q/τ), where ∆Q = Q(θ ′) – Q(θ i) and τ is an effective temperature. Therefore, downhill moves are always accepted, while uphill moves are made probabilistically. At a given temperature, the succession of points θ 0, θ 1, …, θ i, … is not purely downhill, except at τ = 0, when uphill moves are no longer possible. For large values of τ >> ∆Qavg, where the average is taken over random pairs of points inside the definition domain of Q, nearly all new points are accepted, resulting in a random sampling of Q over the entire definition domain. 108 The search begins at a high temperature τ0 declared by the user. A succession of points θ 0, θ 1, …, θ i, … is generated until the system equilibrates, that is when the average value of the objective function Qavg converges as the number of iterations increases. In the thermodynamics analogy, Qavg corresponds to the internal energy of the system. The best point obtained is recorded as θ opt, which is used as a starting point for the next temperature phase. The temperature τ is reduced after each phase according to an annealing schedule, which will be discussed later. At each temperature, the step vector vm is periodically adjusted to adapt to the function behavior, attempting to equalize the number of accepted and rejected moves; the index m describes successive step vector adjustments during a single temperature phase. The process is terminated at a very low temperature when there is no more improvement is possible, according to a terminating criterion. In the optimization context, an iterative search accepting only configurations that lower the objective function is like rapidly quenching a thermodynamic system, so that it is most likely to become trapped in a metastable, local minimum. Conversely, the effective temperature τ acts as a control parameter that allows the system to make uphill moves and explore high-cost configurations. At higher temperatures, the system is more sensitive to the large-scale features of the function. As the temperature decreases, finer details in the surface emerge, while the system is less likely to escape a local basin. Although simulated annealing does not provide a certificate of optimality for the final point, the method searches for a better solution in the presence of many local minima. 109 Annealing schedule The simulated annealing algorithm guarantees the true minimum as long as the temperature is proportional to the reciprocal of the logarithm of the iteration number: τk =τ0 ln k 0 , ln k (3.16) where τ is the temperature, k is the iteration number, and k0 is some starting index. For large k, (3.16) can be rewritten as ∆τ = −τ 0 ln k 0 ∆k , k >> 1, k (ln k ) 2 (3.17) ln k0 . k (ln k ) 2 (3.18) or using recursive indices, τ k +1 = τ k − τ 0 Such a logarithmic temperature schedule is consistent with the Boltzmann algorithm, but it converges to the true minimum much too slowly for practical use (Ingber, 1993). Thus in practice, the optimality of the solution is traded for better computational time. To expedite the search process, some researches employ an exponential annealing schedule: τ k +1 = rτ k , 0 < r < 1, (3.19) 110 where r is the reduction coefficient. When the iteration number k is large, ∆τ τk = (r − 1)∆k , k >> 1, (3.20) or equivalently, τ k = τ 0 exp[k (r − 1)] . (3.21) The tail of the exponential function enforces a gradual decline in temperature toward the end of the search when freezing takes place. Algorithms that use a rapid cooling schedule are referred to as simulated quenching (Gillet & Sheng, 1999; Ingber, 1996; Sato, 1997). One of the many questions that arise when choosing an efficient annealing schedule concerns the choice of starting temperature τ0. In combinatorial problems, a recommended value is on the order of magnitude of the standard deviation of the objective function in the domain of interest (White, 1984). In the case of continuous optimization, however, this leads to starting temperatures that are higher than necessary, thereby wasting computational time (Corana et al., 1987). The success or failure of the simulated annealing algorithm often depends on the choice of annealing schedule, involving important parameters such as the starting temperature, the reduction rate between successive temperature phases, the number of function evaluations at each temperature, and the time at which to terminate the search 111 process. An effective combination of control parameters is best determined empirically for a given problem, or by carefully monitoring the optimization process. (This constitutes yet another optimization problem!) Nonetheless, physical intuition of the problem also plays an instrumental role in simulated annealing. The algorithm The following is a description of the algorithm used in this research, as outlined by Corana et al. (1987), apart from notational changes: Step 0 (Initialization) Specify: A starting point in the parameter space θ0 . An initial temperature τ 0 . An initial step vector v 0 . A terminating criterion δ and a number of consecutive temperature reductions to test for termination Nδ . A test for change in the step vector NS and a varying criterion c. A test for temperature reduction NT and a reduction coefficient rT . 112 Set i, j, m, k to 0, where i is the index for successive points, j denotes successive cycles along every direction, m describes successive step adjustments, and k is for successive temperature reductions. Set h to 1, where h is the index denoting the direction along which the trial point is generated, starting from the last accepted point. Compute Q0 = Q( θ0 ). Set θopt = θ0 . Set Qopt = Q0. Set nu = 0, u = 1, …, N. Set Qu* = Q0, u = 0, −1, …, − Nδ + 1. Step 1 Starting from the point θ i, generate a random point θ ′along the direction h: θ ′ = θ i + rvm,heh , (3.22) where r is a random number generated in the range (−1, 1) by a pseudorandom number generator, eh is the vector of the hth coordinate direction, and vm,h is the component of the step vector vm along the same direction. 113 Step 2 If the hth coordinate of θ ′ lies outside the definition domain of Q (i.e., if xh′ < ah or xh′ > bh ), then return to step 1. Step 3 Compute Q′ = Q(θ ′). If Q′ ≤ Qi , then accept the new point: Set θ i+1 = θ ′, set Qi+1 = Q′, add 1 to i, add 1 to nh , if Q′ < Qopt, then set θ opt = θ ′, Qopt = Q′. endif; else (Q′ > Qi) accept or reject the point with probability p (Metropolis criterion): Q − Q′ . p = exp i τk (3.23) 114 Generate a random number p′ uniformly distributed over the range (0, 1) and compare to p: If p′ < p, the point is accepted, otherwise it is rejected. In the case of acceptance: Set θ i+1 = θ ′, set Qi+1 = Q′, add 1 to i, add 1 to nh . Step 4 Add 1 to h. If h ≤ n, then go to step 1; else set h to 1 and add 1 to j. Step 5 If j < NS, then go to step 1; else update the step vector vm: For each direction u the new step vector component vu′ is n N − 0 .6 vu′ = vm,u 1 + cu u S 0 .4 if nu > 0.6NS , (3.24) 115 vu′ = v m ,u 0.4 − nu N S 1 + cu 0.4 vu′ = vm,u if nu < 0.4NS , (3.25) otherwise. (3.26) Set vm+1 = v′, set j to 0, set nu to 0, u = 1, …, N, add 1 to m. The control parameter for the step variation along the uth direction is denoted cu. The modifications in step length attempt to equalize the number of accepted and rejected moves for maximal efficiency. A high number of accepted moves indicates that the trial points are too close to the initial ones and the system evolves too slowly, whereas a high number of rejected moves means that the candidate points are too distant. Either of these cases results in lower computational efficiency and wasted effort. Step 6 If m < NT, then go to step 1; else, reduce the temperature τ k: Set τk+1 = rτ ⋅τk , set Qk* = Qi , 116 add 1 to k, set m to 0. Notice that an exponential annealing schedule is used. Furthermore, a temperature reduction occurs after NT step adjustments, or NS ⋅ NT cycles of moves, where one cycle is over N coordinate directions. Step 7 (Terminating criterion) If: |Qk* − Qk-u*| ≤ δ, u = 1, …, Nδ Qk* − Qopt ≤ δ , then stop the search; else: Add 1 to i, set θ i = θ opt , set Qi = Qopt . Go to step 1. Corana et al. (1987) suggests the following reasonable values for the control parameters in the algorithm to be used for initial test runs: NS = 20, NT = max(100, 5⋅N), 117 ci = 2, i = 1, …, N, Nδ = 4, rT = 0.85. Of course, suitable control parameters depend on the particular objective function being optimized. They are determined heuristically through carefully observing the system over successive test runs. 118 CHAPTER 4 PROPAGATION OF LIGHT Light is an electromagnetic wave phenomenon governed by the same principles ascribed to the complete spectrum of electromagnetic radiation. Although electromagnetic waves are vectorial in nature, propagating as a mutually coupled electric-field wave and magnetic-field wave, a scalar wave theory in which light is treated as a scalar wavefunction often suffices in describing many optical phenomena. The effects of the wave nature of light are most appreciable when light propagates around objects whose size compares to a wavelength, but when the objects are much larger, the wave behavior can be approximated by rectilinear light rays that travel through different media according to a set of geometrical laws. The term diffraction refers to the deviation from rectilinear propagation. The branch of optics that treats light as electromagnetic waves is called electromagnetic optics, while the scalar wave theory and the ray theory are referred to as wave optics and ray optics or geometrical optics, respectively. In principle, geometrical optics is the limit of wave optics when the wavelength becomes infinitesimally small, and wave optics is an approximation of electromagnetic optics. These theories are encompassed by classical optics, but certain optical phenomena are quantum mechanical in nature, requiring a quantum theory of light (quantum optics). 119 This chapter imparts fundamental results in the classical regime, beginning with the basic equations of electromagnetism, called Maxwell’s equations, in Section 4.1. From these equations, we derive the general time-dependent and time-independent wave equations, which decouple the electric-field and magnetic-field wave components. We discuss two important solutions to these equations, plane waves and spherical waves, in Section 4.2. In Section 4.3, we use Maxwell’s equations once again to derive the basic equation of geometrical optics, namely, the eikonal equation. We then develop several theoretical principles governing the propagation of light rays through dielectric media, as well as reflection and refraction at a planar dielectric boundary. Section 4.4 is dedicated to a facet of diffraction theory. We deal with the specific case where a wave propagates through a planar aperture and travels some distance in free space, and we formulate several laws for calculating the diffraction pattern in different regions of space. 4.1 The electromagnetic field The electromagnetic field is a real physical entity that occupies the space surrounding electric charges. It is described by four vector fields, each a function of a 3D position vector, denoted as r = ( x, y, z ) , and time t. These vector fields are the electric field E(r, t ) , the magnetic field H (r, t ) , the electric flux density or displacement D(r, t ) , and the magnetic flux density or induction B(r, t ) . While E(r, t ) and H (r, t ) are regarded as 120 the basic field vectors, D(r, t ) and B(r, t ) represent the influence of matter, which is described by an electric current density j(r, t ) (current per unit area) and a charge density q (r, t ) (charge per unit volume). The standard notation for charge density is ρ, however, we are reserving it to denote spatial frequency later in this chapter. 4.1.1 Maxwell’s equations All classical electromagnetic phenomena are governed by Maxwell’s equations, which describe the dynamics of charged particles interacting with electromagnetic fields. Maxwell’s equations are a set of four coupled first-order partial differential equations, written in the International System of Units, abbreviated SI, as ∇ ⋅ D(r, t ) = q (r, t ) , (4.1a) ∇ ⋅ B (r , t ) = 0 , (4.1b) ∇ × E( r , t ) = − ∂ B (r , t ) , ∂t ∇ × H (r, t ) = j(r, t ) + ∂ D(r, t ) . ∂t (4.1c) (4.1d) These equations incorporate the divergence and curl operators, denoted as ∇⋅ and ∇×, respectively, where the vector del operator in Cartesian coordinates is given by 121 ∂ ∂ ∂ ∇= , , . ∂x ∂y ∂z (4.2) The divergence of a vector field, such as in (4.1a) and (4.1b), becomes a scalar field. Thus, the charge density in (4.1a) is scalar-valued. Since the divergence of a curl is always zero, it follows from (4.1d) that ∇ ⋅ j(r, t ) = −∇ ⋅ ∂D(r, t ) ∂ = − ∇ ⋅ D(r, t ) . ∂t ∂t (4.3) After invoking (4.1a), we have ∇ ⋅ j(r, t ) + ∂q (r, t ) = 0, ∂t (4.4) which is referred to as the continuity equation and expresses local conservation of charge. The divergence theorem states that ∫V (∇ ⋅ v) d 3 ∫S r = v ⋅ nˆ da , (4.5) where v is an arbitrary vector field, V is a volume enclosed by a closed surface S with area element da, and n̂ is a unit outward normal on S. Applying (4.5) to (4.4) leads to ∂ ∫S j(r, t ) ⋅ nˆ da + ∂t ∫V q(r, t )d 3 r = 0, (4.6) 122 where the first integral represents the total current flowing through the surface J= ∫S j(r, t ) ⋅ nˆ da , (4.7) and the second integral is the total charge in the volume Q(t ) = ∫V q(r, t )d 3 r. (4.8) Thus, the total charge contained in the volume can only change with the flow of electric current. 4.1.2 Constitutive relations In order to uniquely determine the field vectors that appear in Maxwell’s equations for a given distribution of charge and current, we need to describe the behavior of material media when influenced by a field. These relationships are called the constitutive relations (or material equations). In isotropic linear media these relations are: P ( r , t ) = ε 0 χ e E( r , t ) , (4.9) M (r , t ) = χ m H ( r , t ) , (4.10) where P (r, t ) is the polarization, M (r, t ) is the magnetization, χ e is the electric susceptibility, χ m is the magnetic susceptibility, and ε 0 is the permittivity of free space. 123 In a dielectric medium, the polarization is defined as the macroscopic average of the electric dipole moment per unit volume. The magnetization is similarly defined for magnetic dipole moments. Polarization results from distortion of the charge distribution in a medium in the presence of an external electric field. For many materials, the polarization is proportional to the electric field so that (4.9) holds, provided the field is not too strong. If so, we can express the electric flux density as D(r , t ) = ε 0 E(r, t ) + P(r, t ) ≡ ε E(r, t ) , (4.11) where ε ≡ ε 0 (1 + χ e ) is the dielectric constant (or permittivity). Apart from ferroelectric media, the polarization of a material is impermanent and is only induced by an instantaneous external field. In ferroelectrics the electric displacement is determined by the past history of the field, instead of its instantaneous value. This effect is referred to as hysteresis. Magnetization occurs when an applied magnetic field creates a net alignment of magnetic dipoles in a magnetic substance. When the magnetization is linearly proportional to the field as in (4.10), we can write B(r, t ) = µ0 [H (r, t ) + M (r, t )] ≡ µH (r , t ) , (4.12) 124 where µ ≡ µ 0 (1 + χ m ) is the magnetic permeability and µ0 is the permeability of free space. The magnetic susceptibility χ m is very close to zero ( µ ≈ µ 0 ) for most materials, but for magnetic media, χ m is substantially far from zero. Unlike electric polarization, which is usually in the same direction as E(r, t ) , certain materials experience a magnetization parallel to H (r, t ) (e.g., oxygen, aluminum, platinum) and others opposite to H (r, t ) (e.g., water, copper, silver). These materials are said to be paramagnetic ( χ m > 0) and diamagnetic ( χ m < 0) , respectively. Except for ferromagnetic materials (e.g., iron, cobalt, nickel), the magnetization is not retained after the external field is removed. Nonetheless, optical frequencies do not have an effect on magnetic dipoles as they do on electric ones (Barrett & Myers, 2004), so we can safely assume that χ m = 0 as in free-space. For exceedingly strong fields, which can be obtained, for instance, by focusing a laser beam, the constitutive relations can be nonlinear functions of E(r, t ) and H (r, t ) . Throughout this chapter, we will deal with light propagation in linear media. We will also consider transparent, non-conducting media, in which light traverses without considerable weakening. In free space where there are no charges or currents, the constitutive relations become D(r, t ) = ε 0 E(r, t ) , (4.13) 125 B(r, t ) = µ0 H (r, t ) . (4.14) and Maxwell’s equations can be expressed solely in terms of E(r, t ) and H (r, t ) : ∇ ⋅ E (r , t ) = 0 , (4.15a) ∇ ⋅ H (r , t ) = 0 , (4.15b) ∇ × E( r , t ) = − µ 0 ∂ H (r , t ) , ∂t (4.15c) ∇ × H (r , t ) = ε 0 ∂ E( r , t ) . ∂t (4.15d) 4.1.3 Time-dependent wave equation Consider a region without material media, although it may contain charges and currents to generate an electromagnetic field. Using the constitutive relations (4.13) and (4.14), Maxwell’s equations become ∇ ⋅ E (r , t ) = 1 ε0 q (r , t ) , ∇ ⋅ H (r , t ) = 0 , ∇ × E( r , t ) = − µ 0 ∂ H (r , t ) , ∂t (4.16a) (4.16b) (4.16c) 126 ∇ × H (r, t ) = j(r , t ) + ε 0 ∂ E( r , t ) . ∂t (4.16d) To derive the wave equation for the electric field, we will need the following identity from vector calculus: ∇ × (∇ × v ) = ∇(∇ ⋅ v) − ∇ 2 v , (4.17) where the Laplacian operator in Cartesian coordinates is ∂2 ∂2 ∂2 . ∇2 = , , ∂x 2 ∂y 2 ∂z 2 (4.18) Taking the curl of (4.16c) leads to ∇ × [∇ × E(r, t )] = − µ 0 ∂ ∇ × H (r , t ) . ∂t (4.19) Applying (4.16d) and the vector identity (4.17) gives ∇[∇ ⋅ E(r, t )] − ∇ 2 E(r, t ) = − µ 0 ∂ ∂ j(r, t ) + ε 0 E(r, t ) . ∂t ∂t Finally, using (4.16a) leads to the wave equation for E(r, t ) : (4.20) 127 2 ∇ 2 − µ ε ∂ E(r, t ) = µ ∂ j(r , t ) + 1 ∇q (r, t ) . 0 ∂t 0 0 ε0 ∂t 2 (4.21) A similar approach leads to the wave equation for H (r, t ) : 2 ∇ 2 − µ ε ∂ H (r, t ) = −∇ × j(r, t ) . 0 0 ∂t 2 (4.22) We have now reduced the set of four Maxwell’s equations to two second-order equations in which E(r, t ) and H (r, t ) are uncoupled. Thus, we can directly solve for the field vectors if q (r, t ) and j(r, t ) are given. Note, however, that (4.21) and (4.22) each contain three equations, since E(r, t ) and H (r, t ) each consist of three Cartesian components, while these six equations are also uncoupled from one another. The wave equations in (4.21) and (4.22) each have the basic form, 2 ∇ 2 − 1 ∂ u (r, t ) = s (r, t ) , υ 2 ∂t 2 (4.23) called the time-dependent inhomogeneous scalar wave equation. Here, u (r, t ) represents a scalar field and s (r, t ) is a corresponding scalar source distribution of charges and (or) currents, which can be derived from Maxwell’s equations (Jackson, 1975; Barrett & 128 Myers, 2004). When there are no sources, s (r, t ) = 0 , so that (4.23) becomes homogeneous. The constant υ in (4.23) symbolizes the speed of wave propagation in the medium, while the choice of υ depends on the type of media and the type of wave. For the material media described in Section 4.1.2, υ 2 = 1 / µ 0 ε ≡ c 2 /n 2 , (4.24) where c is the speed of light in vacuum and n is the refractive index of the medium. If the material is homogeneous, then both c and n are constants In general, however, they can have a spatial or temporal dependence. By introducing Fourier transformations, we can convert (4.23) from a partial differential equation to an algebraic one. Assuming that u (r, t ) and s (r, t ) have the Fourier integral representations, ∞ ∫ 3 ∫ 3 u (r , t ) = d σ ∞ (4.25a) −∞ ∞ s (r , t ) = d σ ∞ ∫ dν U (σ,ν ) exp[2πi(σ ⋅ r −ν t )] , ∫ dν S (σ,ν ) exp[2πi(σ ⋅ r −ν t )] , −∞ we can write the inverse relations as (4.25b) 129 ∞ ∫ 3 ∫ 3 ∫ U (σ ,ν ) = d r dt u (r, t ) exp[−2πi (σ ⋅ r − ν t )] , ∞ −∞ ∞ ∫ S (σ ,ν ) = d r dt s (r, t ) exp[−2πi (σ ⋅ r − ν t )] , ∞ (4.26a) (4.26b) −∞ where σ denotes the 3D spatial-frequency. Note that according to the sign convention in (4.25) and (4.26), a wave component travelling in the +σ direction has positive temporal frequency ν. Using these transformations, the time-dependent scalar wave equation in the Fourier domain is given by ν 2 U (σ ,ν ) = S (σ ,ν ) . − 4π 2 σ 2 − 2 υ (4.27) 4.1.4 Time-independent wave equation The time dependence of the scalar wave equation can be removed in the case when the source oscillates at a single frequency, ν 0, such that s (r, t ) = s (r ) exp(−2πiν 0t ) . (4.28) Since s (r, t ) must be real, we take the real part of the complex exponential. We use complex notation for the scalar source and field for mathematical ease in calculations, 130 then we can take the real part of the final expression to represent the physical quantity of interest. This process is valid due to linearity of the wave equation. Note that s (r ) can be complex, so the magnitude and phase of the oscillation may change with position. In the Fourier domain, the source is given by S (σ ,ν ) = S (σ )δ (ν − ν 0 ) . (4.29) A monochromatic source is one that oscillates at a single frequency, satisfying both (4.28) and (4.29). Combining (4.27) and (4.29) gives ν 2 − 4π 2 σ 2 − U (σ ,ν ) = S (σ )δ (ν − ν 0 ) . 2 υ (4.30) The only solution is to have U (σ ,ν ) = U (σ )δ (ν − ν 0 ) , (4.31) u (r, t ) = u (r ) exp(−2πiν 0t ) . (4.32) which is equivalent to Here we observe that the field has the same monochromatic time dependence as the source, although there may be a phase shift, since u (r ) can be complex. The reason is that the wave equation represents a temporal linear shift-invariant system, of which 131 exp(−2πiν 0t ) is an eigenfunction. Thus any other choice of time dependence for the source, besides the complex exponential, would not lead to the same time dependence of the field (Barrett & Myers, 2004). For a monochromatic source, the Fourier transform of the wave amplitude satisfies 2 ν 02 U (σ ,ν ) = S (σ ) , − 4π σ − 2 υ 2 (4.33) or in the space domain, ∇ 2 + k 2 u (r ) = s (r ) , (4.34) where k = 2πν0/υ. Each Fourier component of the scalar field satisfies (4.33) and (4.34). This equation is called the time-independent scalar wave equation, more commonly referred to as the Helmholtz equation. It is called the homogeneous Helmholtz equation when s (r ) = 0 . Notice that there is no implicit time dependence in (4.34). 4.2 Plane waves and spherical waves An important feature of the time-dependent and time-independent wave equations is the existence of travelling wave solutions, representing the transport of electromagnetic energy. In this section, we will examine two fundamental solutions, plane waves and 132 spherical waves. Only simple media, free of sources and characterized by spatially constant permeability and susceptibility, will be considered here. 4.2.1 Plane waves The simplest solution of the homogeneous wave equation is the monochromatic plane wave, which has the form u (r, t ) = exp(ik ⋅ r − 2πiν 0t ) , (4.35) where the frequency ν0 is related to the magnitude of the wave vector k = |k| by k= 2πν 0 υ . (4.36) Parallel to the direction of k, the function (4.35) is periodic with a wavelength of λ = 2π/k. It is particularly convenient to replace the wave vector with the 3D spatial frequency σ, defined in Cartesian components as k = 2πσ = (2πξ, 2πη, 2πζ ) . Inserting (4.35) and (4.37) into the homogeneous wave equation (4.23) leads to (4.37) 133 2 ∇ 2 − 1 ∂ exp[2πi (σ ⋅ r −ν t )] 0 2 2 ∂ t υ ν2 = −4π 2 ξ 2 + η 2 + ζ 2 − 02 exp[2πi (σ ⋅ r − ν 0t )] = 0 , υ (4.38) which has solution σ 2 = ξ 2 +η 2 + ζ 2 = ν 02 . υ2 (4.39) This is equivalent to ξ 2 +η 2 + ζ 2 = 1 λ2 , (4.40) since σ = 2π/k = 1/λ. From (4.40), we see that the components of σ depend on each other through the wavelength λ, so that if we know two of the components, say ξ and η, ζ can be determined by ζ =± 1 2 λ − ξ 2 −η 2 . (4.41) Although there is sign ambiguity, a wave that propagates in the +z direction requires the positive sign (Barrett & Myers, 2004). 134 4.2.2 Spherical waves A monochromatic spherical wave is written as u (r , t ) = 1 exp(ik | r − r0 | −2πiν 0t ) , | r − r0 | (4.42) which has spherical symmetry about the point r0. Showing that (4.42) is a solution to the wave equation requires us to take its Laplacian, but we first make a useful change of variables, R ≡ r − r0 . (4.43) In spherical coordinates centered on r0, R has components (R, θR, φR), but because u (r, t ) has no angular dependency, the Laplacian is simply exp(ikR) exp(ikR) 1 ∂ 2 ∂ exp(ikR) , = 2 = −k 2 R ∇2 R R R R ∂R ∂R (4.44) given that R ≠ 0. The behavior at R = 0 involves a discussion of Green’s functions, but we will not elaborate on that here. Using (4.44) in the homogeneous wave equation (4.23) gives 2 2 2 ∇ 2 − 1 ∂ exp(ikR − 2πiν 0t ) = − k 2 − 4π ν 0 exp(ikR − 2πiν 0t ) = 0 . (4.45) R R υ 2 υ 2 ∂t 2 135 We observe from (4.45) that the spherical wave in (4.42) satisfies the homogeneous wave equation as long as k = 2πν0/υ, as in the case for plane waves. The significance of these results is that we can “decompose an arbitrary solution of the homogeneous wave equation into monochromatic plane waves or spherical waves, and for each component we can define a wavelength λ and an associated k = 2π/λ” (Barrett & Myers, 2004). 4.3 Geometrical optics The fundamental equation of geometrical optics, called the eikonal equation, is a direct implication of Maxwell’s equations in the short-wavelength limit. This section begins with a derivation of the eikonal equation, which we use to define geometrical wavefronts and geometrical light rays. We then develop a set of mathematical laws governing the propagation of light rays through a dielectric medium, as well as the refraction and reflection of rays at the interface between two dielectric media. 4.3.1. The eikonal equation A general time-harmonic field in a non-conducting isotropic medium can be written as E(r, t ) = E0 (r ) exp(−iω0t ) , (4.46a) H (r, t ) = H 0 (r ) exp(−iω0t ) , (4.46b) 136 where E 0 (r ) and H 0 (r ) are complex vector functions of position. It is understood that the real parts of the expressions on the right-hand side represent the fields. On substituting (4.46) into (4.1), we find that E 0 (r ) and H 0 (r ) satisfy the timefree Maxwell’s equations. In source-free media that satisfy the assumptions in Section 4.1.2, these equations are given by ∇ ⋅ ε (r )E 0 (r ) = 0 , (4.47a) ∇ ⋅ µ ( r ) H 0 (r ) = 0 , (4.47b) ∇ × E 0 ( r ) = iω 0 µ ( r ) H 0 ( r ) , (4.47c) ∇ × H 0 (r ) = −iω0ε (r )E 0 (r ) , (4.47d) where the constitutive relations (4.11) and (4.12) have been applied. Here we let the permittivity ε and magnetic permeability µ , and therefore the refractive index n = c( µε )1 / 2 , vary with position. We also have ω0 = 2πν 0 = ck 0 = with λ0 denoting the vacuum wavelength. 2πc , λ0 (4.48) 137 Deriving the eikonal equation begins by representing the complex vectors E 0 (r ) and H 0 (r ) in the form E 0 (r ) = e(r )exp[ik 0 S (r )] , (4.49a) H 0 (r ) = h(r ) exp[ik 0 S (r )] , (4.49b) where S (r ) is the eikonal, a real scalar function of position. Conversely, e(r ) and h(r ) are complex vector functions of position; letting these functions be complex allows all possible polarization states to be included (Born & Wolf, 1999). Applying the familiar vector identities, ∇ ⋅ ( fv ) = f (∇ ⋅ v ) + v ⋅ (∇f ) , (4.50) ∇ × ( fv ) = f (∇ × v ) − v × (∇f ) , (4.51) to (4.49) results in ∇ × H 0 (r ) = [∇ × h(r ) + ik 0∇S (r ) × h(r )] exp[ik 0 S (r )] , (4.52a) ∇ ⋅ µ (r )H 0 (r ) = [ µ (r )∇ ⋅ h(r ) + h(r ) ⋅ ∇µ (r ) + ik 0 µ (r )h(r ) ⋅ ∇S (r )] exp[ik 0 S (r )] , (4.52b) ∇ × E 0 (r ) = [∇ × e(r ) + ik 0 ∇S (r ) × e(r )] exp[ik 0 S (r )] , (4.52c) 138 ∇ ⋅ ε (r )E 0 (r ) = [ε (r )∇ ⋅ e(r ) + e(r ) ⋅ ∇ε (r ) + ik 0ε (r )e(r ) ⋅ ∇S (r )] exp[ik 0 S (r )] . (4.52d) Now combining (4.47) and (4.52) gives e( r ) ⋅ ∇S (r ) = − 1 [∇ ⋅ e(r ) + e(r ) ⋅ ∇ ln ε (r )] , ik 0 (4.53a) h (r ) ⋅ ∇S (r ) = − 1 [∇ ⋅ h(r ) + h(r ) ⋅ ∇ ln µ (r )] , ik 0 (4.53b) 1 ∇ × e(r ) , ik 0 (4.53c) 1 ∇ × h (r ) . ik 0 (4.53d) ∇S (r ) × e(r ) − cµ (r )h(r ) = − ∇S (r ) × h(r ) + cε (r )e(r ) = − In the short-wavelength limit (large k0), we can neglect the right-hand sides of (4.53), provided that the multiplicative factors of 1/ik0 are not extremely large (Born & Wolf, 1999). Thus, we have e( r ) ⋅ ∇S ( r ) = 0 , (4.54a) h (r ) ⋅ ∇S ( r ) = 0 , (4.54b) ∇S (r ) × e(r ) − cµ (r )h(r ) = 0 , (4.54c) 139 ∇S (r ) × h(r ) + cε (r )e(r ) = 0 . (4.54d) Solving for h(r ) in (4.54c) and substituting into (4.55d) leads to ∇S (r ) × [∇S (r ) × e(r )] + c 2 µ (r )ε (r )e(r ) = 0 , (4.55) and applying the vector identity, v1 × ( v1 × v 2 ) = v1 ( v1 ⋅ v 2 ) − v 2 ( v1 ⋅ v1 ) , (4.56) ∇S (r )[e(r ) ⋅ ∇S (r )] − e(r ) | ∇S (r ) |2 + µ (r )ε (r )c 2e(r ) = 0 . (4.57) results in The first term vanishes due to (4.54a), so we are finally left with the eikonal equation, | ∇S (r ) |2 = [n(r )]2 , (4.58) or equivalently, 2 2 2 ∂S ∂S ∂S 2 + + = n ( x, y, z ) . x y z ∂ ∂ ∂ (4.59) 140 The surfaces over which S (r ) is constant are called geometrical wavefronts or geometrical wave surfaces, therefore, the eikonal equation relates these surfaces to only the refractive index function of the medium. 4.3.2 Differential equation of light rays Orthogonal trajectories to geometric wavefronts are referred to as geometrical light rays. If r ( s ) is the position vector of an arbitrary point on a ray and s is the line element along a ray path s, then dr/ds = s , so that the equation of the ray is given by n (r ) dr = ∇ S (r ) . ds (4.60) Although (4.60) describes the ray in terms of the function S (r ) , we can obtain a differential equation of the ray in terms of just the refractive index function. We do this by differentiating (4.60) with respect to s, giving d dr d n(r ) = [∇S (r )] ds ds ds = dr ⋅ ∇[∇S (r )] ds = 1 ∇S (r ) ⋅ ∇[∇S (r )] n(r ) 141 = 1 ∇[∇S (r )]2 2 n (r ) = 1 ∇[n(r ) 2 ] 2 n (r ) so that d dr n ( r ) = ∇n ( r ) . ds ds (4.61) In a homogeneous medium, n = constant and (4.61) reduces to d 2r = 0, (4.62) r = sa + b , (4.62) ds 2 which has the solution where a and b are constant vectors. Therefore, light rays in a homogeneous medium propagate in straight lines. 142 4.3.3 Refraction and reflection So far, we have considered the behavior of light rays in media with a continuously varying refractive index function n(r ) . Now we introduce a surface discontinuity in n(r ) , that is, a planar interface that separates two media with different refractive indices. Let n̂ be the unit vector normal to the interface and κˆ inc and κˆ tr be unit vectors parallel to the incident and transmitted wavevectors, respectively. If the incident ray propagates through a medium with refractive index n1, and the transmitted ray through index n2, then the law of refraction is written as (Barrett & Myers, 2004; Born & Wolf, 1999; Stavroudis, 1972) n1 (κˆ inc × nˆ ) = n2 (κˆ tr × nˆ ) . (4.63) This is Snell’s law in vector form; unlike the scalar version of this law, (4.63) is not specific to a coordinate system, rendering it very useful in optical design programs. We can make two important observations from Snell’s law. Firstly, the tangential component of the ray vector nκˆ is continuous across the interface, or equivalently, the vector N12 = n2κˆ tr − n1κˆ inc is normal to the interface. Secondly, the refracted ray lies in the same plane as both the incident ray and the normal to the surface, called the plane of incidence (Born & Wolf, 1999). Suppose we know n1, n2, and n̂, and would like to determine κˆ tr for a given κˆ inc . We can do this with an alternative version of Snell’s law, particularly, by forming an 143 orthonormal basis for the plane of incidence. The vectors n̂ and κˆ inc are linearly independent, except at normal incidence, and constitute a basis for the plane of incidence; however, they are not orthonormal. Using Gram-Schmidt orthogonalization (Arfken & Weber, 2001), we can construct an orthonormal basis containing n̂ and an orthonormal vector, nˆ ⊥ = κˆ inc − (κˆinc ⋅ nˆ )nˆ 1 − (κˆinc ⋅ nˆ ) 2 , (4.64) such that nˆ ⋅ nˆ ⊥ = 0 and | nˆ ⊥ |2 =0. Since n̂⊥ is normal to n̂, which is normal to the interface, we see that n̂⊥ lies in the interface plane. More specific, n̂⊥ coincides with the intersection between the interface plane and the plane of incidence (Barrett & Myers, 2004). In terms of the orthonormal vectors n̂ and n̂⊥ , the constraint that both κˆ inc and κˆ tr lie in the plane of incidence is expressed as (nˆ ⊥ × nˆ ) ⋅ κˆ inc = (nˆ ⊥ × nˆ ) ⋅ κˆ tr = 0 . (4.65) Snell’s law can be rewritten using n̂⊥: n1 (κˆ inc ⋅ nˆ ⊥ ) = n2 (κˆ tr ⋅ nˆ ⊥ ) . (4.66) 144 Finally, we have an expression for κˆ tr that satisfies both (4.65) and (4.66): 2 n1 n1 2 κˆ tr = (κˆ inc ⋅ nˆ ⊥ ) nˆ ⊥ + 1 − (κˆ inc ⋅ nˆ ⊥ ) nˆ n2 n2 2 n1 n1 2 = [κˆ inc − (κˆ inc ⋅ nˆ ) nˆ ] + 1 + (κˆ inc ⋅ nˆ ) − 1 nˆ . n2 n2 [ ] (4.67) The corresponding equation for the reflected wavevector is given by κˆ refl = (κˆinc ⋅ nˆ ⊥ )nˆ ⊥ − 1 − (κˆinc ⋅ nˆ ⊥ ) 2 nˆ = (κˆ inc ⋅ nˆ ⊥ )nˆ ⊥ − (κˆ inc ⋅ nˆ )nˆ , (4.68) which lies in the plane of incidence, along with κˆ inc , κˆ tr , and n̂ (Barrett & Myers, 2004). From (4.68), we see that the angle of reflection equals the angle of incidence. These last two results summarize the law of reflection. 4.4 Diffraction by a planar aperture 4.4.1 A brief history of diffraction theory Sommerfeld (1954) defined the term diffraction as “any deviation of light rays from rectilinear paths which cannot be interpreted as reflection or refraction.” Diffraction 145 results from the lateral confinement of a wave, and the effect is greatest when the size of the confinement is comparable to the wavelength of radiation involved. There is a rich history regarding the discovery and evolution of diffraction theory, which will be described briefly here. The first advocate of the wave theory of light was Christian Huygens (1678), who expressed intuitively that if each point on a wavefront gave rise to a secondary diverging spherical wave, then the wavefront at a later instant would be the envelope of the secondary wavelets. It was not until 140 years later in 1818 that Augustin Jean Fresnel made various assumptions about the amplitudes and phases of the secondary wavelets, and was able to accurately predict the distribution of light in diffraction patterns by allowing the wavelets to interfere with each other. The principle of interference was established by Thomas Young (1802). A significant step in the evolution of the wave theory of light occurred in 1860 when Maxwell identified light as an electromagnetic wave phenomenon. In 1882, Gustav Kirchhoff formed a stronger mathematical framework from the combined ideas of Huygens and Fresnel, but he based his theory on two assumptions regarding the boundary values of the wave impinging on an obstacle placed in the path of propagation. However, inconsistencies in these assumptions were later demonstrated by Henri Poincaré (1892) and Arnold Sommerfeld (1896). Sommerfeld then modified the Kirchhoff theory by abandoning one of Kirchhoff’s assumptions dealing with the light amplitude at the boundary; he accomplished this through the utilization of Green’s functions. The results 146 became known as the Rayleigh-Sommerfeld diffraction theory, which is well accepted for dealing with particular problems in optics. One should be aware, however, that the Kirchhoff and Rayleigh-Sommerfeld theories involve broad simplifications and approximations. The most consequential simplification is that the vectorial nature of electromagnetic waves is ignored, so that light is treated simply as a scalar phenomenon (Goodman, 2005). Nonetheless, the scalar theory still offers great accuracy provided that two conditions are satisfied: the size of the diffracting aperture must be much greater than a wavelength, and the fields must be observed sufficiently far from the diffracting aperture (Silver, 1962). Both of these conditions will be met for the problems described in this chapter. For a broad overview of diffraction theory, see Baker and Copson (1949) and Bouwkamp (1954). 4.4.2 Geometry of the problem In optics and imaging, we are often interested in diffraction of light by an open planar aperture in an otherwise opaque screen. As illustrated in Fig. 4.1, a wave is assumed to impinge on the aperture from the left and the field is calculated at the arbitrary point r in the observation plane, which lies in the z plane. An arbitrary point in the aperture plane is denoted r0. For convenience, the aperture is placed in the z = 0 plane. Since we know which planes are the input and output planes, it suffices to use the 2D vectors r0 and r instead of their 3D counterparts, r0 and r, respectively. 147 Fig. 4.1: Geometry for diffraction by a planar aperture (Barrett & Myers, 2004). In this geometry, the distance between the points r0 and r, denoted R, is given by R = | r − r0 |2 + z 2 , (4.69) | r − r0 |2 = ( x − x0 ) 2 + ( y − y0 ) 2 . (4.70) where we have the expansion, We define θ as the angle between r − r0 and the z axis, so that cosθ = z 2 | r − r0 | + z 2 = z 2 2 ( x − x0 ) + ( y − y 0 ) + z 2 . (4.71) 148 4.4.3 Huygens-Fresnel principle According to Rayleigh-Sommerfeld diffraction theory, the Huygens-Fresnel principle can be stated mathematically as u z (r ) = 1 exp(ikR) , d 2 r0 uinc (r0 )t ap (r0 ) cos θ iλ R ∫ (4.72) ∞ where u z (r ) is the field evaluated on a plane of fixed z for all (x,y) and tap (r ) is the amplitude transmittance of the aperture: 1 if r lies in the clear aperture t ap (r ) ≡ 0 if r is behind the opaque screen. (4.73) Built into (4.72) are two basic approximations, namely, the approximation inherent in scalar diffraction theory and the radiation approximation, which states that the observation distance is many wavelengths from the aperture, z >> λ (Goodman, 2005). The Huygens-Fresnel principle states that the observed field u z (r ) is a superposition of secondary diverging spherical waves exp(ikR)/R called Huygens’ wavelets emanating from every point r0 within the aperture. Each wavelet has a 90° phase shift relative to the incident wave, as expressed in the factor 1/i, as well as a directivity pattern, or obliquity factor, cosθ. The amplitude of each wavelet is proportional to the amplitude of the excitation uinc (r0 ) at the respective point in the aperture. 149 Note that (4.72) is readily seen as a convolution integral, since R and cosθ are both functions of r − r0 , written symbolically as u z (r ) = [uinc (r )t ap (r )] ∗ p z (r ) , (4.74) where p z (r ) is the 2D point spread function (PSF) for propagation, 1 z p z (r ) = iλ r 2 + z 2 exp ik r 2 + z 2 , r2 + z2 (4.75) and r = |r|. The input to the convolution is simply the incident field after being modified by the aperture, u0 (r ) ≡ uinc (r )t ap (r ) , (4.76) so that (4.72) becomes u z (r ) = 1 exp(ikR) d 2 r0 u0 (r0 ) cos θ . iλ R ∫ (4.77) ∞ The ability to express the Huygens-Fresnel principle as a convolution integral is a direct consequence of the linearity and shift-invariance of the diffraction operation, at 150 least as a 2D mapping from the aperture plane to a parallel plane some distance away. Thus, if u0 (r ) and the observation point are shifted together, the result is the same. If we are only interested in points close to the z-axis, we can apply the paraxial approximation, according to which cosθ ≈ 1. Using this approximation, (4.75) results in 1 exp ik r 2 + z 2 , iλ z (4.78) 1 d 2 r0 u0 (r0 ) exp ik | r − r0 |2 + z 2 . iλ z (4.79) p z (r ) ≈ and (4.77) becomes u z (r ) ≈ ∫ ∞ To be clear, the exponential factor in (4.79) still represents a spherical wave originating from r0 in the z = 0 plane, observed at r in the z plane. The next subsection deals with approximating this factor, which requires much carefulness, as k is often a very large number (on the order of 105 cm-1 at optical wavelengths). 4.4.4 Fresnel diffraction To reduce (4.79) to a more practical form, we now introduce an approximation to the distance R through a binomial expansion for z > | r − r0 | , so that R = | r − r0 |2 + z 2 = z + | r − r0 |2 | r − r0 |4 − + ... . 2z 8z 3 (4.80) 151 Therefore, we can rewrite the exponential factor in (4.79) as ) ( | r − r0 |2 | r − r0 |4 ... . (4.81) exp − ik exp ik | r − r0 |2 + z 2 = exp(ikz ) exp ik 2 z 8 z 3 We can disregard the quartic term and higher in (4.80) if k | r − r0 |4 π , 4 (4.82) | r − r0 |4 << λz 3 . (4.83) 8z 3 << or equivalently, If this condition is satisfied, (4.79) becomes u z (r ) ≈ | r − r0 |2 exp(ikz ) = u0 (r ) ∗ p z (r ) , d 2 r0 u0 (r0 ) exp iπ λ iλ z z ∫ (4.84) r2 exp(ikz ) p z (r ) ≈ exp iπ . λz iλ z (4.85) ∞ and the 2D PSF is reduced to 152 Equations (4.84) and (4.85) reflect the Fresnel approximation, in which the PSF is a constant (i.e., independent of r) multiplied by a quadratic phase exponential (Barrett & Myers, 2004). Thus, the spherical wavefronts observed on a plane are now approximated by parabaloids. The region where this approximation holds is called the near field of the aperture. The Fresnel diffraction integral (4.84) is readily converted into a Fourier transform by substituting the term, | r − r0 |2 = r 2 + r02 + 2r ⋅ r0 , (4.86) which yields u z (r ) ≈ r2 r2 r ⋅ r0 exp(ikz ) exp iπ d 2 r0 u0 (r0 ) exp iπ 0 exp − 2πi . λ iλ z z z λ λz ∫ (4.87) ∞ Therefore, (4.87) is immediately seen as the 2D Fourier transform of the product u0 (r0 ) exp(iπr02 / λz ) , with the spatial frequency given by ρ = r / λz : u z (r ) ≈ r 2 r 2 exp(ikz ) exp iπ F2 u0 (r0 ) exp iπ 0 λz λz iλ z . (4.88) ρ = r/λz Despite the transform, the output u z (r ) is related to the input u0 (r0 ) in the space domain; substituting ρ = r / λz converts things from the frequency domain back into the 153 space domain (Barrett & Myers, 2004). The Fourier transform is simply more convenient than performing a spatial convolution with the quadratic phase factor. 4.4.5 Fraunhofer diffraction We will consider a more stringent approximation, which is more difficult to satisfy, but will further simplify calculations when valid. Suppose the clear aperture fits into a circle of radius a, that is, r0 < a for all r0 in the range of integration in (4.88). Now, if z >> a 2 / λ , then we can approximate exp(iπr02 / λz ) by unity, so that (4.88) results in u z (r ) ≈ r2 exp(ikz ) exp iπ F2 {u0 (r0 )}ρ = r/λz , λz iλ z z >> a 2 / λ . (4.89) When the approximation z >> a 2 / λ is valid, we are said to be in the Fraunhofer zone or far field. At optical frequencies, the condition of validity can be quite demanding. For instance, at a wavelength of 0.6 µm and aperture diameter of 2.5 cm, the observation distance must satisfy z >> 260 meters. Of course, the greater the distance, the better accuracy in the approximation. The irradiance of the diffraction pattern, denoted I(r), is defined as the optical power per unit area incident on a surface. It can be shown that the irradiance is proportional to | u (r ) |2 , under certain assumptions, with a proportionality constant that 154 relates to the physical interpretation of u (r ) (Barrett & Myers, 2004). Here we will disregard the constant and let I (r ) = | u (r ) |2 , so that 2 1 I (r ) = u z (r ) ≈ 2 2 λ z 2 ∫ ∞ d 2 r0 u 0 (r0 ) exp(− 2πir ⋅ r0 / λz ) 2 1 r = 2 2 U 0 , (4.90) λ z λz where U 0 ( ρ) is the 2D Fourier transform of u0 (r ) . We see from (4.90) that the irradiance is proportional to the squared modulus of the Fourier transform of the input field. 155 CHAPTER 5 INVERSE OPTICAL DESIGN OF THE HUMAN EYE USING LIKELIHOOD METHODS AND WAVEFRONT SENSING In the preceding chapters, we provided the theoretical building blocks of this research, including concepts in estimation theory, global optimization methods with an emphasis on simulated annealing, and relevant topics in geometrical optics and diffraction theory. This chapter presents our results for the first of three applications that utilize the mathematical framework developed in the previous chapters. Specifically, it deals with the original motivation of our research, which is to develop a new approach to studying the human eye by estimating the complete set of ocular parameters for a given patient. We begin in Section 5.1 by describing the basic optical components of the eye that are integral to the remainder of the chapter. Section 5.2 provides an overview on schematic eye models and their usefulness in evaluating the optical properties of the eye, followed by an algebraic method for ray-tracing through an arbitrary eye model. Section 5.3 discusses the fundamental concepts in wavefront sensor technology and why we chose not to perform traditional wavefront sensing. In Section 5.4, we provide the details on our data acquisition system and optical-design program. We then present several results from the program, including the final WFS data to be used as input to inverse optical design. 156 In Section 5.5, we explore the Fisher information matrices and Cramér-Rao lower bounds for different configurations of the imaging system to demonstrate the impact of changes in the system parameters. In Section 5.6, we get a feel for the behavior of the probability surface, or the objective function to be optimized, as we vary the parameters of the eye. The final ML estimation results from a series of trials using simulated annealing are provided in Section 5.7. We conclude in Section 5.8 with a set of ideas for future work related to practical application. In particular, we consider various enhancements to our optical design program, primarily to increase model robustness, since performing ML estimation on real data requires a vastly accurate probability model. 5.1 Basic anatomy of the human eye The human eye is an extraordinary and highly complex organ with many integrated structures and dynamic, working parts. It is responsible for receiving light and converting it into an electrical signal, which follows the visual pathway to the brain where visual perception takes place. Light first enters the cornea, a transparent layer forming the front of the eye, followed by a cavity filled with a clear fluid, called the aqueous humor, which occupies the anterior chamber and provides necessary nutrients to the cornea (Fig. 5.1). It then passes through the pupil, an opening in the opaque iris, with variable size to regulate the amount of light that eventually reaches the retina. Behind the iris is the crystalline lens, a transparent biconvex structure, whose shape is controlled by ciliary muscles at its edge. 157 Light then travels through the vitreous humor, a clear gelatinous substance that fills the central chamber of the eye. The final destination is the retina, a curved surface in back of the eye that is densely covered with nearly 130 million light-sensitive photoreceptors. These photoreceptors convert photons to an electrochemical neural signal, which leaves the eye via the optic nerve to the visual centers of the brain, where the optical information is processed (Palmer, 1999). This process is referred to as visual phototransduction. Fig. 5.1: Basic anatomy of the human eye, as seen through a cross-sectional view. The lens plays a chief role in proper image formation due to its variable focusing ability, which is achieved by changing the shape of the lens, a process called accommodation. Light from distant objects is brought into focus on the retina when the 158 ciliary muscles are relaxed, resulting in a thin lens. To focus on nearby objects, the ciliary muscles are contracted, so that the lens is thicker and provides more optical power. The total optical power of a relaxed eye, for which the focal length is longest, is about 60 diopters. Roughly two-thirds of this power is provided by the air-cornea interface, and the remaining one-third by the crystalline lens. As the ciliary muscles contract, the lenticular power, or power of the lens, increases and the total focal length of the eye decreases. It has been well known since 1909 that the crystalline lens has a graded-index (GRIN) distribution, which increases not only the refractive power of the lens, but also the degree of accommodation (Gullstrand, 1962). There has been much interest in recent years for more accurate measurements and mathematical models to have greater understanding of lens functionality, for instance, regarding the distribution changes with accommodation (short-term) and age (long-term). Certain models approximate the lens with a shell structure, involving concentric iso-indical surfaces of constant index with the maximum value at the center (Atchison & Smith, 1995; Goncharov & Dainty, 2007; Navarro et al., 2007, 2007a). Due to limited in vivo experimental data, however, there is plenty of debate surrounding how to best model the index changes with radial and axial position. The goal of the research by Navarro et al. (2007, 2007a) is to develop an adaptive model with adjustable parameters, so that individual data can be fitted for a range of ages and accommodation levels. There are two distinct classes of photoreceptors in the retinal layer, rods and cones, which differ in shape and functionality. Rods are usually longer, narrower, and 159 have straight, rod-like ends, while cones are shorter, wider, and have tapered, cone-like ends. Rods are much more abundant, with about 120 million cells distributed all throughout the retina except for the very center, called the fovea. They are extremely sensitive to light and used only for vision under scotopic conditions, that is, at very low light levels. On the other hand, there are about 8 million cones scattered throughout the retina, but with a heavy concentration in the fovea. Cones are much less sensitive to light and are designed for most normal lighting, or photopic, conditions. They are solely responsible for color vision (Palmer, 1999). Interestingly, cone photoreceptors in the human eye exhibit a directional sensitivity to light, in that marginal rays passing through the periphery of the pupil (offaxis light) are perceived as less intense than rays passing through the center of the pupil (axial light), a phenomenon called the Stiles-Crawford effect (SCE) (Stiles & Crawford, 1933). Cones essentially act like microscopic waveguides, funneling light from one end to another, with an associated effective acceptance angle of approximately 5 degrees. The SCE is presumably advantageous to visual performance by ameliorating the effects of defocus and aberrations for large pupils, although there have not been many theoretical or experimental studies to verity this (Atchison, Scott, Joblin, & Smith, 2000). If this evolutionary strategy is true, however, it would only pay off under photopic conditions; but this is precisely when cones are more sensitive. Conversely, rod photoreceptors are not as directionally sensitive as cones and have larger acceptance angles, as they are designed for dim light and cannot afford to waste photons. 160 The theoretical analysis of the physical properties of photoreceptors, using electromagnetic principles, and their influence on the SCE is a very complicated problem, even with many simplifying assumptions. It combines the directional characteristics and relative orientation of individual receptors, as well as the light leakage or cross-talk between cones (He, Marcos, & Burns, 1999). The simplest mathematical representation of this phenomenon incorporates an apodization in the pupil plane, as conceived by Westheimer (1959) and developed by Metcalf (1965). For the greatest simplicity, however, the retina is often treated as a perfect Lambertian reflector. 5.2 Ray-tracing through a schematic eye Ray-tracing through schematic eye models has vast utility in ophthalmology and vision science for evaluating the optical properties of normal and pathologic eyes. We adopted an algebraic method for non-paraxial ray-tracing through an optical system containing aspheric surfaces to second-order, in which a surface is represented by a 4 × 4 matrix. (Langenbucher, Viestenz, Viestenz, Brünner, & Seitz, 2006). The advantages of using second-order, or quadric, surfaces is that the ray-surface intersection, surface normal vector, and direction of the refracted ray can be determined analytically. We applied this matrix-based approach to the Navarro wide-angle schematic eye and incorporated it into our inverse design system for estimating ocular parameters Schematic eye models can vary greatly in terms of their complexity. The earliest eye models were developed over a century ago by Gullstrand (1962) and Von Helmholtz (1910). These models integrated spherical surfaces for the cornea and lens determined 161 from clinical data in order to predict first-order optical properties. More recent models have been used to simulate optical functions including retinal illumination (Kooijman, 1983), chromatic aberration (Thibos & Bradley, 1999), and retinal image formation (Camp, Maguire, Cameron, & Rob, 1990a). Kooijman (1983) utilized aspheric surfaces for the cornea, lens, and retina into a model to study retinal illumination, while the Indiana eye was developed to model chromatic aberration of the eye (Thibos & Bradley, 1999). Camp et al. (1990a, 1990b) integrated corneal topography data into a single refracting surface to model the optical imaging properties of the anterior corneal surface. Eye models that emphasize anatomical accuracy incorporate a GRIN distribution for the crystalline lens, or even non-axially-symmetrical features, such as decentered lenses or pupils, which can have a strong impact optical performance. However, the large number of parameters involved in these advanced models can render them impractical, so there remains much interest in simplified, reduced schematic eyes. Reduced models are often rotationally symmetric and utilize an effective refractive index for the lens, making them more amenable to ray-tracing. Reduced models are generally able to reproduce certain ocular aberrations, such as axial spherical aberration (El Hage & Berny, 1973; Lotmar, 1971; Thibos, Ye, Zhang, & Bradley, 1997) or chromatic aberration (Thibos, Ye, Zhang, & Bradley, 1992). The Navarro wide-angle schematic eye is based on anatomical data from clinical measurements and contains four centered quadric refracting surfaces with rotational symmetry, plus a spherical image surface representing the retina (Navarro et al., 1985). Therefore, we are currently utilizing an effective refractive index for the crystalline lens, 162 but will later incorporate a GRIN distribution according to models suggested in the latest vision science studies (Goncharov & Dainty, 2007; Navarro, Palos, & González, 2007, 2007a). This model has minimal complexity, but on average, it can accurately predict optical performance across the visual field, including longitudinal and transverse chromatic aberration (Escudero-Sanz & Navarro, 1999). Table 5.1 provides a complete parametric description of the eye model, including radii, thicknesses, conic constants, and refractive indices. To calculate refractive indices at 780 nm, we applied the chromatic dispersion model developed by Atchison and Smith (2005) to the reference values provided by Navarro et al. (2007, 2007a). Table 5.1: Navarro wide-angle schematic eye model at λ = 780 nm. Surface Radius [mm] Thickness [mm] Conic Constant Refractive Index Optical Medium Anterior cornea 7.72 0.55 -0.26 1.3729 Cornea Posterior cornea 6.50 3.05 0 1.3329 Aqueous Pupil Infinity 0 0 N/A Aqueous Anterior lens 10.20 4.00 -3.1316 1.4138 Lens Posterior lens -6.00 16.3203 -1.0 1.3311 Vitreous Retina -12.00 N/A 0 N/A N/A We implemented the Navarro wide-angle model for this study, but varied the values of the parameters to emulate a realistic eye (Table 5.2). In addition, we decentered 163 the lens by 0.20 mm in the horizontal direction and -0.10 mm in the vertical direction, which is consistent with the experimental range of lens misalignments (Rosales & Marcos, 2006). Table 5.2: Geometry of eye model used to generate WFS data. Surface Radius [mm] Thickness [mm] Conic Constant Refractive Index Decentration (X, Y) [mm] Anterior cornea 7.46 0.554 -0.24 1.3729 (0, 0) Posterior cornea 6.38 3.37 0 1.3329 (0, 0) Pupil Infinity 0 0 N/A (0, 0) Anterior lens 10.85 4.09 -3.1304 1.4138 (0.2, -0.1) Posterior lens -5.92 16.40 -0.97 1.3317 (0.2, -0.1) Retina -12.00 N/A 0 N/A (0, 0) Description of refracting quadric surfaces A quadric surface S(x,y,z) is implicitly defined by S ( x, y, z ) = Ax 2 + By 2 + Cz 2 + 2 Dxy + 2 Eyz + 2 Fxz + 2Gx + 2 Hy + 2 Iz + K = 0 , (5.1) 164 where x, y, and z are the Cartesian coordinates and A, B, C, D, E, F, G, H, I, K are coefficients. In the matrix method developed by Langenbucher et al. (2006), (5.1) is written as x t Sx = 0 , (5.2) where x is the generalized coordinate vector, x y x = , z 1 (5.3) and S describes the quadric surface in matrix form, A D F D B E S= F E C G H I G H . I K (5.4) For instance, suppose we have a convex conic surface whose apex is at the origin and axis of rotation coincides with the z-axis. The surface sag is given by z= x2 + y2 R 1 + 1 − (1 + κ ) x2 + y2 R2 , (5.5) 165 with conic constant κ and apical radius R. Generally, a paraboloid is described by κ = −1, an ellipsoid by −1 < κ < 0, a sphere by κ = 0, and a hyperboloid by κ > 0. After rearranging (5.5), we have S conic ( x, y, z ) = x 2 + y 2 + (1 + κ ) z 2 − 2 Rz = 0 , (5.6) so that the matrix form of the surface is written as S conic 1 0 = 0 0 0 0 1 0 0 1+ κ 0 −R 0 0 . − R 0 (5.7) Translating a quadric surface When the quadric surface is shifted by a translation vector xT = (xT,yT,zT), (5.1) becomes S ( x, y, z ) = A( x − xT ) 2 + B( y − yT ) 2 + C ( z − zT ) 2 + 2 D( x − xT )( y − yT ) + 2 E ( y − yT )( z − zT ) + 2 F ( x − xT )( z − zT ) + 2G ( x − xT ) + 2 H ( y − yT ) + 2 I ( z − zT ) + K = 0 , or equivalently, (5.8) 166 (x − x T )t S(x − x T ) = x t S T x = 0 . (5.9) The elements of the translated surface matrix ST in (5.9) are given by AT = A, BT = B, CT = C , DT = D, ET = E , FT = F , GT = G − AxT − DyT − FzT , H T = H − DxT − ByT − EzT , I T = I − FxT − EyT − CzT , KT = K + AxT2 + ByT2 + CzT2 + 2 EyT zT + 2 FxT zT − 2GxT − 2 HyT − 2 IzT . (5.10) We can now use (5.9) to generate the surface matrices for the quadric surfaces described in Table 5.2, including the decentration of the lens. If the axes of rotation are parallel to the z-axis and the apex of the anterior corneal surface is at the origin, we have S cornea,anterior 1 0 = 0 0 , 0 0.76 − 7 .46 0 − 7 .46 0 0 0 0 1 0 0 (5.11a) 167 S cornea,posterior S lens,anterior S lens, posterior S retina 1 0 = 0 0 , 0 1 − 6.93 0 − 6.93 7.38 0 0 0 1 0 0 (5.11b) 0 0 − 0.20 1 0 1 0 0.10 = , 0 − 2.1304 − 2.49 0 − 0.20 0.10 − 2.49 52.40 (5.11c) 0 0 − 0.20 1 0 1 0 0.10 = , 0 0.03 5.68 0 − 0.20 0.10 5.68 − 92.91 (5.11d) 1 0 0 . 0 1 − 12.41 0 − 12.41 10.11 (5.11e) 1 0 = 0 0 0 0 0 Similar matrices are determined for the second pass through the ocular, after light is reflected from the retina and advances toward the detector: S retina 1 0 = 0 0 , 0 1 − 12.00 0 − 12.00 0 0 0 0 1 0 0 (5.12a) 168 S lens, posterior S lens,anterior 0 0 0.20 1 0 1 0 0.10 = , 0 0.03 − 6.41 0 0.20 0.10 − 6.41 202.29 (5.12b) 0 0 0.20 1 0 1 0 0.10 = , 0 − 2.1304 54.50 0 0.20 0.10 54.50 − 1339.01 (5.12c) S cornea,posterior 1 0 = 0 0 1 0 0 , 0 1 − 17.48 0 − 17.48 264.85 (5.12d) S cornea,anterior 1 0 = 0 0 1 0 0 , 0 0.76 − 11.09 0 − 11.09 − 345.70 (5.12e) 0 0 0 0 0 0 Determining the surface normal vector The normal vector to the quadric surface described in (5.1), denoted n, can be determined analytically by taking the gradient of S(x,y,z), n = ∇S ( x, y , z ) , so that the components are given by (5.13) 169 ∂S ( x, y, z ) = 2 Ax + 2 Dy + 2 Fz + 2G , ∂x (5.14a) ∂S ( x, y, z ) = 2 Dx + 2 By + 2 Ez + 2 H , ∂y (5.14b) ∂S ( x, y, z ) = 2 Fx + 2 Ey + 2Cz + 2 I . ∂z (5.14c) We can rewrite (5.14) in matrix notation, x G y H = 2S(1 : 3, 1 : 4)x , z I 1 A D F n = D B E F E C (5.15) where S(1 : 3, 1 : 4) is the upper 3×4 submatrix of S. Finally, the unit normal vector is determined in the usual way, nˆ = n . n ⋅n Determining the ray-surface intersection A ray is simply characterized by the coordinates of its origin x 0 = ( x0 , y0 , z0 ) and a direction vector x d = ( xd , yd , zd ) , (5.16) 170 xd x0 x = x 0 + kx d = y0 + k yd , zd z0 (5.17) where k is the scalar propagation constant. The intersection of an arbitrary ray with a quadric surface can be determined analytically with the quadratic equation. Substituting (5.17) into (5.1) leads to − b ± b 2 − 4ac k= = k1 , k 2 , 2a (5.18) with the coefficients a, b, and c given by a = Axd2 + By d2 + Cz d2 + 2 Dxd y d + 2 Ey d z d + 2 Fxd z d , (5.19a) Ax0 xd + By0 yd + Cz0 zd + Gxd + Hyd + Iz d b=2 , + D( x0 yd + y0 xd ) + E ( y0 z d + z0 yd ) + F ( x0 z d + z0 xd ) (5.19b) c = Ax02 + By02 + Cz02 + 2 Dx0 y0 + 2 Ey0 z0 + 2 Fx0 z0 + 2Gx0 + 2 Hy0 + 2 Iz0 + K . (5.19c) Note the following special cases and how to address them: c a=0 → k =− , b (5.20a) 171 b 2 − 4ac < 0 → ray - surface intersection is imaginary , a ≠ 0 → k is smallest positive value of k1 and k 2 . (5.20b) (5.20c) Direction of the transmitted ray We saw in Section 4.3 that Snell’s law in vector form yields an expression for the transmitted wavevector (ray) at a refracting interface, xˆ ′d = xˆ d 2 n [xˆ d − (xˆ d ⋅ nˆ ) nˆ ] + 1 + n [(xˆ d ⋅ nˆ ) 2 − 1] nˆ , n′ n′ (5.21) where x̂ d and x̂′d are unit vectors parallel to the incident and transmitted rays, respectively, and n and n′ are the corresponding refractive indices on both sides of the interface. When using (5.21) in an optical-design program, it is imperative to verify that the transmitted ray propagates in the +z direction, provided that the rays travel from left to right. If not, we must negate the unit normal vector n̂ in this equation, resulting in xˆ ′d = xˆ d 2 n [xˆ d − (xˆ d ⋅ nˆ ) nˆ ] − 1 + n [(xˆ d ⋅ nˆ ) 2 − 1] nˆ . n′ n′ (5.22) 172 Figure 5.2 illustrates the eye model corresponding to the parameters in Table 5.2, with an 8-mm pupil and rays from a collimated, on-axis source. A close-up of the focal region emphasizes the spherical aberration in this schematic eye. Fig. 5.2: Geometrical eye model corresponding to parameters in Table 5.2, with an on-axis source and 8-mm pupil to demonstrate spherical aberration. 5.3 Shack-Hartmann wavefront sensors The Shack-Hartmann wavefront sensor (SHWFS) is a simple optical instrument used to characterize aberrations in an imaging system. It is a technological advancement of the Hartmann screen test, which was originally developed for optical testing by German 173 astrophysicist Johannes Hartmann at the turn of the 20th century (Schwiegerling & Neal, 2005). The SHWFS was invented in the 1960s for applications in astronomy to improve the resolution in images from ground-based telescopes, since the resolution is compromised by atmospheric turbulence (Platt & Shack, 2001). As the technology reached greater sophistication, alternative applications using these sensors received considerable impetus from the expertise developed by astronomers, including applications in ophthalmology, quality laser beam measurement, and optical system alignment (Neal, Copland, & Neal, 2002). The SHWFS contains a two-dimensional lenslet array for measuring distortions in a wavefront, providing valuable information about aberrations in an optical system. The lenslet array is typically conjugated to the pupil plane of the system. Upon sampling an incoming wavefront, the lenslets produce a grid of focused spots on a CCD placed some distance behind the array, normally in the focal plane. Note that this assumes a locally uniform wavefront over each lenslet, which we will discuss in detail in Section 5.3.1. We know from basic Fourier theory that the amount of displacement of each spot from its ideal, on-axis location is proportional to the average local wavefront slope at the respective lenslet. Thus, if the wavefront is perfectly uniform (and normally incident), there is no shift in spots and we observe the focal plane image shown in Figure 5.3, while Figure 5.4 depicts the same image for an aberrated wavefront. 174 Fig. 5.3: Shack-Hartmann WFS measuring a perfect incoming wavefront. Fig. 5.4: Shack-Hartmann WFS measuring an aberrated incoming wavefront. In classical wavefront sensing, an algorithm processes the detected image and attempts to estimate the centroids (focal-spot positions) produced by each lenslet in the detector plane. The local wavefront slopes are computed from the centroids by 175 comparing to a reference, then the wavefront is reconstructed from the array of wavefront slopes. For a measured M × 1 irradiance distribution g (θ ) , the centroid positions are computed from the first moments: τˆxj (θ ) = ∑ g m (θ ) xm m ∈ AOI, j ∑ g m (θ ) ∑ ≈ g m (θ ) xm m ∈ AOI, j g tot m ∈ AOI, j τˆyj (θ ) = ∑ g m (θ ) y m m ∈ AOI, j ∑ g m (θ ) m ∈ AOI, j , (5.23a) , (5.23b) ∑ ≈ g m (θ ) ym m ∈ AOI, j g tot where the index j corresponds to the lenslet number and the index m is over only those detector elements that receive a signal from that lenslet, in the Area-of-Interest AOI, j. The sum of irradiance values is denoted as g tot , and the average sum as g tot . The wavefront slope distribution is computed by comparing the measured centroids τˆ j = (τˆxj ,τˆ yj ) to those determined by a reference wavefront τ j , ref = (τ xj , ref ,τ yj, ref ) , given by ∂W / ∂x ∂W / ∂y βx 1 τˆ − τ x, ref = ≈ x , j β y j d τˆ y − τ y, ref j (5.24) 176 where W = W(x,y) is the two-dimensional wavefront and d is the distance between the lenslet array and detector, normally equal to the lenslet focal length f. The angled brackets denote the average, as the spot displacements are proportional to the average local wavefront slope. Wavefront reconstruction is accomplished by relating the set of slopes to the wavefront gradient: ∇W = ∂W ∂W xˆ + yˆ . ∂y ∂x (5.25) Since the local derivatives are approximated by the average over the respective lenslet, this can introduce significant errors in the reconstructed wavefront (fitting errors), particularly for larger lenslet areas (Neal, Topa, & Copland, 2001). Different techniques are used to reconstruct the wavefront from the slope measurements, such as direct numerical integration (zonal) or polynomial fitting (modal). Southwell (1980) provides a useful description of these methods. 5.3.1 Centroid estimation and Fisher information Blurred spots produced by the lenslets signify a departure from the assumption of a locally uniform wavefront. In reality there may be wavefront features that are smaller than the lenslet diameter, so that these finer details manifest in the spot profiles (Fig. 5.5). Therefore, both the spot positions and profiles provide indispensable information about aberrations in the optical system, or in our case, the human eye. In an effort to preserve 177 all information, our method does not involve centroid estimation or wavefront reconstruction; the data consist of all detector irradiance values in the focal plane of the WFS, which we refer to as the raw detector outputs. Fig. 5.5: Blurred spot profiles in the focal plane of a Shack-Hartmann WFS. To illustrate the last point, we examined the Fisher information matrix when the data consisted of centroid positions and were used to estimate ocular parameters. If the index i denotes the Cartesian direction in the detector plane and the index j represents the lenslet number, we can write τˆ , if i = 1 τˆij = xj τˆ yj , if i = 2 , (5.26) where i = {1, 2}, j = {1, …, L}, and L is the total number of lenslets. By combining (5.23) and (5.26), we can formulate the average spot position for the jth lenslet: 178 τˆxj (θ ) ≈ ∑ g m (θ ) xm m ∈ AOI, j τˆyj (θ ) ≈ g tot , (5.27a) . (5.27b) ∑ g m (θ ) ym m ∈ AOI, j g tot The FIM components are then expressed as Fkl = = 2 1 στ2 L ∑∑ ∂τˆij (θ ) ∂τˆij (θ ) i =1 j =1 ∂θk ∂θl ∂τˆ (θ ) ∂τˆ (θ ) ∂τˆ (θ ) ∂τˆ (θ ) xj yj yj , xj + ∂θ 2 ∂θl ∂θk ∂θl k στ j =1 1 L ∑ (5.28) where στ2 is the variance in the centroid estimates, and θk and θl denote the kth and lth parameters, respectively. The variance στ2 is attributed to centroid estimation error and is a direct consequence of the electronic noise in detector systems. For a set of N successive measurements, we have 1 στ = N 2 1 L n =1 N ∑ L ∑[(τˆxj −τˆxj )2 + (τˆyj −τˆyj )2 ] j =1 n . (5.29) 179 This simple formula of course assumes that the x and y centroids, as well as the detector elements and individual measurements, are statistically independent (Neal et al., 2002). The centroid estimation error can be measured by performing consecutive measurements of the same true wavefront, then analyzing the centroid positions. Computation of centroids can possibly lead to correlations in the estimates, which would require the variance in (5.29) to be replaced with a suitable covariance matrix. Barrett et al. (2007) suggests a full treatment of the statistical properties of the estimates τˆ j = (τˆxj ,τˆ yj ) , by initially declaring a conditional PDF pr (τˆ j | θ ) . They suggest that a more realistic PDF on the data may be a correlated multivariate normal distribution. The information loss when replacing raw detector outputs with centroid positions as data in the FIM can be demonstrated by observing the increase in the Cramér-Rao lower bounds. 5.4 Data-acquisition system 5.4.1 System configuration The data acquisition system for performing inverse optical design was modeled after a clinical Shack-Hartmann aberrometer for measurement of aberrations in human eyes, developed by Straub, Schwiegerling, and Gupta (2001), as shown in Figure 1.1. In the clinical aberrometer, a narrow collimated beam from a laser diode produces a spot on the retina, which acts as a laser beacon and fills the dilated 6-mm pupil upon reflection (Fig. 1.1). The 30-nm bandwidth of the source, centered at 780 nm, reduces speckle 180 noise in the real system, but we used a single wavelength of 780 nm in our computerized model to minimize computation time. We considered an ideal beamsplitter and illuminated the eye with a Gaussian beam (1 mm at FWHM) at multiple angles of 0, 6, and 12 degrees in the vertical direction to assess both on-axis and off-axis aberrations, thereby increasing the amount of Fisher information in the system. The center of the beam was coincident with the intersection of the optical axis and anterior corneal surface. We treated the aberrated retinal spot as a perfect diffuse scatterer and did not account for scattering within the ocular media or internal reflections. While the clinical configuration uses relay optics to conjugate the exit pupil of the eye to the SHWFS, we simply placed the lenslet array 10 mm away from the corneal apex. Global wavefront tip and tilt were discarded by rotating the sensor for off-axis angles. The lenslets were 0.6 mm in diameter and 24 mm in focal length, and the detector pitch was 8.0 µm. Our configuration is shown in Figure 5.6. 181 Fig. 5.6: Data acquisition system for estimating ocular parameters. 5.4.2 Optical-design program We developed an optical-design program in C that performs non-paraxial ray-tracing through quadric surfaces for ocular parameters comparable to those in the Navarro wideangle schematic eye model, given in Table 5.2. Each detector data set resulted from tracing a 256 × 256 bundle of rays through the double-pass system of the eye, where approximately 60% of the rays survived the double-pass after the diffuse retinal reflection and vignetting at the pupil. The WFS lenslets were treated as ideal thin lenses in our model. Next, we assumed that the system is not photon-starved and used Gaussian statistics to represent electronic noise. We presented in Section 2.8 that for i.i.d. detector 182 elements and noise that is independent of the illumination level, the probability density function (PDF) from which the data are drawn is given by M pr (g | θ ) = ∏ m =1 [ g − g m (θ )]2 exp− m , 2 2 σ 2 2πσ 1 (5.30) where g is the M × 1 vector set of random data, θ is the set of estimable parameters, and σ 2 is the variance in each detector element. Since the noise is zero-mean Gaussian, g m (θ ) is simply the output of the optical-design program. We fixed the variance to obtain a modest peak SNR of 103 and applied noise in the data using a Gaussian random number generator. The geometrical eye model corresponding to the ocular parameters provided in Table 5.2 is illustrated in Figure 5.7, which was generated using our optical design program. Sample rays for the multiple beam angles used in this study α = {0°, 6°, 12°} are also plotted. 183 Fig. 5.7: Geometrical eye model used to generate WFS data, corresponding to ocular parameters in Table 5.2. The trial set of WFS data for the different beam angles, used as input to inverse optical design, is provided in Figures 5.8 − 5.10. In each image, central focal spots are sharp and focused, while spots near the periphery are blurred and smeared due to the worsening of aberrations in this region. smeared for larger off-axis angles. The peripheral spots become increasingly 184 Fig. 5.8: WFS data used as input to inverse optical design, for beam angle α = 0°. 185 Fig. 5.9: WFS data used as input to inverse optical design, for beam angle α = 6°. 186 Fig. 5.10: WFS data used as input to inverse optical design, for beam angle α = 12°. The corresponding focal spots on the retina are shown in Figures 5.11 − 5.13 with the coordinate system centered on the optical axis, so that the position on the retina can 187 be read from the axes. Notice that the spot for beam angle α = 0 is centered at (0.027, 0.056) mm, due to the decentration of the lens (Fig. 5.11). Fig. 5.11: Focal spot on the retina for a source beam angle of α = 0°. As the beam angle increases, off-axis aberrations such as coma and astigmatism become more apparent in the retinal image (Figs. 5.12 & 5.13). 188 Fig. 5.12: Focal spot on the retina for a source beam angle of α = 6°. Fig. 5.13: Focal spot on the retina for a source beam angle of α = 12°. 189 5.5 Fisher information and Cramér-Rao lower bounds Adjustable system parameters to increase Fisher information include the beam size, beam angle, lenslet array geometry, detector element spacing, and variance in the detector elements. Multiple output planes can be combined to form a larger, diversified data set to decouple pairs of parameters in the FIM. Here we present the Fisher information matrices and Cramér-Rao lower bounds for various system configurations. In each case, the changes will be compared to the original system configuration as described in Section 5.4. In this proof-of-principle study, we estimated a reduced set of ocular parameters, including the posterior radius, thickness, and refractive index of the cornea, the thickness and index of the anterior chamber, the anterior and posterior radius, thickness, and equivalent index of the crystalline lens, and the thickness of the vitreous body, for a total of 11 parameters. Thus, the FIM is an 11 × 11 symmetric matrix for each system configuration. This matrix is order-specific, and the indices of the estimated parameters are listed in Table 5.3. Note that the jkth entry has units of units of F jk = 1 . (units of θ j )(units of θ k ) (5.23) In Section 2.7, we derived the FIM components for i.i.d. Gaussian data, given by 190 M 1 ∂g m (θ ) ∂g m (θ ) . F jk = 2 ∂θk σ m =1 ∂θ j ∑ (5.24) We computed the FIM according to (5.24) for the system configuration, provided in Figure 5.14 on a base-10 logarithmic scale, with values ranging from 109.32 to 1016.29. The large values in the FIM indicate that the data are very sensitive to changes in the parameters, however, the magnitudes of off-diagonal entries reveal a high degree of detrimental coupling between pairs of parameters. This is intuitive since first-order geometrical and optical parameters combine to form various optical quantities; for example, refractive index and thickness in optical path length, and curvatures, index, and thickness combine in optical power. Interestingly, the lenticular thickness (p = 6) is much less coupled to the lenticular anterior radius (p = 2) than the posterior radius (p = 3), with F2,6 = 109.50 and F3,6 = 1011.38. The same goes for the lenticular refractive index (p = 10), since F2,10 = 1011.40 and F3,10 = 1013.36. As another example, the structure of the FIM for the four refractive indices, p = {8, 9, 10, 11}, show that the coupling is stronger for pairs (8,11) and (9,10), but weaker for other pairs. 191 Fig. 5.14: FIM for the chosen system configuration (log scale). The inverse of the FIM is shown in Figure 5.15, also displayed on a logarithmic scale. We read off the diagonal entries to determine the CRB for each parameter estimate; the corresponding standard deviations, denoted γ 1 , are given in Table 5.3. These diminutive values permit the estimation of parameters to high precisions even under pessimistic noise levels, since we implemented a peak SNR of only 103, while the CRB can immediately be improved by decreasing the variance in the detector pixels. 192 Fig. 5.15: Inverse of the FIM for the chosen system configuration (log scale). Recall from Chapter 2 that for an efficient estimator, the inverse of the FIM is the covariance matrix of the estimates. Despite the feasibility of the CRB, there is also a considerable amount of off-axis structure in the inverse of the FIM, suggesting a high level of coupling between the parameter estimates. One straightforward solution to eliminate the coupling among the parameters is to diagonalize the FIM, which is possible due to the properties of Hermitian matrices. A Hermitian matrix is one that is equal to its own conjugate transpose; the FIM fulfills this requirement, since it is real and symmetric. Barrett and Myers (2004) and Strang (1980) show that a Hermitian matrix can be diagonalized by an appropriate unitary transformation, possibly invoking Gram-Schmidt orthogonalization. The only question, 193 however, is whether the estimable parameters after the transformation would be useful from an ophthalmological standpoint. Table 5.3: Square-root of the CRB (standard deviation) for various system configurations. No. Parameter True Value γ1 γ2 γ3 γ4 1 Cornea posterior radius [mm] 6.38 6.2 × 10-6 4.1 × 10-5 2.0 × 10-6 2.0 × 10-5 2 Lens anterior radius [mm] 10.85 5.4 × 10-6 3.6 × 10-5 2.1 × 10-6 1.0 × 10-5 3 Lens posterior radius [mm] -5.92 7.7 × 10-6 4.9 × 10-5 1.9 × 10-6 3.5 × 10-5 4 Cornea thickness [mm] 0.554 5.6 × 10-6 3.9 × 10-5 1.8 × 10-6 2.5 × 10-5 5 Anterior chamber thickness [mm] 3.37 7.8 × 10-6 4.6 × 10-5 2.7 × 10-6 3.3 × 10-5 6 Lens thickness [mm] 4.09 7.1 × 10-6 4.9 × 10-5 2.4 × 10-6 3.0 × 10-5 7 Vitreous thickness [mm] 16.40 5.0 × 10-7 3.5 × 10-6 1.6 × 10-7 8.0 × 10-6 8 Cornea refractive index 1.3729 9.0 × 10-8 6.3 × 10-7 2.5 × 10-8 1.1 × 10-7 9 Anterior chamber refractive index 1.3329 5.6 × 10-8 3.7 × 10-7 1.6 × 10-8 4.7 × 10-7 10 Lens refractive index 1.4138 8.4 × 10-8 4.9 × 10-7 2.7 × 10-8 1.3 × 10-7 11 Vitreous refractive index 1.3317 9.9 × 10-9 6.8 × 10-8 2.9 × 10-9 9.4 × 10-8 194 Adjusting the detector element size from 8 µm to 25 µm, but leaving all other system parameters unchanged, reduces the total number of detector elements that receive a signal from the source. Since only these elements contribute to the FIM, the pixel enlargement causes an increase in the CRB; the square-root of the CRB is denoted as γ 2 in Table 5.3. Although the overall FIM is smaller in magnitude, there is a minimal effect on the structure of the FIM (Fig. 5.16) and inverse FIM (Fig. 5.17), so that the relative degree of coupling between parameters is roughly the same. Fig. 5.16: FIM for the system after increasing the detector element size. 195 Fig. 5.17: Inverse of the FIM after increasing the detector element size. A very similar effect is observed after increasing the beam diameter (from 1 mm to 2 mm) and the pupil diameter (from 6 mm to 8 mm), with no other changes. Note that we are again comparing to the original system configuration with 8-µm detector elements. The larger beam and pupil sizes produce a bigger focal spot on the retina, which in turn leads to bigger spots in the focal plane of the WFS. Figure 5.18 provides the new detector data for α = 0°, showing larger focal spots throughout the image. More pixels now receive a non-zero signal, which increases the Fisher information in the system, therefore the CRB becomes smaller. The corresponding standard deviations are labeled γ 3 in Table 5.3. Once again, however, there is a negligible difference in the structure of the FIM (Fig. 5.19) and inverse FIM (5.20). 196 Fig. 5.18: Detector data for α = 0° after increasing the beam and pupil diameters. 197 Fig. 5.19: FIM for the system after increasing the beam and pupil diameters. Fig. 5.20: Inverse of the FIM after increasing the beam and pupil diameters. 198 Something more interesting happens when we reduce the number of beam angles to a single angle of α = 0°. In this case the system loses its sensitivity to off-axis aberrations, which has a considerable impact on the structure of the FIM (Fig. 5.21). Compared to Figure 5.14, we immediately observe greater entanglement among the refractive indices, p = {8, 9, 10, 11}, based on the off-axis structure in this region. Upon closer inspection, we can also see relatively higher coupling between the indices and other parameters, aside from an overall decrease in the magnitude of the FIM. Fig. 5.21: FIM for the system after reducing the number of beam angles to one. Inversion of the FIM (Fig. 5.22) leads to a matrix that dramatically differs from the original inverse matrix, shown in (5.15). There is a moderate increase in the CRB, 199 whose square-root is denoted as γ 4 in Table 5.3, but this would probably be less troublesome during the estimation step, compared to the greater parametric coupling in the system. Fig. 5.22: Inverse of the FIM after reducing the number of beam angles. 5.6 Likelihood surfaces When solving optimization problems, it helps to have a strong sense of the objective function to be optimized. This is particularly useful when fitting nonlinear, multivariate functions that may be plagued with numerous local extrema. 200 In the context of ML estimation, a plot of pr(g|θ ) versus θ for a particular g is called the likelihood surface for that data vector. We demonstrated in Section 2.7 that for a purely Gaussian noise model, ML estimation reduces to nonlinear least-squares fitting between the data and the output of the optical-design program: M θˆML = argmin θ ∑[ g m − g m (θ )]2 . (5.25) m =1 Although we are minimizing the sum-of-squares (i.e., negative likelihood), we will refer to this objective function as the likelihood surface. For a vector set of N parameters, the likelihood surface exists in an N-dimensional hyperspace. However, we are currently restricted to plotting the likelihood surface while varying up to two parameters at a time. We selected a handful of pairs of parameters for the following figures and applied the ranges given in Table 5.4. In each plot, the fixed parameters were set to the true values underlying the data. Figures 5.23 – 5.32 illustrate the likelihood surface along the axes of the posterior corneal radius, Rcornea, posterior, and each of the other 10 parameters. For every pair of parameters, including those not shown in the following plots, there exists a groove along which the likelihood is nearly constant and that runs through the global minimum. We speculate that these ridges are caused by the same coupling of parameters that is evident in the Fisher information matrices. 201 Of the aforementioned plots, Figures 5.24, 5.26 – 5.28, 5.31, and 5.32 feature a high barrier followed by a local minimum at the boundary. The absence of a local minimum in the other figures does not guarantee an absence of local minima at other points in the search space. Due to the strong entanglement between parameters, a 2-D likelihood plot is very likely to change in shape or scale, if the 9 fixed parameters are altered to some extent, which will likely shift the position of the local minimum . The × signs correspond to the final ML estimates, which we will save for discussion in the next section. Fig. 5.23: Likelihood surface along Rcornea,posterior and Rlens,anterior axes. Final ML estimates indicated by × sign. 202 Fig. 5.24: Likelihood surface along Rcornea,posterior and Rlens,posterior axes. Final ML estimates indicated by × sign. Fig. 5.25: Likelihood surface along Rcornea,posterior and ∆tcornea axes. Final ML estimates indicated by × sign. 203 Fig. 5.26: Likelihood surface along Rcornea,posterior and ∆tant.chamber axes. Final ML estimates indicated by × sign. Fig. 5.27: Likelihood surface along Rcornea,posterior and ∆tlens axes. Final ML estimates indicated by × sign. 204 Fig. 5.28: Likelihood surface along Rcornea,posterior and ∆tvitreous axes. Final ML estimates indicated by × sign. Fig. 5.29: Likelihood surface along Rcornea,posterior and ncornea axes. Final ML estimates indicated by × sign. 205 Fig. 5.30: Likelihood surface along Rcornea,posterior and nant.chamber axes. Final ML estimates indicated by × sign. Fig. 5.31: Likelihood surface along Rcornea,posterior and nlens axes. Final ML estimates indicated by × sign. 206 Fig. 5.32: Likelihood surface along Rcornea,posterior and nvitreous axes. Final ML estimates indicated by × sign. Figures 5.33 – 5.36 illustrate how the likelihood surface varies with both the thickness ∆t and refractive index n of each optical medium in the eye, which closely resemble the previous plots. At first glance, Figure 5.33 appears different than the others, but bear in mind that the range in corneal thickness ∆tcornea is only 0.54 – 0.56 mm, based on the low variation across populations. Interestingly, both the likelihood and optical path length, OPL = n∆t, are roughly constant along each groove, since the grooves are straight. 207 Fig. 5.33: Likelihood surface along ∆tcornea and ncornea axes. Final ML estimates indicated by × sign. Fig. 5.34: Likelihood surface along ∆tant.chamber and nant.chamber axes. Final ML estimates indicated by × sign. 208 Fig. 5.35: Likelihood surface along ∆tlens and nlens axes. Final ML estimates indicated by × sign. Fig. 5.36: Likelihood surface along ∆tvitreous and nvitreous axes. Final ML estimates indicated by × sign. 209 The likelihood plots comparing pairs of thicknesses or pairs of indices, as well as the remaining pairs, are markedly similar to the plots we have shown so far. As mentioned, the common features are a low groove containing the global minimum and a high ridge that is roughly parallel to the groove. To understand this general behavior, we picked a plot exhibiting these features, Figure 5.28, and ran our ray-trace program for several different points in parameter space (Fig. 5.37). P1 represents the true values of the parameters underlying the data, corresponding to a nearsighted (myopic) eye, which focuses before the retina when accommodation is relaxed (Fig. 5.38). We formed a line perpendicular to the ridge and groove, running through P1 and two additional points, P2 and P3. As we follow this line, we pass through paraxial focus, represented by P3 and a peak cost function value (Fig. 5.39). The cost function then decreases as we approach P2, corresponding to a farsighted (hyperopic) eye (Fig. 5.40). Other selected points on the ridge, P4 and P5, lead to paraxial focus as well (Figs. 5.41 & 5.42). This illustrates that the likelihood is largely a function of defocus and that points in parameter space having the same likelihood correspond to comparable levels of defocus. 210 Fig. 5.37: Understanding the likelihood as a function of defocus. P1 corresponds to the true minimum and a myopic eye (focuses before retina); P3, P4, and P5 are high points and correspond to zero defocus; P2 corresponds to a hyperopic eye (focuses behind retina). 211 Fig. 5.38: Level of defocus at P1 (Rcornea,posterior = 6.381 mm, nvitreous = 16.40 mm). 5.39: Level of defocus at P3 (Rcornea,posterior = 6.188 mm, nvitreous = 15.97 mm). 212 5.40: Level of defocus at P2 (Rcornea,posterior = 6.000 mm, nvitreous = 15.50 mm). 5.41: Level of defocus at P4 (Rcornea,posterior = 6.512 mm, nvitreous = 15.85 mm). 213 5.42: Level of defocus at P5 (Rcornea,posterior = 5.871 mm, nvitreous = 16.09 mm). 5.7 Maximum-likelihood estimation of ocular parameters After generating the data, we pretended not to know the values of selected parameters and estimated them by maximizing the likelihood, or minimizing the sum-of-squares as in (5.25), since we used a Gaussian noise model. We performed the optimization with the simulated annealing algorithm presented in Section 3.3. After testing a series of tuning parameter combinations, we chose the following values during initialization (see Sec. 3.3.4): τ 0 = 105, δ = 0.1, Nδ = 5, NS = 80, c = 2.0, NT = 60, rT = 0.9, 214 v 0 = 0.5 (θ upper − θlower ) , where the initial temperature τ 0 has the same units as the cost function, and θ upper and θlower are the upper and lower limits in the parameter space during the search process. We verified that τ 0 was large enough for the system to perform a random search in the parameter space (i.e., nearly all proposed configurations are accepted), so that the local minima may be sufficiently sampled. We selected limits that compared well with the normal ranges in the parameters, based on clinical population studies, and chose the center of the search space as the starting point. The number of iterations per temperature phase is NI = NS × NT = 4800, with an iteration defined as one cycle through all parameters. Figure 5.42 shows the results from 16 optimization trials, plotting the optimal cost function versus the iteration number on a log-log scale. Each trial represented a different noise realization for the same parameter values, so that the variance in the estimates could be determined, and the same estimation procedure was operated on each trial. The plots begin at the end of the first temperature phase, or at 4800 iterations. After normalization by the detector variance and total number of elements, the average sum-ofsquares described in (5.25) was 462.25 at the starting point in the optimization, 1.3136 at the termination point, and 1.0011 at the true global minimum. As the number of detector elements increases, the normalized true minimum should approach unity. 215 Fig. 5.43: 16 simulated annealing trials for the estimation of ocular parameters. On a 64-bit AMD246 Opteron CPU with 3 GFLOPS of peak computing power, the average computation time per ray was 0.230 µs. There were three output planes per forward propagation, corresponding to 45.2 ms of computation time for the bundle of rays. Thus, each temperature phase took 39.8 min to compute. There was an average of 80.1 phases for the optimization trials, so the average computation time per trial was 53 hrs. We discussed in Section 6.2 that the maximum floating-point computing speed of a single NVIDIA Tesla C2075 GPU is 1030 GFLOPS. There are many other factors that affect performance in computers, such as memory speed and architecture, storage 216 technology, cache coherence, internal bus speeds, and software (i.e., operating system and application), so the clock rate alone does not accurately gauge relative performance, except when comparing it to other processors in the same line. However, assuming the computational time can be rescaled with a quoted computing speed, the total time would be reduced from 53 hours to 9.3 min with a single Tesla C2075. The parameter estimates are listed in Table 5.4, along with the true values underlying the data, the starting point in the ML search, the upper and lower bounds of the search space, the final estimates, and the standard deviation in each estimate. All of the parameters were estimated to within one standard deviation, and each estimate has a very small bias and variance. The accuracy in the estimates were down to three to four decimal places for radii and thickness and four to five decimal places for refractive indices. 217 Table 5.4: Estimated ocular parameters, including the true values, starting point in the search, upper and lower limits in the search space, and estimated values with standard deviations. No. Parameter True Value Lower Limit Starting Point Upper Limit ML Estimate 1 Cornea posterior radius [mm] 6.38 5.75 6.25 6.75 6.3805 ± 0.0008 2 Lens anterior radius [mm] 10.85 10.50 11.00 11.50 10.846 ± 0.007 3 Lens posterior radius [mm] -5.92 -7.00 -6.00 -5.00 -5.922 ± 0.006 4 Cornea thickness [mm] 0.554 0.54 0.55 0.56 0.553 ± 0.002 5 Anterior chamber thickness [mm] 3.37 3.00 3.75 4.50 3.368 ± 0.006 6 Lens thickness [mm] 4.09 3.25 4.00 4.75 4.08 ± 0.01 7 Vitreous thickness [mm] 16.40 15.50 16.50 17.50 16.397 ± 0.005 8 Cornea refractive index 1.3729 1.3700 1.3750 1.3800 1.3728 ± 0.0002 9 Anterior chamber refractive index 1.3329 1.3300 1.3350 1.3400 1.33287 ± 0.00007 10 Lens refractive index 1.4138 1.4100 1.4150 1.4200 1.4139 ± 0.0001 11 Vitreous refractive index 1.3317 1.3300 1.3350 1.3400 1.33175 ± 0.00008 Figure 5.41 illustrates the eye model reconstructed from the estimates, showing precise overlap with the true model. Though the corneal anterior radius was not 218 estimated, we plotted it in the reconstruction to indicate the estimated corneal thickness. Another way to visualize the ML estimates is to represent them as points on the likelihood surface; we used × marks rather than actual points in the likelihood plots in Figures 5.23 – 5.41. Notice that all of these marks fall on the ridges that run through the true minimum. Fig. 5.43: Reconstructed eye model of the estimated parameters, superimposed with the true values underlying the data. 5.8 Summary of Chapter 5 We estimated patient-specific ocular parameters using simulated WFS data with Gaussian noise, ML estimation, and an adaptive simulated annealing algorithm tailored to our 219 probability surface. Our optical-design program performed non-paraxial ray-tracing through quadric surfaces that may contain surface misalignments. An important result of this study is that the centroiding process in traditional wavefront sensing results in severe information loss, which we demonstrated by examining the FIM. In our method, we do not perform centroid estimation or wavefront reconstruction, but instead use the raw detector outputs of the WFS as input to IOD. We saw that another way of increasing information is to enlarge the beam and pupil diameters, resulting in larger spots on the retina and in the focal plane of the WFS. This is also opposite to classical wavefront sensing, in which smaller focal spots are preferred for the centroid estimation step. The key point here is that IOD actually prefers poor imaging for great information yield and smaller CRBs. After investigating the Fisher information in various system configurations, we implemented multiple input angles of the source beam to assess both on- and off-axis aberrations, which produced a feasible Cramér-Rao lower bound and reduced parametric coupling of ocular parameters. Although we obtained excellent results in this proof-of-principle study, we are still far from working with real patient data. The greatest obstacle in performing inverse optical design of the human eye is certainly the requirement of an accurate forward model of the eye that includes all sources of randomness. Fluctuations of ocular aberrations associated with live imaging of the eye, such as the optical tear film effect, would have to be taken into account. Future studies must consider more complexities in the model, such as the coherence properties of the source, the GRIN distribution of the crystalline lens, 220 irregularities in the corneal surface, scattering in the ocular media, Fresnel reflections, and the Stiles-Crawford effect. It may even be necessary to consider other sources of noise, since an i.i.d. Gaussian model is very idealized, as well a more realistic model for the WFS lenslets. We also anticipate practical issues such as stray light and misalignments, or other effects that may contribute to estimation errors. These issues are beyond the scope of this stage of the research, but may be dealt with in subsequent stages. To enhance the inverse optical design algorithm, prior information obtained from statistical studies or by other reliable modalities, such as corneal topography, can be incorporated. This information can be used to select a starting point or to narrow the search space. An approximate method based on optimization with reverse ray-tracing (Goncharov, Nowakowski, Sheehan, & Dainty, 2008) could provide a promising starting point in our likelihood approach. In order to reach clinical application, rapid processing techniques must be explored to improve computation time, which would allow an increase in model robustness. This can be accomplished with dedicated computer hardware and the parallelization of the optical-design program. Global search algorithms such as simulated annealing are very time consuming, though if we could get to a point of negligible parameter coupling and a unimodal likelihood surface, perhaps straightforward optimization could be used instead. This would not only reduce the computation time, but may also result in more reliable estimates. Ultimately, the goal is to make the system 221 practical in the clinical setting and to obtain accurate estimates of the full set of ocular parameters for a given patient. 222 CHAPTER 6 MAXIMUM-LIKELIHOOD ESTIMATION OF PARAMETERIZED WAVEFRONTS USING MULTIFOCAL DATA In this chapter, we present the second of three applications of likelihood methods in optics as it applies to high-precision optical testing. Section 6.1 introduces the general approach to acquiring irradiance data from which wavefront parameters are estimated. Section 6.2 is dedicated to modeling the wave propagation and emphasizes the complexity and accuracy of the computations needed for finding ML estimates. We conclude with a discussion on the graphics processing unit (GPU) for rapid processing, since the estimation procedure requires substantial computation time. In Section 6.3 and Section 6.4, we present proof-of-principle results obtained in simulation and in experiment, respectively. We discuss the added challenge of largeaberration wavefronts in the numerical study. On the experimental side, we discuss the accuracy of the propagation algorithm and the handling of nuisance parameters. In both cases, we examine Fisher information matrices, Cramér-Rao bounds, and likelihood surfaces, after which we provide ML estimates obtained by simulated annealing. 6.1 Formulation of the problem Phase retrieval (PR) is a useful method for recovering the phase distribution in the pupil of an optical system from the irradiance distribution in the focal plane. However, the 223 usual PR problem is ill posed, since both distributions are unrestricted 2D functions (Stefanescu, 1985) and a single irradiance measurement does not ensure that the recovered phase is unique (Seldin & Fienup, 1990; Teague, 1983). A straightforward solution to avoiding the ambiguities is to make multiple irradiance measurements near the focal plane. Another approach to avoiding the phase ambiguities is to estimate the parameters describing the phase function, instead of the function itself (Brady & Fienup, 2004, 2005; Brady, Guizar-Sicairos, Fienup, 2009). Parameterization of the phase is achieved with a set of expansion functions. Obvious choices for circular pupils are Zernike polynomials, although there are many other options. In our method, the data consist of irradiance measurements in multiple planes near the focus of an aberrated optical element (Fig. 6.1). From the multifocal data, we estimate the coefficients of phase polynomials in a wavefront expansion, as proposed by Brady and Fienup, but with various extensions. As mentioned briefly in Section 1.2, we optimize the data-acquisition system by analyzing the Fisher information matrix and Cramér-Rao lower bound, as well as the likelihood surface. Additionally, we have developed our method to handle very large wavefront aberrations. To deal with the resulting high computational demand, we employ rapid-processing techniques using dedicated computer hardware. 224 Fig. 6.1: Data-acquisition system for collecting multiple irradiance patterns near the focus of an optical element. 6.2 Propagation algorithm 6.2.1 Diffraction propagation vs. ray-tracing An imaging system whose aberrations are well corrected will convert a diverging spherical wave emitted by a point source into a converging spherical wave centered on the geometrical image point, through which all geometrical rays or wavefront normals will pass. Since there are an infinite number of rays and each ray contributes a finite amount of energy, the geometrical treatment predicts an infinite irradiance at the focus and zero irradiance everywhere else in the image plane. We know that this is nonphysical, therefore geometrical optics is invalid in the focal region (Stamnes, 1986). For an imaging system that is not well corrected, the converging wave leaving the system will no longer be spherical and all rays will not meet at the geometrical focus. If 225 the aberrations are sufficiently large, the ray-density may provide a crude approximation of the irradiance distribution in the focal region, though the imaging is poor. Even under this circumstance, the prediction by geometrical optics is still invalid in the vicinity of the caustic. Thus, in all cases, we must consider diffraction theory when determining the irradiance in the focal region (Stamnes, 1986). While our method of estimating wavefront parameters from irradiance data does not make demands of the image quality, it does require an accurate forward model of the system when working with real data. The diffraction integral without the Fresnel approximation is given by (4.79), u z (r ) = 1 iλ z ∫ d r u (r ) exp(ik | r − r | + z ), 2 0 0 0 0 2 2 (6.1) ∞ which is essentially a mathematical refinement of the Huygens wavelet formulation. The field at each observation point in the image plane is the sum of an infinite number of secondary waves emanating from the aperture. Since the integrand of the double integral in (6.1) can vary rapidly over the integration domain, especially when the aberrations are large, a significant amount of sampling in the aperture is required to accurately predict the irradiance distribution in the image plane. Thus, numerical evaluations of the Huygens diffraction integral are prohibitively time-consuming and an approximation to this integral is necessary to reduce the computing problem. 226 Note that both the Huygens wavelet formation and the Fresnel approximation assume scalar diffraction theory, which we assume to be adequate for our method of wavefront measurement with multifocal data, even when large aberrations are present. 6.2.2 Diffraction equation for a converging spherical wave Suppose a converging wave has wavefront error W (r0 ) in the exit pupil of an optical system. Then the field in the exit pupil is given by u0 (r0 ) = exp[ikW (r0 )] exp[−ikR f (r0 )] R f (r0 ) , (6.2) where R f (r0 ) = | r0 |2 + f 2 and f is defined as the radius of curvature of the unaberrated wave in the exit pupil (i.e., the distance between the exit pupil and paraxial focus). Inserting (6.2) into (6.1) leads to u z (r ) = 1 iλ z ∫ d 2 r0 exp[ikW (r0 )] xp exp[−ikR f (r0 )] R f (r0 ) ( or equivalently, u z (r ) = 1 iλ z ∫ xp d 2 r0 ) exp ik | r − r0 |2 + z 2 , r ⋅ r0 exp[ikW (r0 )] exp − ik R f (r0 ) z (6.3) 227 r ⋅ r0 × expik | r − r0 |2 + z 2 − | r0 |2 + f 2 + . z (6.4) Note that the last exponential term is part of the integrand. A binomial expansion for z > | r − r0 | and f > r0 leads to | r − r0 |2 + z 2 − | r0 |2 + f 2 + r ⋅ r0 r 2 r02 1 1 =z− f + + − + HOT , z 2 z 2 z f (6.5) where HOT denotes the higher-order terms in r0 : HOT = − | r − r0 |4 | r0 |4 | r − r0 |6 | r0 |6 + + − + ... . 8z 3 8f 3 16 z 5 16 f 5 (6.6) Combining (6.4) and (6.5) results in ∫ u z (r ) = A(r ) d 2 r0 xp r2 r ⋅ r0 exp[ikW (r0 )] expik 0 exp − ik R f (r0 ) z 2 1 1 − + HOT . z f (6.7) where the function A(r) is given by A(r ) = 1 r 2 exp ik z − f + . iλz 2 z (6.8) 228 If the higher-order terms in (6.7) can be ignored, that equation can be represented by a 2D Fourier transform, 1 r 2 1 1 , exp[ikW (r0 )] exp ik 0 − u z (r ) ≈ A(r ) F2 z f 2 R f (r0 ) ρ =r/λz (6.9) where the spatial frequency is ρ = r / λz , as we saw in Section 4.4.4. The corresponding irradiance under this approximation is given by 2 1 2 I (r ) = u z (r ) ≈ 2 2 λ z r2 F2 1 exp[ikW (r0 )] expik 0 1 − 1 . z f 2 R f (r0 ) ρ = r/λz (6.10) The approximation in (6.9) breaks down as the numerical aperture of the optical system increases. Under circumstances when the higher-order terms are not negligible, we might consider including, say, the fourth-order terms in (6.6). The caveat is that r and r0 are inseparable in these terms, so including them in the integral means that we cannot take advantage of the FFT. Since a brute-force computation of the diffraction integral is too computationally expensive and impractical, as we will show in Section 6.4.5, the best we can do is minimize the terms in (6.6) by considering planes sufficiently close to nominal focus, so that z ≈ f and r is small. 229 6.2.3 Parameterized wavefront description The unknown function of interest in these equations is the wavefront error W (r0 ) , which we represent in a parameterized form in the exit pupil of the optical system. Our approach is based on the fundamental assumption that the continuous wavefront can be approximated to sufficient accuracy by a finite set of expansion functions. We choose to represent this function by expanding it in some number of Zernike polynomials, N W (r0 ) ≈ ∑α Z (r ) , n n 0 (6.11) n =1 where Z n (r0 ) is the nth Zernike polynomial with coefficient α n and r0 is a 2D position vector in the pupil. The parameters to be estimated are the Zernike coefficients {αn, n = 1,…, N), but an important determination is the number of coefficients necessary for an accurate representation of the wavefront. Even if a small number of terms is used in the expansion, this does not imply that the wavefront aberration is small; representing large wavefront errors simply requires large coefficients. In our approach, we assume that the wavefront is smoothly varying, so that sufficient accuracy can be achieved with a relatively small value of N. We choose N = 37, the maximum number of coefficients calculated in ZEMAX. 230 Specifically, we choose to use the Fringe Zernike polynomials, provided in Appendix A. These are identical to the original Zernike polynomials, except for the manner and order in which they are listed. 6.2.4 Sampling considerations Let Dxp denote the diameter of the exit pupil, P × P the array size in the pupil plane, and F × F the array size for the FFT propagation including zero-padding, so that F > P. Then we define the following parameters, ∆x p ≡ ∆ν ≡ Dxp P , 1 P 1 = , F∆x p F D xp ∆xd ≡ λz∆ν = (6.12a) F > P, P λz , F Dxp (6.12b) (6.12c) where ∆xp is the pupil element spacing, ∆ν is the spatial frequency spacing, and ∆xd is the detector element spacing in each transverse direction. Most of the F × F detector elements do not receive a signal, which only creates unnecessary computation time when computing the objective function during optimization. For this reason, we automatically extract the M1D × M1D innermost elements from the F × F output of each FFT operation 231 and discard the remainder. Note that we use M1D rather than M, since the latter is reserved for M × 1 data vector g. 6.2.5 Parallel processing with the graphics processing unit Among the earliest applications driving the development of the graphics processing unit were computer-aided design and flight simulation in the 1960s. GPU technology has come a long way since then, and modern uses now include web browser graphics, complex mechanical CAD, DVD video playback, and 3D video games approaching cinematic realism. The same fundamental technology used in dedicated systems such as entertainment consoles and medical imaging stations are now being used for massively parallel high-performance computing in the scientific and engineering fields. General-purpose computing with the GPU became possible at the turn of the century when pioneer programmers realized that the pixel shaders on graphics chips can be treated as stream processors or thread processors with their own registers and local memory. However, the programming model was extremely awkward and clumsy, requiring graphics application interfaces (APIs), such as OpenGL and Cg. By 2006, the GPU was modified with the added support of C/C++, which allowed significant simplification of the programming model and accessibility to a larger community of application programmers. Modern GPU programming is accomplished using an extension to the C programming language called CUDA (“Compute Unified Device Architecture”), which provides software development tools and allows functions in C to be implemented on a 232 GPU’s multiple stream processors. The programming model is heterogeneous; the sequential part runs on a host CPU and the computationally-intensive part on one or more compute devices, which are massively parallel coprocessors. CUDA devices support the Single-Instruction Multiple-Data (SIMD) model, in which all concurrent threads are based on the same code, though the path of execution may differ between threads. An important aspect of CUDA is that it features a hardware abstraction mechanism, by which the runtime transparently compiles the data-parallel computation to shader programs (Ryoo et al., 2008). CUDA programming is achieved with a minimal set of keywords and extensions of the standard ANSI C language, which assign kernels, or data-parallel functions, and their associated data structures to the compute devices. The kernels provide instructions to single threads, usually calling upon thousands of threads at a time. Threads are organized by the developer into thread bundles, or thread blocks, in which they can exchange data in their own shared memory and synchronize actions, while they also have access to global memory and read-only access to constant memory and texture memory. Through a language integration programming interface, the CUDA runtime supports the execution of standard C functions on the device, including library functions for managing device memory and transferring data between the host and device (Ryoo et al., 2008). There are many advantages of using GPU hardware as opposed to other hardware platforms, such as the field-programmable gate array (FPGA) or the cell broadband engine architecture (CBEA) found in the Sony PlayStation3. New hardware becomes available on a continual basis, providing much hardware flexibility based on 233 programming needs, plus a lot of memory exists on the device and host machine. Also, CUDA is comparatively straightforward to use with many useful library routines (e.g., FFTW, BLAS), resulting in high productivity of the programmer. The latest, most advanced GPUs currently on the market include the NVIDIA Tesla models, such as the C1060 and C2075, whose specifications are provided in Table 6.1. For instance, the C2075 contains 448 processing cores and offers an unprecedented 515 GFLOPS of peak double-precision floating-point performance, where FLOPS stands for “floating-point operations per second”. Table 6.1: Product specifications for NVIDIA Tesla C1060 and C2075 models. Tesla C1060 Tesla C2075 Peak double-precision floating-point performance 78 GFLOPS 515 GFLOPS Peak single-precision floating-point performance 933 GFLOPS 1030 GFLOPS CUDA cores 240 448 Memory size 4 GB 6 GB Memory bandwidth 102 GB/sec 144 GB/sec 234 6.3 Numerical studies For the numerical proof-of-principle system, we chose a rather large lens with substantial aberrations. We examined the ideal amount of pupil sampling for a good representation of the irradiance data, but without wasting computational effort. We thoroughly investigated Fisher information matrices, Cramér-Rao bounds, and likelihood surfaces. Due to the multitude of local minima in the cost function, we used simulated annealing to obtain ML estimates of wavefront parameters. 6.3.1 Test lens description Much of the work presented in this chapter was completed under a contract with a corporation that funded our research. Due to the confidentiality agreement between our institution and this corporation, we cannot disclose any design parameters of the lens system referenced in this numerical study. For the purpose of estimating wavefront parameters, however, the design parameters are not pertinent. We are primarily interested in the wavefront emerging from the system and the region of space between the exit pupil and the image planes. We chose to operate the lens, which is rotationally-symmetric, at finite conjugates in our optical design program by placing an on-axis point source 113.0 mm from the entrance pupil of the lens. Table 6.2 summarizes the relevant system data calculated by ZEMAX using a wavelength of λ = 0.6328 µm, including an exit pupil diameter of Dxp = 88.41 mm and a working f-number of f/#w = 1.796. Note that the position of the paraxial 235 Table 6.2: System data provided by ZEMAX™ for the highly aberrated test lens at λ = 0.6328 µm. Effective Focal Length [mm] 66.71521 Back Focal Length [mm] 40.80047 Image Space f/# 0.9309758 Paraxial Working f/# 1.785467 Working f/# 1.796028 Image Space NA 0.2696646 Object Space NA 0.3022556 Entrance Pupil Diameter [mm] 71.6616 Exit Pupil Diameter [mm] 88.40518 Paraxial Focal Plane Position [mm] 157.8443 focal plane at z = zf = 157.8 mm is measured from the exit pupil, which lies in the z = 0 plane. An illustration of the focal region of the lens is provided in Figure 6.2, showing a distance of 5.0 mm between marginal and paraxial focus. 236 Fig. 6.2: Focal region of the highly aberrated test lens at λ = 0.6328 µm. Paraxial focal plane is at z = zf = 157.8 mm. The Fringe Zernike coefficients describing the wavefront error W (r0 ) in the exit pupil, according to ZEMAX, are provided for N = 37 in Table 6.3. Since the system has rotational symmetry, the coefficients for all non-rotationally symmetric Zernike terms are equal to zero, while non-zero coefficients correspond to piston, defocus, and various orders of spherical aberration. Figure 6.3 shows the wavefront error map computed with these coefficients as a function of normalized radius, indicating a peak-to-valley measurement of 149.1λ. 237 Table 6.3: Fringe Zernike coefficients {αn, n = 1,…, 37}, peak-to-valley, RMS, and variance, provided by ZEMAX for the highly aberrated test lens at λ = 0.6328 µm. Unlisted coefficients are zero. Index Aberration Type [λ λ] Design Value [λ λ] 1 Piston 50.73020573 4 Defocus 75.17884184 9 Spherical Aberration, Primary 23.81690475 16 Spherical Aberration, Secondary -0.60555615 25 Spherical Aberration, Tertiary 0.04113341 36 Spherical Aberration, Quaternary -0.00287742 37 Spherical Aberration, 12th-order Term -0.01310320 Peak-to-valley [λ] N/A 149.14956942 RMS [λ] N/A 44.18240916 Variance [λ2] N/A 1952.08527937 238 Fig. 6.3: Wavefront error in the exit pupil of the highly aberrated test lens at λ = 0.6328 µm, as a function of normalized radius. Units are in waves. 6.3.2 Pupil sampling Before simulating the irradiance data, we determined the optimal amount of pupil sampling to accurately represent the irradiance data without unnecessary increases in computation time. A critical design option is the number and location of planes at which to measure the irradiance. It is well known that two-plane measurements are sufficient to determine the pupil phase, so to minimize computation time, we selected two output planes just before paraxial focus, z = z1 = zf – 0.25 mm, z = z2 = zf – 0.43 mm. 239 For each plane, we computed the irradiance using (6.10) with pupil sampling levels of P = 256, P = 512, and P = 1024 (Figs. 6.4 & 6.5). In each case, we held the ratio P/F in (6.12) constant at 1/2, thereby fixing the size of the detector elements in the output plane and allowing us to isolate the pupil sampling effect on the data. The data computed with P = 256 and F = 512 (Figs. 6.4c & 6.5c) exhibit severe signs of undersampling, such as discontinuities and D4 symmetry (i.e., symmetry of a square) from the rectilinear grid of the input function, and the irradiance patterns are clearly unphysical. While the data with P = 512 and F = 1024 (Figs. 6.4b & Fig. 6.5b) show much improvement, the sampling artifacts are still evident. Since these artifacts are diminished for P = 1024 and F = 2048 (Figs. 6.4a and 6.5a), we chose this pupil sampling for the study. Fig. 6.4: Detector data at z = z1 for the highly aberrated test lens using a pupil sampling of: (a) P = 1024, (b) P = 512, and (c) P = 256. 240 Fig. 6.5: Detector data at z = z2 for the highly aberrated test lens using a pupil sampling of: (a) P = 1024, (b) P = 512, and (c) P = 256. The final data set, containing electronic noise with a peak SNR of 104, is shown in Figure 6.6. According to (5.12c), the detector element size is roughly 0.56µm. 241 Fig. 6.6: Detector data for the highly aberrated test lens using a pupil sampling of P = 1024 at image plane: (a) z = z1 and (b) z = z2. We saw that for a given system and image plane, the FFT method requires a minimum value of P to avoid sampling artifacts. Likewise, there is a finite range in the 242 image position, for which a given P is sufficient. For the system configuration described in Section 6.3.1 and with P = 1024, this range is limited to approximately zf – 1.8 mm < z < zf – 0.2 mm, which is roughly one-third of the range between marginal and paraxial focus, zf – 5.0 mm < z < zf . Beyond this range, the irradiance distribution degrades very rapidly. 6.3.3 Fisher information and Cramér-Rao lower bounds We computed the Fisher information matrix using (2.76), according to an i.i.d. Gaussian noise model, for Fringe Zernike coefficients {αn, n = 2,…, 37} (Fig. 6.7). Since α1 corresponds to piston and does not influence the irradiance data, we had no interest in estimating it and disregarded it in the FIM. We determined the variance in the detector elements σ2 with a peak SNR of 104 in the data and evaluated the FIM components at θ , based on the values in Table 6.3. (See Appendix A for a list of Fringe Zernike polynomials.) The high diagonal values in the FIM indicate that there is abundant information in the data with respect to each estimable parameter. However, the off-diagonal structure is also very pronounced, indicating strong parametric coupling that may confound the estimation problem. Recall that the degree of coupling between two parameters is proportional to the magnitude of the respective FIM component. 243 Fig. 6.7: FIM for Fringe Zernike coefficients {αn, n = 2,…, 37}, in the exit pupil of the highly aberrated test lens (log scale). When an optical system has rotational symmetry, there is immediate interest in coefficients associated with defocus and spherical aberration, which are {αn, n = 4, 9, 16, 25, 36, 37}. Unsurprisingly, the FIM indicates strong coupling between these parameters, which only intensifies as the pupil sampling is reduced. 244 There is also significant coupling among terms with bilateral symmetry about the x-axis, including {αn, n = 2, 7, 10, 14, 19, 23, 26, 30}, which are x-axis tilt and all orders of x-axis coma and trefoil. The same is true for the y-axis counterparts of these terms, that is, for {αn, n = 3, 8, 11, 15, 20, 24, 27, 31}. However, the coupling is minimal for any pair that includes a parameter from each set, such as (α2, α3) or (α7, α11). After all, it only makes sense that the system does not confuse changes in the x-direction with those in the y-direction, but it is reassuring that the FIM verifies this. A similar observation can be made about terms with two axes of bilateral symmetry. Forming one group, we have the terms {αn , n = 5, 12, 21, 32}, whose axes of symmetry are the x- and y-axes. These correspond to all orders of astigmatism at 0°. Another group is formed with {αn , n = 6, 13, 22, 33}, the orders of astigmatism 45°. In other words, the system can distinguish any two astigmatic terms with relative ease, as long as one is at 0° and the other at 45°, but it has difficulty otherwise. We computed the inverse of the FIM (Fig. 6.8) and read off its diagonal components to determine the Cramér-Rao bound for the parameters, whose square-root is provided in Table 6.3. The diminutive values in (CRB)1/2, on the order of 10-9 to 10-8, permit the estimation of the wavefront parameters to very high precision, provided that the forward model is exact. 245 Fig. 6.8: Inverse of the FIM for Fringe Zernike coefficients {αn, n = 2,…, 37} in the exit pupil of the highly aberrated test lens (log scale). 246 Table 6.4: Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 37} in the exit pupil of the highly aberrated test lens at λ = 0.6328 µm. Units are in waves λ. Index True Value [λ λ] (CRB)1/2 [λ λ] Index True Value [λ λ] (CRB)1/2 [λ λ] 2 0 1.4 × 10-8 20 0 1.2 × 10-8 3 0 1.4 × 10-8 21 0 4.8 × 10-9 4 75.17884184 1.3 × 10-8 22 0 4.5 × 10-8 5 0 6.6 × 10-9 23 0 7.7 × 10-9 6 0 7.4 × 10-8 24 0 7.7 × 10-9 7 0 6.9 × 10-8 25 0.04113341 5.5 × 10-9 8 0 6.9 × 10-8 26 0 7.6 × 10-9 9 23.81690475 2.4 × 10-8 27 0 7.7 × 10-9 10 0 2.0 × 10-8 28 0 2.8 × 10-9 11 0 2.0 × 10-8 29 0 2.9 × 10-9 12 0 2.9 × 10-9 30 0 7.8 × 10-9 13 0 1.2 × 10-8 31 0 7.8 × 10-9 14 0 1.1 × 10-8 32 0 1.9 × 10-9 15 0 1.1 × 10-8 33 0 1.7 × 10-8 16 -0.60555615 2.4 × 10-8 34 0 3.6 × 10-9 17 0 4.1 × 10-9 35 0 3.6 × 10-9 18 0 5.6 × 10-9 36 -0.00287742 1.2 × 10-8 19 0 1.2 × 10-8 37 -0.01310320 4.3 × 10-9 In this proof-of-principle study, we chose to estimate the Zernike coefficients {αn, n = 2,…, 9, 16}, which we will discuss in Section 6.3.5. In a real physical system, the 247 coefficients related to tilt and off-axis aberrations (i.e., coma and astigmatism) would be useful in determining misalignments in the optical system, while the remaining coefficients represent defocus and spherical aberration, relating to optical power and the curvatures and asphericities of the refractive surfaces. The reduced FIM for the selected coefficients is shown in Figure 6.9. After the inversion of this matrix (Fig. 6.10), we actually see an improvement in (CRB)1/2 by roughly one to two orders of magnitude for each parameter. These values are provided in Table 6.5. Fig. 6.9: FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16}, in the exit pupil of the highly aberrated test lens (log scale). 248 Fig. 6.10: Inverse of the FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16}, in the exit pupil of the highly aberrated test lens (log scale). Table 6.5: Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 9, 16}, in the exit pupil of the highly aberrated test lens at λ = 0.6328 µm. Units are in waves λ. Index True Value [λ λ] (CRB)1/2 [λ λ] 2 0 6.6 × 10-10 3 0 6.6 × 10-10 4 75.17884184 1.2 × 10-10 5 0 6.8 × 10-10 6 0 1.5 × 10-9 7 0 3.6 × 10-10 8 0 3.6 × 10-10 9 23.81690475 2.2 × 10-9 16 -0.60555615 1.5 × 10-9 249 6.3.4 Likelihood surfaces Reiterating from Chapters 2 and 5, the cost function to be minimized for our i.i.d. Gaussian noise model is the sum-of-squares between the data and the output of our optical-design program, expressed in (2.72). As before, we will refer to the following plots of the cost function as likelihood surfaces. Using the data in Figure 6.6, we computed the surface along two parametric axes at a time, where the parameters were selected from {αn , n = 2,…, 9, 16, 25, 36, 37}. Each plot is centered about the true minimum, and the range for each parameter is given in Table 6.6. Table 6.6: Range in likelihood surface plots for Fringe Zernike coefficients {αn , n = 2,…, 9, 16, 25, 36, 37} in the exit pupil of the highly aberrated test lens. Units are in waves λ. Index True value [λ λ] Range 2 0 ± 4λ 3 0 ± 4λ 4 75.17884184 ± 10λ 5 0 ± 4λ 6 0 ± 4λ 7 0 ± 2λ 8 0 ± 2λ 9 23.81690475 ± 4λ 16 -0.60555615 ± λ/2 25 0.04113341 ± λ/2 36 -0.00287742 ± λ/4 37 -0.01310320 ± λ/4 250 We first examined pairs of the rotationally-symmetric terms, {αn , n = 4, 9, 16, 25, 36, 37}. Several examples are provided in Figures 6.11 – 6.14. These likelihood plots are very reminiscent of those shown in Chapter 5, in which we estimated ocular parameters, and we concluded that the likelihood shape was primarily a function of the defocus level in the eye. Although the plots here incorporate the various spherical aberration terms, there is intense entanglement between these terms and defocus, as evidenced in the FIM in Section 6.3.2. A key difference, though, is that these plots contain fine wrinkles, specifically along the α4 and α9 axes (Figs. 6.11 – 6.13), whereas the plots in Chapter 5 are smoothly varying. These wrinkles are likely to cause difficulty in the later stages of optimization using simulated annealing, when finer features in the cost function begin to emerge. Fig. 6.11: Likelihood surface along the α4 (defocus) and α9 (primary spherical aberration) axes for the highly aberrated test lens. 251 Fig. 6.12: Likelihood surface along the α4 (defocus) and α25 (tertiary spherical aberration) axes for the highly aberrated test lens. Fig. 6.13: Likelihood surface along the α9 (primary spherical aberration) and α16 (secondary spherical aberration) axes for the highly aberrated test lens. 252 Fig. 6.14: Likelihood surface along the α16 (secondary spherical aberration) and α25 (tertiary spherical aberration) axes for the highly aberrated test lens. Likelihood plots comparing a rotationally-symmetric term with an off-axis term, such as tilt, α2 and α3, astigmatism, α5 and α6, or coma, α7 and α8, tend to resemble one of the plots in Figures 6.15 and 6.16. These are clearly marked by a number of local basins and a predictable axis of symmetry. Conversely, combining the x- and y-axis versions of an off-axis term (e.g., x- and y-axis tilt) results in a plot with full rotational symmetry, as in Figure 6.17. Any other combination between these off-axis terms results in a plot similar to those in Figures 6.18 and 6.19, once again characterized by a large number of local extrema. 253 Fig. 6.15: Likelihood surface along the α4 (defocus) and α5 (primary astigmatism at 0°) axes for the highly aberrated test lens. Fig. 6.16: Likelihood surface along the α25 (tertiary spherical aberration) and α7 (primary coma, x-axis) axes for the highly aberrated test lens. 254 Fig. 6.17: Likelihood surface along the α2 (tilt, y-axis) and α3 (tilt, x-axis) axes for the highly aberrated test lens. Fig. 6.18: Likelihood surface along the α5 (primary astigmatism at 0°) and α7 (primary coma, x-axis) axes for the highly aberrated test lens. 255 Fig. 6.19: Likelihood surface along the α3 (tilt, y-axis) and α8 (primary coma, y-axis) axes for the highly aberrated test lens. 6.3.5 Maximum-likelihood estimates After generating the data (Figure 6.6), we disregarded our knowledge of Zernike coefficients {αn, n = 2,…, 9, 16}, then estimated them using ML estimation according to our Gaussian noise model. We tried a variety of tuning-parameter combinations for the simulated annealing algorithm described in Section 3.3, observing the cost function and the trajectory taken during the search process. Ultimately, we decided on the following values during initialization: τ 0 = 106, δ = 1.0, Nδ = 5, 256 NS = 10, c = 2.0, NT = 20, rT = 0.90, v 0 = 0.5 (θ upper − θlower ) , where θ upper and θlower are respectively the upper and lower limits in the parameter space, based on the ranges in Table 6.6. To give local minima throughout the entire search space an equal chance to be sampled, we imposed a high initial temperature τ 0 that would enable the system to accept virtually all proposed configurations. We produced a random starting point in the search, as listed in Table 6.7. We ran 12 optimization trials, with each trial representing a distinct noise realization for the same wavefront parameters, and implemented the same estimation procedure on each. Figure 6.20 illustrates the optimal cost function versus iteration number for the trials, starting at the end of the first temperature phase. The average final cost and number of temperature phases were 147.7 and 87.3, respectively. This number of phases is equivalent to a final temperature of 116.1. 257 Fig. 6.20: 12 simulated annealing trials for the estimation of wavefront parameters in the exit pupil of the highly aberrated test lens (log-log scale). As we discussed in Chapter 2, bias and variance are used to specify estimator performance, where bias is defined as the deviation of the average parameter estimate from the true value, and variance is the mean-square fluctuation of the estimate about its mean. In essence, bias is related to accuracy and variance to precision. Inherent bias in the estimator and systematic errors due to miscalibration or inaccurate modeling of the system both factor into the overall bias. Variance provides a measure of random errors that fluctuate from one measurement to another for a given wavefront. 258 Final average estimates for Zernike coefficients {αn, n = 2,…, 9, 16} are given in Table 6.7, along with the respective standard deviation. Both the bias and variance are very small for every parameter, while all parameters were estimated to within one standard deviation. Based on the low biases, it is not surprising that the true and estimated irradiance patterns are virtually indistinguishable (Fig. 6.21). Table 6.7: ML estimates of wavefront parameters for the highly aberrated test lens at λ = 0.6328 µm, including their standard deviations and the starting point in the search. Units are in waves λ. Index Aberration Type True Value [λ λ] Start Point [λ λ] Estimate [λ λ] 2 Tilt (x-axis) 0 2.33765863 0.11 ± 0.34 3 Tilt (y-axis) 0 3.67593941 -0.09 ± 0.30 4 Defocus 75.17884184 81.29481398 75.5 ± 1.0 5 Astigmatism, Primary (0° or 90°) 0 -3.71430657 -0.09 ± 0.28 6 Astigmatism, Primary (±45°) 0 2.79303444 0.17 ± 0.23 7 Coma, Primary (x-axis) 0 1.73597299 0.08 ± 0.33 8 Coma, Primary (y-axis) 0 -1.81531443 -0.10 ± 0.36 9 Spherical Aberration, Primary 23.81690475 20.60294575 23.58 ± 0.59 16 Spherical Aberration, Secondary -0.60555615 -0.29345783 -0.604 ± 0.047 259 Throughout this dissertation, we have emphasized the need for an accurate probability model when performing ML estimation, particularly when working with physical data. In this simulation study, we assumed the validity of the Fresnel approximation despite the low f-number of the test lens. This did not pose any problems, however, since any modeling errors were mapped in both the forward and inverse problems. Thus, the negligible bias probably resulted from the lack of systematic errors in the two-way mapping. Regarding the variance, the large deviations from the CRB must have risen from the host of local extrema in the likelihood surface. If the magnitude of the fluctuations are unacceptable to the specific application, then one must either devote more computation time to navigating the search space or find a system configuration with less extrema. 260 Fig. 6.21: Comparison between the true and estimated irradiance patterns for the highly aberrated test lens. An iteration is defined as one cycle through every parameter, so there were NI = NS × NT = 200 iterations, or 1.8×103 forward propagations, per temperature phase. For the pupil sampling of P = 1024 and FFT grid size of F = 2048, the computation time was 790 ms per forward propagation, including both output planes. So, the average of 87.3 temperature phases per trial corresponds to 34.5 hours of computation time. We 261 carried out this study using an NVIDIA Tesla C1060 GPU, whose peak double-precision (DP) performance is 78 GFLOPS. Had we used the Tesla C2075 model, offering 515 GFLOPS of DP power, the computation time per trial would have been roughly 5.2 hours. Even further improvement can be achieved with a cluster of GPUs. For instance, the VSC455 V8 GPU workstation by Velocity Micro combines 8 Tesla C2075s for over 4 TFLOPS of DP power, which would turn the 5.2 hours into just 40 minutes. 6.4 Experimental results In the experimental proof-of-principle study, we selected a relatively benign lens with less aberrations. In contrast to the numerical study, we dealt with nuisance parameters in the system and evaluated the accuracy of the Fresnel approximation. As usual, we determined the optimal pupil sampling and investigated the FIM, CRB, and likelihood surface. Despite the lower aberrations, the cost function still contained numerous local minima, so we performed ML estimation by simulated annealing. 6.4.1 System configuration We obtained multifocal irradiance data in the focal region of a spherical test lens, described in detail in Section 6.4.2, again at finite conjugates and with a HeNe source (λ = 0.6328 µm) (Fig. 6.22). The distance from the on-axis point source, created with a 40X microscope objective and 10-µm pinhole, to the test lens was 457 mm. To generate an increase in information by illuminating substantially more detector elements, we magnified the intermediate image with an imaging lens placed just before the CCD. This 262 had the added benefit of eliminating the CCD saturation without the use of neutral density filters. The imaging lens was a Nikon 40X microscope objective (NA = 0.95), configured at the design conjugates. When used at the proper conjugates, the manufacturer claims that the objective lens is corrected for spherical aberration. This particular objective was infinity-corrected with an optical tube length of 160 mm. To maintain these conjugates, we placed the imaging lens and CCD on a translation stage, allowing us to scan through the focal region of the test lens. We used a CCD with a fine pixel size of 4.4 µm for high information yield. Fig. 6.22: Data-acquisition system for collecting multiple irradiance patterns near the focus of a spherical test lens, including a movable imaging lens. 263 6.4.2 Test lens description The test lens for this study was a double-convex spherical lens by Edmund Optics (part no. NT45-891), with a diameter of Dlens = 25 mm and radius of curvature of R1 = -R2 = 76.66 mm. According to ZEMAX, the exit pupil diameter and working f-number was Dxp = 25.39 mm and f/#w = 3.471, respectively (Table 6.8). It was imperative that the image-space NA of the test lens (NA = 0.14) was less than that of the imaging lens to avoid information loss. The position of the paraxial focal plane was z = zf = 90.83 mm, where the z = 0 plane contained the exit pupil, and the distance between marginal and paraxial focus was 4.0 mm (Fig. 6.23). Table 6.8: System data provided by ZEMAX™ for the spherical test lens at λ = 0.6328 µm. Effective Focal Length [mm] 74.99634 Back Focal Length [mm] 73.83225 Image Space f/# 2.999853 Paraxial Working f/# 3.576688 Working f/# 3.471054 Image Space NA 0.1384479 Object Space NA 0.02729433 Entrance Pupil Diameter [mm] 25.0 Exit Pupil Diameter [mm] 25.39417 Paraxial Focal Plane Position [mm] 90.82701 264 Fig. 6.23: Focal region of the spherical test lens. Paraxial focal plane is at z = zf = 90.83 mm. The ZEMAX calculations of the Fringe Zernike coefficients (N = 37) in the exit pupil of the lens are provided in Table 6.9. As before, the coefficients for nonrotationally-symmetric terms are zero. With a smaller peak-to-valley wavefront error of 30.6λ, we anticipated less stringent requirements on the pupil sampling in our propagation algorithm. A map of the wavefront error according to the design parameters is shown in Figure 6.24. 265 Table 6.9: Fringe Zernike coefficients {αn, n = 1,…, 37}, peak-to-valley, RMS, and variance, provided by ZEMAX for the spherical test lens at λ = 0.6328 µm. Unlisted coefficients are zero. Units are in waves λ. Index Aberration Type [λ λ] Design Value [λ λ] 1 Piston 10.15431626 4 Defocus 15.26099035 9 Spherical Aberration, Primary 5.12576917 16 Spherical Aberration, Secondary 0.01863364 25 Spherical Aberration, Tertiary -0.00048193 36 Spherical Aberration, Quaternary -0.00002109 37 Spherical Aberration, 12th-order term -0.00000061 Peak-to-valley [λ] N/A 30.55920575 RMS [λ] N/A 8.99563403 Variance [λ2] N/A 80.92143161 266 Fig. 6.24: Theoretical wavefront error in the exit pupil of the spherical lens as a function of normalized radius. Units are in waves. 6.4.3 Experimental data Figure 6.25 displays the physical data that we used as input to our optimization algorithm, consisting of two image planes just before paraxial focus, where the scale bar corresponds to the intermediate image plane between the test lens and objective lens. In Section 6.4.8, we discuss the estimation of nuisance parameters in the system, namely, the image plane locations and the true magnification of the objective lens. Since the computer simulations in Sections 6.4.4 – 6.4.7 were rerun after obtaining these estimates, we will simply quote the results here: z1 = zf + ∆z1 = zf − 0.6745 mm, M1 = 39.96, (6.13a) z2 = zf + ∆z2 = zf − 0.7945 mm, M2 = 40.09, (6.13b) 267 where ∆z is the distance from paraxial focus and M is the magnification. Fig. 6.25: Experimental data for the spherical test lens for image planes: (a) z = z1 and (b) z = z2. Scale bar corresponds to the intermediate image plane just before the imaging lens. 6.4.4 Pupil sampling As previously mentioned, the required amount of pupil sampling is generally proportional to the degree of aberrations in the exit pupil, while other factors to consider are the curvature of the unaberrated reference wave and the distance of the output plane from optimal focus. Simulated detector data for various sampling levels, P = 128, P = 256, and P = 1024, are provided for both output planes in Figures 6.26 and 6.27. The ratio P/F was fixed at 1/4 for all plots, resulting in a detector pitch of roughly 0.56 µm. Sampling artifacts are quite evident in the data for P = 128, F = 512 (Figs. 6.26c & 6.27c), as the 268 irradiance pattern has a cogwheel appearance. There are no apparent signs of undersampling for P = 256, F = 1024 (Figs. 6.26b & 6.27b), and the irradiance is smooth and seemingly physical. Since the irradiance is unaffected by further increase in sampling to P = 512, F = 2048, we decided on P = 256 for this particular study. Fig. 6.26: Detector data at z = z1 for spherical lens using pupil sampling of: (a) P = 512, (b) P = 256, and (c) P = 128. Fig. 6.27: Detector data at z = z2 for spherical lens using pupil sampling of: (a) P = 512, (b) P = 256, and (c) P = 128. 269 6.4.5 Huygens’ method vs. Fresnel propagation Since this estimation task dealt with physical data, accurate forward modeling of the system was imperative. Here we compare the Fresnel approximation, or FFT method, with a brute-force evaluation of the Huygens wavelet formula in (6.1), which we will treat as the gold standard. We computed the detector data in both output planes for the spherical lens using the two methods (Figs. 6.28 & 6.29). Each irradiance pattern was normalized to unity area, consistent with conservation of power. Displayed in Figures 6.28c and 6.29c are the differential images, indicating modest peak discrepancies of 1.5% for z = z1 and 1.7% for z = z2. Despite the slight variation in irradiance between the methods, there is a tremendous difference in computation time. With an NVIDIA Tesla C1060, the computation time per output plane with the FFT method for a pupil sampling of P = 256 and FFT grid size of F = 1024 was TFFT = 154 ms, independent of the number of useful elements we choose to extract from the 1024 × 1024 FFT output. Table 7.10 provides the computation time using the Huygens integral (Tint) for the same P, but various detector grid sizes M1D. The ratio Tint TFFT is 377 for M1D = 128 to an astounding 2.42×104 for M1D = 1024. Although M1D = 128 is large enough to include the irradiance pattern for the wavefront parameters in Table 6.9, it may not be large enough as the parameter space is explored. Regardless, all of the computation times listed below are prohibitively long for any extensive optimization routine. 270 Table 6.10: Computation time using Huygens’ method for a pupil sampling of 256 × 256 and various detector grid sizes. P M1D Tint [sec] Tint [min] Tint TFFT 256 1024 3728 62.1 2.42×104 256 512 932 15.5 6.05×103 256 256 233 3.88 1.51×103 256 128 58.1 0.968 3.77×102 271 Fig. 6.28: Irradiance data at z = z1 for the spherical lens: (a) Fresnel approximation, (b) Huygens integral, (c) difference. 272 Fig. 6.29: Irradiance data at z = z2 for the spherical lens: (a) Fresnel approximation, (b) Huygens integral, (c) difference. 273 6.4.6 Fisher information and Cramér-Rao lower bounds We computed the FIM for the Fringe Zernike coefficients to be estimated, {αn, n = 2,…, 9, 16}, evaluating it at the point in parameter space corresponding to the design parameters in Table 6.9. As in Section 6.3.2, we based the FIM on an i.i.d. Gaussian noise model and a peak SNR of 104. The FIM (Fig. 6.30) structure is comparable to that for the highly aberrated lens, including strong coupling among the rotationally-symmetric terms, α4, α9, and α16. From the inverse matrix (Fig. 6.31), we determined the CRB for the set of parameters (Table 6.11), which again takes on diminutive values. Thus, the variance from detector noise is unlikely to preclude high-precision estimates. Fig. 6.30: FIM for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens (log scale). 274 Fig. 6.31: Inverse of the FIM for Fringe Zernike coefficients α2–α9 and α16 in the exit pupil of the spherical test lens (log scale). Table 6.11: Square-root of the CRB for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens at λ = 0.6328 µm. Units are in waves λ. Index Design Value [λ λ] (CRB)1/2 [λ λ] 2 0 8.3 × 10-9 3 0 8.3 × 10-9 4 15.26099035 4.2 × 10-9 5 0 3.9 × 10-9 6 0 3.9 × 10-9 7 0 5.7 × 10-9 8 0 5.7 × 10-9 9 5.12576917 3.5 × 10-9 16 0.01863364 2.4 × 10-9 275 6.4.7 Likelihood surfaces We greatly expanded the search space in this study, compared to the numerical study, which revealed many new and interesting characteristics of the likelihood surface. The adjusted ranges are listed in Table 6.12 and several examples are shown in Figures 6.32 – 6.36. Table 6.12: Range in likelihood surface plots for Fringe Zernike coefficients {αn, n = 2,…, 9, 16} in the exit pupil of the spherical test lens. Units are in waves λ. Index Design value [λ λ] Range 2 0 ± 10λ 3 0 ± 10λ 4 15.26099035 ± 20λ 5 0 ± 5λ 6 0 ± 5λ 7 0 ± 5λ 8 0 ± 5λ 9 5.12576917 ± 10λ 16 0.01863364 ±λ 276 Fig. 6.32: Likelihood surface along α4 (defocus) and α16 (secondary spherical aberration) axes for the spherical test lens. Fig. 6.33: Likelihood surface along α7 (primary coma, x-axis) and α9 (primary spherical aberration) axes for the spherical test lens. 277 Fig. 6.34: Likelihood surface along α2 (tilt, x-axis) and α4 (defocus) axes for the spherical test lens. 278 Fig. 6.35: Likelihood surface along α5 (primary astigmatism at 0°) and α16 (secondary spherical aberration) axes for the spherical test lens. Fig. 6.36: Likelihood surface along α2 (tilt, x-axis) and α7 (primary coma, x-axis) axes for the spherical test lens. 6.4.8 Nuisance parameters In Section 2.7, we discussed various methods for dealing with nuisance parameters, which are defined as parameters that affect the data and are therefore fundamental to the probability model, but are not useful to the estimation task. Suppose α denotes the wavefront parameters of interest, β denotes the wavefront parameters of no immediate interest, and χ denotes all other nuisance parameters, so that the entire vector parameter can be written as θ = (α, β , χ ) t. Therefore, α is comprised 279 of Zernike coefficients {αn, n = 2,…, 9, 16}, β is comprised of all remaining coefficients, and χ consists of unknown system parameters to be discussed next. We dealt with β by replacing it (up to the ZEMAX limit of N = 37) with the respective design coefficients in Table 6.9, denoted as β0, then separately estimated χ prior to the primary estimation task. Finally, we set pr (g | α, β , χ ) ≈ pr (g | α, β0 , χˆ ) and proceeded to estimate α. Two of the critical nuisance parameters in the system were the absolute image plane positions, z1 = zf + ∆z1 and z2 = zf + ∆z2. During acquisition of the detector data, it was difficult to accurately pinpoint the paraxial focal plane, z = zf , which served as the reference plane in our studies. Another nuisance parameter was the true magnification of the microscope objective just before the detector, denoted as M1 and M2 for the first and second image, respectively. To achieve the design magnification of 40X, the detector must be placed 160 mm from the rear principal plane of the objective lens, however, we had no knowledge of the exact location of this plane. For the ith output plane, we estimated zi and Mi from the respective detector data in Figure 7.25 by performing a straightforward 2D grid search. During this step, we fixed all wavefront parameters, α and β, to the design parameters, α0 and β0, and computed the cost function for both image planes assuming a Gaussian noise model (Figs. 6.37 & 6.38). Prior to generating these plots, we first did a broader search for local minima in the region of interest, but did not find any. The final nuisance parameter estimates are given by 280 ẑ1 = 90.153 mm, M̂1 = 39.96, (5.14a) ẑ2 = 90.033 mm, M̂ 2 = 40.09, (5.14b) Since the detector element size is 4.4 µm, the effective pixel size after magnification is ∆xd = 0.1101 µm and ∆xd = 0.1098 µm. Fig. 6.37: Determining the nuisance parameters in the system for image plane z = z1 via a 2D grid search prior to the estimation of wavefront parameters. 281 Fig. 6.38: Determining the nuisance parameters in the system for image plane z = z2 via a 2D grid search prior to the estimation of wavefront parameters. 6.4.9 Maximum-likelihood estimates To be able to compare the physical data with the output of the optical design program during optimization, we first interpolated the data, so that its coordinates matched that of the FFT output. Then we normalized the irradiance pattern in both the data and FFT output, which is analogous to normalizing the power of the source beam. We minimized the cost function for an i.i.d. Gaussian noise model, given by (2.72), using the broad parametric ranges in Table 6.12 and the following tuning parameters: τ 0 = 103, δ = 1.0, Nδ = 5, 282 NS = 10, c = 2.0, NT = 25, rT = 0.90 v 0 = 0.5 (θ upper − θlower ) , The starting point θ 0 in the search space was based on the design values for the Zernike coefficients, since this offered the best possible initial guess. Figure 6.39 illustrates the optimal cost function versus number of iterations for 12 trials, each with a different noise realization in the data. The average final cost function was 0.150, while the average number of temperature phases was 105, corresponding to a temperature of 0.0174. Fig. 6.39: 12 simulated annealing trials for the estimation of wavefront parameters in the exit pupil of the spherical test lens. 283 In each temperature phase, there were NI = NS × NT = 250 iterations, equaling 2.25×103 forward propagations for 9 parameters. With P = 256 and F = 1024, the computation time was 308 ms per forward propagation. There was an average of 105 temperature phases per trial, corresponding to 20.2 hours of computation time with a single NVIDIA Tesla C1060. On a Tesla C2075 model, the computation time per trial would be 3.1 hours, and clusters of GPUs could be used as well. Final average estimates and standard deviations for Zernike coefficients {αn, n = 2,…, 9, 16} are given in Table 6.13. Without knowledge of the true values or a gold standard, we have no basis for evaluating biases in the estimates. In theory, estimator bias should be insignificant as long as the global minimum of the cost function can be found, while systematic bias is likely to arise from errors in the measurement plane positions or the source position. Still, the estimated values are within one standard deviation from the design values. Once again, large departures from the CRB must have resulted from the numerous local basins in the cost function. 284 Table 6.13: ML estimates of wavefront parameters for the spherical test lens at λ = 0.6328 µm, including their standard deviations. Design values were used as a starting point in the search. Units are in waves λ. Index Aberration Type Design Value [λ λ] ML Estimate [λ λ] 2 Tilt (x-axis) 0 0.10 ± 0.30 3 Tilt (y-axis) 0 -0.13 ± 0.28 4 Defocus 15.26099035 15.57 ± 0.64 5 Astigmatism, Primary (0° or 90°) 0 -0.09 ± 0.19 6 Astigmatism, Primary (±45°) 0 0.13 ± 0.26 7 Coma, Primary (x-axis) 0 0.08 ± 0.18 8 Coma, Primary (y-axis) 0 -0.11 ± 0.23 9 Spherical Aberration, Primary 5.12576917 4.92 ± 0.32 16 Spherical Aberration, Secondary 0.01863364 0.0196 ± 0.0014 285 Fig. 6.40: Comparison between the true and estimated irradiance patterns for the spherical test lens. 6.5 Summary of Chapter 6 In both the numerical and experimental studies, the data-acquisition method involved multiple irradiance patterns collected near the focus of an optical system for the purpose of estimating the pupil phase distribution. We considered various approaches for a suitable propagation algorithm to accurately model the wave propagation and developed 286 the mathematical framework for an aberrated wave emerging from the exit pupil, where we parameterized the wavefront using expansion functions, particularly, Fringe Zernike polynomials. To substantially reduce the computation time, we implemented parallel processing with a state-of-the-art GPU. We obtained proof-of-principle results in both simulation and experiment. In each study, we evaluated the sampling requirements and verified that significant pupil sampling is needed for large wavefront errors. Fisher information matrices featured prominent coupling among specific groups of parameters, such as the group containing only rotationally-symmetric Zernike terms. The associated Cramér-Rao bounds were incredibly small, thereby permitting high-precision estimates, although this generally requires an accurate forward model and a search algorithm that can reliably locate the global minimum of the cost function. After discovering numerous local extrema in the likelihood surfaces, we chose simulated annealing for the estimation of selected Zernike coefficients. In the numerical study with the highly aberrated lens, the estimate biases were negligible, probably because of the lack of systematic errors in the estimation procedure. Although the variances were fairly small, they were far from the CRB, certainly due to entrapment in local basins in the cost function. In the experimental study, we used a benign test lens with significantly less aberrations and a larger working f-number of f/#w = 3.47. Since the imaging (i.e., relay) lens was much faster with f/# ≈ 0.53 (NA = 0.95), it should not have resulted in information loss from the suppression of high spatial frequencies. However, this may 287 become an issue if the f-number of the test lens is low enough. To avoid having to include the imaging lens in the forward model here, one solution is to use the translation stage to place a ground glass in any plane where an irradiance measurement is desired, then relay the incoherent irradiance, instead of the field, to the detector (Fig. 8.1). Note that it is necessary to have a rotating ground glass to decorrelate the resulting speckle noise. An alternative to a rotating diffuser is a liquid-crystal diffuser operated in a dynamic scattering mode. Fig. 6.41: Data-acquisition system for collecting multiple irradiance patterns near the focus of an optical element, including a movable diffuser and imaging lens. Experimentally, we were challenged with the absolute requirement of an accurate probability model, as well as the presence of nuisance parameters in the system. We verified the accuracy of the Fresnel approximation in our forward model compared to the Huygens wavelet formula. Nuisance parameters included image plane positions and the 288 true magnification of the imaging lens. We dealt with them by means of a 2D grid search and located a single extremum for each image plane. The estimate variances were comparable to those in the simulation study, and each estimate did not exceed one standard deviation of the design values. 289 CHAPTER 7 INVERSE OPTICAL DESIGN FOR OPTICAL TESTING Inverse optical design provides a unique approach to testing graded-index and aspheric lenses to ensure that they have been fabricated to specification. In our method of optical testing via parametric modeling, the parameters to be estimated may include coefficients in the refractive index distribution of GRIN lenses or coefficients describing the highorder surfaces of precision aspheres. We present results from numerical studies for both types of lenses. Section 7.1 is devoted to aspheric lenses and Section 7.2 to GRIN lenses. In Section 7.1, we discuss our rapid ray-tracing algorithm that was developed in CUDA on the GPU platform. In Section 7.2, we outline the theoretical framework for tracing rays through GRIN-rod lenses, which involves analytic solutions to the eikonal equation. In both cases, we provide Fisher information matrices, Cramér-Rao bounds, and likelihood surfaces. As usual, we provide maximum-likelihood estimates obtained with simulated annealing. 7.1 Inverse optical design of aspheric lenses The primary objective in this particular application is to process high-order aspheric surfaces by means of ray-tracing. One problem, however, is that the ray-surface intercepts cannot be determined analytically for surfaces beyond fourth-order, so that 290 iterative techniques must be implemented (Blinn, 2006, 2006a). We begin this section with a detailed description of an iterative algorithm used in our optical design program. 7.1.1 Optical-design program We developed a rapid ray-trace algorithm in CUDA that performs non-paraxial raytracing through high-order aspheric surfaces. The ray-surface intersection is determined iteratively through a marching-points algorithm for root-isolation and repeated bisections for root-refinement. Description of refracting high-order aspheric surfaces We assume the following expression to describe a high-order even asphere: z= r2 R 1+ 1 − (1 + κ ) r 2 R2 + α 4 r 4 + α 6 r 6 + α 8 r 8 + α10 r 10 + ... , (7.1) where z is the sag of the surface, r 2 = x 2 + y 2 . S (r , z ) = (1 + κ )[ z − (α 4 r 4 + α 6 r 6 + α 8 r 8 + α10 r 10 )]2 − 2 R[ z − (α 4 r 4 + α 6 r 6 + α 8 r 8 + α10 r 10 )] + r 2 , (7.2) 291 Determining the ray-surface intersection Singh and Narayanan (2007) developed a simple ray-tracing algorithm for ray-tracing general implicit surfaces that is well suited to the SIMD architecture of the GPU. We applied this method to aspheric surfaces described by (7.2) and found that it delivers high performance through robust root-finding. While analytical solutions are possible for polynomials of fourth-order or lower, roots of higher-order polynomials must be determined iteratively. There are many rootfinding techniques that are popular for ray-tracing, such as the Newton-Raphson, Newton-Bisection, and Laguerre methods (Kajiya, 82; Press, Teukolsky, Vetterling, & Flannery, 1992; Wyvill & Trotman, 1990), or extensions of these methods that integrate interval arithmetic (Duff, 1992; Mitchell, 1990). Other techniques for finding roots of polynomials incorporate auxiliary polynomials (Sederberg & Change, 1993) or Sturm sequences (Nister, 2004). However, many of these methods are difficult to implement with the SIMD model and can be quite complicated for higher-order surfaces; quick, simple computations perform best on the GPU (Singh & Narayanan, 2007). In the method that we adopted, a simple marching-points scheme is used to isolate the smallest positive root for a given ray, whereby a ray is sampled at consecutive points and the first bracket containing a root is returned. A straightforward test for rootcontainment is applied. Root-refinement is achieved with repeated bisections, which recursively splits the bracket into sub-intervals and keeps the first one containing a root. Prior to root-finding, the surface S ( x, y, z ) = 0 is cast to the form 292 F f (k ) = 0 (7.3) by substituting the ray-equation (5.17) for an arbitrary surface-fragment f, where k is the ray-parameter. Recall that a ray is defined as x(k ) = x 0 + kx d . (7.4) An alternative to computing the univariate polynomial (7.3) for a given k is to evaluate the points according to (7.4), then substitute the values into the surface equation, which now becomes S ( x, y, z ) = S (x(k )) = 0 . (7.5) The computational implications can vary greatly between the two approaches, and for higher-order polynomials, the expression F f (k ) usually has a large number of terms. Choice of which expression to evaluate must be determined for a particular surface. In the root-isolation stage, bounds on the total search range [ks, ke] for the rays are initially specified and then the range is divided into N equal intervals. The interval endpoints for a given ray are evaluated one-by-one until a root is found, that is, until F f (k ) crosses zero between two neighboring points. If this occurs at the ith iteration, then the algorithm returns [ki, ki +1] as containing a root. A sign test is used to check for root-containment, where a root exists in the ith interval if the function changes sign between the endpoints: 293 S ( p(k i )) ∗ S ( p(ki +1 )) < 0 , root exists, (7.6a) S ( p(k i )) ∗ S ( p(ki +1 )) > 0 , root does not exist, (7.6b) Although this test does not produce false roots, it may miss roots if an interval contains an even number of roots (Singh & Narayanan, 2007). In the bisection method for root-refinement, the bracket [ki, ki +1] is divided into two sub-intervals, [ki, km] and [km, ki +1], using the midpoint km. The first sub-interval containing a root is identified, then the process is repeated until the maximum number of bisections is reached or a tolerance condition is met. This method is robust and never fails, as long as the bracketing is correct. Since the function F f (k ) is crudely evaluated only at the sample points, the computation complexity is very low. Provided that the total number of point evaluations is roughly the same for all rays, the algorithm fits the parallel architecture of the GPU very well. When this is not the case, Singh and Narayanan (2007) propose an adaptive marching-points scheme that samples each ray non-uniformly based on the algebraic distance to the surface, as well as the angle relative to the surface normal. This optimizes the performance of the GPU by balancing the computation load across threads, but adds to the computational burden and involves a derivative computation. For the precision asphere described in Table 7.1, the expense outweighs the benefit, so we used the basic marching-points approach. 294 Direction of the transmitted ray As discussed in Section 5.2, the direction of the refracted ray x̂′d is determined by (5.21): xˆ ′d = xˆ d 2 n [xˆ d − (xˆ d ⋅ nˆ ) nˆ ] + 1 + n [(xˆ d ⋅ nˆ ) 2 − 1] nˆ . n′ n′ (7.7) The unit vector n̂ , generally given by (5.16), for the 10th-order precision asphere has components ∂S ( x, y, z ) = 2 x − 4 Bx[(1 + κ )( z − A) − R] , ∂x (7.8a) ∂S ( x, y, z ) = 2 y − 4 By[(1 + κ )( z − A) − R] , ∂y (7.8b) ∂S ( x, y, z ) = 2 [(1 + κ )( z − A) − R ] , ∂z (7.8c) A(r ) = α 4 r 4 + α 6 r 6 + α 8 r 8 + α10 r 10 , (7.9) B(r ) = 2α 4 r 2 + 3α 6 r 4 + 4α 8 r 6 + 5α10 r 8 . (7.10) where 295 7.1.2 Test lens description and system configuration We simulated detector data for a precision plano-convex aspheric lens with parameters based on Edmund Optics Precision Asphere NT47-731 (Table 7.1). The parameters we chose to estimate in this study, RC, κ, α4, and α6, contain small deviations from the design values of this lens. Table 7.2 provides the system data computed by ZEMAX, where the positions of the entrance and exit pupils are measured from the first surface of the lens. Table 7.1: True values of parameters underlying the irradiance data for the precision asphere, and design values of Edmund Optics Precision Asphere NT47-731. Parameter Units Design value True value Radius of curvature, RC mm 18.41 18.55 Conic constant, κ N/A -1.607913 -1.737913 4th-order aspheric coefficient, α4 mm-3 2.0634554 × 10-5 2.413455 × 10-5 6th-order aspheric coefficient, α6 mm-5 -7.6489765 × 10-9 -7.438977 × 10-9 8th-order aspheric coefficient, α8 mm-7 1.117573 × 10-11 1.117573 × 10-11 10th-order aspheric coefficient, α10 mm-9 -1.010058 × 10-14 -1.010058 × 10-14 Refractive index at λ = 632.8 nm, n N/A 1.58708982 1.58708982 Center thickness, t mm 6.50 6.50 296 Table 7.2: System data provided by ZEMAX™ for the precision asphere at λ = 0.6328 µm. Effective Focal Length [mm] 31.59653 Back Focal Length [mm] 31.59653 Image Space f/# 1.579826 Paraxial Working f/# 4.432388 Working f/# 19.32893 Image Space NA 0.112095 Object Space NA 0.1995864 Entrance Pupil Diameter [mm] Entrance Pupil Position [mm] 20 4.095546 Exit Pupil Diameter [mm] 20 Exit Pupil Position [mm] 6.5 Our model incorporated two point sources, one on-axis and the other displaced by y = 10 mm from the optical axis (z-axis), while both were placed 45 mm before the test lens along the optical axis (Fig. 7.1). A 20-mm-diameter iris was positioned immediately before the lens. Since larger spot sizes produce greater information yield, we intentionally oriented the lens opposite to the manufacturer-intended orientation, so that the flat surface faced the source. We traced rays through the system for two on-axis image plane positions at z = {95, 100} mm from the lens, plus one off-axis position at z = 90 mm. Our ray-trace results are practically identical to those of ZEMAX (Fig. 7.2). 297 Fig. 7.1: Ray-trace data from our CUDA algorithm for the precision asphere. Fig. 7.2: Ray-trace data computed by ZEMAX for the precision asphere. 298 Using our Gaussian noise model, we emulated electronic noise in the detector data with a modest peak signal-to-noise ratio (SNR) of 103 and a pixel size 12 µm (Fig. 7.3). The irradiance patterns compare very well with the output of ZEMAX (Fig. 7.4). 299 Fig. 7.3: Irradiance data computed at: (a) z = 95 mm after lens for on-axis source, (b) z = 100 mm for same on-axis source, and (c) z = 90 mm for off-axis source. 300 Fig. 7.4: Irradiance data computed with ZEMAX at: (a) z = 95 mm after lens for on-axis source, (b) z = 100 mm for same on-axis source, and (c) z = 90 mm for off-axis source. 301 7.1.3 Fisher information and Cramér-Rao bounds We originally computed the FIM for the parameters appearing in (7.2), including RC, κ, α4, α6, α8, and α10. However, the presence of α8 and α10 results in a singular, noninvertible matrix, since the data are not sensitive to changes in these higher-order coefficients. Although there are methods for dealing with singular information matrices that often involve their pseudoinverses (Hero, Fessler, & Usman, 1996; Rao, 1973), there are no unbiased estimators for the confounding parameters (i.e., α8 and α10) with finite variance (Stoica, 2001). The simplest solution is to simply exclude them from both the FIM and the general estimation procedure. We computed the FIM according to (2.76), as well as its inverse, based our standard Gaussian noise model and a peak SNR of 103 (Fig. 7.5). While the resulting CRBs (Table 7.3) are impressively small, thus indicating substantial information in the system for the selected parameters, it is meaningful only if we can locate the global maximum in the likelihood function. 302 Fig. 7.5: (a) FIM and (b) inverse of the FIM for prescription parameters describing the precision asphere. (logarithmic scale) Table 7.3: Square-root of the CRB for prescription parameters describing the precision asphere. Index Parameter Units True value (CRB)1/2 1 Radius of curvature, RC mm 18.55 1.5×10-7 2 Conic constant, κ N/A -1.737913 1.7×10-7 3 4th-order aspheric coefficient, α4 mm-3 2.413455 × 10-5 2.6×10-12 4 6th-order aspheric coefficient, α6 mm-5 -7.438977 × 10-9 2.8×10-15 7.1.4 Likelihood surfaces Likelihood surfaces for all six pairs of parameters are provided in Figures 7.6 – 7.11, based on the parametric ranges in Table 7.4. Since the ranges are centered on the parameters underlying the data, the global minimum occurs at the center of each plot. 303 Most of the irregularities and local minima occur along the RC axis, as seen in Figures 7.6 – 7.8. (Note that Figure 7.7 is shown on a logarithmic scale to bring out features otherwise suppressed by a very strong and narrow peak.) In contrast, there is barely any variation along the α6 axis (Figs. 7.8, 7.10, & 7.11), which could hinder the estimation process. Finally, there is an interesting dynamic in the pairs (κ, α4) (Fig. 7.9) and (α4, α6) (Fig. 7.11) that leads to the same likelihood plots that we saw in the estimation of ocular parameters, as well as wavefront coefficients, in Chapters 5 and 6. Table 7.4: Range in likelihood surfaces for parameters describing the precision asphere, relative to the true values. Parameter Units True value Range Radius of curvature, RC mm 18.55 ± 3.0 Conic constant, κ N/A -1.737913 ± 5.0 4th-order aspheric coefficient, α4 mm-3 2.413455 × 10-5 ± 5.0 × 10-5 6th-order aspheric coefficient, α6 mm-5 -7.438977 × 10-9 ± 5.0 × 10-9 304 Fig. 7.6: Likelihood surface along RC and κ axes. Global minimum is located at center of plot. Fig. 7.7: Likelihood surface along RC and α4 axes. Global minimum is located at center of plot. (logarithmic scale) 305 Fig. 7.8: Likelihood surface along RC and α6 axes. Global minimum is located at center of plot. Fig. 7.9: Likelihood surface along κ and α4 axes. Global minimum is located at center of plot. 306 Fig. 7.10: Likelihood surface along κ and α6 axes. Global minimum is located at center of plot. Fig. 7.11: Likelihood surface along α4 and α6 axes. Global minimum is located at center of plot. 307 7.1.5 Maximum-likelihood estimates We chose to estimate a subset of the parameters, including the radius of curvature RC, conic constant κ, and the 4th- and 6th-order aspheric coefficients, α4 and α6. Pretending to know nothing of the true values of these parameters, we estimated them according to (2.72) with the following tuning parameters in our simulated annealing algorithm: τ 0 = 103, δ = 1.0, Nδ = 5, NS = 10, c = 2.0, NT = 20, rT = 0.90, v 0 = 0.5 (θ upper − θlower ) . Bounds on the parameters during the search process are based on the ranges in Table 7.4. To assess the variance in our estimates, we computed 20 data sets for the given set of parameters in Table 7.1, where each data set represented a different noise realization (Fig. 7.12). We rescaled the cost function to equal unity at the true minimum. The average final cost and number of temperature phases were 1.0908 and 65.3, respectively, where the latter corresponds to a final temperature of 1.1790. 308 Fig. 7.12 : 20 simulated annealing trials for the estimation of prescription parameters describing the precision asphere. Both the estimate bias and variance are very small for each parameter (Table 7.5), except for the broad variance in α̂ 6 . This is not unexpected, since the likelihood surface hardly varies along the α6 axis. Conversely, the radius of curvature RC demonstrated the best performance, with accuracy to 0.1 µm, despite the complicated features in the likelihood surface along this axis. 309 Table 7.5: ML estimates of prescription parameters describing the precision asphere, including standard deviations. Design values were used as a starting point in the search. Parameter Units True value ML estimate Radius of curvature, RC mm 18.55 18.5500 ± 0.0004 Conic constant, κ N/A -1.737913 -1.738 ± 0.008 4th-order aspheric coefficient, α4 mm-3 2.413455 × 10-5 (2.41 ± 0.02) × 10-5 6th-order aspheric coefficient, α6 mm-5 -7.438977 × 10-9 (-7.46 ± 0.41) × 10-9 When a bundle of rays is launched for a particular source location, the rays are propagated through the system in parallel on the GPU’s architecture. Using an NVIDIA Tesla C1060 graphics card, it takes 0.65 sec to compute detector data for a 1024×1024 bundle of rays, equivalent to approximately 0.6 µsec per ray. Thus, the computation time for one forward propagation (or two source positions) is roughly 1.3 sec. There were NI = NS × NT = 200 iterations or 800 forward propagations per temperature phase, which takes 17.3 min. So, the total computation time for 65 phases is 18.7 hrs. In applications where ray optics does not provide an accurate representation of the irradiance data, the ray-trace code can be modified to keep track of the optical path length (OPL) along a ray and then construct a wavefront perpendicular to all of the rays in a specified reference plane, by some algorithm. The wavefront can then be propagated to the final image plane by FFT propagation. Such a hybridized approach would relieve some of the computational burden in using solely diffraction propagation. One caveat in 310 this process is that the ray bundle is no longer uniformly distributed in the reference plane, so an interpolation scheme must be applied prior to the FFT. This is non-trivial, as standard interpolators convert a regular grid to an irregular one, but not the other way around. 7.2 Inverse optical design of GRIN-rod lenses Ray-tracing through GRIN lenses requires solutions of the eikonal equation, discussed in Section 4.3.1. Finding such solutions is generally complicated, except for a few simple textbook problems, while exact solutions do not exist for GRIN lenses. We begin by presenting approximate analytic solutions to the equation, which we used in our optical design program. 7.2.1 Ray-tracing through a GRIN-rod lens Toyokazu Sakamoto (1993, 1995) worked out analytic solutions of the eikonal equation for both meridional and skew rays using the perturbation method of Streifer and Paxton (1971). Since we are interested in systems with rotational symmetry, we will outline the theoretical framework for meridional rays, as presented by Sakamoto (1993). A basic assumption in the analytic solutions is that the refractive index distribution n(r) of a GRIN-rod lens can be represented as n 2 (r ) = n02 [1 − ( gr ) 2 + h4 ( gr ) 4 + h6 ( gr ) 6 + h8 ( gr )8 + ...] , (7.11) 311 where r is the distance from the optical axis z, n0 is the refractive index along the axis, g is the focusing parameter, and h4, h6, and h8 are the fourth-, sixth-, and eighth-order refractive index solutions, respectively. Equation (7.11) assumes perfect rotational symmetry and an index profile independent of z, which may not be realistic in optical testing, but is a start for this proof-of-principle study. Ray equation The ray equation (4.61) for a medium with refractive index n(r ) is given by d dr n ( r ) = ∇n ( r ) , ds ds (7.12) where r2 = (x2 + y2), ds is the line element along a ray path s, and r = ( x, y, z ) is the 3D position vector for an arbitrary point on the ray. Since n(r ) is independent of z, the z component of (7.12) can be written as n( r ) dz = ni cos γ i , ds (7.13) where ni and cos γ i are the refractive index and z component of the directional cosine, respectively, at the initial ray position (xi, yi, 0). Combining (7.13) and (7.12) leads to the ray equation for a meridional ray in a GRIN-rod lens, 312 d2 x 1 ∂n 2 (r ) = . dz 2 2ni2 cos 2 γ i ∂x (7.14) Eikonal equation The eikonal equation (4.58) for a medium with refractive index n(r ) can be rewritten as 2 dS ds = [n(r )]2 , (7.15) where S = S (r ) is the eikonal or optical path length. Inserting (7.13) into (7.15) gives dS n 2 (r ) = . dz ni cos γ i (7.16) Variable transformations The following variable transformations are made for mathematical ease: ni cos γ iζ = n0 gz , (7.17a) n0 g 2W = gω − n0ζ . (7.17b) Equations (7.11) and (7.17) can be used to express the ray equation as a second-order differential equation in x alone, 313 d2 x + x = 2h4 g 2 x 3 + 3h6 g 4 x 5 + 4h8 g 6 x 7 + ... , 2 dζ (7.18) while an alternative expression for the eikonal equation is dW + x 2 = h4 g 2 x 4 + h6 g 4 x 6 + h8 g 6 x 8 + ... . dζ (7.19) Perturbation method For many GRIN-rod lenses, ( gx) 2 << 1 , so that g 2 can be treated as a perturbation parameter. To begin, the ray path x(z ) and the parameter W (z ) are respectively modeled as x( z ) = x0 + g 2 x1 + g 4 x2 + g 6 x3 + ... , (7.20a) W ( z ) = ∆ + W0 + g 2W1 + g 4W2 + g 6W3 + ... , (7.20b) so the zeroth-order perturbation solutions, x0 and W0, are given by x0 = a cosψ , W0 = − where a2 a2 ψ − sin2ψ , 2 4 (7.21a) (7.21b) 314 ψ = ωζ + ψ i , (7.21c) ω = 1 + g 2ω1 + g 4ω 2 + g 6ω3 + ... . (7.21d) Initial conditions specify the constants a, ψi, and ∆, and the removal of a secular term determines the frequency correction terms, ω1, ω2, and ω3. By substituting (7.20) and (7.21) into (7.18) and (7.19), then matching terms of the same order in g2, the first-order approximation of g2 leads to the following coupled differential equations: d 2 x1 + x1 = 2ω1 x0 + 2h4 x03 , 2 dψ (7.22a) dW0 dW1 + ω1 = −2 x0 x1 + h4 x04 . dψ dψ (7.22b) Inserting (7.21a) into (7.22a) and applying a Fourier-series expansion to the right-hand side gives d 2 x1 1 3 + x1 = 2ω1 + a 2 h4 a cosψ + a 3 h4 cos 3ψ . 2 2 2 dψ (7.23) 315 The secular term in (7.23) (i.e., the cosψ term) is the one responsible for boundless growth as ψ increases, and eliminating it allows the approximation to hold for all ψ. Removing this term requires that 3 4 ω1 = − a 2 h4 , (7.24) which can be substituted into (7.22a) , along with x0, to obtain x1 = − 1 24 a 2 h4 cos 3ψ . (7.25a) W1 is determined by substituting the zeroth- and first-order perturbation solutions in (7.22b): W1 = 1 4 a (6h4 sin 2ψ + 3h4 sin 4ψ ) . 26 (7.25b) The procedure for finding the third-order solutions is too lengthy and complicated to reproduce here, but can be found in Appendix A of Sakamoto (1993). Ray path Based on the perturbation solutions up to third-order, the ray path x(z) for meridional rays is expressed as 316 3 gx( z ) = ga cosψ + ∑ A2 j +1 cos(2 j + 1)ψ , (7.26) j =1 where the coefficients, A3, A5, and A7, are given by A3 = − A5 = ( ga) 3 24 h4 − ( ga) 5 28 (21h42 + 30h6 ) − ( ga) 7 212 (417h43 + 984h4 h6 + 672h8 ) , (7.27a) ( ga)5 2 ( ga) 7 224 (43h43 + 24h4 h6 − h ), (h4 − 2h6 ) + 3 8 8 12 2 2 A7 = − ( ga) 7 212 (h43 − 6h4 h6 − 16 h ). 3 8 (7.27b) (7.27c) A similar equation can be determined for the optical path length S(z), however, we are only interested in x(z) for now. Initial conditions As previously mentioned, the initial conditions of the ray and eikonal equations specify the constants a, ψi, and ∆. Since ∆ is primarily related to S(z), it will not be discussed here. The necessary initial conditions are the initial position and slope: x(0) = xi , (7.28a) dx(0) = tan γ i . dz (7.28b) 317 Note that the period of oscillation in x(z) is governed by the initial ray angle. Inserting (7.28) into (7.26) leads to these coupled equations: 3 gxi = ga cosψ i + ∑ A2 j +1 cos(2 j + 1)ψ i , (7.29a) 3 ni sin γ i = −ω ga sin ψ i + ∑ (2 j + 1) A2 j +1 sin(2 j + 1)ψ i . n0 j =1 (7.29b) j =1 To express a and ψi as functions of xi and γi, the following expressions are assumed: a = a0 + g 2 a1 + g 4 a2 + g 6 a3 + ... , (7.30a) ψ i = ψ 0 + g 2ψ 1 + g 4ψ 2 + g 6ψ 3 + ... . (7.30b) Substituting (7.30) into (7.29) and again matching terms of the same order in g2 leads to the following coupled equations: (zeroth order) a0 cosψ 0 = xi , a0 sinψ 0 = − ni sin γ i , n0 g (7.31a) (7.31b) 318 (first order) a1 = a 03 24 ψ1 = (6h4 − 4h4 cos 2ψ 0 − h4 cos 4ψ 0 ) , (7.32a) a 02 (7.32b) 24 (8h4 sin 2ψ 0 + h4 sin 4ψ 0 ) . 7.2.2 Test lens description Prescription parameters of the GRIN-rod test lens used in this numerical study are summarized in Table 7.6, where the refractive index distribution n(r) is specified up to fourth-order by n0, g, and h4. Figure 7.13 provides a 2D plot of n(r), indicating a 0.20% change in index between the center and the periphery. Table 7.6: Design parameters of the GRIN-rod test lens at an arbitrary design wavelength. Included are the distances in the optical system used in the simulations. Parameter Value Index at center, n0 1.4140 Index at edge, n(R) 1.4112 Focusing parameter, g 0.018 mm-1 Fourth-order coefficient, h4 -120 Radius of lens, R 3 mm Length of lens, L 160 mm Distance from source to entrance face 200 mm Distance from exit face to CCD 100 mm 319 Fig. 7.13: Refractive index distribution of the GRIN-rod lens. When the front face of the lens is placed 200 mm from an on-axis point source, it subtends a half-angle of 0.86°. Using (7.26), we traced a 1200 × 1200 bundle of rays through the lens system for this source position and a detector positioned at 100 mm from the exit face. Sample rays are illustrated in Figure 7.14. The irradiance pattern, computed for a pixel size of 25 µm and a peak SNR of 103, features a pronounced central peak and a relatively faint outer ring (Fig. 7.15). 320 Fig. 7.14: Real eikonal rays traced through the GRIN-rod lens. Plot is expanded in the transverse direction to show detail. Fig. 7.15: (a) Irradiance distribution in the detector plane and (b) irradiance profile for the GRIN-rod test lens. 321 7.2.3 Fisher information and Cramér-Rao bounds We computed the FIM and its inverse for the parameters describing the index distribution (i.e., n0, g, and h4), which indicate that the optical system, as simple as it is, contains a wealth of information (Fig. 7.16). The Cramér-Rao bound on the parameters are once again incredibly small, enabling very precise estimates. Fig. 7.16: (a) FIM and (b) inverse of the FIM for the parameters describing the refractive index distribution of the GRIN-rod lens. (logarithmic scale) Table 7.7: Square-root of the CRB for the parameters describing the refractive index distribution of the GRIN-rod lens. Parameter Units True value (CRB)1/2 1 Index at center, n0 N/A 1.4140 9.5×10-7 2 Focusing parameter, g mm-1 0.018 1.7×10-8 3 Fourth-order coefficient, h4 N/A -120 6.3×10-4 No. 322 7.2.4 Likelihood surfaces Figures 7.17 – 7.19 provide likelihood surfaces for the three pairs of parameters. Each plot is centered on the true minimum. While the surfaces are slowly varying along the n0 axis, the behavior along the g axis is quite complicated. However, the SA algorithm should be able to process this cost function with relative ease, based on its prior performance with convoluted, high-dimensional surfaces. Table 7.8: : Range in likelihood surfaces for parameters describing the GRIN-rod lens, relative to the true values. No. Parameter Units True value Range 1 Index at center, n0 N/A 1.4140 ± 0.06 2 Focusing parameter, g mm-1 0.018 ± 0.006 3 Fourth-order coefficient, h4 N/A -120 ± 40 323 Fig. 7.17: Likelihood surface along n0 and g axes. Global minimum is located at center of plot. Fig. 7.18: Likelihood surface along n0 and h4 axes. Global minimum is located at center of plot. 324 Fig. 7.19: Likelihood surface along g and h4 axes. Global minimum is located at center of plot. 7.3 Summary of Chapter 7 Through numerical analysis, we demonstrated how inverse optical design can be used in optical testing to profile high-order aspheric surfaces or to measure the refractive index distribution of GRIN lenses. Although the propagation algorithm consisted of ray-tracing for both cases, the two types of lenses required vastly different approaches. An iterative method was employed to find the precise ray-surface intersection for an aspheric lens, involving a marching-points and repeated-bisections scheme for root-isolation and root-refinement, respectively. We showed that the output of our rapid ray-tracing program compared extremely well with ZEMAX. Conversely, rays were traced through a GRIN-rod lens 325 using analytic but approximate solutions to the eikonal equation based on the perturbation method. In each application, the FIMs indicated high sensitivity of the data to changes in the parameters, thereby resulting in diminutive CRBs, even with a modest peak SNR. The exceptions were the higher-order aspheric coefficients, α8 and α10, which caused a singular, non-invertible information matrix, so we excluded them from the FIM, as well as the estimation task. Since the likelihood surfaces again revealed many local extrema, we implemented global optimization with simulated annealing for the precision asphere. The estimate biases and variances were very small, except for the variance in the sixth-order aspheric coefficient, α6. Averaged over an increasing number of trials, however, the bias for this coefficient approached zero. The next stage in this research is to repeat the estimation procedure on real data. An important objective is to identify deficiencies in the forward model, including additional sources of noise. In cases where ray-tracing is inadequate for representing the the irradiance data, we suggested a combined approach of ray-tracing and diffraction propagation to reduce the computational intensiveness. 326 CHAPTER 8 CONCLUSION AND FUTURE WORK In this dissertation, we presented the theoretical framework and several applications of our basic method of inverse optical design. The results for these applications will be summarized here. In Chapter 5, we presented results for the original application of IOD, which is to estimate the complete set of patient-specific ocular parameters from Shack-Hartmann WFS data. We developed an optical design program that performs non-paraxial raytracing through the quadric surfaces of the eye, which may incorporate surface misalignments. The system configuration involved multiple beam angles to detect onand off-axis aberrations, resulting in reduced parametric coupling, greater Fisher information, and smaller CRBs. One of the key points in our approach is that we do not perform centroid estimation as in classical wavefront sensing, since this results in severe information loss; instead, the raw detector outputs of the WFS are used as input to IOD. Due to the multitude of local extrema in the likelihood surface, we implemented SA search algorithm. The bias and variance for each estimate were very small, giving much hope for success in a real experiment. However, this does not take into account any modeling errors, since the same program that generated the data was also used during the optimization. For this method to succeed with real patient data, there must be very 327 accurate modeling of the extremely complex optical system of the eye. As mentioned in Chapter 5, we did not consider factors such as the optical tear film effect, the GRIN distribution of the crystalline lens, irregularities in the corneal surface, scattering in the ocular media, the Stiles-Crawford effect, and so on. One way to test estimator robustness is to intentionally use a different model during optimization than was used to generate the data, so that we can determine the estimation error due to model deficiencies. For instance, the data may include the GRIN distribution of the lens, while the estimation procedure may involve an equivalent refractive index. Making our method practical in the clinical setting requires rapid processing techniques, especially if the computational time increases due to enhancements in the forward model. The studies in Chapter 5 were performed in the early stages of this project, prior to the use of GPU technology within our research group, so there is still much to explore here. Although the original motivation of our work relates to vision science and ophthalmology, we are also leveraging the basic method in optical shop testing. In Chapter 6, we provided both numerical and experimental results for parameterized wavefront estimation. Here we estimated the pupil phase distribution of an optical system from multiple irradiance measurements near focus, by first parameterizing the wavefront with a set of expansion functions, the Zernike polynomials. We developed a parallel algorithm in CUDA to simulate the wave propagation that takes advantage of the FFT in the Fresnel approximation. The required amount pupil sampling was carefully examined for lenses of different f-numbers and aberration levels. 328 In the numerical study, our method was successful for a test lens with a large peakto-valley wavefront error of 150λ. Both the estimate biases and variances were negligible, although the diminutive CRBs were not achieved. We declared that the only way to attain the CRBs is to locate the global minimum of the cost function, which is plagued with numerous local minima for this application. If the biases or variances are unsatisfactory for a particular purpose, either more time must be spent on searching for the global basin, or a new configuration with less local extrema must be found. In the experimental wavefront estimation study, we used a benign test lens with a larger f-number and much lower aberrations, therefore, the pupil sampling requirement was relatively relaxed and the Fresnel approximation more valid. However, the use of real data created additional challenges, including the presence of nuisance parameters and the stringent requirement of an accurate forward model. The nuisance parameters were the actual magnification of the imaging lens (a microscope objective) and the exact image plane locations. We performed a 2D grid search of the unimodal likelihood surface for these parameters, assuming the design Zernike coefficients for the lens. We examined the accuracy of the Fresnel approximation by comparing the output irradiance patterns to those of the Huygens diffraction integral, and the peak discrepancies were marginally 1.5−1.7% for the output planes of interest. Without knowledge of the true values of the parameters, we were unable to evaluate estimate biases, however, the estimates were within one standard deviation from the design values. Interestingly, the variances were comparable to those in the simulation study, perhaps because the variance is primarily influenced by stagnation in local basins, rather than 329 potential systematic errors or model deficiencies. Model-mismatch effects can be determined by using the Huygens formula to generate the data and the Fresnel FFT method in the estimation procedure. In Chapter 7, we presented numerical results for additional applications in optical testing, which were parametric surface profilometry of precision aspheres and GRIN lens testing. For the former application, we produced a rapid ray-trace algorithm in CUDA that iteratively finds ray-surface intercepts for high-order aspheric surfaces, resulting in a computational time of 0.6 µsec per ray with a single GPU. It might be the case that raytracing does not generate accurate irradiance data, and that diffraction propagation might be necessary when working with real data. A useful model-mismatch study would be to use wave propagation, having greater accuracy, for data generation and ray-tracing during the optimization. Aspheric coefficients beyond sixth-order and larger were excluded from the estimation step, since they did not influence the data and resulted in a singular FIM. Each ML estimate contained a very small bias and variance, apart from the variance of the sixth-order coefficient. For the testing of GRIN-rod lenses, the estimated parameters were the coefficients of the index distribution. We developed a ray-trace program using analytic solutions of the eikonal equation. As usual, we discovered substantial Fisher information and multiple minima in the cost function. A common characteristic of the likelihood surfaces across all applications is the presence of multiple local extrema, so we chose simulated annealing for each estimation procedure. Although SA was very successful in every case, a major drawback of this 330 search algorithm is its slowness. One way to mitigate this is to switch to a local descent algorithm once the system hones in on the final basin. When the temperature is low enough during the search process, the system essentially behaves like straightforward optimization anyway, since it is very unlikely to accept uphill moves. However, SA makes an inefficient local descent algorithm, and it would save time to use a suitable algorithm at this point. A concerted effort was put into the computational aspects of this project, including the development of various optical design programs for implementation on both the CPU and GPU platforms. Rapid processing techniques were fully necessary to address the high computational demands of the maximum-likelihood approach, particularly due to the complicated objective functions encountered in IOD. We also performed extensive statistical analysis with the Fisher information matrix and visualization of the likelihood surfaces, investigating parametric coupling and information content in numerous system configurations through simulation. Furthermore, proof-of-principle studies for the various applications have been primarily computational. Although more work needs to be done in the way of physical experiments, we believe that the computational work and theory have been sufficiently developed, such that the next researcher can readily perform IOD on real data. 331 APPENDIX A FRINGE ZERNIKE POLYNOMIALS Optical imaging systems typically have a circular or annular pupil, as well as an axis of rotational symmetry. There are many applications in which it is useful to expand the wave aberration function of these systems in a power series or a complete set of orthogonal polynomials. Zernike polynomials (Zernike, 1934) are excellent candidates for this task, since they are orthogonal over a circular pupil and represent balanced aberrations with minimum variance. A Zernike polynomial of a particular order in pupil coordinates achieves balanced aberrations by including terms of equal or lower order in a power series expansion, such that the variance is minimized (Born & Wolf, 1999; Mahajan, 2001, 2004). Note that this is different from balanced aberrations that yield minimum variance with respect to ray aberrations (Malacara, 2007). Consider an optical system whose optical axis coincides with the z-axis. Let r be a position vector in the exit pupil, which is orthogonal to the optical axis. Using the standard convention for the polar angle θ , defined as the angle of r with the x-axis, we have x = rcosθ , y = rsinθ , where r = | r | . 332 For the purpose of wavefront estimation, discussed in Chapter 6, we are interested in systems with a circular pupil that are not necessarily rotationally symmetric. The wave aberration function for such a system will consist of terms in both cosmθ and sinmθ, where m ≥ 0, and can be expanded in terms of orthogonal Zernike polynomials Z n ( ρ, θ ) : W ( ρ, θ ) = ∑α n Z n ( ρ, θ ) , n A.1 where αn are the expansion coefficients. Without going into detail on the many mathematical properties of Zernike polynomials, we will simply quote the Fringe Zernike polynomials, developed at the University of Arizona. These are identical to the standard polynomials, except for the indexing format and the order in which they are listed. The expressions are provided in Table A.1 for {n = 1,…, 37}, with the corresponding plots in Figure A.1. Note that these are the orthogonal, not orthonormal, versions of the Fringe Zernike polynomials. 333 Table A.1: Fringe Zernike Polynomials {Zn, n = 1,…, 37}. n Fringe Zernike Polynomial Aberration Type 1 1 Piston 2 ρcosθ Distortion - Tilt (x-axis) 3 ρsinθ Distortion - Tilt (y-axis) 4 2 ρ2 −1 Defocus - Field Curvature 5 ρ 2 cos2θ Astigmatism, Primary (0° or 90°) 6 ρ 2sin2θ Astigmatism, Primary (±45°) 7 (3 ρ 3 − 2 ρ) cosθ Coma, Primary (x-axis) 8 (3 ρ 3 − 2 ρ) sinθ Coma, Primary (y-axis) 9 6ρ4 − 6ρ2 +1 Spherical Aberration, Primary 10 ρ 3cos3θ Trefoil, Primary (x-axis) 11 ρ 3sin3θ Trefoil, Primary (y-axis) 12 (4 ρ 4 − 3 ρ 2 ) cos2θ Astigmatism, Secondary (0° or 90°) 13 (4 ρ 4 − 3 ρ 2 ) sin2θ Astigmatism, Secondary (±45°) 14 (10 ρ 5 − 12 ρ 3 + 3 ρ) cosθ Coma, Secondary (x-axis) 15 (10 ρ 5 − 12 ρ 3 + 3 ρ) sinθ Coma, Secondary (y-axis) 16 20 ρ 6 − 30 ρ 4 + 12 ρ 2 − 1 Spherical Aberration, Secondary 17 ρ 4 cos4θ Tetrafoil, Primary (x-axis) 18 ρ 4sin4θ Tetrafoil, Primary (y-axis) 19 (5 ρ 5 − 4 ρ 3 ) cos3θ Trefoil, Secondary (x-axis) 20 (5 ρ 5 − 4 ρ 3 ) sin3θ Trefoil, Secondary (y-axis) 21 (15 ρ 6 − 20 ρ 4 + 6 ρ 2 ) cos2θ Astigmatism, Tertiary (0° or 90°) 22 (15 ρ 6 − 20 ρ 4 + 6 ρ 2 ) sin2θ Astigmatism, Tertiary (±45°) 23 (35 ρ 7 − 60 ρ 5 + 30 ρ 3 − 4 ρ) cosθ Coma, Tertiary (x-axis) 24 (35 ρ 7 − 60 ρ 5 + 30 ρ 3 − 4 ρ) sinθ Coma, Tertiary (y-axis) 334 25 70 ρ 8 − 140 ρ 6 + 90 ρ 4 − 20 ρ 2 + 1 Spherical Aberration, Tertiary 26 ρ 5 cos5θ Pentafoil, Primary (x-axis) 27 ρ 5sin5θ Pentafoil, Primary (y-axis) 28 (6 ρ 6 − 5 ρ 4 ) cos4θ Tetrafoil, Secondary (x-axis) 29 (6 ρ 6 − 5 ρ 4 ) sin4θ Tetrafoil, Secondary (y-axis) 30 (21ρ 7 − 30 ρ 5 + 10 ρ 3 ) cos3θ Trefoil, Tertiary (x-axis) 31 (21ρ 7 − 30 ρ 5 + 10 ρ 3 ) sin3θ Trefoil, Tertiary (y-axis) 32 (56 ρ 8 − 105 ρ 6 + 60 ρ 4 − 10 ρ 2 ) cos2θ Astigmatism, Quaternary (0° or 90°) 33 (56 ρ 8 − 105 ρ 6 + 60 ρ 4 − 10 ρ 2 ) sin2θ Astigmatism, Quaternary (±45°) 34 (126 ρ 9 − 280 ρ 7 + 210 ρ 5 − 60 ρ 3 + 5 ρ ) cosθ Coma, Quaternary (x-axis) 35 (126 ρ 9 − 280 ρ 7 + 210 ρ 5 − 60 ρ 3 + 5 ρ) sinθ Coma, Quaternary (y-axis) 36 252 ρ10 − 630 ρ 8 + 560 ρ 6 − 210 ρ 4 + 30 ρ 2 − 1 Spherical Aberration, Quaternary 37 924 ρ12 − 2772 ρ10 + 3150 ρ 8 − 1680 ρ 6 + 420 ρ 4 − 42 ρ 2 + 1 Spherical Aberration, 12th order 335 Fig. A.1: Fringe Zernike Polynomials 2-37. 336 APPENDIX B LIST OF ACRONYMS AO : adaptive-optics API : application programming interface CBEA : cell broadband engine architecture CRB : Cramér-Rao lower bound CUDA : Compute Unified Device Architecture DP : double-precision FIM : Fisher information matrix FOV : field-of-view FPGA : field-programmable gate array FWHM : full width at half maximum GPU : graphics processing unit FLOPS : floating-point operations per second GRIN : graded-index i.i.d. : independent and identically distributed IOD : inverse optical design IOL : intraocular lens MAP : maximum a posteriori MCAO : multi-conjugate adaptive optics 337 MCMC : Markov-chain Monte Carlo ML : maximum-likelihood MSE : mean-square error OPL : optical path length PDF : probability density function PR : phase retrieval PSF : point-spread function SA : simulated annealing SCE : Stiles-Crawford effect SHWFS : Shack-Hartmann wavefront sensor SI : International System of Units SIMD : single-instruction multiple-data SNR : signal-to-noise ratio WFS : wavefront sensor 338 REFERENCES Aldrich, J. (1997). “R. A. Fisher and the making of maximum likelihood 1912 – 1922.” Stat. Sci., 12, 162-176. Arfken, G. B. and Weber, H. J. (2001). Mathematical Methods for Physicists, Fifth Edition. Academic Press, San Diego. Artal, P. and Guirao, A. (1998). “Contributions of the cornea and lens to the aberrations of the human eye.” Opt. Lett., 23, 1713-1715. Atchison, D. A. and Smith, G. (1995). “Continuous gradient index and shell models of the human lens.” Vision Res., 35, 2529-2538. Atchison, D. A., Scott, D. H., Joblin, A., and Smith, G. (2000). “Influence of StilesCrawford effect apodization on spatial visual performance with decentered pupils.” J. Opt. Soc. Am. A, 18, 1201-1211. Atchison, D. A. and Smith, G. (2005). “Chromatic dispersions of the ocular media of human eyes.” J. Opt. Soc. Am. A, 22, 29-37. Audet, C. and Dennis, Jr., J. E. (2000). “Pattern search algorithms for mixed variable programming.” SIAM J. Optimiz., 11, 573-594. Audet, C. and Dennis, Jr., J. E. (2003). “Analysis of generalized pattern searches.” SIAM J. Optimiz., 13, 889-903. Baker, B. B. and Copson, E. T. (1949). The Mathematical Theory of Huygens’ Principle, Second Edition. Clarendon Press, Oxford. Bará, S. and Navarro, R. (2003). “Wide-field compensation of monochromatic eye aberrations: expected performance and design trade-offs.” J. Opt. Soc. Am. A, 20, 1-10. Barankin, E. W. (1949). “Locally best unbiased estimates.” Ann. Math. Statist., 20, 477-501. Bard, Y. (1974). Nonlinear Parameter Estimation. Academic Press, New York and London. 339 Barrett, H. H. and Myers, K. J. (2004). Foundations of Image Science. Wiley, New Jersey. Barrett, H. H., Dainty, C., and Lara, D. (2007). “Maximum-likelihood methods in wavefront sensing: stochastic models and likelihood functions.” J. Opt. Soc. Am. A, 24, 391-414. Barrett, H. H., Sakamoto, J. A., and Goncharov, A., “Inverse optical design”, U. S. Patent 7,832,864. Issued on 11/16/2010. Bhattacharyya, A. (1946). “On some analogues of the amount of information and their use in statistical estimation. Part 1.” Sankhya, 8, 1-14. Bhattacharyya, A. (1947). “On some analogues of the amount of information and their use in statistical estimation. Part 2.” Sankhya, 8, 201-218. Bhattacharyya, A. (1948). “On some analogues of the amount of information and their use in statistical estimation. Part 3.” Sankhya, 8, 315-328. Blinn, J. F. (2006). “How to solve a cubic equation, part 1: The shape of the discriminant.” IEEE Comput. Graph. Appl., 26(3), 84-93. Blinn, J. F. (2006a). “How to solve a cubic equation, part 3: General depression and a new covariant.” IEEE Comput. Graph. Appl., 26(6), 92-102. Bonomi, E. and Lutton, J. (1984). “The N-city travelling salesman problem: Statistical mechanics and the Metropolis algorithm.” SIAM Rev., 26, 551-568. Booth, G. W. and Peterson, T. I. (1958). “Nonlinear Estimation.” IBM SHARE Program Pa. No. 687 WLNLI. Born, M. and Wolf, E. (1999). Principles of Optics, 7th edition. Cambridge University Press, Cambridge. Bouwkamp, C. J. (1954). “Diffraction Theory.” In A. C. Strickland, editor, Reports on Progress in Physics, Vol. XVII. The Physical Society, London. Box, G. E. P. and Muller, M. E. (1958). “A note on the generation of random normal deviates.” Ann. Math. Statist. 29, 610-611. Box, G. E. P. and Lucas, H. L. (1959). “Design of experiments in nonlinear situations.” Biometrika. 46, 77-90. 340 Box, G. E. P. and Hunter, W. G. (1962). “A useful method for model-building.” Technometrics, 4, 301-318. Brady, G. R. and Fienup, J. R. (2004). “Improved optical metrology using phase retrieval.” 2004 Optical Fabrication & Testing Topical Meeting, OSA, Rochester, NY, paper OTuB3. Brady, G. R. and Fienup, J. (2005). “Phase retrieval as an optical metrology tool.” Optifab: Technical digest, SPIE Technical Digest TD03, pp. 139-141. Brady, G. R., Guizar-Sicairos, M., and Fienup, J. (2009). “Optical wavefront measurement using phase retrieval with transverse translation diversity.” Opt. Express, 17, 624-639. Camp, J. J., Maguire, L. J., Cameron, B. M., and Robb, R. A. (1990a). “A computer model for the evaluation of the effect of corneal topography on optical performance.” Am. J. Ophthalmol., 109(4), 379-386. Camp, J. J., Maguire, L. J., Cameron, B. M., and Robb, R. A. (1990b). “An efficient ray tracing algorithm for modeling visual performance from corneal topography.” In Proc. First Conf. on Visualization in Biomedical Computing, Atlanta, GA, May 22-25. Piscataway, NJ, IEEE. Corana, A., Marchesi, M., Martini, C., and Ridella, S. (1987). “Minimizing multimodal functions of continuous variable with the ‘simulated annealing’ algorithm.” ACM T. Math. Software, 13, No. 3, 262-280. Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press, Princeton, NJ. Daniels, H. E. (1961). “The asymptotic efficiency of a maximum likelihood estimator.” Proc. 4th Berkeley Symp. Math. Statist. and Prob., 1, 151. Davidon, W. C. (1991). “Variable metric method for minimization.” SIAM J. Optim., 1, 1-17. Dubbelman, M. and Van der Heijde, G. L. (2001). “The shape of the aging human lens: curvature, equivalent refractive index and the lens paradox.” Vision Res., 41, 1867-1877. Dubbelman, M., Weeber, H. A., Van der Heijde, G. L., and Völker-Dieben, H. J. (2002). “Radius and asphericity of the posterior corneal surface determined by corrected Scheimpflug photography.” Acta. Ophthalmol. Scand., 80, 379-383. 341 Duff, T. (1992). “Interval arithmetic recursive subdivision for implicit functions and constructive solid geometry.” SIGGRAPH Comput. Graph., 26(2), 131-138. Dugué, D. (1937). “Application des propriétés de la limite au sens du calcul des probabilités a l’étude de diverse questions d’estimation.” Ecol. Poly., 3(4), 305-372. Eisenpress, H., Bomberault, A., and Greenstadt, J. (1966a). “Nonlinear Regression Equations and Systems, Estimation and Prediction (IBM) 7090.” Computer program 7090-G2 IBM0035 G2, IBM, Hawthorne, New York. Eisenpress, H. and Greenstadt, J. (1966b). “The estimation of nonlinear econometric systems.” Econometrica, 34, 851-861. El Hage, S. G. and Berny, F. (1973). “Contribution of the crystalline lens to the spherical aberration of the eye.” J. Opt. Soc. Am., 63, 205-211. Escudero-Sanz, I. and Navarro, R. (1999). “Off-axis aberrations of a wide-angle schematic eye model.” J. Opt. Soc. Am. A, 16, 1881-1891. Falk, J. E. and Soland, R. M. (1969). “An algorithm for separable nonconvex programming problems.” Manage. Sci., 15, 550-569. Ferguson, T. S. (1967). Mathematical Statistics, A Decision Theoretic Approach. Academic Press, New York. Fisher, R. A. (1912). “On an absolute criterion for fitting frequency curves.” Messenger Math., 41, 155-160. Fisher, R. A. (1922). “On the mathematical foundations of theoretical statistics.” Philos. Trans. Roy. Soc. London Ser. A, 222, 309-368. Fisher, R. A. (1925). “Theory of statistical estimation.” Proc. Cambridge Philos. Soc., 22, 700-725. Fisher, R. A. (1934). “Two new properties of mathematical likelihood.” Proc. Roy. Soc. Ser. A., 144, 285-307. Fisher, R. A. (1935). “The logic of inductive inference.” J. Roy. Statist. Soc., 98(1), 3954. Fletcher, R. and Powell, M. J. D. (1963). “A rapidly convergent descent method for minimization.” Comput. J., 6, 163-168. 342 Fletcher, R. and Reeves, C. M. (1964). “Function minimization by conjugate gradients.” Comput. J., 7, 149-154. Fletcher, R. (1965). “Function minimization without evaluating derivatives—a review.” Comput. J., 8, 33-41. Gauss, K. F. (1809). “Theoria Motus Corporum Coelestium.” Werke, 7, 240-254. Gillet, J. and Sheng, Y. (1999). “Simulated quenching with temperature rescaling for designing diffractive optical elements.” In Proc. 18th Congress Int. Commission for Optics, volume 3749 of Proc. SPIE, pp. 683-684. Goldberg, D. E. and Richardson, J. (1987). “Genetic algorithms with sharing for multimodal function optimization.” In Grefenstette, J. J., editor, Proc. Second Int. Conf. on Genetic Algorithms, 41-49. Lawrence Erlbaum. Goncharov, A. V. and Dainty, C. (2007). “Wide-field schematic eye models with gradient-index lens.” J. Opt. Soc. Am. A, 24, 2157-2174. Goncharov, A. V., Nowakowski, M., Sheehan, M. T., and Dainty, C. (2008). “Reconstruction of the optical system of the human eye with reverse ray-tracing.” Opt. Express, 16, 1692-1703. Goodman, J. W. (2005). Introduction to Fourier Optics, 3rd edition. Roberts & Company Publishers, Englewood. Gray, H. L. and Schucany, W. R. (1972). The Generalized Jackknife Statistic. Dekker, New York. Greenstadt, J. (1967). “On the relative efficiencies of gradient methods.” Math. Comp., 21, 360-366. Guirao, A. and Artal, P. (2000). “Corneal wave aberration from videokeratography: accuracy and limitations of the procedure.” J. Opt. Soc. Am. A, 17, 955-965. Gullstrand, A. (1962). Helmholtz’s Handbuch der Physiologischen Optik, 3rd Edition. English translation edited by J. P. Southall (Optical Society of America) Vol. 1, 351-352. Halmos, P. R. and Savage, L. J. (1949). “Application of the Radon-Nikodym theorem to the theory of sufficient statistics.” Ann. Math. Statist., 20, 225-241. Hansen, P. and Mladenović, N. (2001). “Variable neighborhood search: Principles and applications.” Eur. J. Oper. Res., 130, 449-467. 343 Hansen, P. and Mladenović, N. (2002). “Variable neighborhood search.” In P. Pardalos and M. Resende, editors, Handbook of Applied Optimization, Oxford, 221-234. He, J. C., Marcos, S., and Burns, S. A. (1999). “Comparison of cone directionality determined by psychophysical and reflectometric techniques.” J. Opt. Soc. Am. A, 16, 2363-2369. Hemenger, R. P., Garner, L. F., and Ooi, C. S. (1995). “Change with age of the refractive index gradient of the human ocular lens.” Invest. Ophth. Vis. Sci., 36, 703-707. Hero, III, A. O., Fessler, J. A., and Usman, M. (1996). “Exploring estimator biasvariance tradeoffs using the uniform CR bound.” IEEE Trans. Signal Processing, 44, 2026-2041. Hestenes, M. R. and Stiefel, E. (1952). “Methods of conjugate gradients for solving linear systems.” J. Res. N. B. S., 49, 409-436. Hestenes, M. R. (1969). “Multiplier and gradient methods.” J. Opt. Theo. Applns., 4, 303-320, and in Computing Methods in Optimization Problems, 2 (Eds. L. A. Zadeh, L. W. Neustadt and A. V. Balakrishnan), Academic Press, New York, 1969. Hofer, H., Chen, L., Yoon, G., Singer, B., Yamauchi, Y., and Williams, D. R. (2001). “Improvement in retinal image quality with dynamic correction of the eye’s aberrations.” Opt. Express, 8, 631-643. Hood, W. C. and Koopmans, T. C., eds. (1953). Studies in Econometric Method. Wiley, New York. Hooke, R. and Jeeves, T. A. (1961). “Direct search solution of numerical and statistical problems.” J. Assoc. Comput. Mach., 8, 212-229. Huber, P. J. (1972). “Robust statistics: a review.” Ann. Math. Statist. and Prob., 1, 221. Huygens, C. (1690). Traité de la Lumiére. Leyden; Engl. Transl. Thompson, S. P. (1912). Treatise on Light. Macmillan, London. Ingber, L. (1993). “Simulated annealing: Practice versus theory .” Math. Comp. Model., 18, 29-57. Ingber, L. (1996). “Adaptive simulated annealing (asa): Lessons learned.” Control Cybern., 25, 33-54. 344 Jackson, J. D. (1975). Classical Electrodynamics, 2nd edition. John Wiley & Sons, New York. Jones, C. E., Atchison, D. A., Meder, R., and Pope, J. M. (2005). “Refractive index distribution and optical properties of the isolated human lens measured using magnetic resonance imaging (MRI).” Vision Res., 45, 2352-2366. Kajiya, J. T. (1982). “Ray tracing parametric patches.” In SIGGRAPH ’82, pp. 245-254. Kendall, M. and Stuart, A. (1979). The Advanced Theory of Statistics, Vol. 2: Inference and Relationship, 4th edition. Charles Griffin & Co. Ltd., London. Kirkpatrick, S., Gelatt, Jr., C. D., Vecchi, M. P. (1983). “Optimization by simulated annealing.” Science, 220, No. 4598, 671-680. Kirkpatrick, S., Gelatt, Jr., C. D., Vecchi, M. P. (1984). “Optimization by simulated annealing: Quantitative study.” J. Stat. Phys., 34, 975. Kittel, C. and Kroemer, H. (1980). Thermal Physics. W.H. Freeman and Company, New York. Kooijman, A. C. (1983). “Light distribution on the retina of a wide-angle theoretical eye.” J. Opt. Soc. Am., 73, 1544-1550. Koretz, J. F., Strenk, S. A., Strenk, L. M., and Semmlow, J. L. (2004). “Scheimpflug and high-resolution magnetic resonance imaging of the anterior segment: a comparative study.” J. Opt. Soc. Am. A, 21, 346-354. Langenbucher, A., Viestenz, A., Viestenz, A., Brunner, H., and Seitz, B. (2006). “Ray tracing through a schematic eye containing second-order (quadric) surfaces using 4 × 4 matrix notation.” Ophthal. Physiol. Opt., 26, 180-188. Lecam, L. (1970). “On the assumptions used to prove asymptotic normality of maximum likelihood estimates.” Ann Math. Statist., 41, 802. Legendre, A. M. (1805). Nouvelles Méthodes pour la Determination des Orbites de Comètes. Paris. Levy, A. V. and Montalvo, A. (1985). “The Tunneling Algorithm for the Global Minimization of Functions.” SIAM J. Sci. Stat. Comp., 6, 15-29. Liang, J., Williams, D. R., and Miller, D. (1997). “Supernormal vision and highresolution retinal imaging through adaptive optics.” J. Opt. Soc. Am. A, 14, 2884-2892. 345 Liberti, L. and Maculan, N. (2006). Global Optimization: From Theory to Implementation. Springer, New York. Lotmar, W. (1971). “Theoretical eye model with aspherics.” J. Opt. Soc. Am., 61, 15221529. Mahajan, V. N. (2001). Optical Imaging and Aberrations, Part I: Ray Geometrical Optics. SPIE Press, Bellingham, Washington, second printing. Mahajan, V. N. (2004). Optical Imaging and Aberrations, Part I: Wave Diffraction Optics. SPIE Press, Bellingham, Washington, second printing. Malacara, D. (2007). Optical Shop Testing, Third edition. Wiley, Hoboken. Mallen, E. and Kashyap, P. (2007). “Technical note: measurement of retinal contour and supine axial length using the Zeiss IOLMaster.” Ophthal. Physiol. Opt., 27, 404-411. McAulay, R. J. & Hofstetter, E. M. (1971). “Barankin bounds on parameter estimation.” IEEE T. Inform. Theory, 17, 669-676. Melsa, J. L. and Cohn, D. L. (1978). Decision and Estimation Theory. McGraw-Hill, New York. Metcalf, H. J. (1965). “Stiles-Crawford apodization.” J. Opt. Soc. Am., 55, 72-74. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., and Teller, A. H. (1953). “Equation of state calculations by fast computing machines.” J. Chem. Phys., 21, No. 6, 1087-1092. Miller, L. H. (1964). “A trustworthy jackknife.” Ann. Math, Statist., 35, 1549-1605. Mitchell, D. P. (1990). “Robust ray intersection with interval arithmetic.” In Proc. Graphics Interface ’90, pp. 68-74. Moffat, B. A., Atchison, D. A., and Pope, J. M. (2002). “Age-related changes in refractive index distribution and power of the human lens as measured by magnetic resonance micro-imaging in vitro.” Vision Res., 42, 1683-1693. Navarro, R., Santamaría, J., and Bescós, J. (1985). “Accommodation-dependent model of the human eye with aspherics.” J. Opt. Soc. Am. A, 2, 1273-1280. 346 Navarro, R., Moreno, E., and Dorronsoro, C. (1998). “Monochromatic aberrations and point-spread functions of the human eye across the visual field.” J. Opt. Soc. Am. A, 15, 2522-2529. Navarro, R., González, L., and Hernández, J. L. (2006). “Optics of the average normal cornea from general and canonical representations of its surface topography.” J. Opt. Soc. Am. A, 23, 219-232. Navarro, R., Palos, F., and González, L. M. (2007). “Adaptive model of the gradient index of the human lens. I. optics formulation and model of aging ex vivo lenses.” J. Opt. Soc. Am. A, 24, 2175-2185. Navarro, R., Palos, F., and González, L. M. (2007a). “Adaptive model of the gradient index of the human lens. II. optics of the accommodating aging lens.” J. Opt. Soc. Am. A, 24, 2911-2920. Neal, D. R., Topa, D. M., and Copland, J. (2001). “The effect of lenslet resolution on the accuracy of ocular wavefront measurements.” In Proc. SPIE, 4245, 78-91. Neal, D. R., Copland, J., and Neal, D. (2002). “Shack-Hartmann wavefront sensor precision and accuracy.” In Proc. SPIE, 4779, 148-160. Nelder, J. A. and Mead, R. (1965). “A simplex method for function minimization.” Comput. J., 20, 308-313. Neyman, J. (1935). “Sur un teorema concernente le cosidette statistiche sufficienti.” Inst. Ital. Atti. Giorn., 6, 320-334. Neyman, J. (1937). “Outline of a theory of statistical estimation based on the classical theory of probability.” Philos. Trans. Roy. Soc. London Ser. A, 236, 333-380. Neyman, J. and Pearson, E. S. (1936). “Contributions to the theory of testing statistical hypotheses. I. Unbiased critical regions of type A and type A1.” Stat. Res. Mem., 1, 1-37. Nister, D. (2004). “An efficient solution to the five-point relative pose problem.” IEEE Trans. Pattern Anal. Mach. Intell., 26(6), 756-777. Palmer, S. E. (1999). Vision Science: Photons to Phenomenology. MIT Press, Cambridge. Pearson, K. (1894). “Contributions to the mathematical theory of evolution.” Philos. Trans. Roy. Soc. London Ser. A, 185, 71-110. 347 Pearson, K. (1900). “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling.” Philos. Mag. Fifth Series, 50, 157-175. Pearson, K. (1901). “Of lines and planes of closest fit to systems of points in space.” Philos. Mag. Sixth Series, 2, 559-572. Pearson, K. (1936). “Method of moments and method of maximum likelihood.” Biometrika, 28, 34-59. Plackett, R. L. (1972). “The discovery of the method of least squares.” Biometrika, 67, 239-251. Platt, B. and Shack, R. (2001). “History and principles of Shack-Hartmann wavefront sensing.” J. Refract. Surg., 17, S573 – S577. Poincaré, H. (1892). Théorie Mathématique de la Lumière vol. II. Georg Carre, Paris. Polak, E. (1971). Computational Methods in Optimization: A Unified Approach. Academic Press, New York. Powell, M. J. D. (1964). “An efficient method for finding the minimum of a function of several variables without calculating derivatives.” Comput. J., 7, 155-162. Powell, M. J. D. (1965). “A method for minimizing a sum of squares of nonlinear functions without calculating derivatives.” Comput. J., 7, 303-307. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992). Numerical recipes in C: The art of scientific computing, 2nd edition. Cambridge, New York. Quenouille, M. H. (1956). “Notes on bias in estimation.” Biometrika, 43, 353-360. Raiffa, H. and Schlaifer, R. (1961). Applied Statistical Decision Theory. Graduate School of Bussiness Administration, Harvard Univ., Boston. Rao, C. R (1945). “Information and accuracy attainable in the estimation of statistical parameters.” Bull. Calcutta Math. Soc. 37, 81-91. Rao, C. R. (1973). Linear Statistical Inference and Its Applications, Second Edition. Wiley, New York. 348 Redding, D., Dumont, P., and Yu, J. (1993). “Hubble Space Telescope prescription retrieval.” Appl. Opt., 32, 1728-1736. Robson, D. S. and Whitlock, J. H. (1964). “Estimation of a truncation point.” Biometrika, 51, 33-39. Romeo, F., Sangiovanni Vincentelli, A., and Sechen, C. (1984). “Research on simulated annealing at Berkeley.” In Proc. IEEE Int. Conf. on Computer Design, ICCD 84, IEEE New York, 652-657. Roorda, A., Romero-Borja, F., Donnelly, III, W., Queener, H., Hebert, T., and Campbell, M. (2002). “Adaptive optics scanning laser ophthalmoscopy.” Opt. Express, 10, 405412. Rosales, P. and Marcos, S. (2006). “Phakometry and lens tilt and decentration using a custom-developed Purkinje imaging apparatus: Validation and measurements.” J. Opt. Soc. Am. A, 23, 509-520. Rosales, P., Dubbelman, M., Marcos, S., and Van der Heijde, G. L. (2006a). “Crystalline lens radii of curvature from Purkinje and Scheimpflug imaging.” J. Vis., 6, 1057-1067. Rosenbrock, H. H. (1960). “An automatic method for finding the greatest or least value of a function.” Comput. J., 3, 175-184. RoyChowdury, P., Singh, Y. P., and Chansarkar, R. A. (2000). “Hybridization of gradient descent algorithms with dynamic tunneling methods for global optimization.” IEEE Transactions on Systems, Man and Cybernetics – Part A: Systems and Humans, 30, 384-390. Rynders, M., Lidkea, B., Chisholm, W., and Thibos, L. N. (1995). “Statistical distribution of foveal transverse chromatic aberration, pupil centration, and angle psi in a population of young adult eyes.” J. Opt. Soc. Am. A, 12, 2348-2357. Ryoo, S., Rodrigues, C. I., Baghsorkhi, S. S., Stone, S. S., Kirk, D. B., and Hwu, W. W. (2008). “Optimization principles and application performance evaluation of a multithreaded GPU using CUDA.” In Proc. 13th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, ACM Press, 73-82. Sakamoto, J. A., Barrett, H. H., and Goncharov, A. V. (2008). “Inverse optical design of the human eye using likelihood methods and wavefront sensing.” Opt. Express, 16, 304314. 349 Sakamoto, T. (1993). “Analytic solutions of the eikonal equation for a GRIN-rod lens 1. Meridional rays.” J. Mod. Optics, 40, 503-516. Sakamoto, T. (1995). “Analytic solutions of the eikonal equation for a GRIN-rod lens 2. Skew rays.” J. Mod. Optics, 42, 1575-1592. Sato, S. (1997). “Simulated quenching: A new placement method for module generation.” In ICCAD’97: Proc. 1997 IEEE/ACM Int. Conf. on Computer-aided Design, 538-541, San Jose, California, United States. Savage, L. J. (1954). The Foundations of Statistics. Wiley, New York. Schwiegerling, J., Greivenkamp, J. E., and Miller, J. M. (1995). “Representation of videokeratoscopic height data with Zernike polynomials.” J. Opt. Soc. Am. A, 12, 21052113. Schwiegerling, J. (2004). Field Guide to Visual and Ophthalmic Optics. SPIE Press, Bellingham, Washington. Schwiegerling, J. and Neal, D. “Historical development of the Shack-Hartmann wavefront sensor.” In Harvey, J. E. and Hooker, R. B. (2005) Robert Shannon and Roland Shack: Legends in Applied Optics. SPIE Press Monograph Vol. PM148, pp. 132-139. Seal, H. L. (1967). “The historical development of the Gauss linear model.” Biometrika, 54, 1-24. Sederberg, T. W. and Chang, G. (1993). “Isolating the real roots of polynomials using isolator polynomials. In Algebraic Geometry and Applications, Springer Verlag. Seldin, J. H. and Fienup, J. R. (1990). “Numerical investigation of the uniqueness of phase retrieval.” J. Opt. Soc. Am. A, 7, 412-427. Sheehan, M. T., Goncharov, A. V., O’Dwyer, V. M., Toal, V., and Dainty, C. (2007). “Population study of the variation in monochromatic aberrations of the normal human eye over the central visual field.” Opt. Express, 15, 7367-7380. Silver, S. (1962). “Microwave aperture antennas and diffraction theory.” J. Opt. Soc. Am., 52, 131. Singh, J. M. and Narayanan, P. J. (2007). “Real-time ray tracing of implicit surfaces on the GPU.” Technical Report IIIT/TR/2007/72, International Institute of Information Technology, Hyderabad, India. 350 Smith, G., Atchison, D. A., Pierscionek, B. K. (1992). “Modeling the power of the aging human eye.” J. Opt. Soc. Am. A, 9, 2111-2117. Smith, W. E., Barrett, H. H., and Paxman, R. G. (1983). “Reconstruction of objects from coded images by simulated annealing.” Opt. Lett., 8, 199-201. Smith, W.E., Paxman, R. G., and Barrett, H. H. (1985). “Application of simulated annealing to coded-aperture design and tomographic reconstruction.” IEEE Trans. Nuc. Sci., NS-32, 758-761. Sommerfeld, A. (1896). Mathematische theorie der diffraction. Math. Ann., 47, 317. Sommerfeld, A. (1954). Optics, volume IV of Lectures on Theoretical Physics. Academic Press, New York. Southwell, W. (1980). “Wave-front estimation from wave-front slope measurements.” J. Opt. Soc. Am., 70, 998-1006. Spang, III, H. A. (1962). “A review of minimization techniques for nonlinear functions.” SIAM Rev., 4, 343-365. Spendley, W., Hext, G. R., Himsworth, F. R. (1962). “Sequential application of simplex designs in optimization and evolutionary operation.” Technometrics, 4, 441-461. Stamnes, J. J. (1986). Waves in Focal Regions. Taylor & Francis Group, New York. Stavroudis, O. N. (1972). The Optics of Rays, Wavefronts, and Caustics. Academic Press, New York and London. Stefanescu, I. S. (1985). “On the phase retrieval problem in two dimensions.” J. Math. Phys., 26, 2141-2160. Stiles, W. S. and Crawford, B. H. (1933). “The luminous efficiency of rays entering the eye pupil at different points.” P. Roy. Soc. Lond. B, 112, 428-450. Stoica, P. (2001). “Parameter estimation problems with singular information matrices.” IEEE T. Signal Proces., 49, 87-90. Storn, R. and Price, K. (1997). “Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces.” J. Global Optim., 11, 341-359. 351 Strang, G. (1980). Linear Algebra and Its Applications, 2nd edition. Academic Press, Orlando, FL. Straub, J., Schwiegerling, J., and Gupta, A. (2001). “Design of a compact ShackHartmann aberrometer for real-time measurement of aberrations in human eyes.” Vision Science and Its Applications, OSA Technical Digest (Optical Society of America, Washington DC), 110-113. Streifer, W. and Paxton, K. B. (1971). “Analytic solution of ray equations in cylindrically inhomogeneous guiding media. 1: meridional rays.” Appl. Optics, 10, 769775. Strenk, S. A., Semmlow, J. L., Strenk, L. M., Munoz, P., Gronlund-Jacob, J., and DeMarco, J. K. (1999). “Age-related changes in human ciliary muscle and lens: a magnetic resonance imaging study.” Invest. Ophthalmol. Visual Sci., 40, 1162-1169. Tabernero, J., Benito, A., Nourrit, V., and Artal, P. (2006). “Instrument for measuring the misalignments of ocular surfaces.” Opt. Express, 14, 10945-10956. Tan, P. and Drossos, C. (1975). “Invariance properties of maximum likelihood estimators.” Math. Mag., 48, 37-41. Teague, M. R. (1983). “Deterministic phase retrieval.” J. Opt. Soc. Am., 73, 1434-1441. Thibos, L. N., Ye, M., Zhang, X., and Bradley, A. (1992). “The chromatic eye: a new reduced-eye model of ocular chromatic aberration in humans.” Appl. Optics, 31, 35943600. Thibos, L. N., Ye, M., Zhang, X., and Bradley, A. (1997). “Spherical aberration of the reduced schematic eye with elliptical refracting surface.” Optom. Vision Sci., 74, 548556. Thibos, L. N., Bradley, A. (1999). “Modeling the refractive and neurosensor systems of the eye. Published as Chapter 4 (pp. 101-159) in Visual Instrumentation: Optical Design and Engineering Principles. Pantazis Mouroulis, ed. Mcgraw-Hill, New York. Torczon, V. (1997). “On the convergence of pattern search algorithms.” SIAM J. Optim., 7, 1-25. Trotter, H. F. (1957). “Gauss’s work (1803-1826) on the theory of least squares.” Technical Report 5, Statistical Techniques Research Group, Princeton Univ. Tuy, H. (1998). Convex Analysis and Global Optimization. Kluwer, Dodrecht. 352 Van Trees, H. L. (1968). Detection, Estimation, and Modulation Theory. Wiley, New York. von Helmholtz, H. (1910). Handbuch der physiologischen Optik. Translation by Southall, J. P. C. (1925). Helmholtz’s treatise on physiologic optics. Dover, New York, Vol III. Wah, B. W. and Wang, T. (1999). “Efficient and adaptive Lagrange-multiplier methods for nonlinear continuous global optimization.” J. Global Optim., 14, 1-25. Wald, A. (1939). “Contributions to the theory of statistical estimation and testing hypotheses.” Ann. Math. Stat., 10, 299-326. Wald, A. (1945). “Statistical decision functions which minimize the maximum risk.” Ann. Math. Second Series, 46, 265-280. Wald, A. (1950). Statistical decision functions. Wiley, New York. Westheimer, G. (1965). “Retinal light distribution for circular apertures in Maxwellian view.” J. Opt. Soc. Am., 49, 41-44. White, S. R. (1984). “Concepts of scale in simulated annealing.” In Proceedings of the IEEE International Conference on Computer Design, ICCD 84, New York, 646-651. Wyvill, G. and Trotman, A. (1990). “Ray-tracing soft objects.” In CG International ’90, pp. 469-476, New York, NY. Springer-Verlag New York, Inc. Young, T. (1802). “On the theory of light and colours.” Phil. Trans. R. Soc., 92, 12-48. Zernike, F. (1934). “Beugungstheorie des Schneidenverfahrens und seiner verbesserten Form, der Phasenkontrastmethode." Physica, 1, 689-704. Zhou, F., Hong, X., Miller, D. T., Thibos, L. N., and Bradley, A. (2004). “Validation of a combined corneal topographer and aberrometer based on Shack-Hartmann wave-front sensing.” J. Opt. Soc. Am. A, 21, 683-696.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising