INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand corner and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. University Microfilms International A Bell & Howell Information Company 300 North Zeeb Road. Ann Arbor, Ml 48106-1346 USA 313/761-4700 800/521-0600 Order Number 1S5S117 New methods for super-resolution Walsh, David Oliver, M.S. The University of Arizona, 1993 UMI SOON.ZeebRd. Ann Aifeor, MI 48106 NEW METHODS FOR SUPER-RESOLUTION by David Oliver Walsh A Thesis Submitted to the Faculty of the DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING In Partial Fulfillment of the Requirements For the Degree of MASTER OF SCIENCE WITH A MAJOR IN ELECTRICAL ENGINEERING In the Graduate College THE UNIVERSITY OF ARIZONA 19 9 3 2 STATEMENT BY AUTHOR This thesis has been subntiitted in partial fulfillment of requirements for an ad vanced degree at The University of Arizona and is deposited in the University Library to be made available to borrowers under rules of the Library. Brief quotations from this thesis are allowable without special permission, pro vided that accurate acknowledgment of source is made. Requests for permission for extended quotation from or reproduction of this manuscript in whole or in part may be granted by the head of the major department or the Dean of the Graduate College when in his or her judgment the proposed use of the material is in the interests of scholarship. In all other instances, however, permission must be obtained from the author. SIGNED: APPROVAL BY THESIS DIRECTOR This thesis has been approved on the date shown below: Pamela A. Nielsen Assistant Professor of Electrical and Computer Engineering 3 ACKNOWLEDGMENTS "That which does not kill us, makes us stronger." -Friedrich Nietzsche First and foremost I wish to thank my advisor, Dr. Pamela Nielsen, who deserves credit for many of the new ideas contained in this thesis. I am particularly grateful for the generous financial assistance which she provided to me. I also wish to thank Dr. Donald Dudley and Dr. Michael Marcellin for taking the time to serve on my thesis committee. I must thank Mr. David Marshall for serving as a mathematical reference and for providing valuable feedback, and I would like to thank Mr. Justin Judkins for acknowledging myself in his thesis. Thanks also go to Dr. Richard Ziolkowski and Dr. Hal Tharp for providing excellent reference materials and computer software which I used for this thesis. And, of course, I must thank my parents who have supported me in all my endeavors. In Memory of Noog (1981-1992) 5 TABLE OF CONTENTS LIST OF FIGURES 7 LIST OF TABLES 9 ABSTRACT 1. INTRODUCTION 2. THE DIRECT METHOD 2.1. Description 2.2. Errors And Error Bounds 2.2.1. Types of Errors 2.2.2. Effects of Errors 2.2.3. Error Bounds 2.3. The Least Squares Solution 2.4. Results From Experimental Trials 2.4.1. Case 1 Trials—Gaussian Noise 2.4.2. Case 2 Trials—Uniform Noise 2.4.3. Results 2.4.4. Analysis of Trial Results 2.5. Summary 3. THE GERCHBERG ALGORITHM 3.1. Description of the Gerchberg Algorithm 3.1.1. Example 3.1.1 3.1.2. Error Energy Reduction 3.1.3. Discrete Implementation 3.1.4. The Method of Alternating Orthogonal Projections 3.1.5. Example 3.1.2 3.2. Relationship to the Direct Method 3.2.1. Example 3.2 3.3. The Overdetermined Case 3.3.1. Example 3.3 10 11 16 16 18 19 25 27 31 32 33 34 35 42 43 44 45 45 53 54 54 55 58 60 65 66 6 3.4. Error Bounds for the Gerchberg Algorithm 3.5. Summary, Advantages and Disadvantages 68 69 4. TERMINATION SCHEMES FOR THE GERCHBERG ALGO RITHM 71 4.1. Convergence For The Gerchberg Algorithm 4.2. Results From Experimental Trials 4.2.1. Procedure 4.2.2. Results 4.2.3. Aliasing Error 4.3. Termination Schemes 4.3.1. Convergence Factor 4.3.2. 2nd Derivative of Energy 4.3.3. Statistically Optimum Termination 4.3.4. Comparison Between Termination Schemes 4.4. Summary 5. THE SVD METHOD 5.1. Eigenvector Expansion 5.2. SVD Expansion 5.2.1. Advantages 5.3. Example: Two Point Target 5.4. 2-D Example 5.5. Error Concentration 5.6. Summary 6. SUMMARY AND CONCLUSION REFERENCES 71 74 74 75 79 82 82 83 87 87 88 90 91 93 94 96 106 117 119 121 123 7 LIST OF FIGURES 1.1. Effect of bandlimiting a time-limited function 1.2. Infinite frequency spectrum which has been bandlimited to -LlQHz. . 2.1. The original time-limited function •w{t) 2.2. The infinite frequency spectrum W { f ) of the time-limited function 12 13 20 21 22 23 2.3. The periodic time-limited function Ws(t) 2.4. The periodic frequency spectrum W s { f ) o f W s { t ) 2.5. Aliasing error between continuous frequency spectrum W(/) and dis crete periodic frequency spectrum Ws{f) 2.6. A-priori error bound and error norms for 1000 trials (exactly deter mined case) 2.7. A-priori error bound and error norms for 1000 trials (overdetermined case) 41 3.1. Flow chart for the Gerchberg algorithm 3.2. Known portion of frequency spectrum 3.3. Time domain representation of known frequency spectrum 3.4. Figure 3.3 truncated to time-limited region 3.5. Frequency spectrum of Figure 3.4 3.6. Known portion of frequency spectrum replaced 3.7. Time domain representation of Figure 3.6 3.8. The method of alternating orthogonal projections 3.9. Convergence factor vs. iteration number for Example 3.2 3.10. Convergence to numerical limit of computer 3.11. Correction energy for Example 3.3 46 47 48 49 50 51 52 56 62 64 67 4.1. Example of reconstructed energy 4.2. Second derivative of total reconstructed energy from Figure 4.2. . . . 4.3. Normalized mean squared error between true solution and Gerchberg algorithm's latest estimate 84 85 86 5.1. Original time-limited object 5.2. Discrete frequency spectrum of original object 97 98 24 40 8 5.3. Known portion of frequency spectrum distorted by noise 5.4. Image from noisy, diffraction limited spectrum 5.5. Error norm for Gerchberg algorithm 5.6. Result from Gerchberg algorithm after 154 iterations 5.7. Result from SVD method; 6 singular values thrown out 5.8. Result from SVD method: 5 singular values thrown out 5.9. Original space-limited object 5.10. Frequency spectrum of original object 5.11. Frequency spectrum distorted by noise 5.12. Known portion of frequency spectrum 5.13. Image from noisy, diffraction-limited spectrum 5.14. SVD result using 8 singular vectors 5.15. SVD result using 6 singular vectors 5.16. SVD result using 7 singular vectors 5.17. SVD result using 9 singular vectors 5.18. SVD result using 10 singular vectors 99 100 101 102 103 104 107 108 109 110 Ill 112 113 114 115 116 9 LIST OF TABLES 2.1. Results from Case 1 trials 2.2. Results from Case 2 trials 2.3. Condition number oi A 36 37 38 3.1. Result after each step of example in Figure 3.8 57 4.1. Expected distribution of / and e over eigenvectors of P 4.2. Change in distribution of / and e over eigenvectors of P as known frequencies move away from main lobe of frequency spectrum. . 76 78 10 ABSTRACT This thesis presents a new, non-iterative method for super-resolution which we call the direct method. By exploiting the inherent structure of the discrete signal process ing environment, the direct method reduces the discrete super-resolution problem to solving a linear set of equations. The direct method is shown to be closely related to the Gerchberg algorithm for super-resolution. A mathematical justification for early termination of the Gerchberg algorithm is presented and the design of optimal ter mination schemes is discussed. Another new super-resolution method, which we call the SVD method, is presented. The SVD method is based on the direct method and employs SVD techniques to minimize errors in the solution due to noise and aliasing errors on the known frequency samples. The new SVD method is shown to provide results nearly identical to the optimal solution given by the Gerchberg algorithm, with huge savings in time and computational work. 11 CHAPTER 1 INTRODUCTION Super-resolution is the process of restoring lost frequency or spatial frequency information to improve the resolution of a time or space domain object. The missing frequency information has usually been lost by passing the time domain function through a bandlimited system. For example, consider the time-hmited function {1,0, m= —10ms < t < 10ms elsewhere (1.1) which has been passed through an ideal low-pass filter with cutoff frequencies of The original time domain function f{t) is shown as the dotted line in Figure 1.1. The result of low-pass filtering (bandlimiting) this function to is shown as the solid hne in Figure 1.1. The filtered version of f { t ) has clearly lost the sharp edges of the original function. The original frequency spectrum of f{t) is shown as the dotted line in Figure 1.2 and the low-pass filtered frequency spectrum is shown as the solid line in the same figure. If the original frequency spectrum (the dotted line in Figure 1.2) can be restored, then f{t) can be completely resolved. Super-resolution has been shown to be theoretically possible for a time-limited (space-limited) function [1], such as f(t) in the example above. The theory is based Dotted;Tiine-Limiled Function, Solid:Low-Pass Filtered to 10 Hz 1.5 0.5 -0.5 -100 -80 -60 -40 100 -20 Time (ms) Figure 1.1: Effect of bandliniiting a time-limited function. 13 DoUed;Original Spectiuin, Sc)lid;L()w-Pass Filtered lo 10 Hz -200 -150 -100 -50 0 50 100 150 200 Frequency (Hz) Figure 1.2: Infinite frequency spectrum which has been bandlimited to ± l Q H z . 14 on the fact that the frequency spectrum of a finite object is analytic [2], and any analytic function is uniquely determined by a finite portion of itself. All superresolution methods discussed in this thesis make use of known time-Umited (spacelimited) constraints as well as known portions of the frequency spectrum to achieve super-resolution. Bandlimited extrapolation is essentially the same problem as super-resolution with the domains reversed. Using a finite portion of an analytic time or space domain function, and known bandlimited constraints, the entire analytic function can be restored. To avoid confusion, this thesis will deal only with the super-resolution problem, although all of the super-resolution methods discussed can be used for the bandlimited extrapolation problem as well. There are many applications for super-resolution in addition to the example given above. Super-resolution can be used to improve the optical resolution of a diffraction limited object. Any optical imaging system which has a finite aperture (such as a camera lens) will have a diffraction limit (only a finite range of spatial frequencies are passed). A space-limited object must have an infinite frequency spectrum, therefore a diffraction limited system will reduce the resolution of a space-limited object, such as an antenna. 15 Super-resolution also has applications in geophysical imaging. Using samples from a small portion of the frequency response of a finite portion of the earth, superresolution can restore the entire frequency spectrum and thus provide a high reso lution model of underground structures. Super-resolution has found similar applica tions for limited angle tomography in the field of medicine. In some cases super-resolution can reduce data storage requirements. Since superresolution can be used to restore the entire function from the partial information, only a small portion of a sampled analytic function needs to be stored. In Chapter 2 of this thesis a new super-resolution method which we call the direct method will be presented and analyzed. Chapter 3 will examine the popular Gerchberg algorithm for super-resolution. The Gerchberg algorithm will be related to the direct method and the overdetermined case will be re-evaluated. Chapter 4 will provide the first mathematical justification for early termination of the Gerchberg algorithm. The design of termination schemes for the Gerchberg algorithm will be discussed and three specific schemes will be implemented and tested. Chapter 5 will present a new non-iterative method based on the direct method of Chapter 2 and the early termination criteria of Chapter 4. We call this new method the SVD method because it incorporates SVD techniques. The SVD method is so fast and insensi tive to error that it should have significant impact on the field of super-resolution. Chapter 6 will summarize the results in this thesis and draw a few conclusions from them. 16 CHAPTER 2 THE DIRECT METHOD This chapter will introduce a new super-resolution method which we call the di rect method. The direct method is a non-iterative method which extrapolates func tions using known time-limited constraints, available frequency domain samples, and the Discrete Fourier Transform (DFT) coefficients which relate the known frequency samples to the unknown time domain samples. In Section 2.1 the direct method is described. Section 2.2 explores possible uses for the direct method. Section 2.3 begins with an explanation of sources of errors. The effect of the errors is then illustrated and error bounds are discussed and implemented. Section 2.4 generalizes the direct method to the overdetermined case and a least squares solution is implemented. Fi nally, in Section 2.5, the results of extensive experimental trials are presented and analyzed. 2.1 Description Suppose there is a discrete periodic time domain function / with a period of N samples which is known to be time-Hmited to n non-zero samples. Also suppose that we know n samples of this function's periodic frequency spectrum F. The direct 17 method of super-resolution utilizes the relationship between the time and frequency coefScients of the DFT and a-priori knowledge about the duration and location of the function in the time domain to solve for the n unknown time domain samples given n known frequency samples. For example, consider a function / which is known to be time-limited to the first 3 time domain samples, has a periodic length of 8 samples, and only the 1st, 2nd, and 8th frequency domain samples are known (only the low frequency components are known). / = [/i/2 /30 0000] (2.1) F = [Fi F2 F3 F4 FS Fe Fr Fg] (2.2) Each element of F is related to the elements of / by the discrete Fourier transform Fm = E (2.3) n=0 where Wn = For this example, N = 8, so = Ws = Since /i thru fs are all known to be zero, we have the following equations for Fj, F2, and Fg. Fi = fiW^ + f2W^ + fsW^ (2.4) F2 = fiW° -f f2W}j -f (2.5) Fs = /i< -h f2W'^ + (2.6) Since Fi, F2, and Fs are known, we have a set of 3 linear equations with 3 unknowns (/15 /25 and /s). We can use Gaussian elimination to solve the system Ax = b for the 18 unknown vector x where; A= 11 1 Wn 1 Wjf 1 • X = ' h /2 . • b= . ' Fi ' F2 Fs Having solved for the unknown time domain samples, the entire time domain function / can be transformed via the DFT to yield the entire discrete frequency spectrum F. Thus given partial frequency information and knowledge about the location and duration of the time domain function /, we can determine / and F completely. In the example above, the system of equations is exactly determined (the number of known frequency samples is equal to the number of unknown time domain samples). Since the rows of A are independent, a unique solution for x is guaranteed. When the number of unknown time domain samples is greater than the number of known frequency samples, the system Ax = bis underdetermined and the solution for x will not be unique. To avoid the underdetermined case the sampling rates in each domain can be adjusted such that the number of known frequency samples is equal to the number of unknown time domain samples. The overdetermined case and the least squares solution will be discussed in detail in Section 2.4. 2.2 Errors And Error Bounds The direct method is subject to many kinds of errors and can be quite sensitive to them. This section will introduce and discuss sources of error and their effects on the direct method. Also, a-priori and a-posteriori error bounds for the direct method's solution will be derived and implemented. 19 2.2.1 Types of Errors Two commonly encountered sources of error are random noise and measurement errors. These errors may be incurred when sampling the frequency response directly, or they may propagate to the frequency domain from noisy or poorly measured time domain samples. Another type of error to which frequency samples may be subjected is aliasing error. Aliasing error occurs whenever a non-periodic sequence is modeled as a periodic sequence, which is usually what we need to do to implement the direct method. The following example illustrates how aliasing errors could occur. Suppose we want to reconstruct the rectangular waveform w { t ) shown in Figure 2.1 from samples of its continuous frequency spectrum W{f) (Figure 2.2) taken between -1.5 Hz and -|-1.5 Hz. Suppose we also know that w{t) is time-hmited to between -.5 and +.5 seconds, and since w{t) is time-limited we know that W{f) cannot be bandlimited. The time domain sampling rate has been arbitrarily chosen as 32 Hz for this example. Since the direct method is based on the DFT, it assumes that w { t ) is the periodic function Ws( t ) (Figure 2.3) with periodic frequency spectrum Ws{f) (Figure 2.4). By using the direct method we are modeling w(t) as one period of Ws{t). If error-free samples from the periodic spectrum Ws{f) are used, we can recon struct Ws{t) perfectly, but remember we are not using samples from Ws{f) we are using samples from W{f). From Figure 2.5 it is apparent that W{f) and Ws{f) are Time (seconds) Figure 2.1: The original time-limited function w ( t ) . 21 W(f)=Frequency Spectrum of w(t) 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -50 -40 -30 -20 -10 Frequency (Hz) Figure 2.2: The infinite frequency spectrum W { f ) of the time-limited function w { t ) . ws(t)=Periodic Representation of w(t) 1.4 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -10 -5 0 5 Time (seconds) Figure 2.3: The periodic time-limited function Ws{t). 10 Ws(f)=Frequency Spectrum of ws(t) 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4'-50 -40 -30 -20 -10 0 10 20 30 40 Frequency (Hz) Figure 2.4; The periodic frequency spectrum Ws{f) of Ws{t). 50 24 Dotted=Wsw. Dashed=W(f). Solid=AIiasing Error. 1.2 1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 Frequency (Hz) Figure 2.5: Aliasing error between continuous frequency spectrum W { f ) and discrete periodic frequency spectrum Ws{f). 25 not exactly the same over the frequencies of interest. The difference between W { f ) and Ws{f) is the aliasing error. Whenever samples from a non-bandlimited continuous frequency spectrum are used in the reconstruction, aliasing error will occur. The aliasing error can be reduced by increasing the sampling rate in the time domain, however this will increase the number of unknown time domain samples which, as will be shown later in this chapter, increases the direct method's sensitivity to errors dramatically. A third type of error often encountered is computer roundoff error. Although roundoff error in the frequency samples used for the reconstruction is usually neg ligible compared to noise and aliasing error, roundoff error on the elements of the matrix A (the DFT coefficients) can become significant. As the periodic length N of the signal gets large, the DFT elements Wn get closer together which increases the condition number cond(A). 2.2.2 Effects of Errors The effects of errors in the known frequency samples will be illustrated by an example. Consider the function / from Section 2.1 which was time-limited to the first 3 samples and had a periodic length of 8 samples. For this example we will let the first 3 samples equal 1. / = [1 1 1 0 0 0 0 0] (2.8) 26 Recall, we do not know the values of /, only that thru fs are zero. We want to determine / and F completely, but we know only the first, second, and eighth samples of F (the periodic frequency spectrum of /): Fi F2 Fs b= = 3.000 1.7071 - 1.7071i 1.7071 + 1.7071i (2.9) The matrix A is the same as before. A= 1 1 1 Wn 1 WJ, 1 1 1 1 1 0.7071 - 0.7071i -i 1 0.7071 + 0.707h' i (2.10) Solving the system Ax = b we get ' 1.000 " /i X = = /2 1.000 (2.11) 1.000 h which is the true solution for x. To illustrate the effect of errors, the vector r, consisting of samples of complex Gaussian noise of zero mean and variance 0.01, is added to the known frequency samples, b. r= I" -0.1140 -0.1435Z " -0.0516 + 0.0825i 0.0316 + 0.0673i b = b+ r = 2.8860 - 0.1435i 1.6555 -1.6246i 1.7387+ 1.7744i (2.12) (2.13) Now the system Ax = 6 is solved yielding: X= 0.8254 - 0.2269i 1.2329 + 0.4268i 0.8277 - 0.3434i (2.14) 27 The resulting error in the solution due to the input error r is given by e. -0.1746 - 0.2269? 0.2329 + 0.4268 -0.1723 - 0.3434i e = x-x= (2.15) The output error e for this example is relatively large compared to the input error r. It is shown in the next subsection that this amplification of error is limited by the character of the matrix A. 2.2.3 Error Bounds We would like to compute bounds for errors in the solution x due to input errors on the known frequency samples 6, rounding errors on the DFT coefficients of A, or both. Mathematical bounds have already been established for errors in the solution to systems of linear equations, hence these known bounds can be used to bound the absolute error, ||e||, and the relative error, ||e||/||a;||, of a reconstructed function. First consider the solution of Ax = b when there are errors in the known frequency A A samples b. As in the previous example b = b + r where r is the error in the frequency samples, and x = x + e where e is the error in the solution. Now substituting for x and b the system becomes: A{x + e) = (b + r) (2.16) Since this system is linear, and since we know that Ax = b, the system can be separated into two parts. Ax = b , Ae = r (2.17) 28 From the second part of the previous equation: e = A-V (2.18) Given that A~^ is bounded, then there exists a finite Ci such that: IIA-'rll < c.||r|| (2.19) Define ||^~^|| as the smallest Ci such that Eq. 2.19 holds. Then (|^~^r|| < and we have the following bound for the absolute error. ||e||<p-^|||H| (2.20) l|e|| ^ ll^llll^-^IIIMI .oon Eq. 2.20 is equivalent to: II where the left hand side is the relative error. Now given that A is bounded, then there exists a finite C2 such that: llAxll < C2llxlj (2.22) Define ||A|| as the smallest C2 such that Eq. 2.22 holds. Then ||6|| = ||Aa;|| < ||A||||a:||. Substituting ||fe|| for the denominator in Eq. 2.21 we obtain the following relative error bound for perturbations in b. g<l|A|l||^-'||M (2.23) The term ||A||[|A~^|| is known as the condition number of the matrix A [5], and is denoted as cond{A). Since the condition number can vary with the choice of norm, 29 1 another condition number, condt:{A), is defined as the ratio of the largest singular value of A to the smallest singular value of A [3]. The justification for the use of cond^[A) is based on the singular value decomposition of A, which will be discussed in detail in Chapter 5. A similar (but longer) process is used to derive the following generalized relative error bound for errors in A and/or b where 6x, Sb, and SA are perturbations in x, b, and A respectively. ||e|| Ikll ||fe|| , ||x|| - 1 amd(A) (\Ml I ll^ll 11^6^ l|6|| j ^ ' The proof for this error bound has been omitted, but the interested reader is referred to Atkinson [3] for details. An alternate to the a-priori bounds given by Eq's. 2.23 and 2.24 is an a-posteriori bound. The bound suggested by Aird and Lynch [4] was implemented and tested. A The error e = — a; for the solution to the problem Ax = b can be bounded as: (1+ r ) - " "-(i-T) where C = A~^ is the computed inverse of A, T = ||CA — /|| < 1, and the residual r = h — Ax. Aird and Lynch [4] have implemented the a-posteriori bound for real numbers as follows. First, assume that all elements of vectors and matrices are real and that A is error free. Compute C = A~^. Assume that the magnitude of the error on 30 each element of b is less than some known positive number . Since b and K are known, each element of h can be confined to an interval [6;, bu] with the corresponding element of b in the center of this interval. Next, using the endpoints of [6i,6u], r = h — b can be confined to an interval [r;,ru]. Now, using the extreme values of [r/, r^] and appropriate matching of signs in the inner products, Cr can be confined to an interval C[r/,r„] = [Cr/,Cru]. Now substitute the minimum value for ||Cr|| in the lower bound and the maximum value for ||Cr|| in the upper bound. If zero is in the interval containing a component of a vector, then the lower bound for that component is zero. Aird and Lynch [4] have shown that this bound can provide a marked improvement over the a-priori bound when used for applications with real numbers. For our application it must be assumed that all components can be complex. We adapt the a-posteriori bound of Aird and Lynch to complex elements as follows. Again, we know b, and we know that the magnitude of the error for each element must be less than K. Since the components of b are complex, we can no longer confine the elements of b to an interval. However we can confine each element of b to a circular region whose radius is K and whose center is the corresponding element of b. Next we confine each element of the residual r = 6 — 6 to a circular region whose radius is K and whose center is zero. Since we know that the magnitude of each element of r must be less that K we can find the worst case ||r|| (the largest possible ^In a practical situation, x and b are unknown. Hence the bound in Eq. 2.25 cannot be imple mented directly. However, it may be reasonable to assume that the error on each element of b is less than some fixed value. Such a situation arises with measurement errors. 31 ||r||) for whichever type of norm we choose. Next we compute C = A~^ as before and use the definition of the matrix norm to bound Cr by ||Cr|| < ||C||||r|| where ||r|| is the largest possible ||r||. Substituting the bound for ||Cr|| into the a-posteriori error upper bound, it becomes 11^-^11^(1?^ (2.26) which is identical to the a-priori upper bound except for the term in the denominator. Since 0 < r <1, this bound when adapted for complex elements is, at best, equivalent to the a-priori bound and therefore not worth computing. 2.3 The Least Squares Solution Suppose the number of known frequency samples is greater than the number of unknown time-domain samples. In this case the system of equations Ax = h is overdetermined. If the known frequency samples contain error there may be no solution X which satisfies the entire set of equations and hence the direct method will fail. One way to get around this problem is to simply throw away the extra frequency samples and use the direct method to solve an exactly determined set of equations. A better idea is to use the extra samples and the redundancy which they provide to compute a least squares solution [5] for x. The least squares solution to the system AmXn^nXl ~ ^mxl ? ^ (2.27) 32 is that X which minimizes (b — Ax)^(b — Ax). This can be computed by the following equation [5]. X = {A'^A)-'^AH (2.28) When using MATLAB [6] the command x = A\b, which uses a subroutine based on Gaussian elimination, gives a numerically more accurate solution than the previous equation which requires computing an inverse. 2.4 Results From Experimental Trials Over 160 thousand experimental trials were performed to find the practical limits of the direct method and the least squares method. The trials were designed to answer a number of questions; we wanted to know how the SNR affected the accuracy of the results and to be able to determine a minimum SNR for a desired accuracy; we wanted to determine whether the least squares method produced better results than the direct method; we wanted to evaluate the tightness of the error bounds presented in Section 2.2. The trials were divided into 2 cases. The Case 1 trials were designed to test the direct method's performance in the presence of Gaussian noise. These trials were intended to give a general idea of the magnitude of the errors to be expected for a given set of values for the total number of samples, N, the number of unknown time domain samples, n, the number of known frequency samples m, and SNR. The Case 2 trials tested the same things as the Case 1 trials except the noise was uniform. The 33 uniform noise also provided a more realistic setting in which to test the tightness of the error bounds because the magnitude of the input errors (the uniform noise samples) could be absolutely bounded for an entire set of trials. 2.4.1 Case 1 Trials—Gaussian Noise For the Case 1 trials, values of N, n, m, and cr (standard deviation of Gaussian noise distribution) were fixed and sets of 1000 trials were performed. Each trial pro ceeded as follows. First a random periodic time-limited signal / was generated. Each of the first n samples of / were randomly chosen from a real uniform distribution between 0 and 1 (thus a substantial DC component was guaranteed). All other sam ples of / were set to zero. Next, the periodic frequency spectrum F was generated by computing the DFT of the time-limited signal /. The use of the DFT precluded the possibility of introducing any aliasing error in the frequency domain. A vector of complex Gaussian noise samples^ of zero mean and standard deviation a were then added to the frequency samples, and the (m + l)/2 lowest non-negative frequency samples (including the DC sample) were selected for the reconstruction. Since the function / was known to be real, the complex conjugates of the known frequency samples (except the DC sample) were computed and used as the corresponding neg ative low pass samples to provide a total of m known frequency samples. Next the direct method was used to form the estimate / of / (if m > n the least squares ^Each noise sample was the sum a + jb, where a and 6 were chosen randomly from the Gaussian distribution f(x) = —e"^ 34 A solution was computed) and the norm of the error j|/ — /|) was computed. All vector norms for the Case 1 and Case 2 trials were computed as the Euclidean or 2-norm given in Atkinson [5] as m \ (2.29) j=i where Vj is the jth element of the vector V. Next the signal to noise ratio for the trial was computed by the following formula. SNR = (2.30) To implement the a-priori error bound, K was taken as the magnitude of the largest element of the noise vector. (2.31) = worst (2.32) 2=1 For the least squares solution the relative error bound was computed. Upon completion of 1000 trials various statistics (mean and standard deviation of the error norm ||/ — /||, SNR, a-priori bound, etc ...) were calculated and stored. The number of violations of the error bounds, if any, was also stored. 2.4.2 Case 2 Trials—Uniform Noise The Case 2 trials were similar to the Case 1 trials except for the noise. The noise was complex with the magnitude and phase components uniformly and independently distributed between [0, K] and [0,27r] respectively. Values for N, n, m, and K were 35 fixed and each trial was performed in the same manner as the Case 1 trials. As in the Case 1 trials the error bound was computed by Eq. 2.31 ||/-/||<||A-^||||w..|| (2.33) m VwOTSt = (2.34) W where K was the maximum possible magnitude of a noise sample. For the least squares case, this bound was computed using the pseudoinverse of A. As in the Case 1 trials, various statistics were compiled upon completion of 1000 trials. 2.4.3 Results Some of the statistics compiled from the Case 1 and Case 2 trials are shown in Tables 2.1 and 2.2 respectively. Each entry in the statistics columns of these tables represents a statistic calculated from 1000 independent trials. Each of these entries is accurate to -klfyjnumber of trials = l/\/lOOO or about ±3%. For this reason only two significant digits are given for these entries. In Table 2.1 the least squares results for fixed values of N, n, and SNR are given in the same row as the direct method for easy comparison. The condition number cond»[A) was calculated for a variety of values for N, n, and m. The results are tabulated in Table 2.3. After studying the three tables it becomes clear that the factor which has the greatest effect on the size of the error in the solution is the number of unknown time domain samples (n). For example, from Table 2.3, for = 16, n = 3, m = 5 cond^:(A) = 14.3396. By increasing the number of unknowns by 2 (N = 16, n = 5, 36 Table 2.1: Results from Case 1 trials. N n (7 8 8 8 8 16 16 16 16 16 32 32 32 32 32 32 32 32 32 32 32 32 64 64 64 64 64 64 128 128 128 128 256 256 256 256 256 3 3 3 3 3 3 3 5 5 3 3 3 5 5 5 7 7 7 9 9 9 3 3 3 5 5 5 3 3 3 5 3 3 3 3 5 E-4 E-3 E-2 E-i E-3 E-2 E-4 E-3 E-4 e-3 E-2 E-6 E-5 E-4 £1-6 E-5 E-4 E-6 E-^ e-4 E-b ^-4 e-3 E-6 e-5 E-4 E-a E-5 e-4 E-e E-a e-5 E-4 e-4 e-6 a — pn. mean mean mean error \\error\\ \\error\\ \\error\\ m = n m = n + 2 m = n-\- 4 bound 9.5£-4 9.5JE;-3 9.3^-2 QAE-'^ 3.8E-'^ S.9E~^ 3.9E-^ 3.1E-^ S.IE-'^ 1.6E-'^ 1.6E~^ 1.6^0 5AE-^ 5.5E-^ 5AE-'^ 7.2E-2 7.0E-'^ 7.1E° 4.8^-1 4.8E° 4.8E^ GAE-" 6AE-^ 6.5E-'^ 8.9E-^ 8.8^-1 8.9E° 2.6E-' 2.6^-2 2.5E-'^ 1AE° l.OE-'^ l.OE-^ 1.0 E° l.OE^ 2.3E^ i.GE-'^ 4.6E-^ 4.5E-^ 4.6E-'^ 1.8E-^ l.8E-^ l.SE-'^ I.IE-^ I.IE-^ 7.2E-'' 7AE-^ 7.2E-^ l.8E-^ 1.9E-^ 1.9£-^ 2.1E-2 2.1i;-i 2.1^;° 1.2E-'^ 1.2E° 1.2E^ 2.8E-' 2.9E-^ 2.8E-1 3.0^-2 3.0^;-^ 3.0£;° \.2E-' 1.2E-^ 1.1^-1 4.9E-1 4.4^;-^ 4.8E-^ 4AE~'^ 4.5E° 7.9E° l.ZE-^ lAE-^ l.ZE-^ lAE-'^ 4AE-'^ 4AE-^ 4.ZE-'^ l.5E-^ l.bE-'^ 1.8E-'' 1.8JB-2 i.8£;-^ 2.8E-'^ 2.7E-^ 2.8E-^ 2.1E-^ 2.0E~^ 2.0£'-i 9.2E-^ 9.2E-^ 9.3E-^ 7.ZE-^ 7.SE-^ 7.5E-^ 4.6E-^ 4AE-^ 4.6£;-^ 3.0iJ-^ 2.9E-^ 2.8E-^ 7.2^-2 1.2^;-^ i.i^;-2 1.1^;-^ i.2i;° 1.1^;° mean Avg. \\erroT\\ m = 7i + 10 sr^R 7.5E-2 7.6E-^ 4.1E-^ 4.2E-^ lAE-^ 1.5E-^ lAE-^ 2.9E-^ 2.9E-^ S.OE-^ l.2E-^ l.SE-^ l.2E-'^ 1.2E-'' 1.2E-^ 1.2E-^ 2.0E-2 4.9iJ-4 4.8i;-3 4.7E-^ 4.8iJ-i S.2E-'^ 6.^E-^ l.OE-^ 2.bE-^ 1.8E-® 9.&E-^ 9.7E-^ 2.8E-^ 4.9E^ 4.7E^ 4.8E'^ 4.8E^ G.GE" 6.5E^ 6.5E'^ lAE^ lAE"^ GSE" 7.2E^ 7.1E'^ 2.1^13 2.11;" 2.0E^ 3.5£i3 3.4i;" 3.5i;® 4.8E^^ 4.8^" 4.8E^ 7.3E''' 7.6E^ 7AE^ 2AE^^ 2.5£" 2.5£® 7.3E^'^ 7.3E^° 7.2£® 2.6E^^ 7.5^1^ 7.SE^° 7.5E^ 7.7E^ 2.7E^^ A''=periodic length of function, ra=number of unknown time domain samples, m=number of known frequency samples, (T=standard deviation for noise samples. 37 Table 2.2: Results from Case 2 trials. N n K 8 8 8 16 16 16 16 16 16 32 32 32 32 32 32 64 64 64 64 64 128 128 128 128 256 256 256 3 3 3 3 3 3 5 5 5 3 3 3 5 5 5 3 3 3 5 5 3 3 5 5 3 3 5 l.OE-^ l.OE-^ l.OE-^ LO^;-" l.OE-" 1.0£^-3 l.OE-^ l.OE-"^ l.OE-^ 1.0£;-^ l.OE-"^ l.OE-3 l.QE-^ 1.0£;-® 1.0£;-^ 1.0£:-^ i.oE-4 1.0£;-3 l.OE-^ 1.0E-® l.OS-" l.OE-'^ 1.0^;-® 1.0£;-^ l.QE-" 1.0^-'' 1.0£;-® a — pri. mean a — pri. mean bound ||error|| bound ||error|| m = n m = n m = n-\-2 m = n + 2 7.9^-^ 7.9E-^ 7.9E-2 3.3E-3 Z.ZE-^ 2.4E-3 2.4E-2 2.4^;-^ 1.3£;-^ 1.3E-2 1.3^;-^ 4.3^-3 4.3£;-2 4.3^-^ 5.4^-^ 5.4E-2 5.4^-1 7.0E-2 7.0E-1 2.2E-'' 2.2E-'^ l.\E° 1.1£;\ 8.6E-''^ 8.6£;-i 1.7^1 2.3E-'^ 2.3E-^ 2.1E-^ 9.0^;-'' 9.3^;-^ 9.2E-^ 5.3£;-3 4:.9E-^ 3.8£;-'' 3.7£;-3 3.6^-2 9.1^-^ 8.9^;-^ 8.8^;-^ 1.5£;-^ I.5£J-2 \.%E-^ l.5E-^ 1.4^-^ ME^ 5.7E-^ 2.4£;-^ 2.3^° 2.4^"^ 2.4£;-i 3.8^;° 2.5£;-^ 2.5E-^ 2.bE-^ 9.2^-^ 9.2£;-4 9.2E-^ 3.6£;-^ 3.6£;-^ 3.6E-^ 3W^ 3.8E-^ 3.8£;-2 6.7£;-^ UE-^ 6.7^-^ llE=^ i.s^;-^ l.hE-^ I.IE'^ 1.1^;-^ QJE^ 6.1^-2 1.8^-^ 1.8.E° 2.4^"^ 2.4^"^ 2.9E° 5.7^;-® 5.1 b.lE'^ 1.9^-^ 1.9£;-^ 2.0E-^ 6.2£;-® 6.3^-^ 8.0^-^ l.m-^ 7.6^;-3 \.2E-^ l.2E-^ 1.2^-^ Z2E=^ a.o^;-^ Z.IE-'^ 1.9E-^ 2.0£;-2 TIE^ 1.3E-^ S.S^-^ 3.1g-^ S.IS"^ 5.1£;-2 S.OE'i A'^=periodic length of function, n=number of unknown time domain samples, m=number of known frequency samples, /('=maximum magnitude of each noise sam ple. 38 Table 2.3: Condition number of A. N n m 8 8 16 16 16 16 32 32 32 32 32 32 32 32 32 64 64 64 64 64 64 64 64 64 64 128 128 128 128 128 128 128 128 128 128 128 128 3 3 3 3 5 5 3 3 3 5 5 5 7 7 7 3 3 3 3 3 5 5 5 5 5 3 3 3 3 3 3 5 5 5 5 5 5 3 5 3 5 5 7 3 5 7 5 7 9 7 9 11 3 5 7 13 19 5 7 9 15 21 3 5 7 13 19 35 5 7 9 15 21 37 Condt,{A) i.meE^ 2.7526E° 5.5251£;^ 1.4340£;^ 4.2232^2 5.3563E^ 2.3023E^ 2.9651^1 SMUE^ 1.2%6E^ 3.6829i;2 l.m6E^ 2.1571E^ 9.3053£;2 2.6064£:2 1.2482E2 3.3434£;i 1.4613£;^ 1.5284E® 2.3816£;^ 7.2mE^ 7.2735E^ 1.5628£;2 3.7319£;3 1.0498£;3 5.0601E2 1.3974^2 6.3811£;^ 1.7433£;i 2.4949E® 3.9564^5 1.2343£;® 1.3533£^^ 3.2368£;2 2.7177£;2 N n m Cond^:{A) 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 256 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 3 3 3 3 3 3 3 5 5 5 5 5 5 5 7 7 7 7 7 7 7 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 7 3 5 7 13 19 35 55 5 7 9 15 21 37 57 7 9 11 17 23 39 59 3 5 7 13 19 35 55 5 7 9 15 21 27 57 97 97 1.4937^;^ 4.2065E3 2.0308£;3 5.6536E2 2.6143£;2 7.5211£;i 2.9280E^ 4.0120E^ 6.3913^;® 2.0059E® 2.2577E^ 5.6044E^ 5.3723^3 8.5459£2 4.0843^1° 4.5023^9 1.0405^;^ 5.7849^^ 8.5010E® 3.0422E® 2.1070E'' 5.9758E^ 1.6833E'' 8.1302E3 2.2680E3 1.0521^3 3.0704^2 1.2295£;2 6.4274ii;8 1.0251^® 3.2220£;^ 3.6512E® 9.1551£;® 3.2897E® 1.5676E'' 1.7052^3 7A286E'^ iV=periodic length of function, n=number of unknown time domain samples, m=number of known frequency samples. 39 m = 5) cond^{A) = 422.3243, an increase by a factor of about 30. As another example from Table 2.3, for N = 128, ra = 3, m = 5 condt,(A) = 1049.8. Again just by increasing n by 2 (iV = 128, n = 5, m = 5) cond^{A) jumps to 2,494,900, an increase by a factor of about 2377. The condition number cond^{A) is also dependent on the periodic length N and the number of unknown time domain samples n. For example, if n and m are both fixed at 3, doubling N tends to increase cond^{A) by a factor of 4. If n = 3 and m = 5, doubling N still tends to increase cond^,{A) by a factor of about 4. If n and m are fixed at 5, doubling N tends to increase cond^{A) by a factor of about 16. If n = 5 and m = 7 doubling N will also tend to increase cond^,{A) by a factor of about 16. While the effects of N and n appear to be linked, the effect of adding extra known frequency samples seems to be independent. Referring again to Table 2.3, if = 64 and n = 5, the addition of two extra known frequency samples from m = 5 to m = 7 decreases cond^,[A) by a factor of about 6.4. Using 4 extra samples (m = 9) decreases cond^,{A) by a factor of about 21. Using 10 extra (m = 15) and 16 extra (m = 21) decreases cond*(A) by factors of 210 and 978 respectively. Decreasing the norm of the error by a factor of 978 translates to about 3 more significant digits in the solution. Increasing the magnitude of the noise appears to cause a linearly proportional increase in the magnitude of the error. Table 2.1 shows that increasing the standard 40 Norm of Error in Solution. L=64, n=3, m=3, K=.0001. 0.108 A-Priori Bound 0.054 300 400 500 600 800 900 1000 Trial# Figure 2.6: A-priori error bound and error norms for 1000 trials (exactly determined case). deviation of the noise by a constant factor results in the size of the error in the solution increasing by the same factor. In over 160,000 independent trials, not a single violation of any of the error bounds occurred. For the Case 2 trials using the direct method, the a-priori error bound was typically 3 to 4 times as large as the mean of the error norm. Figure 2.6 shows the error norms and a-priori bound for a typical 1000 trial run. For several of the trials in Figure 2.6 the norm of the error was greater than 80% of the bound, suggesting that this bound is about as tight as one can expect while still being guaranteed. 41 Norm of Error in Solution. L=64, n=3, m=5, K=.0001 0.03 A-Priori Bound 0.015 0 0 100 200 300 400 500 600 700 800 900 1000 Trial# Figure 2.7: A-priori error bound and error norms for 1000 trials (overdetermined case). For the Case 2 trials using the least squares method, the a-priori bound was implemented using the pseudoinverse of A and was generally 4 to 6 times as large as the mean of the error norms. Although this bound is not guaranteed for the least squares implementation, there was not a single violation in 27000 trials. Figure 2.7 shows the error norm and a-priori bound for a typical 1000 trial run using least squares method. All variables for this example were the same as those in Figure 2.6 except the number of known frequency samples m. The lower error bound and error 42 norms of Figure 2.7 illustrate the value of using any extra samples in the least squares method. 2.4.4 Analysis of Trial Results Using Tables 2.1 thru 2.3 one can estimate how well the direct method or the least squares method will work for a given application. The precision in the solu tion is limited by the condition number of A and the size of errors in the known frequency samples. The size of the errors in the frequency samples may not always be controllable, but there are ways to control the size of cond^,(A). The most effective way to control the size of cond*(A) is to limit the number of unknowns n in the time domain (preferably to fewer than 7 samples). This can be accomplished by lowering the time-domain sampling rate. If an application requires a large number of unknown time domain samples (such as resolving fine details in a complicated time-limited signal) then another extrapolation method should be used. Another way to improve the condition of A is to use any and all extra frequency samples when they are available. The number of available samples will be limited by the extent of the frequency spectrum which is available. Decreasing the sampling interval in the frequency domain will increase the number of available frequency samples m without affecting n, but the the resulting increase in the total length N will mostly offset the value of using the extra frequency samples. 43 Section 2.2 alluded briefly to the unique problems caused by aliasing. To decrease aliasing errors the sampling frequency must be increased. Increasing the sampling frequency will necessarily increase n, which then increases cond^{A). We have al ready seen the drastic effects caused to cond^{A) by even slight increases of n. Our experimental simulations have shown that reducing the sampling frequency by 1/2 roughly doubles the magnitude of the aliasing error. The additional aliasing error is more than offset by the decrease in cond^,{A) caused by reducing n, therefore it is better to limit n at the expense of additional aliasing error. 2.5 Summary This chapter has introduced a new super-resolution method which we call the direct method. This method uses DFT coefficients and known frequency samples to solve for the unknown time domain samples directly. Unlike other super-resolution methods ([2], [7]) which were first conceived in a continuous form and then adapted to the discrete form, the direct method was originally conceived in the discrete form. (In fact, there does not exist a continuous form for the direct method.) As a result, the direct method exploits the inherent structure of the discrete form to provide a very simple, very fast super-resolution method. Unfortunately, the direct method's usefulness is severely limited by its condition number, but it does represent a new way of looking at the super-resolution problem and it provides the foundation for more sophisticated super-resolution methods to be introduced in Chapter 5. 44 CHAPTER 3 THE GERCHBERG ALGORITHM The algorithm examined in this chapter was proposed independently by both Gerchberg [2] and Papoulis [7]. Since [2] predates [7], this algorithm for super-resolution will be referred to as the Gerchberg algorithm throughout this thesis. The Gerchberg algorithm can be used to extrapolate time or frequency domain functions from incomplete information. As in the previous chapter, only the frequency extrapo lation (super-resolution) problem will be considered, although the algorithm works essentially the same for both applications. This chapter begins with a review of the continuous and discrete forms of the Gerchberg algorithm. In Section 3.2, the relationship of the Gerchberg algorithm to the direct method is discussed and the exactly determined case is examined. In Section 3.3 the overdetermined case is analyzed and the correction energy described by Gerchberg [2] is re-evaluated. Computable error bounds for the Gerchberg algo rithm are introduced in Section 3.4. Finally, the advantages and disadvantages of the Gerchberg algorithm will be discussed. 45 3.1 Description of the Gerchberg Algorithm If a portion of the frequency spectrum of a time-limited object is known, and the location and extent of the object in time are also known, then under certain reason able conditions [8] the Gerchberg algorithm can be used to reconstruct the unknown portion of the frequency spectrum, thereby improving the object's resolution. The algorithm is iterative and each iteration consists of four steps. The first step trans forms the known portion of the frequency spectrum into the time domain. Step two sets the time domain function to zero outside the region in which it is known to be time-limited. Step three transforms this result back to the frequency domain and the fourth step replaces the known portion of the frequency spectrum into its appropriate location. This four step process is repeated starting with the new frequency spectrum estimate and iterates until satisfactory estimate is obtained. A flow chart for the Gerchberg algorithm is shown in Figure 3.1. The algorithm is better understood with the help of the following example. 3.1.1 Example 3.1.1 Consider a periodic time-limited signal which has been sent through a band-limited channel so that its high frequency components have been lost. The location and extent of the original signal in the time domain are known. The application of one cycle of the Gerchberg algorithm to this problem is illustrated in Figures 3.2-3.7. The known portion of the frequency spectrum is shown in Figure 3.2 (the dotted line Known Portion of Frequency Spectrum Estimated Frequency Spectrum Corrected Over Known Portion Fourier Transform Fourier Transform Estimated object Corrected to Zero Outside Known Extent Known Extent of Object Figure 3.1: Flow chart for the Gerchberg algorithm. 47 350 400 450 500 550 600 figure 3.1 (a) Figure 3.2: Known portion of frequency spectrum. 650 48 1.5 0.5 -0.5 350 400 450 500 550 650 600 figure 3.1 (b) Figure 3.3: Time domain representation of known frequency spectrum, indicates the original frequency spectrum which is to be reconstructed). Step one transformed the known portion of the frequency spectrum into the time domain as shown in Figure 3.3. Next, step two truncated the function to satisfy the known time-limited constraints as shown in Figure 3.4. Step three transformed this result back into the frequency domain as shown in Figure 3.5. Step four replaced the known portion of the frequency spectrum into its corresponding frequencies as shown in Figure 3.6. Beginning another iteration, step one transformed the new frequency spectrum estimate back to the time domain as shown in Figure 3.7. 49 450 500 550 figure 3.1 (c) Figure 3.4: Figure 3.3 truncated to time-limited region. 50 1.5 0.5 -0.5 350 400 450 500 550 figure 3.1 (d) Figure 3.5: Frequency spectrum of Figure 3.4. 600 650 51 1.5 0.5 -0.5 350 400 450 500 550 600 figure 3.1 (e) Figure 3.6: Known portion of frequency spectrum replaced. 650 52 450 500 550 figure 3.1 (f) Figure 3.7: Time domain representation of Figure 3.6. 650 53 3.1,2 Error Energy Reduction The convergence for this algorithm is based on reducing the error energy at each iteration. The following arguments were originally provided by Gerchberg [2]. The entire band-limited frequency spectrum of the object can be considered as the sum of the true spectrum and an error spectrum. Assuming the known portion of the frequency spectrum is free from error, the error spectrum will be zero over the known frequency region and it will be equal and opposite to the true spectrum outside the known region. Since the algorithm is linear, its effect on the true spectrum and the error spectrum will be independent. As long as the time-limited constraints are not underestimated the true spectrum will be unaffected by the algorithm [2]. Since the error spectrum has a finite length section equal to zero, it cannot be analytic, so its inverse transform (in the time domain) will be infinite in extent. From Parseval's theorem, the error energy at this point will be equal to the original error energy. After the time-limiting step all of the error energy outside the time-limited region will be lost, so the error energy will be less than its original value. Now since this function is time-limited, its Fourier transform in the frequency domain must be analytic, so it will have energy in the region where the true spectrum is known. When the known portion of the true spectrum is replaced, the error energy in this region will be lost. Therefore, at each iteration the error energy is reduced twice. 54 3.1.3 Discrete Implementation To implement the Gerchberg algorithm on a computer, the time and frequency domain functions must be represented by finite length vectors, thus a discrete version of the Gerchberg algorithm is required. For the discrete version, continuous time and frequency functions are modeled as discrete periodic vectors and transforms are performed using the DFT. The known information will consist of sampled data which can be obtained by sampling the continuous frequency spectrum directly. In the discrete form, the problem is set up exactly the same for the Gerchberg algorithm as it would be for the direct method. Both methods utilize the same sam pled frequency information and time-limited constraints; the same choices relating to sampling frequency and frequency spacing must be made. The two methods are also subject to the same types of errors. In particular, the discrete version of the Ger chberg algorithm will be subject to the same aliasing problems discussed in Section 2.3. 3.1.4 The Method of Alternating Orthogonal Projections The Discrete Gerchberg algorithm has been shown by Youla [8] to be a special case of the method of alternating orthogonal projections. If a vector / is known a-priori to belong to a subspace Pb of a parent Hilbert space H, but all that is known to the observer is its projection g = Paf, then the original vector / can be restored by the following three step iterative algorithm: 1) project the latest estimate /„ onto 55 Pb, 2) project this result onto (the subspace of H which is orthogonal to Pa), 3) add back the known vector g to obtain the new estimate fn+i- Each iteration of the algorithm can be expressed mathematically by the following equation. U , = g + PtPbfn (3.1) The known vector g is used as the initial estimate and the algorithm is allowed to iterate until a satisfactory estimate is obtained. If the known vector g is free from error and Ph f) P^ = {0}, then /„ will converge to f as n approaches infinity (see proof in Youla [8] for details). The operation of this algorithm and its relationship to the Gerchberg algorithm are illustrated by the following simple example in 2-space. 3.1.5 Example 3.1.2 The method of alternating orthogonal projections can be used to restore the orig inal vector / = [2,0] (shown in Figure 3.8) from g = [1,1] (its projection onto Pa). The vector / is known a-priori to belong to the normalized subspace Pb — [1,0]. The normalized subspaces Pa = [0.7071,0.7071] and A. Pa = [0.7071,-0.7071] are also shown in Figure 3.8 (all vectors and subspaces shown in Figure 3.8 are in the time domain). In terms of the Gerchberg algorithm, Pb is the set of functions time-limited to the first sample. Pa is the set of functions band-limited to the first sample, and ± Pa is the set of functions band-limited to the second sample. By inspecting the time domain representations of Pa and ± Pa-, this may not seem obvious, but their 56 Pa Pb IPd Figure 3.8: The method of alternating orthogonal projections. 57 Table 3.1: Result after each step of example in Figure 3.8. time domain q (known) : [1,1] step 1 1,0 step 2 [O.f5,-t).5] step 3 fl 5,0.5] step 1 [1.5,0] [0.75,-0.75] step 2 [1.75,0.25] step 3 step 1 [0.875, -0.875] step 2 [1.875,0.125] step 3 frequency domain 2,0 1,1 0,1 2,1 [1 5,1 (3,1.^ f 2,1.1 0,1.75 2,1.75 respective frequency domain transformations (obtained by the DFT) are [1,0] and [0,1]. The three steps of the algorithm are illustrated in Figure 3.8 for three iterations. The first step, projecting the latest estimate of / onto Pf,, is equivalent to timelimiting to the first sample (step two for the Gerchberg algorithm). The second step, the projection onto J_ Pa, is equivalent to zeroing the frequency spectrum over the region where it is known. The third step, adding back the known vector g, is equivalent to adding back the known portion of the frequency spectrum. The combination of the second and third steps of the method of alternating orthogonal projections are therefore equivalent to step four of the Gerchberg algorithm. A chart listing the result in both the time and frequency domains after each step of this example is shown in Table 3.1. 58 For this example the method of alternating orthogonal projections performs the same basic operations of time-limiting replacing the known frequency information, as the Gerchberg algorithm. In general, the method of alternating orthogonal pro jections can project into any space, not just time-limited or band-limited spaces, therefore the discrete Gerchberg algorithm is a special case of the method of alter nating orthogonal projections. 3.2 Relationship to the Direct Method There are many similarities between the Direct Method and the Gerchberg algo rithm. As was pointed out in the previous section, they have the same problem set up requirements and are subject to the same errors. In fact, for the exactly determined case, the Gerchberg algorithm will converge to the same result given by the Direct method (at least to the numerical precision of the computer). To explain why this is true, consider the exactly determined case where the known frequency samples are free from error. Neglecting computational roundoff errors, the Direct Method will provide the correct solution ^ . Furthermore, Youla [8] and Jones [9] have both shown that when the known samples are error free the Gerchberg algorithm will converge to the correct solution. Since there can only be one correct solution the two methods must provide the same solution for the exactly determined error-free case. ^Define the correct solution to be the exact solution to the system Ax = 6, or in the case of a system perturbed by errors, the exact solution to Ax = b. The correct solution to the system Ax = 6 is not necessarily the same as the correct solution to the error free system Ax = b. 59 Now consider the exactly determined case where the known frequency samples do contain errors. In this case there is a unique solution to the set of equations and the direct method will find it. This solution will not be the same as that given by error-free samples, but it will be the correct solution for the given frequency samples which have been corrupted by error. The Gerchberg algorithm will also converge to the correct solution for the given frequency samples; it will find the one function which satisfies both the time-limited constraints and the known frequency samples simultaneously. In other words, the known portion of the frequency spectrum of / which contains error can be thought of as a portion of the error-free spectrum of another time-limited function /. The direct method and the Gerchberg algorithm will both find the correct solution for the given information: /. Before demonstrating the equivalence of the direct method and the Gerchberg algorithm for the exactly determined case, a convergence factor will be defined. The purpose for defining a convergence factor is to track the rate of convergence of the Gerchberg algorithm. The convergence factor is defined as 'f = (3.2) 1/ — Jn-l\ where / is the solution which the Gerchberg algorithm will converge to and /„ is the estimate of the solution after n iterations. Obviously, computing the convergence factor requires knowing the solution / a-priori. When the vector estimate /„ is 60 converging in a straight line toward / the convergence factor can be defined as c/ = II/.-/.-I I II/-A-1II (3.3) where || * || indicates the standard Euclidean norm. When defined as Eq. 3.3, the convergence factor represents the fraction of the distance to the solution closed at each iteration. For example, a constant convergence factor of 0.5 means the difference between the latest estimate and the solution is being cut in half with each iteration. 3.2.1 Example 3.2 The direct method and the Gerchberg algorithm are used to reconstruct the timelimited function / = [12300000] (3.4) from the first, second and eighth samples of its discrete frequency spectrum which have been corrupted by noise. The known frequency samples are given as h. h—h-\- noise — 6.0000 + O.OOOOi 2.4142 - 4.4142i 2.4142 + 4.4142^ + 6.0582 + 0.00002 2.4180 - 4.3966e 2.4180 + 4.3966i 0.0582 + O.OOOOi 0.0038 + 0.0176i 0.0038 - 0.01762 (3.5) (3.6) 61 The solution / given by the direct method is as follows. /= 1.1268 + 0.0000i 1.8260 -O.OOOOi 3.1055 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi (3.7) The Gerchberg algorithm is allowed to iterate 10000 times and the convergence factor (Eq. 3.3) is computed after each iteration using / obtained by the direct method above. The estimate given by the Gerchberg algorithm after 10000 iterations is as follows. /loooo — 1.1268 +O.OOOOi " 1.8260 - O.OOOOi 3.1055 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi 0.0000 + O.OOOOi A The difference / — /loooo is on the order of 10" (3.8) A times the size of / which is nu merically equivalent to zero for the double precision computer system on which the reconstructions were performed. The convergence factor (Eq. 3.2) is plotted for the first 100 iterations in Figure 3.9. After about twenty iterations the convergence factor approaches a constant of about 0.0185. The reason for the convergence factor eventually approaching a constant will be discussed in Chapter 4. For now, the fact that the convergence factor will eventually become a constant will be used to prove that for the exactly determined Convergence Factor For Example 3.2 3 2.5 2 1.5 1 0.5 0 0 30 40 60 70 80 90 Iteration Number Figure 3.9: Convergence factor vs. iteration number for Example 3.2. 100 63 case the Gerchberg algorithm will converge to the same solution given by the direct method. Assume that the Gerchberg algorithm is converging toward the direct method's solution, /, (at a constant rate) after k iterations, with k finite. Then the distance between / and fk is given by the following. ll/-AII = l l / - A - . l l ( l - c / ) (3.9) l|/-A+n|| = ||/-/.||(l-c/r (3.10) Now it will be shown that Since the convergence factor is assumed to be constant, ||/-/.+i|| = ||/-MI(l-c/) (3.11) so Eq. 3.10 is true for n = 1. Now assume that it is true for some n > 1. = ll/-A+(.+i)ll= (3.12) ll/-A+«ll(i-c/) (3.13) = ll/-AII(l-c/)"(l-o/) (3.14) = (3.15) ll/-AII(l-c/)»" Eq. 3.10 is true for n -\- 1, therefore by induction it must be true for all n. Since 0 < c/ < 1, the term (1 — c/)" must converge to zero as n approaches infinity. Therefore ||/ — /fc+„|| must also converge to zero as n approaches infinity, so the 64 Convergence To Numerical Limit Of Computer 0.035 0.03 0.025 0.02 Convergence Factor 0.015 0.01 0.005 0 2000 2500 3000 3500 4000 4500 5000 5500 6000 Iteration Number Figure 3.10: Convergence to numerical limit of computer. Gerchberg algorithm's estimate /„ must converge to the direct method's solution / as the number of iterations approaches infinity if the convergence factor is constant. In practice, the numerical precision of the computer will limit how close the Ger chberg solution can converge to /. Figure 3.10 is a plot of the convergence factor for this example between iterations 2000 and 6000. This plot illustrates the convergence factor becoming unstable as f — fn nears the precision limit of the computer. At about 4800 iterations the Gerchberg algorithm finally reaches its numerical conver gence limit and will converge no further. 65 3.3 The Overdetermined Case As is the case with the direct method, when there are more known frequency sam ples than unknown time-domain samples the system of equations is overdetermined, so if there is any error on the known frequency samples there usually is no solution which perfectly matches the data. The Gerchberg algorithm will be unable to find a solution which satisfies the known frequency samples and the time-limited con straints simultaneously. In this case, the Gerchberg algorithm will provide a solution estimate but the correction energy, as defined by Gerchberg [2], will not converge to zero and the Gerchberg algorithm's estimate will not in general be the same as the least squares solution. Gerchberg [2] defined the correction energy as the amount by which the error energy is reduced with each correction, and he also showed that the correction energy must decrease with each iteration. Gerchberg [2] provided examples in which the correction energy failed to converge to zero and reasoned that this was caused by error on the known frequency samples. This is partly true, but the failure of the correction energy to converge to zero is also due to the fact that the system is overdetermined. This concept is illustrated by the following example. 66 3.3.1 Example 3.3 Consider the discrete time-limited function / and its discrete frequency spectrum '1• T 1 1 /= 0 0 0 0 .0 F == 3.0000 -h O.OOOOi 1.7071 - 1.707h' 0.0000 - l.OOOOi 0.2929 -t- 0.2929i l.OOOO-I-O.OOOOi 0.2929 - 0.2929i 0.0000 + l.OOOOi 1.7071 + 1.707h' (3.16) The frequency samples are corrupted by the following noise vector. noise = 0.3750 + O.OOOOi 1.1252 -t-0.3180i 0.7286 - 0.51121 -2.3775 - 0.0020Z -0.2738 4-1.6065i -2.3775 -F 0.0020^• 0.7286 H-0.5112i 1.1252 - 0.3180i (3.17) The Gerchberg algorithm was used to reconstruct / for an exactly determined case (using the first, second, and eighth corrupted frequency samples) and for an overdetermined case (using the third and seventh samples also). The correction energy was calculated at each iteration for the two cases and is plotted in Figure 3.11. It is apparent from the figure that the correction energy converged to zero for the exactly determined case despite significant error on the known frequency samples. Clearly, the correction energy converged to a non-zero value for the overdetermined case. While the least squares solution of the direct method finds the optimum solution to the overdetermined problem in the least squares sense, the Gerchberg algorithm 67 Correction Energy For Example 3.3 -T 1 1 1 1 0.2 1 r 800 900 0.18 0.16 0.14 0.12 0.1 0.08 0.06 Dotted Line:Overdetermined 0.04 0.02 0 0 Solid Line:Exactly Determined 100 200 300 400 500 600 700 Iteration Number Figure 3.11: Correction energy for Example 3.3. 1000 68 finds the optimum solution in terms of minimizing the correction energy. Which solution is better is a matter of debate, but it seems reasonable to suggest that they won't be radically different for most cases. As with the direct method, when the system is underdetermined (there are more unknown time domain samples than known frequency samples) there will not be a unique solution. The Gerchberg algorithm will provide a solution in this case but it is not guaranteed to be correct even if error free frequency samples are used for the reconstruction, therefore the underdetermined case should always be avoided. 3.4 Error Bounds for the Gerchberg Algorithm Youla [8] suggested a theoretical error bound based on the sin of the infimum of the angle between the subspaces Pb and ± Pa- For most applications this error bound would be difficult to visualize and more difficult to implement. For the exactly deter mined case, since the Gerchberg algorithm will converge to the solution given by the direct method, the error bounds given in Chapter 2 for the direct method also apply for the Gerchberg algorithm for the exactly determined case. These bounds (specifi cally the a-priori bounds given by Eq's. 2.20 and 2.23) are very easily implemented and were shown in Chapter 2 to be reasonably tight. For the overdetermined case, since the Gerchberg algorithm does not converge, in general, to the least squares solution, the least squares bounds would not necessarily apply for the Gerchberg algorithm, although it might be reasonable to use the least 69 squares bound as an estimate of the bound for the Gerchberg algorithm. It also might be possible to use the correction energy to estimate the error in the solution given by the Gerchberg algorithm. This is a topic for future research. 3.5 Summary, Advantages and Disadvantages This chapter examined the Gerchberg algorithm for super-resolution and related it to the method of alternating orthogonal projections and the direct method of Chapter 2. It was shown that for the exactly determined case, the Gerchberg algorithm will converge to the result given by the direct method. This relationship was used to establish easily computable a-priori error bounds for the Gerchberg algorithm. The correction energy described by Gerchberg [2] was reevaluated and linked to the overdetermined case. The one obvious disadvantage to using the Gerchberg algorithm is its slow con vergence speed. It takes at least several thousand times as long for the Gerchberg algorithm to converge to within one percent of its final value than it takes for the direct method to find the solution. In general, a problem which the direct method can solve in milliseconds will take minutes for the Gerchberg algorithm to solve. To this point, it would not seem that there would be any advantages to using the Gerchberg algorithm at all. For the exactly determined case and for the overdeter mined error free case, the Gerchberg method converges to the same solution given by the direct method, and for the overdetermined case with error, the Gerchberg 70 algorithm converges to a solution which is worse (in the least squares sense) than the direct method's solution. There is, however, one big advantage to using the Gerchberg algorithm when there is error on the known frequency samples, especially for problems where the condition number for the direct method is large. The Gerchberg algorithm can get closer to the true solution by terminating the iteration early. The reasons for this and the design of early termination schemes will be discussed in the next chapter. 71 CHAPTER 4 TERMINATION SCHEMES FOR THE GERCHBERG ALGORITHM Papoulis [7] states that the propagation of error through the Gerchberg algorithm can be controlled by early termination of the iteration. This is generally accepted as true. To date, however, there has been no explanation as to why early termination should limit the effects of error and no criteria offered as to the optimal termination point. This chapter will expand the model of Jones [9] to explain the conditions under which early termination would limit the effects of error, and the results of extensive experimental trials will show that these conditions seldom fail to occur. These results may be used as a basis for the design of termination schemes. Finally, three specific termination schemes are presented and tested. 4.1 Convergence For The Gerchberg Algorithm The following model was introduced by Jones [9] to analyze the rate of convergence for the Gerchberg algorithm. Define x as the solution to the unknown time domain samples which the Gerchberg algorithm will converge to as the number of iterations approaches infinity. Define fin as the matrix of DFT coefficients relating the unknown 72 frequency samples to the unknown time domain samples. The rows of fin correspond to the unknown frequency samples and the columns of On correspond to the unknown time domain samples. Define (f indicates conjugate transpose). Now the solution x can be expanded in eigenvectors of the matrix P x = f^aiVi (4.1) i=l where Vj is the jth eigenvector of P and the coefiicient ay is a real or complex number. The following example illustrates the meaning of the matrix On. Consider again the time-limited function / with discrete frequency spectrum F. /=[/i/2/30 0000] (4.2) F = [i^i F2 F3 F4 F5 Fq Fr i^s] (4-3) Suppose that only the first, second, and eighth frequency samples are known. The matrix 0 contains the DFT coefficients relating all of the frequency samples to all of the time domain samples. w^, w% • K ws, K Wf, W}, n n K Wj, Wf, wi' W}} ws, wi W}, w% wl wn wk' Wjf 0= wp W}, ws, Wf, wnf wA' WS' W0' wp Wn W^' Wj} w}? ws, ws,' wf . K K n- (4.4) Since the first three time domain samples are unknown, and the third through seventh frequency samples are unknown. On is the intersection of the first three columns and 73 the third through seventh rows of fl. Wf^ ^11= (4.5) Wk° W% From Jones [9], the estimate of x after r iterations is given by 71 Ir = E(1 - •*;)") + P'"' (4.6) where Aj is the eigenvalue corresponding to the jth eigenvector of P. From [9], since all eigenvalues of P satisfy 0 < A < 1, the term P'^xq will converge to zero as r —)• oo. From Eq. 4.6, it is apparent that the speed of convergence of each eigenvector is determined by the size of its corresponding eigenvalue. Eigenvectors with small corresponding eigenvalues will converge rapidly, while eigenvectors with large eigenvalues (close to 1) will converge slowly. Therefore, if the function x is concentrated in the eigenvectors with small eigenvalues, x can be recovered quickly, and if x is concentrated in eigenvectors with large eigenvalues it will take a large number of iterations to converge. In practice, there will be error in the solution x due to errors (noise and aliasing) on the known frequency samples. Hence, the solution can be represented as the sum of a true solution x and an error e. X = a; + e (4.7) 74 The true solution and the error can be expanded separately in eigenvectors of P. X = X + e = ^ ajVj + ^ i=i i=i (4.8) By terminating the algorithm early, the contribution to the solution x from eigenvec tors with large eigenvalues can be eliminated almost entirely. Therefore, whenever the error e is concentrated in the eigenvectors of P with large eigenvalues, the con tribution from e can be minimized by terminating the algorithm early. The results from the next section will show that the error will usually be concentrated in these eigenvectors, therefore justifying early termination. 4.2 Results From Experimental Trials Experimental trials were performed in order to determine how the true function X and the error e are usually distributed among the eigenvectors of P. These trials were made feasible by the fact that for the exactly determined case the Gerchberg algorithm will converge to the same result given by the direct method. Using the Gerchberg algorithm to obtain the final result to within a small tolerance would have required thousands of iterations per trial. 4.2.1 Procedure For each trial a time-limited function / was generated by choosing the non-zero time domain samples randomly from a uniform distribution. Complex Gaussian noise 75 samples^ were then added to the frequency samples used for the reconstruction, and the solution for the unknown time domain samples x was obtained by the direct method. The solution x was divided into a true function x and an error function e. The functions x and e were expanded separately in eigenvectors of P. x= e = j2l3jVj j=l (4.9) j=l The eigenvectors (Vj) were ordered such that their corresponding eigenvalues were in increasing order. Ai < Az < ... < A„_i < Xn (4.10) In this way the first eigenvector Vi converged fastest and the last eigenvector Vn was the slowest to converge. The fractions of the true function x and the error function e in the jth eigenvector of P were calculated as follows. a' = ^4—' EL. |a.i (4.11) ^ ' For a given set of values for m (number of known frequency samples), n (number of unknown time domain samples), and N (periodic length), 1000 independent trials were performed and expected values for a'j and /3j were computed and tabulated. 4.2.2 Results ^Each noise sample Wcis the sum a + jb, where a and b were chosen randomly from the Gaussian distribution f { x ) — —x' . 76 Table 4.1: Expected distribution of / and e ver eigenvectors of P. N 8 16 16 32 32 32 64 64 64 128 128 128 256 256 256 n m 3 3 5 3 5 7 3 5 7 3 5 7 3 5 7 3 3 5 3 5 7 3 5 7 3 5 7 3 5 7 0.6524 0.6419 0.5219 0.6434 0.5392 0.4641 0.6484 0.5413 0.4863 0.6485 0.5490 0.4822 0.6517 0.5443 0.4836 "2 0.1730 0.1761 0.1124 0.1761 0.1138 0.0823 0.1700 0.1142 0.0844 0.1755 0.1168 0.0870 0.1738 0.1117 0.0892 «3 0.1746 0.1818 0.1413 0.1805 0.1169 0.1172 0.1816 0.1160 0.0877 0.1760 0.1074 0.0868 0.1746 0.1155 0.0842 0.0930 0.0318 0.0045 0.0111 0.0003 0.0000 0.0037 0.0000 0.0000 0.0010 0.0000 0.0000 0.0004 0.0000 0.0000 ^'2 0.2017 0.1265 0.0065 0.0851 0.0010 0.0000 0.0493 0.0001 0.0000 0.0299 0.0000 0.0000 0.0175 0.0000 0.0000 ^3 0.7053 0.8418 0.0172 0.9037 0.0054 0.0001 0.9470 0.0015 0.0000 0.9691 0.0004 0.0000 0.9822 0.0001 0.0000 N n m 8 16 16 32 32 32 64 64 64 128 128 128 256 256 256 3 3 5 3 5 7 3 5 7 3 5 7 3 5 7 3 3 5 3 5 7 3 5 7 3 5 7 3 5 7 <^4 "5 "6 «7 0.1112 0.1133 0.1141 0.1161 0.0836 0.0824 0.0891 0.0814 0.1154 0.1132 0.0855 0.0841 0.0848 0.0873 0.1127 0.1142 0.0843 0.0855 0.0854 0.0887 0.1113 0.1173 0.0851 0.0843 0.0876 0.0861 /?4 0'e ^7 0.1113 0.8604 0.0700 0 9232 0.0005 O!O042 0.0647 0.9304 0.0428 0 9555 0.0001 0^0013 0.0361 0.9626 0.0233 0.9763 0.0000 0.0003 0.0281 0.9716 0.0152 0.9847 0.0000 0.0001 0.0147 0.9852 iV=periodic length of function, n=nuniber of unknown time domain samples, m=number of known frequency samples, Q:j=expected fraction of / in jth eigen vector of P, /3j=expected fraction of e in jth eigenvector of P. 77 Table 4.1 contains results from trials using time-limited functions with a sub stantial dc component. The time domain samples of x were randomly chosen from the uniform distribution [0,1]. The frequency samples used for the reconstruction were taken from the low-pass portion of the spectrum (including the dc frequency component). The top half of Table 4.1 contains the expected distribution of x over the eigen vectors of P. The largest portion of x was found in the first eigenvector, which was the only eigenvector with a non-zero d.c. component. The function x is distributed fairly evenly among the remaining eigenvectors. Additional trials were performed using time-limited functions generated from the uniform distribution [-.5,-f.5] (the expected value of the dc component for each trial was zero). For these trials the true function x was generally spread evenly among all of the eigenvectors of P. The lower half of Table 4.1 contains the sample density of the error function e over the eigenvectors of P. From the table, it is apparent that the error tends to be concentrated in the last few eigenvectors (the slowly converging ones), and this tendency becomes very strong as N and n get large (as cond{A) gets large). This result is important because when the error is heavily concentrated in the last few eigenvectors its contribution to x can be reduced greatly by terminating the Gerchberg algorithm early. Table 4.2 illustrates how the distribution of the true function x among the eigen vectors of P changes as the location of the known frequency samples shifts away 78 Table 4.2: Change in distribution of / and e over eigenvectors of P as known fre quencies move away from main lobe of frequency spectrum. L n m ol'^ o^s 64 64 64 128 128 128 128 128 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 0.4821 0.0953 0.0881 0.5006 0.4598 0.0935 0.1556 0.0901 0.1296 0.1005 0.1002 0.1071 0.0945 0.1033 0.0899 0.1018 0.0984 0.4813 0.1985 0.0978 0.1696 0.4780 0.0915 0.1981 0.0987 0.1004 0.0931 0.0952 0.0917 0.0921 0.1186 0.0933 0.0974 0.1322 0.0875 0.1000 0.0920 0.1357 0.4516 0.0912 0.0938 0.0903 0.4326 0.0992 0.0924 0.0974 0.0928 0.4256 2 10 20 2 10 20 30 40 64 64 64 128 128 128 128 128 6 ? 6 6 6 6 6 6 6 6 6 6 6 6 6 6 o.lfdoo 0.0013 0.0019 0.0000 0.0000 0.0003 0.0007 0.0005 0.(^00 0.0013 0.0021 0.0000 0.0000 0.0003 0.0009 0.0005 0.(f^04 0.0094 0.0117 0.0001 0.0007 0.0043 0.0105 0.0063 O.fc 0.0099 0.0204 0.0021 0.0029 0.0046 0.0136 0.0094 0.^f28 0.1990 0.2299 0.2067 0.1324 0.2067 0.4124 0.2460 0.'ff96 0.7791 0.7341 0.7912 0.8639 0.7838 0.5618 0.7372 ^2 10 20 2 10 20 30 40 A''=periodic length of function, n=number of unknown time domain samples. m=number of known frequency samples, lst=sample number of first known frequency sample, expected fraction of / in jth eigenvector of F, /3j=expected fraction of e in jth eigenvector of P. 79 from the low-pass region. The samples of x were again chosen randomly from the uniform distribution [0,1] to ensure a substantial dc component. From Table 4.2 it can be seen that as the known frequencies move away from the low-pass region, the eigenvector containing the dc component of x shifts to the right. For cases where the known frequency samples are quite far from the low-pass region, the dc component is in the last (slowest converging) eigenvector. This means that for functions with a substantial dc component, it is important that the known frequencies be as close to the low-pass region as possible. In general, the reconstruction will be most effective if the known frequencies are as close to the center of the main lobe of the frequency spectrum as possible. 4.2.3 Aliasing Error The previous trials have shown that error due to random noise is usually concen trated in the slower converging eigenvectors of P. It is more difficult to determine the effects of aliasing error experimentally. To introduce aliasing error into a trial one must start with a continuous time-limited function whose continuous frequency spectrum can be expressed mathematically by the Fourier Transform equation: (4.13) Therefore randomly generated signals are not applicable, and one quickly runs out of time-limited functions whose Fourier transforms are easily derived. In lieu of L 80 thousands of trials, this section will provide two examples of the distribution of the error function e caused by aliasing. For the first example consider the time-limited function 0<f<l elsewhere (4.14) whose Fourier Transform is given by Equation 4.15. (4.15) Five low frequency samples from F{(jj) were modeled as five samples of a periodic frequency spectrum 128 samples long with a sampling frequency of 4.2 Hz. The only error on the known frequency samples was the aliasing error as described in Section 2.3. The solution x = x-\-e was computed by the direct method and the true function x and the error function e were expanded separately in eigenvectors of P. The fractions of x and e in each eigenvector are given in the following table. a'l «2 ol'z "4 <^5 0.9960 0.0000 0.0000 0.0040 0.0000 (4.16) /?( ^'2 ^'4 0's 0.0015 0.0834 0.3665 0.0710 0.4476 For the second example consider the time-limited ramp function 0 <^<1 elsewhere (4.17) whose Fourier Transform is given by Eq. 4.18. (4.18) 81 Again five samples from F{u>) were used to reconstruct x = x + e into a vector 128 samples long. The true function x and the error e were expanded in eigenvectors of P and the fraction of x and e in each eigenvector are given in the following table. a[ a'2 a'3 a'4 a's 0.5840 0.4130 0.0024 0.0007 0.0000 (4.19) /?; ^'3 ^'4 /3's 0.0010 0.1540 0.2478 0.2606 0.3366 The results from these two examples are not intended to be taken as a standard for error in the solution due to aliasing, but they do show that aliasing error can be a substantial problem. The error for these two examples is not nearly as well behaved as for similar results shown for random noise (see Table 4.1: N = 128, m = n = 5). It is interesting to note that for these two examples the true function x is particularly well behaved in terms of being concentrated in the faster converging eigenvectors. The results from this section indicate that the error in the solution due to input errors will usually be concentrated in the slowly converging eigenvectors of the matrix P, and as the size of the error increases the error becomes more heavily concentrated in these eigenvectors. By terminating the Gerchberg algorithm early, the contribution from these eigenvectors (and thus the error) can be minimized. However, terminating the algorithm early introduces its own error because a portion the true function x is also in the slowly converging eigenvectors. Therefore we need to find the point which minimizes the overall error, this will be the optimal termination point. 82 4.3 Termination Schemes In the past, the Gerchberg algorithm has been terminated by a human observer at his discretion [10]. We would like to have a termination scheme which automatically terminates the algorithm at or near the optimal termination point. Three possible termination schemes are suggested in this section. These schemes are not neccesarily the only schemes, or even the optimal schemes, but they all make use of the results from the preceding section. 4.3.1 Convergence Factor This method is based on computing the convergence factor (Eq. 2.3) between iterations and terminating the algorithm when it becomes a constant (or changes less than a certain threshold). When the convergence factor is a constant, there is only one eigenvector (the last eigenvector) which is still converging. This is based on Eq. 4.6 which relates the speed of convergence to the eigenvectors of P. Assuming all eigenvectors except the last eigenvector K have converged, and the term P'^xq has converged to zero, Eq. 4.6 can be rewritten as: x- X r = A;(o;„ + /?„)K where x^ is the estimate of x after r iterations. Since A„ < 1, (4.20) will decrease by the constant factor A„ at each iteration, so x — Xr will also decrease by a constant factor at each iteration. 83 By terminating the algorithm when the convergence factor approaches a constant the contribution from the last eigenvector will be minimized and the contributions from the other eigenvectors will be realized. Figure 3.9 from Chapter 3 shows a typical plot of the convergence factor. For this example, the algorithm would be terminated at around iteration number 20. Since the convergence factor can be computed for only the exactly determined case, this termination scheme is of limited value. Also, since this scheme can reliably eliminate only the last eigenvector it would be inappropriate for applications where more than one eigenvector would need to be eliminated (such as longer time-limited functions with substantial noise on the known frequency samples). 4.3.2 2nd Derivative of Energy This method computes the total reconstructed energy after each iteration and terminates the algorithm when the slope of the energy function approaches a constant (its second derivative approaches zero). The justification for this method is based on the assumption that most of the true function will reside in the first few eigenvectors and the error will be concentrated in the last few eigenvectors. Hence, there should be a sharp increase in the reconstructed energy in the early iterations as the first few eigenvectors converge rapidly. After this point the energy should increase at a slow, nearly constant rate as the slower eigenvectors converge. 84 Solid: Total Energy, Dotted: Energy Due To True Function 2600 2400 2200 2000 1800 1600 1400 1200 1000 800 100 Iteration Number Figure 4.1: Example of reconstructed energy. The main advantage of the energy method proposed herein, is its easy implemen tation: simply compute the energy after each iteration and compute an approximate second derivative as the algorithm moves along. When the magnitude of the second derivative drops below a specified threshold the algorithm is terminated. The main disadvantage of this method is its lack of a rigorous supporting theory. An example of this scheme is shown in Figures 4.1-4.3. Figure 4.1 shows the total reconstructed energy from using noisy frequency samples and the energy due to the true function / alone. Figure 4.2 shows the second derivative of the total Close-up of 2nd Derivative of Energy 0.2 0 -0.2 Zero Crossing @ Iteration # 25 -0.4 -0.6 •0.8 1 -1.2 15 Iteration Number Figure 4.2: Second derivative of total reconstructed energy from Figure 4.2. 86 Normalized Mean Squared Error 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 Minimum at Iteration # 10 0.05 100 Iteration Number Figure 4.3: Normalized mean squared error between true solution and Gerchberg algorithm's latest estimate. reconstructed energy. The energy termination scheme would terminate the algorithm at the first zero crossing (iteration number 25). Figure 4.3 shows the normalized mean squared error for this example. It reaches its minimum at iteration number 10, and then increases steadily. For this example the energy method works well because the mean squared error at iteration number 25 (the termination point given by the energy method) is pretty close to the minimum mean squared error. 87 4.3.3 Statistically Optimum Termination This method is based on performing experimental trials to determine one termi nation point which is the best on average. The basic idea is to run large numbers of experimental trials using functions and errors similar to those which will be encoun tered for a particular application. One can then find the optimal termination point for each trial and then choose the mean or the median as the statistically optimal termination point. 4.3.4 Comparison Between Termination Schemes Experimental trials were performed to compare the performance of the different methods for two cases. Random time-limited functions and noise samples were gener ated as in the previous trials, and the Gerchberg algorithm was allowed to iterate up to 500 times for each trial. For each trial the minimum mean squared error and the iteration number at which it occurred were recorded. The termination points given by various termination schemes were also recorded along with their corresponding mean squared errors. For the first case the performance of the energy method and the statistically optimal method were tested for discrete functions with = 64 and m = n = 9. After 1000 trials the mean was computed for the number of iterations by which each method missed the optimal termination point. The average mean squared error in the solution at those points was also computed. The energy method did slightly better 88 for both measures, but neither method performed well consistently. On average, the energy method missed the optimal termination point by about 120 iterations, and the statistical method missed the optimal termination point by about 131 iterations. For the second sample case the energy and convergence factor methods were tested for discrete functions with N = 32, m = n = 3. Again neither method proved to be extremely reliable. On average, the energy method missed the optimal termination point by about 76 iterations and the convergence factor method missed by about 91 iterations. The two methods had an average mean squared error slightly less than twice the minimum mean squared error, so in this regard, they didn't perform too badly. It is possible that for some of the trials the energy method mistakenly terminated the algorithm early because the termination criteria appeared coincidentally before it should have. Errors such as these would normally be detected by a human observer and the algorithm would be allowed to continue. 4.4 Summary This chapter explained the conditions under which early termination of the Gerchberg algorithm will limit the effects of input error on the result. The results from experimental trials showed that these conditions can usually be expected, and the conditions generally become stronger as the size of the error increases. Three schemes 89 were designed to automatically terminate the algorithm. These schemes were imple mented and proved to be fairly reliable, but not perfect. CHAPTER 5 THE SVD METHOD This chapter presents a new super-resolution method which could make the Gerchberg algorithm obsolete. The first section presents a super-resolution method which combines the direct method of Chapter 2 and the eigenvector expansion of Chapter 4. Although this method is non-iterative, it is based on the early termination criterion of Chapter 4 and it achieves results nearly identical to the Gerchberg algorithm's optimal solution with great savings in work and time. Section 2 presents a similar, even more efficient method which is based on singular value decomposition (SVD) techniques. Section 3 compares the new SVD method to the Gerchberg algorithm by means of the familiar two point target example. Section 4 presents an example of the SVD method used for a 2-dimensional super-resolution problem. The final section provides explanations for the distribution of error and the change in the distribution of the true function as the known portion of the frequency spectrum moves away from the main lobe. 91 5.1 Eigenvector Expansion In the previous chapter, the solution x given by the Gerchberg algorithm (and the direct method for the exactly determined case) was expanded in eigenvectors of the matrix P in order to show that the error is usually concentrated in one or a few undesirable eigenvectors. The contribution from these eigenvectors (and thus the error in the solution) can be minimized by terminating the algorithm early. Now, if the solution x can be determined by the direct method and expanded in eigenvectors of P, and the undesirable eigenvectors can be identified, then these eigenvectors can be eliminated completely by simply recombining the expanded version of x while omitting the components of x in the undesirable eigenvectors. For example, consider once again the discrete time-limited function / with discrete frequency spectrum F. / = [/i/2/3 00000] (5.1) F = [Fi F2 F3 Fi Fs Fe F7 Fg] (5.2) Suppose the first, second, and eighth frequency samples are known but have been corrupted by noise. The direct method can be used to find the solution x to the unknown time domain samples. Since the known frequency samples contain error, the solution x will be the sum of a true solution x and an error function e. The solution X is expanded in eigenvectors of P by solving the linear set of equations 92 Va = X ioT a >11 Ki2 143" V21 V22 V^23 V31 1^32 ^3 ori "2 as _ Xi = A = X2 X3 _ >11 • V21 F31 >12" + OL2 V22 V32 _ + OLz >13" V23 V33 where the columns of V are the eigenvectors of P. Suppose that the error e is concentrated in the third term 03^3 and that the true solution x is concentrated in the first two terms. Now the error in the solution can be greatly reduced by simply throwing away the third term and estimating the solution x as the sum of the first two terms. X = ai >11 • >12" V21 + 0:2 V22 V31 F32 For the exactly determined case, the eigenvector expansion can find the estimated solution given at any iteration of the Gerchberg algorithm by weighting the eigen vectors in Eq. 5.3 by the terms aj{l — A^) from Jones [9], where Xj is the eigenvalue corresponding to the jth eigenvector Vj and r is the iteration number. For the overdetermined case, the least squares solution (Eq. 2.27) can also be expanded in eigenvectors of P and the undesirable vectors can be thrown out as described above. By eliminating the undesirable eigenvectors directly, the eigenvector expansion method accomplishes the same thing as the Gerchberg algorithm and it achieves nearly identical results. The eigenvector expansion method, however, provides tremen dous savings in work and time over the Gerchberg algorithm. The entire process consists of three steps. Step one is solving Ax = b for x. Step two is expanding x in eigenvectors of P, which requires solving another set of linear equations. Step three 93 is recombining the expanded components of x while omitting those components in the undesired eigenvectors. 5.2 SVD Expansion The solution x can also be expanded in right singular vectors of the matrix A. Recall from Chapter 2 that the matrix A consists of the DFT coefficients relating the known frequency samples to the unknown time domain samples. From [5], any mxn matrix A whose number of rows m is greater than or equal to its number of columns n, can be written as the product of an m X n column-orthonormal matrix U, an nxn diagonal matrix W with non-negative elements in decreasing order down the diagonal, and the transpose of an n x n orthonormal matrix V. Singular Value Decomposition of A: Wi W2 u yT w„ TlXn -• m X n The WjS are the singular values of A. From [11], the columns of V (known as the right singular vectors of A) are an or thonormal set of eigenvectors for A^A. The singular values (wj's) corresponding to the columns of V are the positive square roots of the eigenvalues of A^A. The columns of V form an orthonormal set of vectors which span the space orthogonal to 94 the nullspace of A, and the columns of U form an orthonormal set of vectors which span the range of A [12]. For our application, each column of A is linearly indepen dent. Therefore if m > n then A is full rank and its null space consists of the zero vector, so in this case the columns of V span i?" [13]. Experimental trials have shown that when the singular value decomposition is done in this way the columns of V are similar (in some cases identical) to the orthonormal eigenvectors of the matrix P . This means that the error will usually be concentrated in the last few columns of V just as it is concentrated in the last few eigenvectors of P. These are the columns of V which have the smallest corresponding singular values. Expanding the solution x in column vectors of V (right singular vectors of A) and throwing away the vectors with small corresponding singular values accomplishes essentially the same thing as the eigenvector expansion described previously. 5.2.1 Advantages The SVD expansion is preferable to the eigenvector expansion because it only involves the matrix A whereas the eigenvector expansion requires constructing the matrix fin which can be very large for long functions. Extra matrix manipulations must also be performed to find the matrix P, and numerical errors occur with each extra computation. The entire process of computing f, expanding it in column vectors of V, and elimi nating the undesirable column vectors can be accomplished in one step by computing 95 the following equation from [5]. (5.5) X = V • [ d i a g ( l / w n ) ] • [ f / ^ • b] If the 1/Wj terms are left unaltered, the equation above will compute the least squares solution for Ax = b. Setting the largest 1/wj terms to zero will eliminate the con tribution from the undesirable column vectors of V to the solution x. The following expansion of Eq. 5.5 illustrates the process. X =V • [ d i a g { l l w n ) ] • Vu V21 V22 llwi Vln V2n \/w2 Vn\ Vn2 ... Kinn Vu V12 V21 V22 l/l^n _ . Vn2 In V22n . Vn\ V22 _ Uin U2n ••• Ujmn "m U\2 U22 Vln V2n k Uib U2h 'i-IWn J [Unb _ (l/u;2)t/2& [ . V12 V2n V22 + {lfw2)U2b Knl 61 U21 1/U>2 Vn2 ... Vnn J Vn V21 {\lwi)Uih Uml Um2 Un l/wi K, V21 = [f/^ • 6] -f ... + {llWn)U\b Kn2 (5.6) K, 96 From the final form of the equation above it is clear that x is a linear combination of the right singular vectors Vj. It is also clear that the contribution from a particular Vj can be eliminated by setting the corresponding l/(wj) to zero. Equation 5.5 is the main result of this thesis. The entire discrete-time superresolution problem has been reduced to finding the SVD of the matrix A and per forming a simple matrix multiplication. There are no linear equations to solve and no inverses to compute. This method accomplishes essentially the same thing as ter minating the Gerchberg algorithm early: it eliminates the components of the solution which are dominated by error. Therefore, the SVD method can provide results nearly identical to the optimal solution given by the Gerchberg algorithm with huge savings in time and work. 5.3 Example: Two Point Target This example will be used to compare the performance of the SVD method to the Gerchberg algorithm for a two point target. This is the same example which was used by Gerchberg in his original paper [2], with the exception of the noise samples (Gerchberg used uniform noise, this example uses Gaussian noise). The original time-limited object and its discrete frequency spectrum are shown in Figures 5.1 and 5.2 respectively. The original object is a two point target. Each point of the target consists of 2 samples and the points are separated by a distance of 7 samples. The total periodic length of the function is 256 samples. As in [2], all that Original Time-Limited (Space-Limited) Object 1400 1200 1000 800 600 400 200 0 -50 0 50 Time (Space) Figure 5.1: Original time-limited object. IFFT of Original Object! 5000 4500 4000 3500 3000 2500 2000 1500 lOOO 500 100 150 200 Frequency Figure 5.2: Discrete frequency spectrum of original object. 250 99 IKnown Portion of Frequency Spectrum Distorted by Noisel 5000 4500 4000 3500 3000 2500 2000 1500 1000 500 0 0 50 100 150 200 250 Frequency Figure 5.3: Known portion of frequency spectrum distorted by noise. • iiO'V is known of the original object are the 37 low-pass samples of its frequency spectrum which have been corrupted by random noise, and the exact extent of the object in the time-domain (9 samples). The known portion of the frequency spectrum is shown in Figure 5.3, and the image given from this noisy, diffraction-limited spectrum is shown in Figure 5.4. The image in Figure 5.4 has lost all information about the two point nature of the original object due to noise and diffraction in the frequency domain. Note, we did not zero the samples outside the time-limited region, as would 100 Image from Noisy, Diffraction Limited Spectrum 1400 1200 1000 800 600 400 200 -100 -50 0 50 100 Time (Space) Figure 5.4: Image from noisy, diffraction limited spectrum. 101 Error Norm for Gerchberg Algorithm. 1700 1600 1500 1400 1300 1200 Minimum error norm=999.5 at 154 iterations. 1100 1000 100 200 300 400 500 600 700 800 900 1000 Iteration Number Figure 5.5: Error norm for Gerchberg algorithm. normally be done. The energy outside the time-limited region is the correction energy which was described in Chapter 3. First, the Gerchberg algorithm's optimal solution was found. The algorithm was allowed to iterate 1000 times and the norm of the error between the original object and the algorithm's estimate over the time-limited region was computed after each iteration. A plot of the error norm versus iteration number is shown in Figure 5.5 from which it appears the minimum error norm was about 1000. The actual minimum error norm was found to be 999.5 and it occurred at iteration number 154. 102 Result from Gerchberg Algorithm after 154 Iterations 1400 1200 1000 800 600 Norm of Error=999.5 400 200 0 -100 -50 0 50 100 Time (Space) Figure 5.6; Result from Gerchberg algorithm after 154 iterations. The Gerchberg algorithm was restarted and allowed to iterate 154 times. The result given by the Gerchberg algorithm after 154 iterations is shown in Figure 5.6, this is the best result obtainable using the Gerchberg algorithm ^ . Next the SVD method was used for the same problem. Figure 5.7 shows the result obtained by the SVD method eliminating 6 right singular vectors. This result is nearly identical to Figure 5.6, with a slightly lower error norm. Figure 5.8 shows the result obtained by the SVD method eliminating 5 right singular vectors. ^The portion of the result outside the known time-limited region would normally be set to zero. This portion of the function is the error energy which was discussed in Chapter 3. 103 Result from SVD Method: 6 Singular Values Thrown Out 1 1 !••• •• • 1 ••• 1 1400 1200 1000 800 600 Norm of En:or=989.9 400 200 / 0 1 1 1 1 i -100 -50 0 50 100 Time (Space) Figure 5.7: Result from SVD method: 6 singular values thrown out. 104 Result from SVD Method: 5 Singular Values Thrown Out 1 1— 1 •• 1 1 1400 1200 1000 800 600 Nonii of Error=1030.4 400 200 0 1 t t 1 1 -100 -50 0 50 100 Time (Space) Figure 5.8: Result from SVD method: 5 singular values thrown out. 105 Although the error norm for this result is slightly larger than that for Figure 5.6, the result is still a reasonable approximation to the original object. To provide a fair comparison of the time and work requirements of the two meth ods, a 'stripped-down' version of the Gerchberg algorithm was used for this problem. The stripped-down Gerchberg algorithm took about 8 seconds to complete 154 it erations, which was about 800 times as long as the 10 milliseconds required by the SVD method. In terms of computing work, the SVD method required 9144 flops. The Gerchberg algorithm required 3,895,740 flops for 154 iterations, an increase by a factor of about 426 over the SVD method. The savings in time and computing work for this example are probably uncharac teristically low. The signal to noise ratio for this example was less than 10 dB, which is pretty dismal. As the SNR increases, the Gerchberg algorithm requires many more iterations to reach its optimal result, while the SVD method will not require any additional time or work. For the extreme case when there is no error, the SNR will be infinite and theoretically the Gerchberg algorithm will require an infinite number of iterations to reach its optimal solution. Therefore, much greater savings in time and work can be expected for applications with a reasonably high signal to noise ratio. 106 5.4 2-D Example This example is included to show that the SVD method is applicable to multidi mensional super-resolution problems. The original object (shown in Figure 5.9) was known to be space-limited to a 7 x 7 square in the center of the 128 x 128 image plane. The discrete spatial frequency spectrum of the original object is shown in Figure 5.10. The frequency spectrum was corrupted by complex Gaussian noise samples^ as shown in Figure 5.11 and a 15 x 15 section of the noisy frequency spectrum (shown in Figure 5.12) was used for the reconstruction. The image given by the known portion of the frequency spectrum before the reconstruction is shown in Figure 5.13. This blurred image gives no indication of the four distinct points of the original object. The SVD method was used to reconstruct the image from the known portion of the noisy frequency spectrum. The best result (lowest error norm) was obtained using 8 singular vectors of A and is shown in Figure 5.14. The image in Figure 5.14 clearly shows 4 distinct peaks and is a great improvement over the blurred image in Figure 5.13. The four point nature of the original object was also clearly visible in results using 6, 7, 9, and 10 singular vectors. These results are shown in Figures 5.15, 5.16, 5.17, and 5.18 respectively. ^Each noise sample was the sum a + j b , where a and b were chosen randomly from the Gaussian _x2 distribution f ( x ) = e"^. 107 Original Space-limited Object Figure 5.9: Original space-limited object. 109 INoisy Frequency SpectrumI Figure 5.11: Frequency spectrum distorted by noise. Known Portion of Frequency Spectrum Figure 5.12: Known portion of frequency spectrum. 111 llmage from Noisy, Diffraction-Limited Spectrum! Figure 5.13: Image frorn noisy, diffraction-limited spectrum. ISVD Result Using 8 Singular VectorsI Figure 5.14: SVD result using 8 singular vectors. ISVD Result Using 6 Singular Vectorsl Figure 5.15: SVD result using 6 singular vectors. ISVD Result Using 7 Singular VectorsI Figure 5.16: SVD result using 7 singular vectors. ISVD Result Using 9 Singular VectorsI Figure 5.17: SVD result using 9 singular vectors. 116 ISVD Result Using 10 Singular Vectorsl Figure 5.18: SVD result using 10 singular vectors. 117 5.5 Error Concentration The results from earlier sections have shown that as the size of the error increases, it becomes increasingly concentrated in the right singular vectors of A with small corresponding singular values. It was also shown that as the known frequencies move away from the main lobe of the frequency spectrum the true function x becomes increasingly concentrated in these same singular vectors. In this section the nature of the SVD solution itself will be used to partially explain why these two phenomena occur. Again consider the problem A x =b and the equivalent problem A{x+ e) = (b+r) where e is the error in the solution and r is the error on the known frequency samples. The error e is given by the SVD solution as: e = V • ldiaff(l/wn)] • [t/^ • r v21 vi2 vln v22 V2n llwi 1/^2 l/ii 1/21 u12 u22 Uml U 'm2 n f2 • K, vn2 ... ^f'^n k V12 ... v2l v22 _ _ uxn u2n ••• l/zui • • 1/w2 v2n uir u2r , vnl vii2 ••• vyin l/wn _ u'nnx • 118 Vll vi2 vin {l/wi)uir V21 V22 v2n {l/w2)u2r ^nl ^n2 ••• ^nn V21 _ Knl /^n)unv T42 T4n V22 v2n + {l/w2)u2r = {l/wi)uir _ (5.7) + ...+ {l/wn)unr vn2 k, where [/j denotes the jth row of [/^ (the transpose of the jth column of U). From the final form of Eq. 5.7 it is clear that the error e is a linear combination of the right singular vectors (V^'s) of A. The contribution to e from each Vj is given by the coefficient {\lwj)Ujr. Using the Schwarz inequality and the fact that the f/y's are orthonormal, the size of the contribution from each right singular vector, V}, is limited by the following formula. IKiMjCiWI < (5.8) Eq. 5.8 implies that if the error norm ||e|| is large compared to ||r||, then the error e must have large components from singular vectors with small corresponding singular values. The SVD expansion can also be used to explain the change in the distribution of the true function x among the T^ 's as the known frequencies move away from the main lobe of the frequency spectrum. The true function x can be expanded in terms of the SVD components as was done above for the error e. Omitting the intermediate 119 steps, X can be expressed as the following linear combination of right singular vectors of A. ' • V12" " V21 X = {lfwi)Uib • Ki ' V2n V22 -f {l/w2)U2b • 14. " + ... + { l / W n ) U i b Vn2 • (5.9) v„„ 'nn Again using the Schwarz inequality and the fact that the C/j's are orthonormal, the contribution to x from each Vj is limited by the following formula. ||(lM)C/i6||<||(lM)||||&|| (5.10) As the known portion of the frequency spectrum b moves away from the main lobe, its size given by ||6|| decreases. The size of x remains constant however, so as ||6|| decreases, larger coefficients {l/wj)Ujb are required to reconstruct x. Since the terms with large singular values, Wj, are severely limited by Eq. 5.10, the signal is effectively forced into the singular vectors with small corresponding singular values. 5.6 Summary This chapter introduced two new super-resolution methods, the eigenvector ex pansion method and the SVD method. Both of these methods are based on solving the linear system Ax — b. It was shown that the eigenvector expansion method is a generalization of the discrete Gerchberg algorithm. Hence the eigenvector method can directly determine the result given by the Gerchberg algorithm at any iteration with huge savings in time and work. The SVD method was introduced and shown 120 to do essentially the same thing as the eigenvector expansion method, and the use of SVD techniques makes this method numerically superior to the eigenvector ex pansion method. An example was used to compare the performance of the SVD method to the Gerchberg algorithm. A second example applied the SVD method to a 2-dimensional super-resolution problem. Finally, the SVD solution was expanded to partly explain the expected distribution of the true function x and the error e among the right singular vectors of A. 121 CHAPTER 6 SUMMARY AND CONCLUSION This thesis has introduced several new concepts and methods for super-resolution. A new super-resolution method, which we call the direct method, was introduced in Chapter 2. This method is the first to recognize that, in the discrete form, the super-resolution problem can be reduced to solving a linear set of equations relating the known frequency samples to the unknown time or space domain samples. By taking full advantage of the inherent structure of the discrete case, the direct method achieves super-resolution very quickly and efficiently. The direct method can be generalized to provide a least squares solution for the overdetermined case. The main drawback of the direct method is its sensitivity to input errors. The Gerchberg algorithm for super-resolution was examined in Chapter 3. The discrete Gerchberg algorithm is an iterative method for solving the same set of equa tions which the direct method solves directly. For the exactly determined case, the discrete Gerchberg algorithm converges to the same solution given by the direct method. For the overdetermined case, the discrete Gerchberg algorithm converges to the solution which minimizes the correction energy, which is not usually the same as the least squares solution. 122 Chapter 4 presented a mathematical justification for early termination of the Gerchberg algorithm to limit the effects of errors. The conditions under which early termination would minimize the effects of errors were outlined, and the results from experimental trials showed that these conditions seldom fail to occur. Three termi nation schemes designed to limit the effects of error were introduced and tested. In Chapter 5 two new super-resolution methods were introduced, the eigenvector expansion method and the SVD method. The eigenvector expansion method is a noniterative generalization of the discrete Gerchberg algorithm based on solving the set of linear equations directly. The eigenvector expansion method can directly determine the solution given by the Gerchberg algorithm at any iteration. The SVD method was shown to accomplish essentially the same thing. Due to the SVD techniques which it employs, the SVD method is faster and numerically more accurate than the eigenvector expansion method. The SVD method has overcome the most significant drawback of the Gerchberg algorithm, its slow convergence speed. The savings in time and computational work which it provides over the Gerchberg algorithm are huge. What might take hours for the Gerchberg algorithm to accomplish can be done in seconds with the new SVD method. The savings in time might be especially crucial to the prospects for real-time super-resolution applications. 123 REFERENCES [1] J. L. Harris, "Diffraction and Resolving Power," Journal of the Optical Society of America, vol. 54, No. 7, pp. 931-936, July 1964. [2] R. W. Gerchberg, "Super-resolution through error energy reduction," Optica Acta, vol. 21, pp. 709-720, Sept. 1974. [3] K. E. Atkinson, An Introduction to Numerical Analysis. New York, NY: John Wiley & Sons, 1978. [4] T. J. Aird and R. E. Lynch, "Computable Accurate Upper and Lower Error Bounds for Approximate Solutions of Linear Algebraic Systems," ACM Trans actions on Mathematical Software, vol. 1, No. 3, pp. 217-231, Sept. 1975. [5] W. H. Press et ah. Numerical Recipes. Cambridge, UK: Cambridge University Press, 1986. [6] T. Mathworks, Matlab: A Tutorial. Natick, MA: The MathWorks, Inc., 1985. [7] A. Papoulis, "A New Algorithm in Spectral Analysis and Band-Limited Ex trapolation," IEEE Transactions on Circuits and Systems, vol. CAS-22, No. 9, pp. 735-742, Sept. 1975. [8] D. C. Youla, "Generalized Image Restoration by the Method of Alternating Orthogonal Projections," IEEE Transactions on Circuits and Systems, vol. CAS25, No. 9, pp. 694-702, Sept. 1978. [9] M. C. Jones, "The Discrete Gerchberg Algorithm," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP-34, No. 3, pp. 624-626, June 1986. [10] 1. Sadka and H. Ur, "On The Application of Cadzow's Extrapolation Method of BL Signals," in International Conference on Acoustics, Speech, and Signal Processing, 1992. 124 [11] R. A. DeCarlo, Linear Systems, a State Variable Approach with Numerical Im plementation. Englewood Cliffs, NJ: Prentice-Hall, 1989. [12] H. S. Tharp, "A Numerical Algorithm for Chained Aggregation and Modified Chained Aggregation," M.S. thesis, University of Illinois, Urbana-Champaign, 1983. [13] F. Ayres, Theory and Problems of Matrices. New York, NY: McGraw-Hill, 1962.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement