University of Huddersfield Repository Cooper, Philip Rational approximation of discrete data with asymptomatic behaviour Original Citation Cooper, Philip (2007) Rational approximation of discrete data with asymptomatic behaviour. Doctoral thesis, University of Huddersfield. This version is available at http://eprints.hud.ac.uk/2026/ The University Repository is a digital collection of the research output of the University, available on Open Access. Copyright and Moral Rights for the items on this site are retained by the individual author and/or other copyright owners. Users may access full items free of charge; copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational or not-for-profit purposes without prior permission or charge, provided: • • • The authors, title and full bibliographic details is credited in any copy; A hyperlink and/or URL is included for the original metadata page; and The content is not changed in any way. For more information, including our policy and submission procedure, please contact the Repository Team at: [email protected] http://eprints.hud.ac.uk/ Rational Approximation of Discrete Data with Asymptotic Behaviour Philip Cooper A thesis submitted to the University of Huddersfield in partial fulfilment of the requirements for the degree of Doctor of Philosophy The University of Huddersfield in collaboration with the National Physical Laboratory. June 2007 TABLE OF CONTENTS List of Tables v List of Figures Chapter 1: Introduction vii 1 1.1 Metrology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Linear approximation of discrete data . . . . . . . . . . . . . . . . . . 3 1.2.1 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Best approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 ℓ1 approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Chebyshev approximation . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Least-squares approximation . . . . . . . . . . . . . . . . . . . . . . . 8 1.6.1 QR Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6.2 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . 11 1.7 Rational approximation forms . . . . . . . . . . . . . . . . . . . . . . 13 1.8 Properties of Rational Functions . . . . . . . . . . . . . . . . . . . . . 15 1.8.1 Existence of poles . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.8.2 Numerical considerations . . . . . . . . . . . . . . . . . . . . . 17 1.8.3 Nonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.8.4 Asymptotic limits . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.8.5 Decay to a constant value as x → ±∞ . . . . . . . . . . . . . 21 1.8.6 Decay to zero as x → ±∞ . . . . . . . . . . . . . . . . . . . . 22 1.8.7 Approximating limiting behaviour of the type xk as x → ±∞ 22 1.8.8 Approximation of double sided asymptotic limits . . . . . . . i 23 1.9 Example applications from industry . . . . . . . . . . . . . . . . . . . 25 1.9.1 The Blasius equation . . . . . . . . . . . . . . . . . . . . . . . 26 1.9.2 Mesopic efficiency functions . . . . . . . . . . . . . . . . . . . 26 Existing methods for fitting rational approximations 28 Chapter 2: 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 The Loeb algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2.1 Least-squares approximation with the Loeb algorithm . . . . . 29 2.2.2 Ill-conditioning associated with the Loeb algorithm . . . . . . 31 2.3 The Differential Correction Method (DCM) . . . . . . . . . . . . . . 34 2.4 Other work on rational approximation . . . . . . . . . . . . . . . . . 35 2.5 Padé approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Appel’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.7 Non-Uniform Rational B-Splines . . . . . . . . . . . . . . . . . . . . . 38 2.8 Classical non-linear optimization techniques . . . . . . . . . . . . . . 40 2.8.1 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8.2 The Gauss-Newton method . . . . . . . . . . . . . . . . . . . 43 2.8.3 The Levenberg-Marquardt algorithm . . . . . . . . . . . . . . 45 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Chapter 3: Extensions of the Loeb Algorithm 49 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2 Asymptotically constrained approximation using Loeb’s algorithm . . 49 3.2.1 Adequate representation of asymptotic behaviour by the data 51 3.2.2 Approximation of the data . . . . . . . . . . . . . . . . . . . . 51 3.2.3 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2.4 Extrapolation of the data . . . . . . . . . . . . . . . . . . . . 56 3.3 Semi-infinite rational splines . . . . . . . . . . . . . . . . . . . . . . . 58 3.3.1 Continuity conditions at the knot . . . . . . . . . . . . . . . . ii 60 3.3.2 Satisfying the continuity requirements . . . . . . . . . . . . . 61 3.3.3 Least-squares SIRS approximation using Loeb’s algorithm . . 63 3.3.4 Changing the position of the knot λ . . . . . . . . . . . . . . . 66 3.4 Variable numerator SIRS . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.4.1 Examples of fitting the variable numerator SIRS . . . . . . . . 70 3.4.2 Optimal choice for λ . . . . . . . . . . . . . . . . . . . . . . . 73 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Chapter 4: Asymptotic Polynomials 78 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2 Asymptotic polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.3 Approximation with asymptotic polynomials . . . . . . . . . . . . . . 80 4.3.1 Fixed auxiliary parameters . . . . . . . . . . . . . . . . . . . . 82 4.3.2 Auxiliary parameters as additional approximation parameters 85 4.3.3 Elimination of the parameters a . . . . . . . . . . . . . . . . . 86 4.3.4 Choice of initial values of the auxiliary parameters . . . . . . . 90 4.4 Example applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4.1 Photometric data . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4.2 Approximation of sigmoid function . . . . . . . . . . . . . . . 94 4.5 Modelling two asymptotic limits simultaneously . . . . . . . . . . . . 95 4.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Chapter 5: Pole Free Least-Squares Rational Approximation 101 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Positive denominator rational approximation . . . . . . . . . . . . . . 101 5.3 Requirements for a strictly positive denominator . . . . . . . . . . . . 102 5.4 Implementing positive denominator constraints . . . . . . . . . . . . . 103 5.5 Parameter evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5.1 Ill-conditioning issues . . . . . . . . . . . . . . . . . . . . . . . iii 107 5.5.2 Consideration of other types of constraint . . . . . . . . . . . 108 5.5.3 Pole-free SIRS approximations . . . . . . . . . . . . . . . . . . 109 5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Chapter 6: ℓp Rational Approximation 111 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2 The Lawson algorithm, and the Rice and Usow extension . . . . . . . 112 6.3 Application of Lawson’s algorithm to ℓp Rational approximation . . . 116 6.4 A combined Rice-Usow-Loeb Algorithm . . . . . . . . . . . . . . . . . 117 6.5 A Lawson-Loeb algorithm . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Chapter 7: Conclusions 125 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.2 The Semi-Infinite Rational Spline . . . . . . . . . . . . . . . . . . . . 125 7.3 Asymptotic polynomial approximation . . . . . . . . . . . . . . . . . 126 7.4 Pole free rational approximation . . . . . . . . . . . . . . . . . . . . . 127 7.5 ℓp Rational Approximation . . . . . . . . . . . . . . . . . . . . . . . . 128 7.6 Radial basis function rational approximation . . . . . . . . . . . . . . 129 Bibliography 138 iv LIST OF TABLES 2.1 Effect on condition number on choices of orthogonal polynomial basis functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1 Comparison of results from constrained and unconstrained Loeb algorithms for approximation of tanh(x). . . . . . . . . . . . . . . . . . . 54 3.2 Comparison of results from constrained and unconstrained Loeb algorithms for approximation of 0.5x(1 + e−x )−1 . . . . . . . . . . . . . . . 56 3.3 Comparison of SIRS (λ = 0) and polynomial ratio approximations to f (x) fitted with Loeb’s algorithm. . . . . . . . . . . . . . . . . . . . . 66 3.4 The effect of variation of λ on the SIRS approximation. . . . . . . . . 67 4,5 3.5 Comparison of R5,4 (λ = 0) and degree (5,4) polynomial ratio approx- imations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.6 The effect of variation of λ on the SIRS approximation. . . . . . . . . 73 3.7 Comparison of fixed knot SIRS fitted with Loeb algorithm, with variable knot SIRS fitted with L-M algorithm. . . . . . . . . . . . . . . . 76 4.1 Norm of update step p in Newton and Gauss-Newton methods for the photopic efficiency function example (Fig. 4.8). . . . . . . . . 94 5.1 Error norms and iterations to convergence of pole-free rational approximations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.1 Degree 3 ℓp approximation of log(x + 3) using Lawson’s algorithm . . 115 6.2 Degree 3 ℓp approximation of tanh(x) using Lawson’s algorithm . . . 115 2 6.3 Degree 6 ℓp approximation of e−x using Lawson’s algorithm . . . . . v 115 6.4 Degree (5, 5) ℓp rational approximation of tanh(x) using Rice-UsowLoeb algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5 Comparison of approximations obtained using the method of Watson and the RUL algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.6 Degree (5, 5) ℓ∞ rational approximations using Lawson-Loeb algorithm 122 7.1 Approximation of log(x + 3) using Loeb’s algorithm . . . . . . . . . . 136 7.2 Approximation of ex + e−0.5x using Loeb’s algorithm . . . . . . . . . . 136 vi LIST OF FIGURES 1.1 Comparison of ℓ1 and ℓ2 polynomial approximations of data containing outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Equioscillation property of a linear ℓ∞ approximation error. . . . . . 7 8 1.3 Degree 8 ℓ2 polynomial approximation of f (x) = tan(x) at 30 equally spaced points on the interval [0,π]. . . . . . . . . . . . . . . . . . . . 16 1.4 Degree (2,2) rational ℓ2 approximation of f (x) = tan(x) at 30 equally spaced points on the interval [0,π]. . . . . . . . . . . . . . . . . . . . 17 2 1.5 Degree (3, 5) ℓ2 rational approximation of f (x) = e−x containing normally distributed noise. . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 The effect of the pole at x = 3.823 on the approximation of Figure 1.5. 18 18 1.7 Artificial data exhibiting double-sided asymptotic behaviour sampled from f (x) = x(1 + e−x )−1 . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.8 Experimental data from the mesopic efficiency experiment. . . . . . . 27 2.1 The effect of increasing a control point’s weight. . . . . . . . . . . . . 40 3.1 y = tanh(x) on the interval [−1, 1.5] . . . . . . . . . . . . . . . . . . . 52 3.2 y = tanh(x) on the interval [−1, 4] 52 . . . . . . . . . . . . . . . . . . . 3.3 Error of degree (4, 4) polynomial ratio approximations to tanh(x) obtained with constrained and unconstrained Loeb algorithms . . . . . 55 3.4 Approximation error of degree (5, 4) polynomial ratios fitted using constrained and unconstrained Loeb algorithms . . . . . . . . . . . . . . 55 3.5 Extrapolation error of degree (4, 4) polynomial ratio approximations to tanh(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 57 3.6 Extrapolation error of degree (5, 4) polynomial ratio approximations to 0.5x(1 + e−x )−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7 Error curves from degree (5,5) polynomial ratio and SIRS approximations to f (x) = 2 + tan(x). . . . . . . . . . . . . . . . . . . . . . . . . 66 4,5 3.8 Error curves from degree (5,4) polynomial ratio and R5,4 approxima- tions to f (x). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.1 The effect of varying the auxiliary parameter t of the weight function w(x, b) with s = 1, k = 2 held constant. . . . . . . . . . . . . . . . . 80 4.2 The effect of varying the auxiliary parameter s of the weight function w(x, b) with t = 0, k = 2 held constant. . . . . . . . . . . . . . . . . . 81 4.3 The effect of varying the auxiliary parameter k of the weight function w(x, b) with s = 1, t = 0 held constant. . . . . . . . . . . . . . . . . . 81 4.4 First four orthogonal asymptotic polynomials φ˜j (b, x) generated with b = (3, 0, 4)T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.5 601 experimental measurements obtained from material properties of aluminium. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.6 Artificial data with asymptotic limits. . . . . . . . . . . . . . . . . . . 93 4.7 Artificial data with asymptotic limits. . . . . . . . . . . . . . . . . . . 93 4.8 Asymptotic polynomial approximation of photopic efficiency function. 95 4.9 Polynomial of degree 6 and asymptotic polynomial of degree 3 fits to a sigmoid curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.10 Polynomial of degree 9 and asymptotic polynomial of degree 6 fits to 601 measurements of material properties (for aluminium). . . . . . . . 96 4.11 Approximation errors of degree 10 spline and standard asymptotic polynomial fits to tanh(x). . . . . . . . . . . . . . . . . . . . . . . . . 99 5.1 Degree [4,4] pole-free rational approximation to e0.1x cos(2x). . . . . . 107 viii 6.1 Lawson algorithm approximations to data with outliers, for various p < 2116 6.2 Error plots of ℓ4 and ℓ16 rational approximations to tanh(x) using the Rice-Usow-Loeb algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3 Error plot of ℓ∞ rational approximation to tanh(x) using the LawsonLoeb algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 123 ABSTRACT This thesis is concerned with the least-squares approximation of discrete data that appear to exhibit asymptotic behaviour. In particular, we consider using rational functions as they are able to display a number of types of asymptotic behaviour. The research is biased towards the development of simple and easily implemented algorithms that can be used for this purpose. We discuss a number of novel approximation forms, including the Semi-Infinite Rational Spline and the Asymptotic Polynomial. The Semi-Infinite Rational Spline is a piecewise rational function, continuous across a single knot, and may be defined to have different asymptotic limits at ±∞. The continuity constraints at the knot are implicit in the function definition, and it can be fitted to data without the use of constrained optimisation algorithms. The Asymptotic Polynomial is a linear combination of weighted basis functions, orthogonalised with respect to a rational weight function of nonlinear approximation parameters. We discuss an efficient and numerically stable implementation of the Gauss-Newton method that can be used to fit this function to discrete data. A number of extensions of the Loeb algorithm are discussed, including a simple modification for fitting SemiInfinite Rational Splines, and a new hybrid algorithm that is a combination of the Loeb algorithm and the Lawson algorithm (including its Rice and Usow extension), for fitting ℓp rational approximations. In addition, we present an extension of the Rice and Usow algorithm to include ℓp approximation for values p < 2. Also discussed is an alternative representation of a polynomial ratio denominator, that allows pole free approximations to be fitted to data with the use of unconstrained optimisation methods. In all cases we present a large number of numerical applications of these methods to illustrate their usefulness. 1 Chapter 1 INTRODUCTION The main focus of this thesis is the fitting of least-squares approximations to discrete data using rational functions. In this chapter we introduce some of the methods and notation that will be used throughout the thesis and outline the industrial requirements that have motivated this study. We start with a general introduction to the concept of linear approximation and describe common methods used for fitting linear functions to data. We then go on to define generalised rational functions, and describe some of the properties that make them particularly suitable for certain approximation problems. Such problems include the approximation of data or functions that contain singularities, or exhibit asymptotic behaviour such as exponential decay. We then highlight some of the disadvantages and difficulties associated with the use of rational functions for data fitting. Finally we discuss the aims of this project and why it is important to the area of metrology, and include a brief overview of some real world data fitting problems that are particularly well suited for approximation by rational functions. 1.1 Metrology Metrology is the scientific study of measurement, and the development of methods for modelling measurement data is an active area of research in this field. A common problem that arises in metrology is the need to model the functional relationship between two variables x and f , that is represented by a set of m discrete pairs of data 2 points {(xi , fi )}m i=1 ⊂ R , that have been obtained by experimental measurements. 2 A practical example of such a situation is the modelling of the relationship between temperature and pressure of a gas contained in a fixed volume. In this example, each of the xi values represent a specified temperature at which a measurement of the pressure is made and recorded as a value fi . The xi values are considered as fixed, exact values, while the fi are treated as inexact values due to the presence of some measurement error. In practice, there are also measurement errors in the xi values, but for many approximation problems, including those presented in this thesis, we regard the xi values as fixed and without error. The general situation in Metrology that we consider are problems where the xi are known values at which the experimenter is able to record a measurement of the quantity of interest fi . Many of the datasets that we use within this project were obtained in this manner. The contamination of the fi values can be explained by a number of physical reasons, such as imperfections in the measurement instrument, poor calibration, or mechanical defects. Given the data, we are then faced with the problem of trying to find a function that is able to approximate it well. We can represent the noise in the data mathematically as fi = h(xi ) + ǫi (1.1) where h(x) represents an unknown function that explains the true functional relationship underlying the data, and ǫi is a component of measurement error associated with the measurement fi . We treat the function h(x) as a function that describes the data in the absence of any error, and it is this function that we then try and approximate. For any particular problem there may be experimental or theoretical knowledge that will assist in the approximation process, or at least suggest an appropriate choice of function to approximate with. For example, it would be more appropriate to fit experimental data exhibiting exponential decay with a gaussian function rather than a polynomial, or to fit periodic data using trigonometric polynomials rather than standard polynomials. 3 It may also be that the statistical distribution of the measurement errors ǫi is known, which is useful knowledge and affects the choice of method used to approximate the data as we will explain in more detail later. 1.2 Linear approximation of discrete data As described previously, we consider the problem of finding a function that approxi2 mates a discrete set of data points {(xi , fi )}m i=1 ⊂ R . The function chosen to approx- imate the data is denoted by F (x) and is called an approximation form, and is usually chosen from a particular class of functions. Although the functional form is considered to be fixed, F (x) will have a large degree of flexibility due to its dependence on a number of variable coefficients called approximation parameters. The value of these parameters are initially unknown and need to be evaluated in such a way that results in F (x) approximating the data as well as possible. The function F (x) is called a linear approximation if it can be expressed in the form F (a, x) = n X aj φj (x), (1.2) j=1 where {φj (x)}nj=1 is a set of specified linearly independent basis functions and a = (a1 , . . . , an )T ∈ Rn is the vector whose elements are the approximation parameters. It is usually the case that n ≤ m which results in a larger number of data points than unknown approximation parameters. Commonly used basis functions are orthogonal polynomials, splines, radial basis functions [48], and trigonometric functions, as these choices result in a linear form F (a, x) that is simple to evaluate, differentiate, and integrate. Linear approximation forms (1.2) are widely used in data fitting as they are linear with respect to their approximation parameters, and as a consequence these parameters are often easy to evaluate. Once an approximation form F (a, x) has been chosen, it is necessary to find a method of calculating the parameters a that will yield a good approximation to the data. In 4 order to achieve this, a means of assessing the quality of an approximation is needed, which will enable the distinction between good and poor approximations to be made. At any individual point xi we define the approximation error, or residual as ei = fi − F (a, xi ), (1.3) and define the error or residual vector e as e = (e1 , . . . , em )T . (1.4) The smaller the value of the residuals, the better the approximation is, and so we try and find values of a that will minimise the error vector in some way. 1.2.1 Norms A semi-norm is a function g : Rn → R that satisfies the following properties 1. g(c) ≥ 0. 2. g(λc) = |λ|g(c), for all λ ∈ R. 3. g(c + d) ≤ g(c) + g(d). A norm is a function g that qualifies as a semi-norm and also satisfies the property g(c) = 0 ⇔ c = 0. The norm of a vector c is commonly represented by kck, and we will use this notation throughout the thesis. Norms are multi-dimensional abstractions of the absolute value function, and provide a means with which to measure the size of error vectors and hence the quality of approximations. For discrete approximation problems, the most common choices of norm are the ℓp and ℓ∞ norms, which are defined as " m # p1 X kekp = |ei |p , for 1 ≤ p < ∞, i=1 (1.5) 5 and kek∞ = max |ei |, i i = 1, . . . , m (1.6) respectively. 1.3 Best approximations If possible, it would be desirable to find an approximation that will minimise the error norm over every possible choice of the approximation parameters. If there exists a parameter vector a∗ ∈ Rn such that kf − F(a∗ , x)k ≤ kf − F(a, x)k ∀a ∈ Rn , (1.7) where f = (f1 , . . . , fm )T , (1.8) x = (x1 , . . . , xm )T , (1.9) F(a, x) = (F (a, x1 ), . . . , F (a, xm ))T , (1.10) then we refer to F (a∗ , x) as a best approximation to f. Provided that the basis functions are linearly independent, a best approximation will always exist whatever the norm [46], but uniqueness of a best approximation will depend on the particular norm that is being used, and on the choice of basis functions used to define F (a, x). Uniqueness has been proved for best ℓ2 and ℓ∞ (provided the basis functions for a Chebyshev set) approximations [11, 46], but the best linear ℓ1 approximation is not necessarily unique. For some practical problems it may not be possible or even necessary to find a best approximation (although one will always exist), and often an approximation that represents data well and has a small error norm is deemed satisfactory. 6 1.4 ℓ1 approximation ℓ1 approximations are usually fitted to data that are known to be largely accurate, but contain a small number of wild points or outliers. An ℓ1 approximation has a tendency to ignore these outliers, resulting in small error values at the accurate data points and large errors at the outliers. This property is clearly illustrated by the ℓ1 approximation in Figure 1.1. For given abscissae and data vectors x ∈ Rm and f ∈ Rm (where fi = f (xi ) and i = 1, . . . , m), we now state without proof a characterization theorem for a best ℓ1 polynomial approximation. Theorem 1 (ℓ1 Characterization Theorem) Let F (a∗ , x) be an arbitrary polynomial approximation of degree n. We define the set A = {xi ; F (a∗ , xi ) = fi }, and define s(x) as the sign function s∗ (xi ) = 1, fi > F (a∗ , xi ) 0, fi = F (a∗ , xi ) −1, f < F (a∗ , x ). i i The element F (a∗ , x) is a best ℓ1 approximation to f if and only if m X X s∗ (xi )F (a∗ , xi ) ≤ |F (a, xi )|, i=1 xi ∈A for all a ∈ Rn+1 . This theorem is taken from [46] (from which a proof can also be found), and has been modified to accomodate the notation used in this chapter. An additional characteristic of a best polynomial ℓ1 approximation is that it will interpolate the data at at least n + 1 points, where n is the degree of the approximating polynomial. There are a number of methods of fitting a best ℓ1 approximation, including the widely used algorithm of Barrodale and Roberts [9], which is the algorithm that is used in this thesis for fitting linear ℓ1 approximations. 7 3.5 Data l1 approximation l2 approximation 3 2.5 f(X) 2 1.5 1 0.5 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 X Figure 1.1: Comparison of ℓ1 and ℓ2 polynomial approximations of data containing outliers. 1.5 Chebyshev approximation For a given approximation problem where it is known that the data is accurate, contains no outliers, and that the error in the data is uniformly distributed, it is appropriate to fit the data by minimizing the ℓ∞ norm of the residual vector. The resulting approximations obtained in this way are commonly referred to as a Chebyshev, minimax, or uniform approximations. The existence and uniqueness of best Chebyshev approximations can be proved for the case when the approximation form is linear [11]. In addition, the best linear Chebyshev approximation has the following characterization theorem, a proof of which can be found in [46]. Theorem 2 (Chebyshev Characterization Theorem) Let F (a∗ , x) be an element from a linear approximation space of degree n, spanned by basis functions {φj (x)}nj=1 that form a Chebyshev Set. F (a∗ , x) is a best Chebyshev approximation to f if and only if there exists n + 1 points {γ1 , . . . , γn+1 } ⊂ f with γi < γi+1 that satisfy 1. |F (a∗ , γi) − γi | = kF (a∗ , x) − fk∞ i = 1, . . . , n + 1. 8 −6 4 x 10 3 Approximation error 2 1 0 −1 −2 −3 −4 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 X Figure 1.2: Equioscillation property of a linear ℓ∞ approximation error. 2. F (a∗ , γi ) − γi = −(F (a∗ , γi+1 ) − γi+1 ) i = 1, . . . , n. This alternation property exhibited by the Chebyshev error function is sometimes referred to as the equioscillation property and is illustrated in figure 1.2. There are a number of methods of fitting Chebyshev approximations to data, of which the most widely used is probably the algorithm of Barrodale and Phillips [8]. This is the algorithm of choice for any linear Chebyshev approximation problems that are presented in this thesis (unless explicitly stated otherwise). 1.6 Least-squares approximation A least-squares approximation refers to an approximation that has been fitted with respect to the ℓ2 norm (also known as the Euclidean or least-squares norm). Leastsquares approximation is the primary focus of this thesis, and for this reason it is discussed in greater detail than approximation with respect to other norms. Although we are mainly concerned with nonlinear approximations, we will begin with a discussion of linear least-squares problems and solution methods first, as many nonlinear solution methods require solving a set of linearised problems as part of an iterative 9 process. For data that are subject to uncorrelated errors that are believed to follow a Normal distribution with mean zero and constant variance, it is appropriate to fit a leastsquares approximation. Under these assumptions on the error distribution, it can be proved (for the case of linear least-squares approximation) that the maximum likelihood estimates for the approximation parameters are identical to those obtained using the least-squares method. The least-squares norm of the error vector (1.4) is defined as kek2 = " m X i=1 (fi − F (a, xi ))2 # 21 , (1.11) and is the quantity we wish to minimise over all possible approximation parameters a. The function (1.11) has the same minimum as its square, which is easier to work with, and we express this in matrix form as E(a) = kf − Cak22 = (f − Ca)T (f − Ca), (1.12) where C is the m × n observation matrix which is defined as the matrix having i, jth element Cij = φj (xi ). This is a quadratic function of the elements of the parameter vector, and will have a turning point at a if it is a solution of the set of equations ∂E(a) = 0, ∂ak i = 1, . . . , n. (1.13) The system of equations (1.13) are referred to as the normal equations, which have a global solution given by a = (C T C)−1 C T f. (1.14) The normal equations will only have a solution if the the observation matrix C is of full rank. This is not be the case when data contain repeated abscissae values xi . The parameter vector solution to the normal equations gives a turning point of the function E, but we need verify that it is a global minimum. The first partial derivatives of E with respect to the parameters can be expressed in matrix form as ∂E = 2C T (f − Ca). ∂a (1.15) 10 The n × n Hessian matrix of E is defined as the matrix having i, jth element Hij = ∂2E , ∂ai ∂aj (1.16) and can be represented with respect to the observation matrix C as H = 2C T C. (1.17) H is symmetric positive definite if the observation matrix C is of full rank. As has been mentioned previously, this will be the case provided the abscissae are distinct and the basis functions are linearly independent. Therefore, under these conditions, the solution vector a given by the normal equations is the minimum of E(a). This solution must be unique as E is a quadratic function of the approximation parameters, and therefore has a unique stationary point. 1.6.1 QR Factorisation A potential problem with the normal equations (1.13) is that they may suffer from ill-conditioning due to the fact that the condition number of the matrix C T C is dependent on the square of the condition number of the observation matrix C. This will have little effect if basis functions are chosen to be orthogonal polynomials, but for other basis functions it can pose a serious problem. If we are presented with an ill-conditioned system of normal equations, solving them via matrix inversion will exacerbate the problem. This can be avoided with the use of QR factorisation. The m × n observation matrix C can be factorised as R , C = Q 0 (1.18) where Q is an m × m orthogonal matrix, R is an n × n upper triangular matrix, and 0 is the (m − n) × n zero matrix. Given the matrix Q, we can also factorise f as θ1 . f = Q (1.19) θ2 11 from which we can rewrite (1.12) as 2 θ1 R − a . Q E(a) = θ2 0 (1.20) by an orthogonal matrix, this reduces to 2 θ1 R − a . E(a) = θ2 0 (1.21) 2 Due to the fact that the ℓ2 norm of a vector is invariant with respect to multiplication 2 Thus E(a) is minimised when a is a solution to the equation Ra = θ1 , (1.22) and its minimum value given by kθ2 k22 . Using QR factorisation does not require matrix inversion as equation (1.22) can be solved by back substitution. It is also possible to form the QR factorisation where Q ∈ Rm×n is formed of orthogonal columns, and R ∈ Rn×n is a square matrix [24]. There are a number of ways to obtain a QR factorisation of a matrix, some more numerically stable than others. A discussion of a number of such methods and their numerical stability can be found in [24]. 1.6.2 Orthogonal polynomials The ill-conditioning of the normal equations is dependent on the type of basis functions used. As an example, the use of monomial basis functions for least-squares approximation typically results in an ill-conditioned observation matrix, particularly when the abscissae are spaced uniformly. This ill-conditioning can be improved with the use of different polynomial basis functions such as orthogonal polynomials. Given vectors f, g ∈ Rm , an inner product is defined as a function from Rm × Rm → R, whose result is denoted by < f, g >, and satisfies 1. < f, g >≥ 0, with equality when f = 0 12 2. < f, g >=< g, f > 3. < f, ag + bh >= a < f, g > +b < f, h >, where a, b ∈ R. A simple example of such a function is the standard scalar product in Rn defined by < f, g >= m X fj gj . j=1 The definition of an inner product also applies when arguments are continuous functions rather than vectors. For example, the following function taking arguments f, g ∈ C[−1, 1] defined by < f, g >= Z 1 f (x)g(x)w(x)dx, −1 also satisfies the axioms for an inner product, where w(x) ∈ C[−1, 1] > 0. A set of n polynomial basis functions φj (x)m j=1 ∈ R are said to be orthogonal polynomials if < φl (x), φk (x) >= clk δlk , where clk is a constant and δlk is the Kronecker delta symbol that takes value 1 when l = k, and 0 when l 6= k. If the constant clk is equal to 1 for all values of l, k then the polynomials are orthonormal. Use of orthogonal polynomial basis functions usually results in a well conditioned observation matrix. One of the most widely used orthogonal polynomials are the Chebyshev polynomials, which are defined on [−1, 1] and are orthogonal with respect to the inner product < φl (x), φk (x) >= Z 1 −1 φl (x)φk (x) 1 (1 − x2 ) 2 dx. (1.23) Specifically these are referred to as Chebyshev polynomials of the first kind [40], and the degree n Chebyshev basis function denoted as Tn (x). As well as having nice numerical properties, the basis functions Tn (x) also naturally exhibit the equioscillation 13 property between -1 and 1 on the interval [-1,1]. The function 2(n−1) Tn (x) is the best Chebyshev approximation to zero on [-1,1] and this property makes them useful for fitting best Chebyshev approximations to data. There are a number of other orthogonal polynomials that are widely used including Hermite, Legendre, Laguerre, and Chebyshev polynomials of the second, third, and fourth kind [38],[39],[40]. All orthogonal polynomials also have a general three term recurrence relation φi (x) = (ai x − bi )φi−1 (x) − ci φi−2 (x), where the constants ai , bi , ci depend on the type of orthogonal polynomial. In the case of Chebyshev polynomials, this recurrence relation is given by Tn (x) = 2xTn−1 (x) − Tn−2 (x), (1.24) with T0 (x) = 1 and T1 (x) = x. 1.7 Rational approximation forms We define a generalised rational function of degree (n, m) as n (p, q, x) = Rm where Pn (p, x) = Qm (q, x) = Pn (p, x) Qm (q, x) n X i=0 m X (1.25) pi φi (x), qj ψj (x), j=0 {φi(x)}ni=1 ,{ψj (x)}m j=1 are sets of linearly independent basis functions, and p = (p0 , . . . , pn )T ∈ Rn , (1.26) q = (q0 , . . . , qm )T ∈ Rm , (1.27) 14 are vectors of approximation parameters. This defines a broad class of rational functions but in this thesis we are mainly concerned with polynomial ratios using monomial or orthogonal polynomial basis functions. In addition, we will only be concerned with rational approximation forms (1.25) with real-valued approximation parameters and basis functions. Data fitting for rational functions proceeds in the same way as it does in the linear case, that is we calculate approximation parameters q, p that minimise kf − Rnm (p, q x)k , (1.28) where f = (f1 , . . . , fN )T ∈ RN , (1.29) x = (x1 , . . . , xN )T ∈ RN , (1.30) and n n Rnm (p, q, x) = (Rm (p, q, x1 ), . . . , Rm (p, q, xN ))T ∈ RN . (1.31) When using rational functions for approximation purposes, it is also necessary to force a normalisation constraint on the approximation parameters. The reason for this is that for any non-zero choice of parameters p and q we have n n Rm (p, q, x) = Rm (λp, λq, x) for any constant λ 6= 0, which allows a particular rational function to be defined by an infinite number of different parameter vectors. Clearly this will pose a problem for approximation purposes, and so we normalise the parameters, usually with the constraint q0 = 1 or forcing kqk = 1. Although the principles of data fitting with rational functions are the same as for linear functions, the problem is considerably more complicated as the error norms that we wish to minimise are nonlinear with respect to the approximation parameters. As a consequence we are often required to use methods applicable to nonlinear problems, 15 which are in general more complicated and computationally intensive than the linear methods described previously. Polynomial ratios are the most widely used rational approximation form and are the study of the majority of research literature in rational approximation. The most notable exception to this is the study of non-uniform rational B-splines (NURBS) that are used extensively in the area of Computer Aided Geometric Design (CAGD). A slightly more detailed introduction to NURBS is given in chapter 2. 1.8 Properties of Rational Functions Rational approximation forms, such as those defined in (1.25) have a number of features and intrinsic properties that make them a particularly suitable as choice of form, especially when the data or function being approximated exhibits certain types of behaviour. In such cases, rational forms can provide far superior approximations to those obtained using linear forms. We now describe the kind of situations and behaviour for which rational function approximations are particularly well suited, and also discuss some of the advantages and disadvantages associated with their use. 1.8.1 Existence of poles If we consider the problem of approximating f (x) = tan(x) on the range [0, π], we are dealing with a function that has a simple pole at x = π2 . This particular example illustrates a type of functional behaviour that linear approximations (particularly those of low degree) lack the ability to approximate effectively. This can be seen in figure 1.3 which shows a least-squares polynomial approximation to a discrete set of points sampled from f (x) = tan(x) on [0, π]. The approximation is generally poor over the entire interval, but appears to be particularly bad in the vicinity of the pole. Unlike linear forms, a rational function is also able to have a simple pole and provides an approximation form that has the potential to reproduce the same asymptotic be- 16 20 15 10 5 0 −5 −10 −15 −20 0 0.5 1 1.5 2 2.5 3 Figure 1.3: Degree 8 ℓ2 polynomial approximation of f (x) = tan(x) at 30 equally spaced points on the interval [0,π]. haviour as the data being approximated. Figure 1.4 shows a least-squares polynomial ratio approximation of the same set of points used in Figure 1.3. The superiority of this rational approximation is clear to see, and the pole at x = π 2 has been fitted in the rational function to an accuracy of 6 decimal places. The rational form fitted to the data in this example is of degree (2,2) and despite having fewer approximation parameters than the polynomial approximation of figure 1.3 it still provides a superior approximation. The ability of rational forms to have poles is clearly advantageous in the case of the previous example, but this may not always be the case. Sometimes a pole is reproduced in the approximant with much less precision, and sometimes not reproduced at all. This is particularly true when approximating a function with a large number of poles. Careful thought needs to be given to the degree of rational function used to approximate a function or data known to contain poles, as the maximum number of poles the approximation may have is equal to the degree of the denominator. Although it has been shown that pole fitting is a potentially useful property of rational functions, it can also cause problems. It is particularly troublesome when 17 50 40 30 20 10 0 −10 −20 −30 −40 −50 0 0.5 1 1.5 2 2.5 3 Figure 1.4: Degree (2,2) rational ℓ2 approximation of f (x) = tan(x) at 30 equally spaced points on the interval [0,π]. the data or function being fitted does not contain poles, but the resultant rational approximation does contain poles. In such cases this may not be a problem provided the poles lie outside the interval of approximation and away from the data. If there are poles present in the approximation range then this commonly leads to an increase in the approximation error in the vicinity of the pole. Figure 1.5 shows a degree (3, 5) rational least-squares approximation of a set of 2 equally spaced points from the function f (x) = e−x to which a small amount of normally distributed noise has been added. It can be seen that the approximation is good on the majority of the range, but an unwanted pole has been fitted at x = 3.823. Figure 1.6 provides a magnified view of the region near the pole and clearly illustrates the detrimental effect that this has on the quality of the approximation. 1.8.2 Numerical considerations Numerical error also needs to be considered in relation to the problem of unwanted poles. Consider the case where we have used an arbitrary algorithm to fit a rational 18 1.5 1 0.5 0 −0.5 −1 −4 −3 −2 −1 0 1 2 3 4 2 Figure 1.5: Degree (3, 5) ℓ2 rational approximation of f (x) = e−x containing normally distributed noise. 0.15 0.1 0.05 0 −0.05 −0.1 −0.15 −0.2 −0.25 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 Figure 1.6: The effect of the pole at x = 3.823 on the approximation of Figure 1.5. 19 approximation to a set of data, and the algorithm has converged to a solution. Let us further assume that this solution takes the form n−1 X (x − c) pi xi (x − c) i=1 m−1 X , (1.32) qi xi i=1 where c ∈ R. If this is the case, the factor (x − c) common to denominator and numerator is able to be removed from the function. However, it may be the case (due to numerical issues such as rounding error) that the algorithm has converged to a solution that is of the form (x − (c + δ1 )) (x − (c + δ2 )) n−1 X i=1 m−1 X pi xi , (1.33) i qi x i=1 where δ1 6= δ2 are very small non-zero real numbers. In this case cancellation is not possible and so we are faced with the question of whether the pole in the approximation should be there or not. Rational functions that have had all factors common to numerator and denominator removed are referred to as irreducible. To help avoid this kind of potential problem, it is possible to prevent unwanted poles by constraining the parameters in a way that forces the denominator to be strictly positive on the approximation range. It is also possible to try and force the rational function to fit poles outside of the range of approximation, or explicitly define a factor (x − c) in the denominator, where c ∈ R lies outside the approximation range. In particular we can try and enforce the roots of the denominator polynomial to be complex, and some simple methods to achieve this are presented in chapter 5. 1.8.3 Nonlinearity As has been previously mentioned, the major difficulty associated with rational approximation is the nonlinearity of the approximation parameters. Methods for fitting 20 linear approximations are not immediately applicable to nonlinear problems, and instead we will often need to utilise nonlinear optimisation methods. There are a number of optimisation techniques available for fitting general nonlinear forms to data and these can be applied to the rational approximation problem. Optimisation methods are useful tools but they can be computationally intensive, and often their convergence is conditional on an appropriate selection of initial values for the approximation parameters. Some of the most commonly used optimisation methods (such as the Newton (2.8.1) and Gauss-Newton methods (2.8.2)) are utilised extensively in this thesis and will be described in detail in Chapter 2. Despite the nonlinearity of the parameters in a rational function, there are a number of methods available for fitting rational approximations that only require the solution of linear systems of equations (usually as part of an iterative process). Such methods include the Gauss-Newton and Newton methods [18] mentioned previously. Others include methods for finding rational interpolants [51],[23],[33], Padé approximations [4], Thiele interpolants [14], and weighted iterative methods for fitting rational approximations (such as Loeb’s algorithm [7] and the Differential Correction Method [27]). We will describe some of these methods in greater detail in Chapter 2. 1.8.4 Asymptotic limits Another advantageous feature of rational functions is that they may be used to approximate over an infinite range. They can also be used to approximate functions that exhibit certain types of asymptotic behaviour as their arguments tend to infinity. A simple example of this is the approximation of tanh(x) on the interval [0, ∞). The function tanh(x) the finite limit lim tanh(x) = 1 x→∞ (1.34) and so the use of a polynomial to approximate it on this interval would be seem to be an unsuitable choice of form, due to the fact that all polynomials have an asymptotic 21 limit of ±∞. However, there is a subset of generalised rational functions that also have a constant asymptotic limit, and it would then seem natural to use such an approximant for this problem. We now describe some of the types of asymptotic behaviour that rational functions are able to model effectively. 1.8.5 Decay to a constant value as x → ±∞ We consider the problem of approximating data sampled from a function f (x) that has known asymptotic behaviour given by lim f (x) = α, (1.35) x→∞ where α 6= 0 is a real valued constant. Asymptotic decay to a constant is an intrinsic property of ratio of equal degree polynomials, and so it would seem sensible to approximate f (x) using an approximation of this kind. The exact asymptotic limit of a degree (n, n) polynomial ratio Rnn (p, q, x) is given by lim Rnn (p, q, x) = x→∞ pn , qn (1.36) where pn and qn are the coefficients of the highest degree polynomial basis functions used in the numerator and denominator respectively. Therefore, we can explicitly force a degree (n, n) polynomial ratio approximant to have exactly the same limit as f (x) by imposing the parameter constraint pn = qm α. (1.37) This constraint can be enforced explicitly in the function definition using substitution, or could be utilised in the form of a parameter constraint as part of the approximation process. We have specified an asymptotic limit as x → ∞ for illustrative purposes, and the same results will apply to a limit specified as x → −∞. We investigate the quality of approximation using rational forms with the same limiting behaviour enforced upon them in later chapters. 22 1.8.6 Decay to zero as x → ±∞ Now suppose we wish to approximate data sampled from a function f (x) that has known asymptotic behaviour given by lim f (x) = 0. x→∞ (1.38) There are a large number of generalised rational functions that have the same asymptotic limit as that in (1.38). The most obvious choice of such a function would be a polynomial ratio whose numerator degree is less than that of the denominator, which leads to a large amount of potential choices for the degrees n, m. 1.8.7 Approximating limiting behaviour of the type xk as x → ±∞ Now we consider the problem of approximating data from a function f (x) that has the limit f (x) = α, (1.39) x→∞ xk where α ∈ R and k ∈ N are known constants. This type of limiting behaviour can be lim achieved once again by a ratio of polynomials, provided that the numerator degree is larger than that of the denominator. We restrict our choice of approximant to degree n (n, m) rational approximant by Rm (p, q, x) which has positive asymptotic limit given by n pn Rm (p, q, x) , n > m, (1.40) = lim (n−m) x→∞ x qm where pn and qm are the coefficients of the highest degree numerator and denominator basis functions respectively. Clearly our rational approximant will then share the asymptotic limit (1.39) provided that the following conditions are satisfied n − m = k, pn = qm α. (1.41) (1.42) The first of these conditions is easily imposed as the degree of the approximating form needs to be fixed prior to approximation anyway. The second condition can be 23 imposed explicitly within the definition of the form by direct substitution. It may also be able to achieve this constraint as a part of the approximation algorithm being used for data fitting. We now consider approximation of functions with the negative asymptotic limit f (x) = α, x→−∞ xk (1.43) lim for k ∈ N and α ∈ R. In the same way as in the previous example we will need to satisfy constraint (1.41), and then the equivalent negative asymptotic limit for the approximant will be n Rm (p, q, x) lim = x→−∞ xk pn . qm (1.44) From this equation it is clear that we can make the approximant satisfy the limit (1.43) with the constraint (1.42) as before. We do not have to deal separately with the issue of the limit of xk as x → −∞ for odd number values of k, as (1.42) ensures the correct limiting behaviour of the function. 1.8.8 Approximation of double sided asymptotic limits We now need to address the problem of approximating a function or data over the entire real line. In this case we are faced with task of modelling two asymptotic limits, one as x → −∞, and one as x → +∞. We will consider the problem of approximation of functions f (x) that have asymptotic limits of the form limx→∞ fx(x) k1 = α1 , limx→−∞ fx(x) = α2 , k2 (1.45) where α1 , α2 ∈ R and k1 , k2 ∈ N. An example of such a dataset exhibiting doublesided asymptotic behaviour is illustrated in figure 1.7. As in the previous section, a polynomial ratio would seem to be the most appropriate choice of rational approximation form. The limits of a polynomial ratio are either zero, or given by (1.40) or (1.44) depending on the degrees of numerator and denominator. Therefore, it is not possible 24 5 4 Y 3 2 1 0 −1 −5 −4 −3 −2 −1 0 X 1 2 3 4 5 Figure 1.7: Artificial data exhibiting double-sided asymptotic behaviour sampled from f (x) = x(1 + e−x )−1 . for a standard polynomial ratio approximant to possess the limits (1.45) except for special cases where k1 = k2 and α1 = α2 . In the case of modelling asymptotic decay to zero in both directions this will not pose a problem, as we can set k1 = k2 = 0, but for the more generic problem involving mixed limits, we require a new rational form to approximate the data. This problem is revisited in more detail in chapter 3. We also consider problems with double limits specified by (1.45) but extend the definition to deal with non-integer values for the exponents k1, k2. This vastly increases the different types of data that may be approximated, as we are no longer restricted to consideration of integer powers of x as a limit. Clearly this problem cannot be dealt with using standard polynomial ratios, and so new approximation forms and methods are required. This work will be presented in chapter 4. The least-squares approximation of data that exhibits these types of asymptotic behaviour is the primary focus of this project, and this is the reason that specific study of rational functions has been undertaken. The particular types of asymptotic behaviour described here have been chosen as they are of specific interest to the National Physical Laboratory (NPL) who are the collaborating partial sponsor of this project. 25 The reason for their interest in these specific types of asymptotic behaviour is because they occur frequently in physical systems (particularly decay to zero or decay to a constant), or are exhibited by the solutions to some types of differential equations. The objective of the project is to obtain good approximations to these specific types of data set with the use of traditional rational functions and some new nonlinear approximation forms that are of a rational nature. Specifically, we hope to obtain good approximations with the use of carefully selected approximation forms that mimic the asymptotic behaviour of the observed data, and investigate whether they are superior to those obtained with approximation forms that do not (such as polynomials and splines). It is important to specify that we will consider two types of problem 1. The approximation of data that has a specified type of asymptotic behaviour that is known beforehand. An example of such a problem is the modelling of data from an experiment where there is some theoretical knowledge of the physics of the experiment that provides knowledge of the asymptotic behaviour. 2. The approximation of data that has an implied asymptotic behaviour based on the shape of the data in question, but for which there is no theoretical justification. Within this project, approximation of data that falls into the second of these categories is dealt with more frequently than the first. 1.9 Example applications from industry This chapter has highlighted some of the properties of rational functions that make them a particularly appropriate form for modelling the kinds of asymptotic behaviour exhibited by many physical systems. These types of asymptotic behaviour are found to occur in many of the physical systems currently being investigated by the National Physical Laboratory (NPL) in the area of Metrology, and explains their interest in 26 this project. An example from physics that illustrates a problem suitable for rational functions is the approximation of the solution of the Blasius equation. 1.9.1 The Blasius equation The Blasius equation occurs in the boundary layer problem of hydrodynamics, and is a differential equation which can be expressed as d2 y d3 y +y 2 =0 dx3 dx (1.46) subject to the boundary conditions y(x) = y ′(x) = 0 at x = 0 and y ′(x) → γ as x → ∞, where γ is a known positive constant. We see immediately from the last of these boundary conditions that the solution to this equation has the asymptotic behaviour described in section (1.8.7) and is therefore a suitable candidate for approximation using a polynomial ratio. Finding rational function solutions to the Blasius equation (1.46) has been studied by Mason [37]. 1.9.2 Mesopic efficiency functions Another application of rational forms is the approximation of data that has been used to calculate a mesopic efficiency function. This is an experiment undertaken by City University, the aim of which is to model the sensitivity of the eye to light of a variety of wavelengths. Data have been collected experimentally from an experiment which records the length of time it takes a human subject to respond to light signals of differing wavelength and intensities. As it is known that the human eye can only detect electromagnetic radiation lying inbetween infra-red and ultra-violet wavelengths, we know that the response variable will decay to zero as wavelength increases and decreases. The actual data from one of these experiments are shown in figure 1.8 . This particular experiment and approximation of the recorded data is discussed again in chapter 4. 27 1 0.9 0.8 0.7 Efficiency 0.6 0.5 0.4 0.3 0.2 0.1 0 350 400 450 500 550 600 Wavelength 650 700 750 800 850 Figure 1.8: Experimental data from the mesopic efficiency experiment. 28 Chapter 2 EXISTING METHODS FOR FITTING RATIONAL APPROXIMATIONS 2.1 Introduction In this chapter we describe some of the more popular methods used for fitting rational approximations of the form (1.25) to discrete data or functions. Some of these methods have been proved to converge to a best approximation in certain norms. The majority of the methods presented here are used at some stage in the thesis, with others mentioned only to illustrate areas of rational approximation that have been researched and found not to be directly applicable to the project. We also describe some common non-linear optimization techniques that may be used to fit rational functions. 2.2 The Loeb algorithm Loeb’s algorithm is a weighted iterative procedure that can be used to fit a generalised rational function to a set of discrete data points {(xi , fi )}N i=1 . Instead of minimising the norm of the residual vector (1.28), the algorithm works by solving min k∆(k) (p(k) , q(k) , x)k p(k) ,q(k) (2.1) at iteration k where ∆(k) ∈ RN is the vector with ith element (k) ∆i = 1 Qm (q(k−1) , xi ) (Pn (p(k) , xi ) − fi Qm (q(k) , xi )). (2.2) The function (Qm (q(k−1) , x))−1 is treated as a known weight function, and is obtained by evaluating the denominator function using the parameters q(k−1) from the 29 previous iteration. The result of this is that the original nonlinear approximation problem has been simplified to an iterative weighted linear problem, with the fixed weight vector replacing the denominator function. The approximation parameters are usually normalised by setting q0 = 1 and the algorithm initialised with weight Qm (p(0) , q(0) , x) = 1. Variation of the choice of start weight has been found to have very little effect on the performance of the algorithm. The Loeb algorithm may be applied to any norm , although it was originally suggested for the ℓ∞ norm by Loeb [32] and subsequently for the ℓ2 norm by Whittmeyer [53]. Barrodale and Mason [7] apply the Loeb algorithm to fit approximations to discrete data using the ℓ1 , ℓ2 and ℓ∞ norms. This method is attractive due to its ease of implementation, although it is sometimes unreliable due to lack of convergence. From our experience with the algorithm, we found it to converge in the vast majority of cases and when it does converge it does so to good approximations, particularly for leastsquares problems. These findings are also verified by the results of Barrodale and Mason [7] who have used the method to approximate a large number of functions, including those with poles. A drawback of the algorithm is that, as far as we are aware, there is no convergence proof for the algorithm, and when used for least-squares problems even if the algorithm does converge, it is almost certain not to converge to a best approximation [7]. Despite this, the parameters obtained from one or two iterations of the Loeb algorithm are often a good choice of start parameters for classical optimisation methods such as Newtons method (described in section 2.8.1). We have also noticed that the application of optimisation methods from these start parameters, only generates a very minor improvement in the quality of the approximation. 2.2.1 Least-squares approximation with the Loeb algorithm We now describe application of the Loeb algorithm to the specific case of least-squares approximation with polynomial ratios, as it will be applied extensively throughout the 30 thesis. For the ℓ2 rational approximation problem we wish to minimise the quantity 2 N X P (p(k) , xi ) , (2.3) fi − Q(q(k) , xi ) i=1 at iteration k. We linearise this quantity by multiplying each residual by the denom(k) inator and a fixed weight term wi N X (k) wi i=1 where the weight to give 2 fi Q(q(k) , xi ) − P (p(k) , xi ) , (k) wi = (2.4) 1 (2.5) Q(q(k−1) , xi ) is chosen to simulate the effect of the denominator. The use of a fixed weight reduces the problem to a linear system which can be solved easily. Because of the normalisation condition q0 = 1, we can write (2.4) as N X (k) wi i=1 fi + fi m X (k) qj xji j=2 − n X (k) pj xji j=1 2 . (2.6) and this least-squares problem can be represented in matrix form at iteration k as W (k) f = W Cd(k) . (2.7) Here the observation matrix C ∈ RN ×(m+n+1) is defined by its i, jth element Cij = xji , (j = 1, . . . , n + 1), (j−(n+1)) Cij = −fi xi (2.8) , (j = n + 2, . . . , m + n + 1), (2.9) W (k) ∈ RN ×N is a diagonal matrix of weight terms defined by its i, jth element (k) = wi (k) = 0, (i 6= j), Wii Wij (k) (k) (k) (k−1) , (i = 1, . . . , N), (2.10) (2.11) (k) and d(k) = {p0 , . . . , pn , q1 , . . . , pm } ∈ Rm+n+1 is the vector of approximation parameters at step k. The Loeb algorithm is then easily implemented by solving the linear least-squares system (2.7) using QR factorization (1.6.1). 31 2.2.2 Ill-conditioning associated with the Loeb algorithm The matrix C defined in equation (2.8) is susceptible to ill-conditioning as a consequence of its structure. Furthermore, this ill-conditioning is not dependent on the choice of polynomial basis used. To illustrate this, we compare the condition number of the matrix C formed from a degree (5, 5) polynomial ratio using a monomial basis and a Chebyshev polynomial basis. As data points we take 51 equally spaced abscissae on [−1, 1] at which we evaluate the function cos(x) to represent the data points fi . Forming the observation matrix C using the monomial basis leads to a condition number of 6.266 × 109 while the same matrix calculated with Chebyshev polynomial basis has condition number 5.155 × 109 . Approximation using polynomials with uniformly spaced abscissae results in ill-conditioning [40], but if we change the abscissae to be the zeros of the degree 51 Chebyshev polynomial basis function T51 (x) the condition number of C (using Chebyshev bases) is now 4.517 × 109 . The use of Chebyshev basis functions that are evaluated at Chebyshev zeros gives a well conditioned system of equations [40], but in this case it is not so, and we conclude that the ill-conditioning is associated with the Loeb method itself. In an attempt to improve the conditioning we tried degree (5, 5) rational functions defined with different orthogonal polynomial basis functions in numerator and denominator. The condition number for a selection of these different choices is shown in table 2.1. We can explain the possible reasons for this ill-conditioning by writing the matrix C as P BQ (2.12) where P and Q are the standard observation matrices obtained from the numerator and denominator basis functions {φi (x)}ni=0 and {ψj }m j=1 respectively (1.25), and B is the diagonal matrix with ith diagonal element Bii = fi and all other elements equal to zero. The matrix B has the effect of multiplying row i of Q by the constant value 32 Table 2.1: Effect on condition number on choices of orthogonal polynomial basis functions Numerator basis Denominator basis Cond(C) Chebyshev Monomial 4.024 × 109 Chebyshev Legendre Legendre Chebyshev Hermite Monomial 4.189 × 109 5.130 × 109 1.088 × 1011 fi , and so when we use the same basis functions in numerator and denominator we end up with columns that are identical up to a diagonal matrix multiplication. Now if we consider the case of approximating data from a function having almost zero gradient, using monomial basis functions then we will have column j of P almost equal to column j + 1 of Q and such linear dependence between columns will result in serious ill-conditioning or numerical rank deficiency. This is of particular importance when bearing in mind the kind of approximation problems that we are interested in as described in sections (1.8.7) using monomial basis functions. In such cases the multiplicative function values fi ≃ xki for integer values k (1.39), which leads to linear dependence between certain columns of P and Q. The conditioning is reasonable (in the region of 106 ) for degree (n, m) approximations for which n + m ≤ 8 but quickly becomes worse for higher degrees. We have also observed that in general when using a monomial basis, the condition number of C is minimal for the choice of degree with n = m. This will be due to the fact that the matrices P and Q are themselves monomial basis observation matrices, which are generally ill-conditioned to start with. The highest degree monomial basis function contained in C will be xmax(n,m) and clearly max(n, m) is minimised when n = m. 33 Observations on the behaviour of Loeb’s algorithm For least-squares problems, we have found that the Loeb algorithm is very easy to apply and in general gives good approximations, that are only slightly poorer in quality to those obtained using optimisation techniques. However, after extensive use of this algorithm, we have noticed some patterns of behaviour that are worth mentioning. Firstly, it appears that in a large number of cases where the algorithm fails to converge it seems to do so through a poor choice of degree for the shape of the data. As an example, a degree (3,3) polynomial ratio was used to approximate 51 sampled points 2 of the function f (x) = e−x on the interval [−4, 4], with a small amount of gaussian noise present. Convergence was defined as occurring when the Chebyshev norm of the difference between the solution vectors from two successive iterations was less than 10−10 . Using the Loeb algorithm to fit the (3,3) polynomial ratio, convergence still had not occurred after 250 iterations. However, by approximating with a (2,2) polynomial ratio, the algorithm converged within 10 iterations, despite having two less approximation parameters than the (3,3) degree approximation. This behaviour has been observed in other cases also, usually when the function being approximated is even, and the degree of the approximant is odd (and vice versa). Another observation is that the algorithm can often get ’stuck’ and solution parameters from successive iterations will oscillate between 2 or more sets of solution parameters. In some cases the algorithm has been seen to get stuck between a set of 6 distinct sets of solution parameters. In all cases where we have observed this behaviour, at least one of sets of parameters yields an approximation that contains poles in the approximation range. In such cases, it is possible to choose to terminate the algorithm choosing the best set of parameters from the set that the algorithm oscillates between. This behaviour is most commonly seen when the degree of the approximant is too small and can be overcome by increasing the degree of numerator or denominator (or both). 34 2.3 The Differential Correction Method (DCM) The differential correction method is an iterative algorithm specific to the ℓ∞ norm, and is proven to converge to the best Chebyshev rational approximation on a discrete data set. The algorithm as we describe it here is specific to the case of approximation with polynomial ratios. The DCM has two variations, the first of which was put forward in [12]. We describe here the modified version given in [11] in which a proof of (at least linear) convergence is also given. Starting with the same data points and approximation problem described in (2.2), the DCM begins by choosing an arbitrary initial rational function R(0) (x) = P (0) (x)/Q(0) (x) where we define P (k) (x) = Pn (p(k) , x), (2.13) Q(k) = Qm (q(k) , x), (2.14) for ease of notation. The only restriction placed on R(0) is that it has no poles within the interval of approximation. Then at iteration k, denote the maximum current approximation error as ∆(k) = kf − R(k) (x)k∞ , (2.15) and then minimise the quantity δ (k) = max |fi Q(k+1) (xi ) − P (k+1) (xi )| − ∆(k) Q(k+1) (xi ) (2.16) kQ(k+1) (x)k∞ = 1. (2.17) 1≤i≤N is minimised with respect to the parameters p(k) , q(k) subject to the constraint The algorithm terminates when we find a function R(k+1) (x) such that δ (k) ≥ 0 and the best approximation is then given by R(k) (x). The original DCM [12] varies from the modified version only in the definition of δ (k) which becomes δ (k) = max 1≤i≤N |fiQ(k+1) (xi ) − P (k+1) (xi )| − ∆(k) Q(k+1) (xi ) Q(k) (xi ) . (2.18) 35 The modified version of the DCM was widely in favour of the original version due to its proved linear convergence properties, however, the original version was subsequently proved to have quadratic convergence [27]. There are a number of variations of the DCM that have been published, most involving some form of constrained approximation. Kauffman and Taylor implement a version that allows linear constraints to be placed on the approximation parameters [28] as well as another version that places a strictly positive lower bound on the denominator function [29]. Gugat [25] presents an implementation of the algorithm that forces the denominator to be bounded above and below by continuous functions. The general DCM was also extended by Cheney and Powell [13] for approximation using generalised rational functions and proved to have superlinear convergence subject to a unique solution. 2.4 Other work on rational approximation In addition to the Differential Correction method and its variants, there are a number of other methods for fitting Chebyshev rational approximations, for both discrete data, and function approximation. The exchange algorithm of Remes [14] is another method often referred to, and is the rational equivalent of the exchange algorithm used to fit best linear Chebyshev approximations. Kauffman, Leeming and Taylor [20] consider an approach using a combined Remes-Differential correction method for fitting approximations on subsets of [0, ∞), and another method for approximation on the same interval using polynomial reciprocals [19]. A number of other methods of fitting Chebyshev rational approximations are put forward by Maehly in [34, 35]. We have not found a great deal of material in the field of least-squares rational approximation. There is a huge amount of material and research on nonlinear least-squares optimisation methods, but we have only found a small amount of material specifically concerning rational approximation of discrete data. The paper of Pomentale [45] deals 36 with fitting pole free least-squares rational approximations with a denominator function Q(x) bounded below by a parameter ǫ > 0. We have also found some research on complex variable least-squares rational approximation [5] 2.5 Padé approximation The area of Padé approximation is very large and there is a considerable amount of literature in this field, and for this reason, any discussion of rational approximation would be incomplete without mentioning Padé approximations. Let us assume that we are trying to approximate a function, and that this function can be defined by a power series f (z) = ∞ X fi z i . (2.19) i=0 If there exist two polynomials Pn (z) and Qm (z) of degrees n and m respectively such that Qm (z)f (z) − Pn (z) = O(z n+m+1 ), (2.20) then the function πnm = Pn /Qm is called the Padé approximant of order (n, m) of f (z). It can be viewed as matching the power series f (z) truncated at the (n+m+1)th term. This definition has been taken from [43] which also describes how to calculate the Padé approximant as follows. If we express the polynomials P, Q by = an z n + . . . + a1 z + a0 , Pn (z) Qm (z) = bm z m + . . . + b1 z + b0 , we can see that condition (2.20) is equivalent to equating the coefficients of z k to zero for the function Qm (z)f (z) − Pn (z) for values k = 0, . . . , n + m. This leads to the requirements a0 = f0 b0 , a1 = f0 b1 + f1 b0 , ... an = Pmin(n,m) i=1 fn−i bi + fn b0 , 37 and b0 fn+1 + b1 fn + . . . + bm fn−m+1 = 0, b0 fn+2 + b1 fn+1 + . . . + bm fn−m = 0, ... b0 fn+m + b1 fn+m−1 + . . . + bm fn = 0, where fi = 0 when i < 0. The second system of equations above always has a solution as it is a linear system of m equations in m + 1 unknowns b0 , . . . , bm , from which the coefficients of Pn (z) are easily obtained using the first system of equations above. There is a large amount of research into Padé approximation, not only in the univariate case but in the multivariate case too [16],[1]. Other areas are Newton-Padé approximation [17], Vector Padé approximation and multipoint Padé approximation. As the specific problem that this project is faced with is approximation of discrete data rather than functions (or their power series expansions), we have found that the field of Padé approximation is not directly applicable to the project. It has been mentioned due to the fact that makes a large contribution to the area of rational approximation as a whole. 2.6 Appel’s algorithm We consider the specific problem of approximating data exhibiting exponential decay as described in section 1.8.6. A suitable approximation form for this purpose is a rational approximation R(x) of the form R(x) = s(x) , (Qm (q, x))r (2.21) where s(x) is a specified fixed function, Qm (q, x) is a linear denominator function as defined in (1.25), r ∈ Z is suitably chosen to model decay effectively. This form can be fitted to a discrete data set with a simple one step algorithm due to Appel [3] which we now describe. At each data point x we have an error component e(x) given 38 by f (x) = R(q, x) + e(x). (2.22) Substitution of (2.21) into this equation gives Qm (q, x) = s(x) f (x) − e(x) 1r = s(x) f (x) r1 − r1 e(x) 1− . f (x) (2.23) Taking a Taylor expansion of the second term of the right hand side of (2.23) and ignoring quadratic and higher order terms gives Qm (q, x) ≈ s(x) f (x) 1r e(x) 1+ , rf (x) (2.24) and so G(x)Q(q, x) − rf (x) ≈ e(x), where f (x) G(x) = rf (x) s(x) 1r . (2.25) (2.26) Thus we can expect to obtain (close to best) approximations to the nonlinear problem min kR(q, x) − f (x)k, q (2.27) by solving the linear approximation problem min kG(x)Q(q, x) − rf (x)k. q 2.7 (2.28) Non-Uniform Rational B-Splines Non-uniform rational B-splines (NURBS) are another kind of widely used rational function. These are not only used within the field of approximation, but are very powerful tools for shape representation and are used extensively within the area of computer aided geometric design (CAGD) and within CAD software. A NURBS curve is a parametric curve that is a weighted combination of fixed geometric points called control points multiplied by ratios of B-Spline basis functions. The following 39 formal definition of a NURBS curve is taken from [44]. A degree p NURBS curve C(u) is defined over a parametric interval [a, b] by C(u) = n X Ni,p (u)wi Pi i=0 n X Ni,p (u)wi ,a ≤ u ≤ b (2.29) i=0 where Pi are the set of n + 1 control points, wi > 0 are the weights associated with each control point, and Ni,p (u) are the degree p B-Spline basis functions defined on the knot vector U = a, . . . , a, up+1, . . . , um−p−1 , b, . . . , b. | {z } | {z } p+1 p+1 The interval [a, b] is commonly assumed to be [0, 1], and the curve is often defined in the form C(u) = n X Ri,p (u)Pi, (2.30) i=1 where Ri,p (u) = Ni,p (u)wi n X , (2.31) Ni,p (u)wi i=0 are called the rational basis functions. The number of control points (n + 1), the degree p, and the number of knots (m + 1) are related by m = n + p + 1. (2.32) Each weight wi is associated with control point Pi and describes the affinity that the curve has for that control point. If the weight of a control point is increased, then the effect is that the curve will be more strongly attracted to its control point. This is illustrated in figure 2.1, where the curve labelled NURB2 has a weight of 5 times that of curve NURB1 associated with the second control point. This also illustrates why NURBS are a powerful tool for designers and users of CAD software, as control points and weights can be manipulated until the desired curve or surface shape is obtained. 40 2 1.8 1.6 1.4 1.2 1 0.8 Control point NURB1 Knots NURB2 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Figure 2.1: The effect of increasing a control point’s weight. The theory of NURBS also has links with projective geometry [21]. NURBS have not been directly utilised in this thesis, but a study of NURBS motivated the work of chapter 3. 2.8 Classical non-linear optimization techniques The solve rational approximation problems described previously, we need to find a set of parameters that minimise a multi-dimensional non-linear error surface. This is a typical example of the kind of problems that can solved with the use of classical optimisation methods. A general optimisation problem requires the minimisation of a function f : Rn → R, called the objective function, with respect to a set of parameters a ∈ Rn . An unconstrained optimisation problem is one where there are no restrictions on the values of the parameter vector a. If we are required to minimise f (a) subject to the restriction that a ∈ Ω, where Ω ⊂ Rn , then the problem is referred to as a constrained optimisation problem, and the set Ω is termed the feasible set or constraint set. In this thesis we will mainly 41 be concerned with unconstrained approximation problems, although we will look at constrained problems when we wish to fit rational approximations that are without poles in the approximation range. There are a large number of optimisation methods available, with many being subtle variants of others [50], [15]. We now describe some of the more commonly used optimisation methods that we will use in later chapters. The descriptions for the Newton, Gauss-Newton, and Levenberg-Marquardt methods presented here are summarized versions of the explanations given in [15], using a very similar notation. 2.8.1 Newton’s method We present a summarized version of the description of Newton’s method as given in [15]. Suppose we are given a n-dimensional objective function h(a) which we wish to minimise with respect to the parameters a = (a1 , . . . , an ). Provided that h(a) has continuous first and second derivatives, we can obtain a Taylor series expansion of h about an arbitrary point a(k) . Neglecting terms of order three and above, this expansion about a(k) will be denoted by q(a) and is given by 1 q(a) = h(ak ) + (a − ak )T ∇h(ak ) + (a − ak )T H(ak )(a − ak ), 2 (2.33) where H(ak ) is the Hessian matrix of h at ak defined by its i, jth element Hij (a(k) ) = ∂2h (a(k) ), ∂ai ∂aj (2.34) and ∇h(a(k) ) ∈ Rn is the gradient vector having ith component (∇h(a(k) ))i = ∂h . ∂ai (2.35) The function q(a) provides a quadratic approximation to the objective function h in the neighbourhood of a(k) and has the same first and second derivatives as h at this point. The principle behind the Newton method is to then minimize the approximation q(a) instead of the objective function itself, and then use this minimum 42 as a new point at which to construct another Taylor approximation. The process then continues iteratively until convergence of the a(k) occurs (although convergence is not guaranteed), at which point a minimum of h has been obtained. The function q(a) is quadratic and has a unique stationary point at the value astat that satisfies 0 = ∇q(astat ). (2.36) ∇q(a) = ∇h(a(k) ) + H(a(k) )(a − a(k) ), (2.37) The function ∇q(a) is given by and will be equal to zero at the point astat given by astat = a(k) − H(a(k) )−1 ∇h(a(k) ) (2.38) This will be a minimum provided that the Hessian matrix at a(k) is positive definite. Since this minimum forms the starting point of the next iteration, the process can be defined recursively as a(k+1) = a(k) + ∆a(k) , (2.39) where ∆a(k) is given by the solution to H(a(k) )∆a(k) = −∇h(a(k) ), (2.40) and is referred to as the update parameter at iteration k. The Newton method works well provided that the Hessian is positive definite. However, even if this is the case convergence cannot always be guaranteed unless the start point a(0) is reasonably close to the true minimum of the objective function h. In spite of this, however, Newton’s method is a popular method as it is proven to converge quadratically [15] when implemented with a choice of start parameter close to the solution. Despite the quadratic convergence, the Newton method can be computationally expensive due to the calculation of the Hessian matrix. 43 2.8.2 The Gauss-Newton method When applying Newton’s method to a nonlinear least-squares problem, the objective function we are trying to minimize is of the form h(a) = m X (el (a))2 = e(a)T e(a), (2.41) l=1 where e(a) ∈ Rm is the vector of approximation errors or residuals as defined in (1.4). Application of Newton’s method to solve this problem requires the calculation of the Hessian and the gradient of the objective function h. The gradient vector ∇h(a) (2.35) for this problem can be expressed as ∇h(a) = 2J(a)T e(a), (2.42) where J(a) represents the Jacobian matrix of e evaluated at a and is defined by its i, jth element J(a)ij = ∂ei (a). ∂aj (2.43) The i, jth component of the Hessian matrix of h evaluated at a is given by ∂2h (a) ∂ai ∂aj X m ∂el ∂ 2 el (a) (a) = ∂ai ∂aj l=1 m X ∂el ∂el ∂ 2 el = 2 (a) (a) + el (a) (a) . ∂a ∂a ∂a i j i ∂aj l=1 Hij (a) = (2.44) (2.45) (2.46) The first term on the right hand side of the last line of equation (2.44) can be seen to be the i, jth element of the matrix 2J(a)T J(a), and so we can write the Hessian matrix as H(a) = 2(J(a)T J(a) + S(a)), (2.47) where S(a) is the matrix with i, jth element Sij (a) = ei (a) ∂ 2 ei (a). ∂ai ∂aj (2.48) 44 With the Hessian and gradient calculated, the application of Newton’s method (2.39) to the problem (2.41) is given by a(k+1) = a(k) + ∆a(k) , (2.49) where the update parameter is given by the solution to (J(a(k) )T J(a(k) ) + S(a(k) ))∆a(k) = −2J(a(k) )T e(a(k) ). (2.50) When the objective function h has low curvature around a(k) , its second partial derivatives that form the elements of the matrix S(a) are often very small and so can be ignored. When the matrix S(a) is omitted from the calculation of the update parameter, equation (2.50) reduces to J(a(k) )T J(a(k) )∆a(k) = −2J(a(k) )T e(a(k) ), (2.51) which are the normal equations for the solution to the overdetermined system J(a(k) )∆a(k) = −e(a(k) ). (2.52) The method described above is referred to as the Gauss-Newton method, and is a popular choice of algorithm for solving nonlinear least-squares problems. A nice feature of the Gauss-Newton method is that it only requires knowledge of the first derivatives of the residual vector e(a), thus avoiding the potentially expensive calculation of the Hessian. The method works well when the objective function h(a) has low curvature (and hence small second derivatives), and as a result the matrix J(a)T J(a) is a good approximation to the Hessian matrix. When the objective function exhibits high curvature, this approximation is less good and the Gauss-Newton method may fail to converge. In this case better results may be obtained using Newton’s method. When it does converge, the order of convergence of the Gauss-Newton method is linear. As with the Newton method, the Gauss-Newton method is not guaranteed to work well for choices of start parameter that are not close to the true solution. In regions of 45 high curvature, it is not guaranteed to converge, even if it is arbitrarily close to a local minimum. An important point for consideration is the positive definiteness of the matrix J(a)T J(a). If this is not the case then even if the algorithm does converge it may not converge to a local minimum. This is also true of the Newton method if the Hessian is not positive definite. This problem can be overcome with the use of the Levenberg-Marquardt algorithm which is described below. 2.8.3 The Levenberg-Marquardt algorithm Application of either the Newton or Gauss-Newton methods generates at each iteration k an update parameter ∆a(k) from which we obtain the next parameter a(k+1) = a(k) + ∆a(k) . When close to the minimum, the search direction given by d(k) = a(k+1) − a(k) , is such that the objective function decreases at each iteration f (a(k+1) ) < f (a(k) ), and when this is the case, the search direction is said to point in a descent direction. When the matrix J(a)T J(a) is not positive definite, the search direction is not guaranteed to point in a descent direction. To ensure a descent direction with each iteration it is possible to calculate the update parameter from the equation (J(a(k) )T J(a(k) ) + µk I)∆a(k) = −J(a(k) )T e(a(k) ), (2.53) where the parameter µk is chosen to be such that the matrix Ak = J(a(k) )T J(a(k) ) + µk I, (2.54) 46 is now positive definite. We can be certain of the positive definiteness of Ak for sufficiently large µk for the following reason. We will denote the set of eigenvalues of J T J by λ1 , . . . , λm , which may or may not be distinct. By definition, if J T J is not positive definite, then at least one of these eigenvectors is not positive. The eigenvalues of the matrix Ak will be λ1 + µ k , . . . , λm + µ k , due to the fact that Ak ci = (J T J + µk I)ci = J T Jci + µk Ici = λ i ci + µ k ci = (λi + µk )ci , where the vector ci is the eigenvector with corresponding eigenvalue λi . Thus if the value of µk is large enough (larger than the smallest eigenvalue λi ), the eigenvalues of Ak will be positive and therefore Ak will be positive definite. There are a number of heuristic algorithms for choosing the value of µk . Usually the value is increased whenever the value of the error increases at a particular iteration. In such cases, the value of µk is increased by a factor until there is a decrease in the error. Similarly, when the error is decreasing steadily, then the value of µk is decreased at each iteration. A method similar to this was originally proposed by Marquardt [36]. The method described above is referred to as the Levenberg-Marquardt algorithm and is described in greater detail in [41]. A similar approach may be used for the Newton method to ensure positive definiteness of the Hessian by addition of a multiple of the identity matrix. The Levenberg-Marquardt method is more robust than the GaussNewton method and is not as reliant on a good choice of start parameter in order 47 for convergence to occur. In addition, there are some variations of the LevenbergMarquardt algorithm that have been shown to be globally convergent [42]. Application of the Gauss-Newton method to generalised rational approximation We have defined a generalised rational function in section 1.7 as Pn pi φi (x) Pn (p, x) n = Pmi=0 . Rm (p, q, x) = Qm (q, x) j=0 qj ψj (x) (2.55) and the rational approximation residual vector e(p, q) as the column vector that has ith element n ei (p, q) = fi − Rm (p, q, xi ). (2.56) Evaluation of the partial derivatives of e with respect to the approximation parameters then gives required elements of the Jacobian matrix (2.43) and these are given by ∂ei ∂pj φj (xi ) Qm (q, xi ) = j = 0, . . . , n (2.57) ∂ei = ψ (x ) Pn (p, xi ) k = 0, . . . , m. k i ∂qk (Qm (q, xi ))2 If we define the vector of approximation parameter estimates at iteration k as (k) (k) (k) T a(k) = (p0 , . . . , p(k) n , q0 , . . . , qm ) , (2.58) then we can define the Jacobian matrix J ∈ RN ×(m+n+2) as the matrix with i, jth element Jij = φj−1 (xi ) Qm (q, xi ) j = 1, . . . , n + 1, (2.59) Jij = ψj−(n+2) (xi ) Pn (p, xi ) j = n + 2, . . . , m + n + 2. (Qm (q, xi ))2 With the Jacobian matrix given, we can now apply the Gauss-Newton method as described in section 2.8.2. 48 2.9 Summary In this section we have presented some of the major areas of rational approximation, some of which are utilised in the rest of the thesis. Some of the described methods are not used anywhere in this thesis as they were found not to be directly applicable to the specific problem addressed by the project. These methods are included here as they were researched initially to assess suitability for the project, and also because they are major components of the vast subject of rational approximation. The Loeb algorithm and the nonlinear optimisation methods have been described in more detail as they are used extensively in the future chapters. 49 Chapter 3 EXTENSIONS OF THE LOEB ALGORITHM 3.1 Introduction In this section we describe some modifications to the basic Loeb algorithm (2.2) that provide good rational approximations that have been forced to share the asymptotic behaviour exhibited by the data being approximated. In addition, we have found that this type of approximation can have good extrapolation properties. A slightly modified version of Loeb’s algorithm is applied to fit rational functions with a constrained asymptotic limit, and we compare the quality of the resulting approximations to those obtained with the standard Loeb algorithm. We also introduce the semi-infinite rational spline, which is a new rational form that is capable of having different asymptotic limits as x → +∞ and x → −∞, and we show how to fit this form to discrete data with the use of the Loeb algorithm. 3.2 Asymptotically constrained approximation using Loeb’s algorithm In this section we look at the problem of approximating a set of discrete data that is known to come from a function that has a specified asymptotic limit. We consider 2 the general problem of approximating a set of data {(xi , fi )}m i=1 ⊂ R , on an positive interval [α, β] where the fi are assumed to be sampled from an unknown function f (x) that has the asymptotic limit f (x) =µ x→+∞ xγ lim (3.1) 50 where µ ∈ R and γ ∈ Z+ are coefficients that are known prior to approximation. Our aim is to use an approximation form that is forced to share the same asymptotic limit (3.1) as the function f (x), and investigate whether this provides a better approximation than when we impose no constraints the approximant. This general problem of approximating limits of the form (3.1), was described in sections (1.8.7),(1.8.5) and (1.8.6), for the specific case of polynomial ratios, and in these sections, suitable parameter constraints for fixing their asymptotic limits were derived. In this section we apply these constraints and fit polynomials ratios to the data with the use of the Loeb algorithm. We define our degree (n, m) polynomial ratio approximation form by n Rm (p, q, x) = n X pi xi i=0 m1 X 1+ , (3.2) i qi x i=1 where we have used the parameter normalisation condition q0 = 1. We recall from Chapter 1 that a polynomial ratio will have the asymptotic limit (3.1), provided that the following constraints are satisfied n − m = γ, pn = qm µ. (3.3) (3.4) For the case where the limit (3.1) is defined with µ = 0 (the case of modelling asymptotic decay to zero), then the only constraint we require for a suitable polynomial ratio approximant is n < m. n Given a polynomial ratio Rm (x) that has degrees m, n chosen to satisfy (3.3), we can n enforce the parameter constraint (3.4) directly within the definition of Rm (x) by subn stituting pn with qm γ. Our approximating function Rm (x) is now explicitly defined 51 by n Rm (p, q, x) = n−1 X pi xi + µqm xn i=0 1+ m X . (3.5) qi xi i=1 Although this results in the loss of one free approximation parameter, the function n Rm (p, q, x) now has exactly the same limiting behaviour as the function we are ap- proximating. 3.2.1 Adequate representation of asymptotic behaviour by the data We next need to consider whether or not the data sufficiently represents the asymptotic behaviour specified in (3.1). We will assume that the interval of approximation [α, β] contains data from the region where the function f (x) actually starts to exhibit the asymptotic behaviour we are trying to model. If this is not the case, we would have no justifiable reason to expect that an approximant with the same limit as f (x) would provide a better choice of form than any other arbitrary form. To illustrate this situation we look at figures 3.1 and 3.2 which show the same function f (x) = tanh(x) on two different intervals. The interval in figure 3.1 is not an interval where the functions asymptotic behaviour starts to take effect, unlike the one shown in 3.2, where it can clearly be seen that the function values start to approach the asymptotic limit as x → ∞. 3.2.2 Approximation of the data Assuming that we have fixed the parameters of the approximant R(x) as described previously, we can now apply the Loeb algorithm and fit R(x) to the data. The Loeb algorithm is described in detail in section 2.2, but for this constrained problem, we need to modify the observation matrix slightly. We recall that for the standard Loeb algorithm we need to minimise the quantity (2.6), which is done by solving the least- 52 1 0.8 0.6 y = tanh(x) 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −0.5 0 0.5 1 1.5 x Figure 3.1: y = tanh(x) on the interval [−1, 1.5] 1.2 1 0.8 0.6 y = tanh(x) 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 x Figure 3.2: y = tanh(x) on the interval [−1, 4] squares system (2.7). For the constrained problem described here, with the rational 53 form (3.5), the ℓ2 error norm at iteration k is given by N X i=1 (k) wi fi + fi m−1 X (k) qj xji + (fi xm i j=1 − (k) µxni )qm − n−1 X (k) pj xji j=0 2 , (3.6) with weight as defined in (2.5). We can then represent this weighted linear leastsquares problem at iteration k in matrix form by W (k) f = W Cd(k) , (3.7) where the observation matrix C ∈ RN ×(m+n) is defined by its i, jth element xij−1 j = 1, . . . , n (j−n) Cij = −fi xi j = n + 1, . . . , m + n − 1 . µxj−m − f xj−n j = n + m i i i The weighting matrix W (k) ∈ RN ×N is defined in equation (2.10), and (k) (k) (k) T m+n d(k) = (p0 , . . . , pn−1, q1 , . . . , p(k) m ) ∈ R is the vector of approximation parameters, obtained by solving the normal equations for (3.7), or using QR factorisation. 3.2.3 Numerical results We now present some results obtained from application of the asymptotically constrained Loeb method to some sample data sets. Example 1. Sigmoid type functions We took 51 uniformly spaced abscissae xi on the interval [−1, 4] at which we evaluated function values fi = tanh(xi ). These were approximated using a polynomial ratio of degree (n, m). To implement the constrained Loeb algorithm for this problem, we require the parameter constraints n = m, pn = qm , 54 to ensure that the approximant has the same asymptotic limit as the function tanh(x). We then applied this algorithm to the data with a choice of degrees m = n = 4, and assumed that convergence at iteration k when kd(k) − d(k−1) k∞ < 10−10 . (3.8) We then applied the standard Loeb algorithm to the same data using the same (4, 4) polynomial ratio. A comparison of both algorithms is made in Table 3.1, and plots of the approximation errors are shown in Figure 3.3, where e represents the residual vector of the approximation obtained at convergence. Table 3.1: Comparison of results from constrained and unconstrained Loeb algorithms for approximation of tanh(x). Algorithm Iterations kek2 Cond(C) Unconstrained 7 5.6029 × 10−5 1.4077 × 105 Constrained 10 3.6370 × 10−4 5.8904 × 103 Example 2. Approximation of f (x), where f (x) → kx as x → ∞ We took the same set of abscissae values as in the previous example, and evaluated the set of function values fi = f (xi ) this time using the function f (x) = x . 2(1 + e−x ) This function behaves like 12 x as x → ∞, and so for a degree (n, m) polynomial ratio approximation to behave similarly we need to set n = m − 1, pn = 1 q . 2 m 55 −4 1.5 x 10 Constrained Loeb Unconstrained Loeb 1 Approximation Error 0.5 0 −0.5 −1 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 x Figure 3.3: Error of degree (4, 4) polynomial ratio approximations to tanh(x) obtained with constrained and unconstrained Loeb algorithms As before, we applied both the constrained and the unconstrained Loeb algorithms to this data using a degree (5, 4) polynomial ratio. The results for both of the resulting approximations are shown in Table 3.2, and a plot of the different error curves is displayed in figure 3.4. −7 6 x 10 4 Approximation error 2 0 −2 −4 −6 Constrained Loeb Unconstrained Loeb −8 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 x Figure 3.4: Approximation error of degree (5, 4) polynomial ratios fitted using constrained and unconstrained Loeb algorithms 56 Table 3.2: Comparison of results from constrained and unconstrained Loeb algorithms for approximation of 0.5x(1 + e−x )−1 . Algorithm Iterations kek2 Cond(C) Unconstrained 6 1.7162 × 10−7 4.4525 × 107 Constrained 7 2.4483 × 10−6 8.8900 × 105 3.2.4 Extrapolation of the data We can see from the previous examples that the unconstrained algorithm provides a better approximation than the constrained version, which we might naturally expect as the constrained form loses one free parameter in order to satisfy (3.4). We also note that the condition number of the observation matrix for the constrained version is significantly smaller, which we would again expect to be the case through having one less column. The constrained algorithm was specifically applied in an attempt to improve approximations on the approximation interval itself, but our results show that the algorithm is unsuccessful for this purpose. However, despite the apparent lack of improvement on the approximation interval itself, we did make an interesting observation on the behaviour of the approximations outside the interval. We found that the approximation obtained using the constrained version is far superior to that of the standard Loeb method when it comes to extrapolation outside the interval (in the direction of the asymptotic limit we have modelled). This is illustrated in figure 3.5 which shows the extrapolation error of the approximations obtained from both algorithms, plotted on an interval ten times larger than the approximation interval. We obtain a similar improvement in the extrapolation error for the second example 3.2.3, however, it is not as good as for the constant limit asymptote. A proposed reason for this is functions like that in the second example, will generally have an 57 0.02 0 Approximation error −0.02 −0.04 −0.06 −0.08 −0.1 Unconstrained approximation Constrained approximation −0.12 −0.14 0 5 10 15 20 25 30 35 40 x Figure 3.5: Extrapolation error of degree (4, 4) polynomial ratio approximations to tanh(x) asymptote of the form y = a + bx, where a and b are real constants. Our algorithm constrains the asymptotic behaviour, and this fixes the gradient b of the asymptote (3.2.4), but we have not specified any constraint that will provide the correct value of the intercept a. In this case we would expect our extrapolation error to converge to the value of the intercept a. This is illustrated in figure 3.6 which shows the extrapolation error of both approximations over a much larger interval. With the constant asymptote example, we do not have the same problem, and so our constrained rational form has exactly the same limit as the function being approximated. When comparing the differences in asymptotic limits between the approximations obtained from the two algorithms, we have always observed that the extrapolation residuals from one algorithm are opposite in sign to those of the other, although we are unclear as to exactly why this is so and cannot provide a theoretical reason for it. In summary, we have found that the constrained algorithm yields a better conditioned observation matrix, and generates good approximations that have much smaller ex- 58 0.4 0.2 0 Approximation error −0.2 −0.4 −0.6 −0.8 −1 −1.2 Unconstrained approximation Constrained approximation −1.4 −1.6 0 5 10 15 20 25 30 35 40 x Figure 3.6: Extrapolation error of degree (5, 4) polynomial ratio approximations to 0.5x(1 + e−x )−1 trapolation errors than the unconstrained approximations. However, it has also been found that there is no notable improvement in approximation error on the approximation interval itself, and also in general it takes a slightly larger number of iterations to converge than the standard algorithm does. 3.3 Semi-infinite rational splines We now introduce a new approximation form, motivated by the need to approximate double sided asymptotic limits as discussed in section 1.8.8. Consider the approximation problem described in section 1.8.8 where we wish to approximate a set of 2 data {(xi , fi )}m i=1 ⊂ R , where the fi come from a function f (x) having two specified asymptotic limits f (x) = µ1 x→−∞ xα1 f (x) = µ2 lim x→+∞ xα2 lim (3.9) (3.10) where µ1 , µ2 ∈ R and α1 , α2 ∈ N. As in the previous section we wish to approximate this data with an approximant 59 that has the same asymptotic limits (3.9),(3.10). A polynomial ratio with suitably numerator and denominator degrees can be forced to share one of these asymptotic limits, but it will rarely be able to have both. The exceptions will be for cases where both asymptotic limits are zero (µ1 = µ2 = 0), or are constants of equal value (µ1 = µ2 6= 0, α1 = α2 = 0). To allow the two different asymptotic limits (3.9),(3.10) to be modelled simultaneously, we consider the following piecewise rational function defined by n X pi xi P (p, x) = R− (p, b, x) for x ≤ 0 = i=0m1 X B(b, x) i bi x 1+ n i=1 Rm (p, b, c, x) = n 1 ,m2 X pi xi P (p, x) = i=0m2 = R+ (p, c, x) for x ≥ 0 X C(c, x) i 1+ ci x (3.11) i=1 where p ∈ R(n+1) , b ∈ Rm1 , c ∈ Rm2 are vectors of approximation parameters defined in the usual way. With an appropriate choice of n, m1 , m2 , (3.11) provides a function that can potentially model both of the asymptotic limits we require. The piecewise n (p, b, c, x) is rational functions (3.11) are defined in a way that ensures that Rm 1 ,m2 C 0 continuous across the value x = 0 which behaves like a knot does in a spline function. We will refer to functions of the form (3.11) as semi-infinite rational splines (SIRS). We mention at this point that we could have chosen to define the SIRS with a fixed denominator, and variable numerator. The reason for not doing so is that by considering two denominator functions, we have more chance that our rational approximation will have no poles. That is to say that it is possible for the function R− (p, b, x) to have real poles (if there are any) for x > 0 and R+ (p, c, x) to have real poles for x < 0. In such cases, the resulting SIRS approximation will have no poles anywhere on the real line. It is important to mention also that by the same argument, it is possible that we are allowing twice as many poles to be present in 60 the approximation by considering two seperate denominators, but we feel that the variable denominator approach is favourable because of the potential for pole free approximations to be fitted. 3.3.1 Continuity conditions at the knot In an analogous way to splines, we now look at the conditions required for a SIRS to be continuous at the knot at x = 0. As a consequence of its definition, the SIRS already has C 0 continuity at x = 0 as n Rm (p, b, c, 0) = p0 , 1 ,m2 (3.12) for all possible values of q, b, c. To obtain conditions for C 1 continuity, we first evaluate the derivatives of R− (x) and R+ (x) with respect to x, and these are given by P (x) ′ P ′ (x) − B (x), = B(x) B(x)2 P (x) ′ P ′ (x) ′ − C (x). R+ (x) = C(x) C(x)2 ′ R− (x) (3.13) (3.14) Evaluating these derivatives at the knot gives ′ R− (0) = p1 − p0 b1 , (3.15) ′ R+ (0) = p1 − p0 c1 , (3.16) which are equal (for non-zero p0 ) provided that b1 = c1 , (3.17) and so we have C 1 continuity at x = 0 if this constraint is satisfied. In the special case of p0 = 0, the SIRS is C 1 continuous for all possible choices of b, c. Differentiating the functions (3.13),(3.14) again gives the second derivatives as ′′ R− (x) = P ′′ (x) P ′ (x) ′ P (x) ′′ − B (x) − B (x) 2 B(x) B(x) B(x)2 61 2P (x) ′ P ′ (x) − B (x) − B (x) B(x)2 B(x)3 P ′′ (x) P ′ (x) ′ P (x) ′′ ′′ R+ (x) = − C (x) − C (x) 2 C(x) C(x) C(x)2 ′ P (x) 2P (x) ′ ′ −C (x) − C (x) , C(x)2 C(x)3 ′ (3.18) (3.19) with values at the knot given by ′′ R− (0) = p2 − p1 b1 − p0 b2 − b1 (p1 − 2p0 b1 ), (3.20) ′′ R+ (0) = p2 − p1 c1 − p0 c2 − c1 (p1 − 2p0 c1 ). (3.21) If we assume that the conditions (3.17) for C 1 continuity hold, the second derivatives at the knot will be equal (for non-zero p0 ) provided that b2 = c2 , (3.22) and hence the SIRS will be C 2 continuous at x = 0 if this constraint is satisfied. We could go on to obtain higher order continuity conditions, but will restrict ourselves to the study of SIRS that are C 2 continuous at the knot, as we feel it is sufficient for our requirements. 3.3.2 Satisfying the continuity requirements In a similar manner to that of section 3.2, we can explicitly satisfy the continuity constraints (3.17) and (3.22) by substituting the parameters c1 and c2 with b1 and b2 respectively. We now redefine the SIRS accordingly as n X pi xi i=0 = R− (x) for x ≤ 0 m1 X i 1+ bi x n i=1 , Rm (d, x) = n 1 ,m2 X i pi x i=0 = R+ (x) for x ≥ 0 m2 X 2 i ci x 1 + b1 x + b2 x + i=3 (3.23) 62 where d = (p0 , . . . , pn , b1 , . . . , bm1 , c3 , . . . , cm2 )T is the vector of combined approximation parameters. Although we have lost two approximation parameters c1 and c2 , the function (3.23) is now C 2 continuous at x = 0 for all values d, and is capable of modelling two different asymptotic limits as x → +∞ and x → −∞. Up to this point we have considered the knot at x = 0 to be fixed, the reason being that the monomial basis functions xi evaluated at this point are zero (i 6= 0) which results in the very simple continuity constraints we have derived. Had we used different set of basis functions, such as Chebyshev polynomials, these constraints would be significantly more complicated, since for an even integer n, the nth Chebyshev basis function Tn (0) 6= 0. However, the use of monomial basis functions is likely to result in numerical instability for approximation intervals far away from this knot at x = 0. Another reason for consideration of a variable knot value is that the SIRS reduces to a standard polynomial ratio if the knot value itself does not lie in the interval of approximation. In order to avoid these problems, and in an attempt to provide more flexibility, we consider approximating with a SIRS defined as a function of the transformed variable u = x − λ. (3.24) In this way we have obtained a new shifted SIRS with knot at x = λ, (u = 0), defined n as a polynomial ratio in powers of u rather than x. Rm (p, b, c, u) then provides us 1 ,m2 with an approximation form that has exactly the same constraints for continuity at n the knot, as those derived for Rm (p, b, c, x). We now describe an effective method 1 ,m2 of fitting the SIRS to a discrete set of data. 63 3.3.3 Least-squares SIRS approximation using Loeb’s algorithm Now that we have defined a new approximation form (3.23) that has the required level of continuity and asymptotic limits, we need a method of fitting it to data. Due to its ease of implementation, we consider fitting least-squares SIRS approximations using the Loeb algorithm (2.2). One of the reasons for this choice is that it can be easily applied to the SIRS due to the way that the required continuity conditions are satisfied immediately from its definition (3.23). The Loeb algorithm has already been discussed and we now describe how to implement it for the SIRS. We are approximating the 2 data set {(xi , fi )}m i=1 ⊂ R , and firstly assume that the data points have been ordered with respect to ascending values of the abscissae xi . To begin with, we choose the knot λ to be the midpoint of the approximation interval, and define the integer t to be the largest integer for which ut ≤ 0, where ut = xt − λ. Alternatively we could take λ to be the median value of the abscissae xi in the case of non-uniformly spaced abscissae. It seems sensible to ensure that we have an approximately equal number of abscissae on either side of the knot, as we are approximating simultaneously with R− (x) and R+ (x). If this is the case we will approximate roughly half the data with R− (x) and half with R+ (x), and so this should prevent either one of these functions from dominating the approximation. We will discuss other factors to bear in mind when choosing the knot in a later section. As we are approximating with R− (x) and R+ (x) simultaneously, the observation matrix A, needed to implement Loeb’s algorithm, is defined as having i, jth element 64 Aij given by Aij = uij−1 −fi uj−n−1 i −fi uj−n−1 i 0 0 j−(n+m1 +1) −fi ui i = 1, . . . , N, j = 1, . . . , n + 1 i = 1, . . . , N, j = n + 2, n + 3 i = 1, . . . , t, j = n + 3, . . . , n + m + 1 1 i = 1, . . . , t, j = n + m + 2, . . . , n + m + m − 1 1 1 2 i = t + 1, . . . , N, j = n + 3, . . . , n + m + 1 1 i = t + 1, . . . , N, j = n + m + 2, . . . , n + m + m − 1 1 1 (3.25) 2 Once the observation matrix has been formed, we now require the Loeb weight vector. This weight vector at iteration k will be denoted by w(k) ∈ RN , and is defined as the vector having ith element wik = wik = 1 B(b(k) , u i) 1 C(c(k) , u i) for i = 1, . . . , t, (3.26) for i = t + 1, . . . , N (3.27) where b(k) , c(k) are the denominator parameters obtained at iteration k. Given the weight vector w(k) , the modified SIRS Loeb algorithm proceeds by finding at the (k + 1)th iteration the least-squares solution vector d(k+1) of the equation k k D w Ad(k+1) = D w f (3.28) k where the matrix D w ∈ RN ×N is the diagonal matrix with elements k Diiw = wik , i = 1, . . . , N. (3.29) 65 The algorithm is initialised by setting D 0 as the N × N identity matrix, and then applied until the solution vectors d(k+1) converge. If the knot is coincident with a data point then we have to approximate using both spline functions at the knot. This leads to the same abscissa value xi = λ having 2 rows in the observation matrix. If we do this with for standard linear least squares methods, then the algorithm breaks down as the observation matrix is not of full rank. We can do this with the SIRS as we are fitting the same point twice but with different sets of approximation parameters. We now present some results from the application of this algorithm. Example 3 We will consider the approximation of the function f (x) = 2 + tanh(x) on the interval [−4, 4]. This is exactly the type of function suitable for SIRS approximation as it has asymptotic limits lim f (x) = 1, (3.30) lim f (x) = 3. (3.31) x→−∞ x→+∞ We chose the abscissae vector x to consist of 50 uniformly spaced points on the interval [−4, 4], with corresponding function values fi = 2 + tanh(xi ). We applied the Loeb algorithm described previously to fit a degree (5,5) SIRS to this data set, with a choice of knot λ = 0. We also fitted a standard degree (5,5) polynomial ratio to the data and used the same convergence criteria (3.8) for both methods. We compare the resulting approximations in Table 3.3 and Figure 3.7. We can clearly see that we get fast convergence to good approximations using both forms. It also appears that the SIRS provides a superior approximation at the ends of the interval, but performs less well in the vicinity of the knot. This is highly likely to be due to the fact that the SIRS has only C 2 continuity at the knot. Also, because the SIRS has been constrained at the knot to ensure this level of continuity, the approximation is less flexible in this region. 66 Table 3.3: Comparison of SIRS (λ = 0) and polynomial ratio approximations to f (x) fitted with Loeb’s algorithm. Approximation Iterations kek2 Cond(C) Polynomial ratio 6 5.3800 × 10−5 1.7366 × 106 SIRS 7 1.0226 × 10−4 5.6611 × 106 −5 4 x 10 Polynomial ratio SIRS 3 2 Approximation error 1 0 −1 −2 −3 −4 −5 −6 −4 −3 −2 −1 0 1 2 3 4 x Figure 3.7: Error curves from degree (5,5) polynomial ratio and SIRS approximations to f (x) = 2 + tan(x). 3.3.4 Changing the position of the knot λ For the data given in the previous example, the approximation obtained after convergence did not contain any unwanted poles in the interval of approximation, but this is not always the case for a given choice of λ. By changing the value of the knot, we can obtain approximations that are pole-free on the range of interest as we show with the following example. In addition, should the Loeb algorithm fail to converge for a particular knot value we can modify the knot slightly and achieve convergence as the next example illustrates. 67 Example 4. We now approximate f (x) = tanh(x) on the same interval and number of points as in the previous example. Again we choose degree 5 numerator and denominator, and again we fix the knot at λ = 0. Using the same convergence criteria as for the previous example, the algorithm went through 250 iterations and still failed to converge, due to the oscillatory behaviour in the parameters as was described in section 2.2.2. However, approximation of the data with a choice of knot λ = 0.5, resulted in convergence after 21 iterations, to a very good approximation. Table 3.4 shows how the speed of convergence and quality of approximation varies according to the value of the knot for this particular problem. Table 3.4: The effect of variation of λ on the SIRS approximation. 3.4 λ Iterations kek2 Cond(C) 0.0 Failed - - 0.2 Failed - - 0.4 63 1.0396 × 10−4 1.7895 × 106 0.6 15 0.8 13 4.7651 × 10−5 4.1835 × 106 1.2 13 1.5 24 7.1490 × 10−5 7.5587 × 10−5 8.5695 × 10−5 2.3837 × 106 2.0446 × 106 4.8264 × 107 Variable numerator SIRS We now consider another form of SIRS that incorporates a separate numerator for each rational spline, providing the approximation form with greater flexibility. We 68 define this new approximation form as n1 X gi u i G(g, u) = R− (g, b, u) for u ≤ 0 = i=0m1 X B(b, u) i 1+ bi u n1 ,n2 i=1 , Rm (g, h, b, c, u) = n2 1 ,m2 X i hi u H(h, u) i=0 = R+ (h, c, u) for u ≥ 0 = m2 X C(c, u) i 1+ ci u i=1 (3.32) where u = x − λ as before. This type of SIRS is likely to provide greater flexibility due to the extra parameters we have introduced. The parameter constraints needed to impose C 2 continuity at u = 0 are very similar to those for the single numerator SIRS described previously. It is evident that the function (3.32) will be C 0 continuous at the knot provided that g0 = h0 . (3.33) The first derivatives of each component of (3.32) are given by G′ (u) G(u) ′ = − B (u), B(u) B(u)2 H(u) ′ H ′ (u) ′ − C (u). R+ (u) = C(u) C(u)2 ′ R− (u) (3.34) (3.35) and evaluated at u = 0 take the values ′ R− (0) = g1 − g0 b1 , (3.36) ′ R+ (0) = h1 − h0 c1 . (3.37) Provided that g0 and h0 are non-zero, and assuming the C 0 continuity constraint (3.33) is satisfied, the variable numerator SIRS will be C 1 at the knot if g1 = h1 − g0 (c1 − b1 ). (3.38) 69 A simple set of constraints that can be used to ensure this equation is satisfied is to set g1 = h1 , (3.39) c1 = b1 . (3.40) It is clear that the use of these constraints is more restrictive than (3.38) and that their use will result in us considering only a subset of all possible C 1 SIRS for approximation purposes. However, these constraints are much easier to enforce and once again they can be implemented within the definition of the approximation form itself, allowing us to avoid the use of complex constrained optimisation techniques. For C 2 continuity we need the second derivatives of the SIRS and these are given by G′ (u) ′ G(u) ′′ G′′ (u) − B (u) − B (u) 2 B(u) B(u) B(u)2 ′ G (u) 2G(u) ′ ′ − B (u) − B (u) B(u)2 B(u)3 H(u) ′′ H ′′ (u) H ′ (u) ′ ′′ − C (u) − C (u) R+ (u) = 2 C(u) C(u) C(u)2 ′ H (u) 2H(u) ′ ′ − C (u) , −C (u) C(u)2 C(u)3 ′′ R− (u) = (3.41) (3.42) and at u = 0 take the values ′′ R− (0) = g2 − g1 b1 − g0 b2 − b1 (g1 − 2g0 b1 ), (3.43) ′′ R+ (0) = h2 − h1 c1 − h0 c2 − c1 (h1 − 2h0 c1 ). (3.44) If we assume that the conditions (3.39),(3.33) for C 1 continuity are satisfied, we will have C 2 continuity at u = 0 if g2 = h2 , (3.45) b2 = c2 . (3.46) As mentioned for the C 1 parameters, the set of SIRS that satisfy the constraints (3.33),(3.39) and (3.45) is only a subset of the set of all C 2 continuous SIRS, however, 70 the constraints derived here are attractive due to their simplicity and the ease with which they can be satisfied. As before with the original SIRS, we will now impose C 2 continuity by redefining the SIRS with the constraints explicitly satisfied. The resulting SIRS is now defined as n1 X gi u i i=0 , u≤0 m1 X i 1+ bi u n1 ,n2 i=1 , Rm1 ,m2 (d, u) = n2 X 2 i g0 + g1 u + g2 u + hi u i=3 , u≥0 m2 X 2 i ci u 1 + b1 u + b2 u + (3.47) i=3 where d = {g0 , . . . , gn1 , h3 , . . . , hn2 , b1 , . . . , bm1 , c3 , . . . , cm2 }T , (3.48) is the vector whose elements are the remaining approximation parameters. As before this form can be easily fitted with an appropriate modification of Loeb’s algorithm similar to that shown in section 3.3.3. We now present some results from the application of this method. 3.4.1 Examples of fitting the variable numerator SIRS We have found that a variable numerator SIRS to data using the Loeb algorithm yields an observation matrix that has significantly higher condition number than for the fixed numerator SIRS and standard polynomial ratios. However, we have found that we can usually ensure a condition number of the order 106 or less provided that we keep to fairly low degrees of numerator and denominator. In general, we have found that we get a condition number of 107 or less provided that max(n1 + m1 , n2 + m2 ) ≤ 8. (3.49) 71 This is only a rough estimate based on our experience of repeated applications of the algorithm to approximate data that is defined on intervals that are of a reasonable size (intervals such as [-5,5] or even smaller). Over much larger intervals (such as [-10,10] for example) then we are faced with poorer conditioning. We now present some numerical examples that illustrate the improvement in approximation quality given by variable numerator SIRS compared with standard polynomial ratios. Example 5. The function f (x) = x , 1 + e−x is a good example of a function having two different asymptotic limits, and so we approximated this at 100 equally spaced abscissae on the interval [-4,4], using the SIRS 4,5 R5,4 with initial knot value λ = 0. Table 3.5 shows how the SIRS approximation com- pares with the standard polynomial ratio approximation of degree (5,4) to the same data, and Figure 3.8 compares the approximation errors. This example illustrates 4,5 Table 3.5: Comparison of R5,4 (λ = 0) and degree (5,4) polynomial ratio approximations. Approximation Iterations kek2 Cond(C) Polynomial ratio 4 3.2004 × 10−5 6.8604 × 105 SIRS 6 6.9154 × 10−6 3.0789 × 106 the superior quality of the approximation using the SIRS over the polynomial ratio, even though both approximations have roughly the same convergence and condition number for the observation matrix. 72 −6 10 x 10 SIRS Polynomial Ratio 8 Approximation error 6 4 2 0 −2 −4 −6 −4 −3 −2 −1 0 X 1 2 3 4 4,5 Figure 3.8: Error curves from degree (5,4) polynomial ratio and R5,4 approximations to f (x). Example 6. We approximated f (x) = tanh(x) at 100 equally spaced points on [−5, 5], using a 5,5 . With an initial knot value of λ = 0 degree (5,5) polynomial ratio and the SIRS R5,5 the SIRS converged to a solution in a reasonable number of iterations. The resulting approximation had a comparable quality with the standard polynomial ratio, but had the undesirable side effect of having 2 poles within the approximation interval. We then tried to refit the data using various different values for the knot. The properties of the resulting approximations are shown in Table 3.6. The standard degree (5,5) polynomial ratio had an error norm of 2.9047 × 10−4 , converged in 6 iterations and had better conditioning than any of the SIRS knot values. This example clearly shows that we can obtain convergence for a number of different values of the knot, and that changing this value can result in better approximations, and can also result in the elimination of poles from the range. This is something that cannot be done with the standard Loeb algorithm. If the algorithm fits a pole where it is is not wanted then there is nothing that can be done about it. It may be the case that there are other 73 Table 3.6: The effect of variation of λ on the SIRS approximation. λ Iterations to kek2 Cond(C) convergence Poles present in approximation range 0.0 26 2.0402 × 10−4 2.4556 × 106 Yes 0.5 Failed - - - 1.0 Failed - - - 1.5 13 1.2838 × 10−4 1.2123 × 108 No data sets for which we cannot find a value of λ that gives a pole-free approximation, but this example illustrates the added flexibility that the SIRS has over standard polynomial ratios for approximation purposes. 3.4.2 Optimal choice for λ The previous example illustrates how the value of λ has a huge effect on the properties of the resulting approximation, in particular convergence, approximation quality, and occurrence of poles. It has been shown that it is possible to modify the knot value in an arbitrary manner until a satisfactory approximation has been obtained, but this is a trial and error approach to data fitting. This approach may be sufficient for certain problems, but it would be desirable to find an optimal value for λ, or at least find a method that allows it to be treated as a free approximation parameter. We cannot hope to continue fitting the SIRS to data using Loeb’s algorithm if we wish to use this approach, but by using the Gauss-Newton method we have an algorithm that allows us to treat the knot value as an additional parameter. However, we can still utilise a few steps of the Loeb algorithm for a specific choice of λ to provide a set of initial values for the approximation parameters with which to apply the GaussNewton method. Using the definition (3.47) for the SIRS, the partial derivatives of 74 the ith residual n1 ,n2 ei = fi − Rm (d, ui ), 1 ,m2 (3.50) with respect to each of the approximation parameters are given by uji u ≤ 0, ∂ei B(b, ui) = , ∂dj 0 u≥0 (3.51) for j = 0, . . . , n1 , uji u ≥ 0, ∂ei C(c, ui) = , ∂dj 0 u≤0 (3.52) for j = n1 + 1, . . . , n2 + n1 − 1, k ui G(g, ui) u ≤ 0, ∂ei (B(b, ui ))2 , = ∂dj 0 u≥0 (3.53) for j = n1 + n2 , . . . , n1 + n2 + m1 , k ui H(h, ui2) u ≥ 0, ∂ei (C(c, ui)) = , ∂dj 0 u≤0 (3.54) for j = n1 + n2 + m1 + 1, . . . , n1 + n2 + m1 + m2 − 2. Finally for the knot λ we have G(g, ui)B ′ (b, ui ) − G′ (g, ui)B(b, ui) (B(b, ui))2 u ≤ 0, ∂ei . = ∂λ ′ ′ ui)C(b, ui) H(g, ui)C (b, ui) − H (g, u≥0 2 (C(b, ui )) (3.55) These partial derivatives are used to form the Jacobian matrix which is then used to implement the Gauss-Newton method as described in section 2.8.2. However, care is needed in the application of the Gauss-Newton method, as if at any iteration we end 75 up with the knot λ lying outside the approximation interval, then we will have a rank deficient Jacobian matrix due to repeated columns of zeros. We have applied the full step Gauss-Newton method (2.52) to a variety of functions, (implemented using the Loeb algorithm solution parameters for the initial Gauss-Newton parameters) using a number of different initial values for the knot, and found that the algorithm did not converge in the vast majority of cases. In many cases this was due to the knot being placed outside of the approximation interval as described above. This could be due to high curvature which will result in the matrix J T J being a poor approximation to the Hessian. This is not entirely unexpected as by allowing the knot to become a free parameter, we have introduced a much higher degree of nonlinearity in the approximation form. Alternatively, it was possible that we were unable to find a good enough start parameter for the knot, and so with this in mind, we applied the MATLAB implementation of the Levenberg-Marquardt algorithm, as it will be more likely to converge with a poor choice of start parameters. This was much more successful than the Gauss-Newton method, although convergence in many cases was very slow, on occasion taking more than 200 iterations. However, based on the application of this method to a variety of different problems we would recommend the Levenberg-Marquardt method if it is desired to optimise for the knot. We have also observed that in general, the best choice of start value for the knot parameter is the midpoint of the approximation interval. To illustrate the performance of the fixed knot variable numerator SIRS approximations, we present some numerical results in table 3.7. We compared the SIRS approximations obtained using the Loeb algorithm with fixed knot, with approximations obtained using the Levenberg-Marquardt method optimising for the knot also. We tested a number of functions, in each case using 50 equally spaced abscissae on the approximation interval and using a degree (3,3) polynomial ratio. We use the abbreviation L-M to refer to the Levenberg-Marquardt algorithm in these results. These results clearly show that the SIRS approximation optimising for the knot value 76 Table 3.7: Comparison of fixed knot SIRS fitted with Loeb algorithm, with variable knot SIRS fitted with L-M algorithm. f (x) Interval Loeb Loeb kek2 convergence tanh(x) [-2,2] 7 tanh(x) [-3,3] 97 [-4,4] 7 1+ 1 (1+e−x ) L-M L-M kek2 convergence 7.3875 × 10−4 2.3341 × 10−2 3.6937 × 10−4 13 13 4 2.8395 × 10−7 2.6575 × 10−6 7.4797 × 10−8 gives a much better approximation to that of the fixed knot case. Using unconstrained optimisation methods, we have no way of ensuring that the value of the knot lies within the approximation interval, and, as mentioned previously, in some cases we have had the algorithm break down for this very reason. However, we have not attempted to apply the algorithm using constrained techniques, as the motivation behind the thesis is to experiment with and develop simple algorithms that are easy to apply, and in the majority of cases, we get very good approximations using the basic fixed knot, SIRS Loeb algorithm. If the resulting approximations are not sufficiently accurate, we can also treat the knot value as an additional approximation parameter and fit the SIRS using the Levenberg-Marquardt method. 3.5 Conclusion We have seen that the use of polynomials with constrained asymptotic limit provide a useful tool for approximation, particularly for extrapolation purposes. Also the various forms of the SIRS are useful tools for approximation and extrapolation of data or functions that exhibit double sided asymptotic limits. The two approaches can be combined to provide improved extrapolation properties. In particular, we observe a very small approximation error toward the ends of the approximation interval when 77 approximating this kind of data using an appropriate choice of the SIRS form. This approximation error is generally much smaller than that of standard polynomial ratio approximations on the same data. We also have seen that these forms can be fitted with a modification of the Loeb algorithm, which is considerably simpler than standard optimisation techniques. Furthermore, there is little loss of accuracy using the modified Loeb algorithm in place of optimisation methods in the case of a fixed knot SIRS approximations. However, in general we have found that to obtain the most accurate approximations, the best approach is to optimise for the value of the knot and fit the variable numerator SIRS using the Levenberg-Marquardt method. This method was found to be the most robust and the most accurate. 78 Chapter 4 ASYMPTOTIC POLYNOMIALS 4.1 Introduction Within the field of metrology, it is quite common for there to be some type of asymptotic behaviour associated with the physical system that is being studied. Some of these types of behaviour have already been described in sections 1.8.5, 1.8.6, 1.8.7. Empirical models such as polynomials, splines and Fourier series [6, 10] are not well suited for modelling this kind of behaviour, and so in this chapter, we consider an easily implemented method to allow various classes of asymptotic behaviour to be modelled effectively. This is achieved by modifying a set of polynomial basis functions using a nonlinear weighting function designed to enable the correct type of asymptotic behaviour to be modelled. We refer to these weighted polynomials as asymptotic polynomials. The weight function itself is a rational function dependent on a small number of parameters that can be regarded as being fixed, or as free approximation parameters. In the latter case we describe numerically stable algorithms for approximation with asymptotic polynomials that exploit the fact that nonlinearity is introduced through nonlinear diagonal weighting matrices. We also deal with the problem of modelling variable asymptotic behaviour at ±∞ (as described in section 1.8.8) by using a piecewise continuous weight function. Finally, in section 4.4.2, we compare asymptotic polynomial and standard (Chebyshev) polynomial fits to metrology data. The vast majority of the work presented in this chapter has been published as a collaborative paper with Professors Alistair Forbes and John Mason, in the proceedings of the 5th conference on Algorithms for approximation (A4A5) which took 79 place at Chester in the UK in July 2005. 4.2 Asymptotic polynomials Let {φj (x)}nj=0 be a set of polynomial basis functions such as Chebyshev polynomials [40], and define the weight function w(x) = w(x, b) = 1 , (1 + s2 (x − t)2 )k/2 s > 0, k > 0, (4.1) where b = (s, t, k)T . The weighting function w(x) is continuous with 0 < w(x) ≤ 1 and is a rational function that has the desirable property of being pole free over the entire real line. In addition, w(x) behaves like |x|−k as |x| → ∞. The choice of this weight was inspired by some research into radial basis functions [47]. It can be seen that this function is similar to the multiquadric radial basis function 1 φ(r) = (r 2 + c2 ) 2 , (4.2) where c ∈ R > 0 and r = kx − tk for some t ∈ R. This function has been modified slightly with the introduction of the parameter k in order to achieve the desired type of asymptotic behaviour. We now define a modified basis function φ̃j (x, b) = w(x, b)φj (x), (4.3) and then consider approximation with linear combinations φ̃(x, a, b) = n X aj φ̃j (x, b), (4.4) j=0 where a = (a1 , . . . an )T . We refer to the function (4.4) as an asymptotic polynomial, and we refer to b = (s, t, k)T as the auxiliary parameters associated with the model φ̃(x, a, b). Each one of the auxiliary parameters has a different effect on the behaviour of the asymptotic polynomial. For x limited to a finite interval, the parameter s controls the 80 1 t t t t 0.9 = = = = −3 −1 1 3 0.8 0.7 w(x,b) 0.6 0.5 0.4 0.3 0.2 0.1 0 −10 −8 −6 −4 −2 0 2 4 6 8 10 X Figure 4.1: The effect of varying the auxiliary parameter t of the weight function w(x, b) with s = 1, k = 2 held constant. degree to which asymptotic behaviour is imposed on the model within that interval. The parameter t is where the weight function attains its maximum value, and acts like a centre around which there is radial symmetry. Finally, the parameter k provides control over the limiting behaviour of the asymptotic polynomial according to φ̃(x, a, b) an = . |x|→∞ |x|n−k s lim (4.5) It is the effect of this parameter k that makes the asymptotic polynomial a useful tool for modelling a wide variety of asymptotic limits, including cases of specific interest described in sections 1.8.5, 1.8.6, 1.8.7. Figures 4.1, 4.2 and 4.3 illustrate the effect that varying each of the auxiliary parameters has on the shape of the weight function. 4.3 Approximation with asymptotic polynomials We now consider the problem of obtaining a least-squares approximation to a set of m pairs of discrete data points {(xi , yi)}m i=1 using an asymptotic polynomial. Given the abscissae x = (x1 , . . . , xm )T , we denote by C the basis matrix generated from the 81 1 s s s s 0.9 = = = = 1 0.5 0.25 0.125 0.8 0.7 w(x,b) 0.6 0.5 0.4 0.3 0.2 0.1 0 −10 −8 −6 −4 −2 0 2 4 6 8 10 X Figure 4.2: The effect of varying the auxiliary parameter s of the weight function w(x, b) with t = 0, k = 2 held constant. 1 k=1 k=2 k=3 0.9 0.8 0.7 w(x,b) 0.6 0.5 0.4 0.3 0.2 0.1 0 −10 −8 −6 −4 −2 0 2 4 6 8 10 X Figure 4.3: The effect of varying the auxiliary parameter k of the weight function w(x, b) with s = 1, t = 0 held constant. 82 φi which has is its i, jth element Cij = φj (xi ). (4.6) Similarly, we define C̃ = C̃(b) to be the modified observation matrix generated from the modified basis functions φ̃i , which has i, jth element given by C̃ij = φ̃j (xi ) = wi Cij , (4.7) where wi = w(xi , b). We can now attempt to approximate the data in one of two ways. In the first case we can consider the auxiliary parameters as being fixed, in which case the problem reduces to a weighted linear problem. Alternatively, we can treat them as additional approximation parameters, in which case we are faced with a nonlinear least-squares optimisation problem. 4.3.1 Fixed auxiliary parameters With the auxiliary parameters fixed, the approximation form φ̃(x, a, b) is just a weighted linear combination of the basis functions {φj (x)}nj=0 . In this case we merely need to choose a suitable degree n for φ̃(x, a, b) and calculate the modified observation matrix C̃(b) defined in (4.7) and then find the linear least-squares solution parameters a to the system min ky − C̃ak2 , a (4.8) where y = (y1 , . . . , ym )T . This approach using fixed auxiliary parameters may be particularly useful if the type of asymptotic behaviour exhibited by the data is explicitly known beforehand. The value of b can then be chosen and fixed to match this behaviour in a similar manner as was done for polynomial ratios in section 3.2. This type of approximation is similar to that proposed by Kilgore [30] who considers approximation on [0, ∞) using polynomials of degree equal to or less than 2n, weighted by the function (1 + x2 )−n . 83 Generation of orthogonal polynomials We have suggested the use of Chebyshev polynomials for the choice of the basis functions {φj (x)}nj=0 due to their numerical stability and discrete orthogonality property. Use of the Chebyshev polynomial basis results in mutually orthogonal columns of the observation matrix C. However, this orthogonality may be lost when we form the modified observation matrix C̃ due to the multiplication of the original Chebyshev polynomials by the weight function w(b). If we can generate a set of basis functions that are orthogonal with respect to the weight function w(b) then the resulting observation matrix C̃ formed from these functions will be orthogonal. This can be achieved with the use of the Forsythe method [22] which generates polynomial basis functions that are orthogonal with respect to a specified weight function w(x). We now describe this approach. Given abscissae x = (x1 , . . . , xm )T and weights w = (w1 , . . . , wm )T , we generate polynomial basis functions φj (x) of degree j such that m X wi2 φ2j (xi ) = 1, (4.9) i=1 m X i=1 wi2φj (xi )φl (xi ) = 0, l 6= j. (4.10) We can rewrite these equations more concisely by setting φj = (φj (x1 ), . . . , φj (xm ))T , which results in 4.9 simplifying to kφ̃j k = 1, (4.11) φ̃j φ̃l = 0, l 6= j. (4.12) T The following algorithm constructs the m × (n + 1) matrices C and C̃ along with (n + 1) vector α, n vector β and (n − 1) vector γ. The vectors α = (α0 , . . . , αn )T , β = (β0 , . . . , βn−1 )T and γ = (γ0 , . . . , γn−2)T are such that φ0 (x) = 1/α0 , (4.13) φ1 (x) = α0 (x + β0 )φ0 (x)/α1 , (4.14) 84 and, for j = 2, . . . , n, φj (x) = αj−1 (x + βj−1 )φj−1 (x)/αj + αj−2 γj−2φj−2 (x)/αj . (4.15) I Evaluate φ̃0 and α0 . Set α0 = kwk and, for i = 1, . . . , m, set φi,0 = 1 and φ̃i,0 = wi α0 . II Evaluate φ̃1 , α1 and β0 . Set β0 = − and, for i = 1, . . . , m, m X xi φ2i,0 , (4.16) i=1 φi,1 = (xi + β0 )φi,0, φ̃i,1 = wiφi,1 . (4.17) Set α1 = kφ̃1 k and normalize φ̃1 := φ̃1 /α1 . III For j = 2, . . . , n, calculate αj , βj−1 and γj−2 and φ̃j from φj−1 and φj−2 . Set βj−1 = − m X xi φ2i,j−1, (4.18) i=1 and γj−2 = −(αj−1 /αj−2)2 , and, for i = 1, . . . , m, φi,j = (xi + βj−1)φi,j−1 + γj−2φi,j−2, (4.19) φ̃i,j = wi φi,j . (4.20) Set αj = kφ̃j k and normalize φ̃j := φ̃j /αj . IV Normalize φj : for j = 0 to n, set φj = φj /αj . It can be seen that evaluation of the basis functions require only the vectors α and β. Since C̃ is orthogonal the best fit parameters a are given by a = C̃ T y. Figure 4.4 shows the first four orthogonal basis functions φ̃j defined on the interval [−1, 1] using the weight function w(b) when b = (3, 0, 4)T . 85 0.4 0.3 0.2 Y 0.1 0 −0.1 j=1 j=2 j=3 j=4 −0.2 −0.3 −1 −0.8 −0.6 −0.4 −0.2 0 X 0.2 0.4 0.6 0.8 1 Figure 4.4: First four orthogonal asymptotic polynomials φ˜j (b, x) generated with b = (3, 0, 4)T . 4.3.2 Auxiliary parameters as additional approximation parameters The asymptotic polynomial will have greater flexibility if we regard one or more of s, t and k as additional parameters to be determined as part of the optimization. In this case the matrix C̃ = C̃(b) is now a nonlinear function of b and the we are required to find parameters a, b that solve the nonlinear least-squares system min ky − C̃(b)ak22 . a,b (4.21) The optimization problem (4.21) can be solved using the Gauss-Newton algorithm which has been described in section 2.8.2. If we define the m × m matrix W (b) to be the diagonal matrix formed from the elements of the weight vector w(b) = (w(x1 , b), . . . , w(xm , b))T , (4.22) we can now express C̃(b) as C̃(b) = W (b)C. (4.23) We then form the Jacobian matrix of partial derivatives of the quantity h(a, b) = y − W (b)Ca, (4.24) 86 with respect to the optimization parameters. These partial derivatives are given by ∂h = −φ̃j , ∂aj ∂W ∂h Ca, = − ∂bl ∂bl (4.25) (4.26) and using these the Gauss-Newton method can then implemented. Given an initial estimate b0 of the parameters b, the polynomial basis can be chosen to be orthogonal with respect to the weight vector w(b0 ) so that for b close to b0 , the associated Jacobian matrix is relatively well-conditioned. In order to maintain well-conditioned matrices, we can periodically reparametrize the polynomials based on the current estimate of the auxiliary parameters b. 4.3.3 Elimination of the parameters a For the case where we treat the auxiliary parameters as being free approximation parameters rather than fixed values, it is possible to eliminate the parameters a from the optimization, thus reducing the complexity of the problem. The least-squares problem (4.21) can be rewritten as min hT (a, b)h(a, b), (4.27) h(a, b) = y − C̃(b)a, (4.28) a,b where and C̃(b) is an m × n matrix, m > n, depending on parameters b. If we fix the value of b, then the optimal value of a in (4.27) is the solution of the linear least-squares problem min ky − C̃(b)ak22 , a (4.29) where C̃(b) is now a fixed matrix. The solution parameters a to this problem must satisfy the normal equations C̃ T (b)C̃(b)a = C̃ T (b)y, (4.30) 87 and these equations can be seen to implicitly define the parameters a as a function of the auxiliary parameters b. If we now define the quantity f(b) = y − C̃(b)a(b), (4.31) we see that the initial nonlinear least-squares problem of equation (4.27) is equivalent to min f T (b)f(b). b (4.32) This is also a nonlinear least-squares problem, but is much simpler as we only need to solve for the auxiliary parameters b. Once we have solved for b, the parameters a are easily obtained from the normal equations (4.30). In order to solve this using the Gauss-Newton method, it is necessary to calculate a, f and the partial derivatives of f with respect to the auxiliary parameters b. For any fixed value of b, the optimal parameters a are easily obtained from the normal equations. Once the parameters a are known, the value of f can be calculated from (4.31). The partial derivatives of f are given by fl = −C̃l a − C̃al , (4.33) where we use the subscript l to represent a derivative with respect bl . Differentiation of the normal equations (4.30) with respect to bl , leads to C̃lT C̃a + C̃ T C̃l a + C̃ T C̃al = C̃ T y, (4.34) −1 h i al = C̃ T C̃ C̃lT f − C̃ T C̃l a , (4.35) which reduces to after substituting y with f + C̃a (4.31). We note here that from equation (4.33), evaluation of fl only requires the calculation of the product C̃al and not the vector al itself. If we assume C̃ has QR decomposition C̃ = QR, (Q ∈ Rm×n , R ∈ Rn×n ) [24], then substituting this into (4.35) we get (QR)T C̃al = C̃lT f − (QR)T C̃l a. 88 Further manipulation gives RT QT C̃al = C̃lT f − RT QT C̃l a, = Q−T R−T [C̃lT f − RT QT C̃l a], C̃al from which we obtain the following expression for C̃al C̃al = QR−T C̃lT f − QQT C̃l a. (4.36) If we now consider vectors cl and ql such that RT cl = C̃lT f, ql = QT (C̃l a), (4.37) (4.38) then we can rewrite (4.36) as C̃al = Q(cl − ql ). (4.39) In this way, the derivatives fl (4.33) with respect to parameters bl can be calculated from the expression fl = −C̃l a − Q(cl − ql ). (4.40) At this point we note that the function f(b) = y − C̃(b)a(b) and its derivatives are independent of the choice of basis functions used to represent the polynomials. In particular, we can choose use the Forsythe method to generate the basis functions so that C̃ is orthogonal. Using this basis means that the upper triangular matrix R in equation (4.38), must be the identity matrix, which also means Q = C̃. Using equations (4.38),(4.37) and the fact that R is the identity matrix, equation (4.40) reduces to fl = −C̃l a − C̃(C̃lT f − C̃ T (C̃l a)). (4.41) This means that the derivatives of f can be calculated using matrix-vector multiplication, and avoids the need for computationally expensive matrix inversions. Also the 89 orthogonality of C̃ results in a simplification of the normal equations (4.30) and the parameters a are given by a = C̃ T y (4.42) An additional efficiency gain can be made using the fact that C̃ = W (b)C, where W (b) is a diagonal weighting matrix with diagonal elements wi (b) with wi (b) > 0. If we write di,l = 1 ∂wi , wi ∂bl (4.43) then we have C̃l = Dl C̃, (4.44) where Dl is the diagonal matrix with diagonal elements di,l . The quantities C̃lT f = C̃ T (Dl f), (4.45) C̃l a = Dl (C̃a), (4.46) used in (4.38) and (4.37) allows (4.41) to be written as fl = −Dl (C̃a) − C̃(C̃ T (Dl f) − C̃ T (Dl (C̃a))). (4.47) This means that we can calculate the derivatives of f using only f, a, Dl , and C̃. We can now calculate the elements of the Jacobian matrix J from the elements of the partial derivative vectors fl according to Jil = ∂fi /∂bl , and implement the Gauss-Newton method which iteratively updates the current parameter estimate to b = b + pGN , where J T JpGN = −J T f. The Gauss-Newton algorithm for minimizing a sum of squares F (b) = f T (b)f(b)/2 works well if the Jacobian matrix J, is such that J T J is a good approximation to the 90 Hessian matrix of second partial derivatives of F (b). In cases where the approximating form has high curvature, it is common for J T J to provide a poor approximation to the Hessian and this can be the case with asymptotic polynomials. This can prevent the convergence of the Gauss-Newton method and in such cases we may achieve more success by applying the Newton method instead, although again convergence is not always guaranteed. We recall that the Newton step pN to update b := b + pN is the solution of HpN = −g, g = J T f, where H is the Hessian matrix of F (b). If convergence occurs, the Newton update step leads to quadratic convergence near the solution, and for this reason there can be computational advantages in using a Newton update. In our implementation, we have used finite differences to approximate H using an implementation provided by Professor Alistair Forbes of NPL. 4.3.4 Choice of initial values of the auxiliary parameters When we fit data with an asymptotic polynomial using either the Gauss-Newton or Newton methods, careful thought needs to be given to the initial values of the auxiliary parameters. As has been mentioned in chapter 2, the convergence of these methods is dependent on a suitable set of start parameters close to a local minimum of the objective function - in this case the least-squares approximation error surface. Although we can also employ other methods that are not so sensitive to start values (such as the Levenberg-Marquardt method), it is still possible to choose good start values for the Gauss-Newton and Newton methods based on a visual assessment of the data. Although the guidelines we now propose for start parameter selection are far from rigorous, they are often very effective at ensuring convergence of the optimization algorithms used. The auxiliary parameter t acts like a centre around which the weight function has 91 radial symmetry. Due to this behaviour, we have found that a good starting value for t can usually be chosen to coincide with a region of symmetry within the data if there is one. If there is not a great deal of symmetry in the data then we have found that an alternative is to choose t to coincide with a stationary point of the data. Figure 4.5 shows a plot of measurements of some material properties of aluminium (provided by NPL). This data exhibits some loose symmetry centred somewhere in the region 155 < x < 157 and so for an initial choice of t we would try a value in this range, probably concentrating on values close to the two localised peaks in this data x = 155.7 and x = 156.3. The auxiliary parameter k is usually chosen to mimic the asymptotic behaviour exhibited by the data in conjunction with the degree of the orthogonal polynomial basis in the numerator. For example, suppose we are approximating data that appears √ (or is known theoretically) to behave like x as x → ∞, using a degree n asymptotic polynomial. In order for the correct limiting behaviour to be achieved by the asymptotic polynomial, we would require a choice of k = n − lim x→∞ αxn n−k x 2 2 (1 + s (x − t) ) k 2 = α . sk 1 2 because (4.48) The choice for start value of s is the most difficult of the three auxiliary parameters, because it requires a guess of how quickly the asymptotic behaviour takes effect. Figures 4.6 and 4.7 show artificial sets of data that have the same asymptotic limit, but with different rates at which these limits are attained. It is clear that the data in Figure 4.6 approaches its asymptotes faster than that of Figure 4.7 and so we would expect to find that an asymptotic polynomial approximation of the first dataset would have a larger value of s than that of the second dataset. This only gives an indication of a good start value of s relative to some other measure, but at least gives some indication that it is necessary to use larger s values for data that attains its limits quickly. Using these guidelines it is often possible to choose start parameters that result in con- 92 vergence of the Gauss-Newton or Newton methods. In cases where convergence is not possible, we can either make adjustments to the initial parameter values, or implement the Levenberg-Marquardt method described in section 2.8.3. This is less sensitive to the choice of start parameter and in our experience has given good results for these problem cases. For a more robust choice of algorithm, the Levenberg-Marquardt method could be used all the time, but for a good selection of initial parameters, we would expect faster convergence with the Newton method. 1600 1500 1400 1300 Y 1200 1100 1000 900 800 700 600 152 153 154 155 156 157 158 X Figure 4.5: 601 experimental measurements obtained from material properties of aluminium. 4.4 Example applications We now present some applications of the asymptotic polynomial to the approximation of experimental data obtained from industry, and make some comparisons with linear approximations to the same data. We note here that instead of approximating directly with the auxiliary parameter s > 0, we optimised for es , thus ensuring the required positivity of the parameter without the need for constrained optimisation methods. 93 1.5 1 Y 0.5 0 −0.5 −1 −1.5 −10 −8 −6 −4 −2 0 2 4 6 8 10 X Figure 4.6: Artificial data with asymptotic limits. 1.5 1 Y 0.5 0 −0.5 −1 −1.5 −10 −8 −6 −4 −2 0 2 4 6 8 10 X Figure 4.7: Artificial data with asymptotic limits. 4.4.1 Photometric data This experiment was described in section 1.9.2, and here we present the results of approximating the collected data with asymptotic polynomials. The data is composed 94 of 471 measurements, and an initial inspection of the data reveals a shape similar to a guassian curve with fairly mild asymptotic behaviour. Clearly the data exhibits exponential decay and so using the guidelines for start parameter selection (section 4.3.4) we chose initial auxiliary parameters b0 = (−10, 505, 7)T . An asymptotic polynomial of degree 6 was then fitted to the photometric data using both Newton (via finite difference approximation to the Hessian) and Gauss-Newton methods. Table 4.1 compares the norms of the update parameter p calculated at consecutive iterations of these methods, and illustrates the superior convergence of the Newton method for this example. Figure 4.8 shows degree 9 Chebyshev polynomial approximations and Table 4.1: Norm of update step p in Newton and Gauss-Newton methods for the photopic efficiency function example (Fig. 4.8). Iteration kpGN k2 kpN k2 1 0.8496 0.6573 2 0.3354 0.2203 3 0.1380 0.0019 4 0.0568 2.075 e-06 5 0.0235 3.855 e-13 6 0.0097 degree 6 asymptotic polynomial approximations to the photometric data, from which the superior performance of the latter is evident. 4.4.2 Approximation of sigmoid function Figure 4.9 shows a polynomial of degree 6 and an asymptotic polynomial of degree 3 fits to the sigmoid curve y= 2 − 1. 1 + e−x (4.49) 95 1.2 1 Efficiency 0.8 0.6 0.4 asymptotic polynomial fit 0.2 0 −0.2 350 polynomial fit 400 450 500 550 600 650 Wavelength/nm 700 750 800 850 Figure 4.8: Asymptotic polynomial approximation of photopic efficiency function. (In many circumstances the response of a system to a step change in input has a sigmoid-type behaviour.) We chose to fit the function to 501 equally spaced points from the sigmoid over the interval [-10,10]. The asymptotic polynomial fit is indistinguishable from the sigmoid curve and the maximum error of approximation is less than 2.5 × 10−4 . The degree 6 polynomial fit is much worse. (In the examples considered here the degree of the standard polynomial is 3 more than the asymptotic polynomial so that both models have the same number of parameters.) Finally, figure 4.10 shows standard polynomial and asymptotic polynomial fits to another experimental dataset (provided by NPL). This dataset consists of 601 measurements of material properties of aluminium, and again we can clearly see the superior performance of the asymptotic polynomial over the Chebyshev polynomial approximation. 4.5 Modelling two asymptotic limits simultaneously Up to now we have used asymptotic polynomials to model various types of asymptotic behaviour more effectively than standard polynomials. This works well when we are 96 1.5 1 0.5 polynomial fit 0 −0.5 aysmptotic polynomial fit −1 −1.5 −10 −5 0 5 10 Figure 4.9: Polynomial of degree 6 and asymptotic polynomial of degree 3 fits to a sigmoid curve. 1600 1500 1400 asymptotic polynomial fit 1300 1200 1100 1000 polynomial fit 900 800 700 600 152 153 154 155 156 157 158 Figure 4.10: Polynomial of degree 9 and asymptotic polynomial of degree 6 fits to 601 measurements of material properties (for aluminium). dealing with a one sided asymptotic limit, but may not be effective at modelling the double sided asymptotic behaviour that was described in section 1.8.8. To recap, this section described the problem of modelling a function f (x) that has twin asymptotic limits f (x) = α1 , x→−∞ xβ1 lim (4.50) 97 lim x→+∞ f (x) = α2 , xβ2 (4.51) where β1 , β2 , α1 , α2 ∈ R. In chapter 3 we introduced the semi-infinite rational spline as an approximation tool for modelling this type of asymptotic behaviour, and in this section we utilise a similar approach to allow asymptotic polynomials to model two different limits simultaneously. This is achieved by considering a new piecewise continuous weight function defined by w− (b, x) = 1 k1 (1 + − λ)2 ) 2 w(b, x) = 1 k2 w+ (b, x) = 2 (1 + s2 (x − λ)2 ) 2 s21 (x for x ≤ λ (4.52) for x ≥ λ where b = (s1 , s2 , k1 , k2, λ)T . As was the case with the SIRS of chapter 3, we regard the parameter λ as a knot (which we can treat as fixed or variable) and examine the conditions required for continuity of the weight function at this knot. Clearly w(b, x) is C 0 continuous at the knot as we have w− (b, x) = w+ (b, x) = 1, (4.53) when x = λ. For C 1 continuity at the knot we require the first derivatives of w− (b, x) and w+ (b, x) with respect to x to have the same value at x = λ. These derivatives are given by −k1 s21 (x − λ) , k1 (1 + s21 (x − λ)2 ) 2 +1 ′ −k2 s22 (x − λ) w+ (b, x) = , k2 (1 + s22 (x − λ)2 ) 2 +1 ′ w− (b, x) = (4.54) and C 1 continuity is verified as being an intrinsic property of the weight function as we have ′ ′ w− (b, x) = w+ (b, x) = 0, (4.55) when x = λ. So far we have C 1 continuity at the knot satisfied automatically due to the definition 98 of the weight function. However, C 2 continuity is not so straight forward to enforce. The second derivative of the spline functions w− (b x) and w+ (b x) are given by ′′ −( w− (b, x) = (k1 + 2)s41 k1 (x − λ)2 u1 ′′ k1 +2) 2 −( w+ (b, x) = (k2 + 2)s42 k1 (x − λ)2 u2 k2 +2) 2 −( − s21 k1 u1 k1 +1) 2 −( − s22 k2 u2 , k1 +1) 2 , (4.56) (4.57) where u1 = 1 + s21 (x − λ)2 , (4.58) u2 = 1 + s22 (x − λ)2 . (4.59) Evaluating these second derivatives at x = λ and forcing their equality leads to the constraint s21 k1 = s22 k2 , (4.60) and as a consequence, we are now faced with a nonlinear optimisation problem involving nonlinear constraints on the parameters. If we are approximating data where the asymptotic behaviour is known explicitly, then we could consider fixing the values of k1 and k2 which would result in a significant simplification of the constraint (4.60) as we could then explicitly define s1 in terms of s2 . However, we would then be faced with an optimisation problem involving a single parameter (either s1 or s2 ), and this is likely to reduce the flexibility of the asymptotic polynomial significantly, and so we will not consider this option further. Due to the complexities of constrained optimisation, we will therefore settle for C 1 continuity of the weighting function, due to the fact that this requires no constraints at all and is an intrinsic property of w(b, x). Approximation of data with this modified version of the asymptotic polynomial can then be achieved in exactly the same way as described previously, the only difference being the addition of extra optimisation parameters k2 and s2 . When approximating double asymptotic limits, we have found that this splined version of the asymptotic polynomial yields better approximations to those obtained using the standard asymptotic polynomial form. We finish this section by presenting 99 figure 4.11, which illustrates this superiority of the splined version of a degree 10 asymptotic polynomial approximation of tanh(x) at 101 equally spaced abscissae on [−5, 5]. −4 2 Comparison of approximation errors x 10 Standard Spline 1.5 Approximation Error 1 0.5 0 −0.5 −1 −1.5 −5 −4 −3 −2 −1 0 1 2 3 4 5 X Figure 4.11: Approximation errors of degree 10 spline and standard asymptotic polynomial fits to tanh(x). 4.6 Concluding remarks In this chapter we have demonstrated that data reflecting asymptotic behaviour can be modelled by polynomial basis functions multiplied by a nonlinear weighting function depending on three auxiliary parameters. In addition, we have developed numerically stable optimization algorithms using polynomial basis functions orthogonal with respect to the weighting function. Further simplification has been achieved with the implementation of a parameter elimination scheme that allows the approximation problem to solved compactly. The model can easily be extended to allow for different asymptotic behaviour as x → ∞ and x → −∞, by using a piecewise continuous weighting function that itself has variable asymptotic behaviour. We also have observed some guidelines for selecting initial values for the parameters that in many cases result in convergence of our algorithm. However, in cases where we cannot find 100 a good set of starting parameters, we can still fit the asymptotic polynomial to data using the more robust Levenberg-Marquardt algorithm. The examples presented have shown that these asymptotic polynomial approximations can be much more effective than standard polynomial approximations. 101 Chapter 5 POLE FREE LEAST-SQUARES RATIONAL APPROXIMATION 5.1 Introduction In this chapter we introduce a method of fitting polynomial ratios to data in such a way that the denominator function has no real roots, resulting in the rational approximation having no singularities anywhere on the real line. We do this by presenting an alternative representation of the denominator polynomial and applying nonlinear optimisation techniques to evaluate parameters. This approach is very simple and we illustrate how to combine it with the SIRS discussed in Chapter 3, resulting in pole-free SIRS approximations. 5.2 Positive denominator rational approximation Unless we are faced with a specific problem requiring the modelling of a singularity in the data, we will most often wish for a rational approximation to have no poles on the approximation range. For Chebyshev rational approximation there are a number of methods available to constrain the approximation parameters, and using these methods we may be able to find a way to ensure that the denominator has no zeros. Most of these methods are based on modifications of the Differential Correction Method which was presented in Chapter 2. Kauffman and Taylor have implemented variations on the DCM that allow linear constraints to be placed on the approximation parameters [28], and another that places a strictly positive lower bound on the denominator function [29]. Another method has been developed by Gugat [25], which forces the 102 denominator to be bounded above and below by continuous functions. Our primary interest is in least-squares approximation, as far as we are currently aware, there are no equivalent methods to those described above, other than constrained optimisation techniques. 5.3 Requirements for a strictly positive denominator When faced with the general problem of approximating discrete points on an interval [a, b] it is not immediately obvious what constraints on approximation parameters will lead to a positive polynomial denominator. If we are instead faced with an approximation problem on an interval [c, d] where c, d > 0, then strict positivity can be achieved with the simple constraint 0 < qi , i = 1, . . . , m, (5.1) where our denominator function Qm (q, x) = m X qj xj , (5.2) j=0 is defined as in previous chapters. In such cases we can enforce these constraints and get pole free approximations. With these constraints satisfied, we now have a monotonically increasing denominator on the approximation interval, and hence some flexibility in the approximation is lost. If we are approximating on a general range [a, b] with a < b < 0, or a < 0 then we could consider approximating using a denominator of the form Qm (q, x) = m/2 X qj x2j . (5.3) j=0 Using such a denominator removes the odd monomial basis functions, and then enforcing the constraint (5.1) will again provide a positive denominator. However, again we have lost some flexibility with the removal of the odd functions. 103 5.4 Implementing positive denominator constraints In an attempt to avoid the use of constrained optimisation techniques, we propose a simple method of data fitting, using a subset of the set of all pole-free rational approximations. This method works by considering an alternative representation of the denominator polynomial in the rational form. Up to now we have mainly n been using polynomial ratios Rm (p, q, x) for approximation, where n, m are chosen according to the shape of the data, and are generally of low degree to avoid illconditioning problems with our proposed methods. For the moment, we will restrict our discussion to cases where the denominator polynomial Qm (q, x) has even degree. This denominator has at most m distinct roots, and we can represent Qm (q, x) as m Y Qm (q, x) = qm (x − αi ) (5.4) i=1 where the αi ∈ C are the denominator roots. If it is the case that all of the roots are complex, then our denominator will be strictly positive over the entire real line resulting in a pole free rational approximation. We now wish to examine some conditions that we can apply to enforce this situation. Because we have chosen m to be an even integer, we can also express the denominator (suitably normalised with q0 = 1) as a product of quadratic polynomials m/2 Qm (q, x) = Y (γix2 + ξi x + 1), (5.5) i=1 for constants γi, ξi ∈ R. Now if we can ensure that each of the quadratics in the product (5.5) have complex roots, then we can be sure that Qm has no real roots and as a consequence the rational approximation contains no poles. For all roots of the equation (5.5) to be complex, we require that 4γi > ξi2 , (5.6) 104 for i = 1, . . . , m2 . Now if we apply the following explicit parameter constraints ξi 2 = γi , (5.7) for i = 1, . . . , m2 , then the conditions (5.6) are automatically satisfied. Thus if we consider rational approximation using denominators of the form m/2 Qm (q, x) = Y (qi2 x2 + qi x + 1), (5.8) i=1 then we are guaranteed to have no real poles on the entire real line. In addition, this representation defines a degree m polynomial given in terms of only m 2 coefficients, and so we have reduced the number of denominator approximation parameters by half. Obviously the space of polynomials described by (5.8) is only a subset of the space of all degree m polynomials, and so we are reducing the size of the space of approximation forms that we are using. In particular, the set of polynomials given in (5.8) is only a subset of set of polynomials strictly greater than zero. Despite these limitations, the simplicity of the constraint provides us with exactly what we require, without the need for constrained optimization algorithms, and has the advantage of reducing the number of optimisation parameters by 5.5 m . 2 Parameter evaluation In order to try and evaluate the parameters we will need to apply an optimisation algorithm such as Gauss-Newton, and will need to evaluate the Jacobian matrix for this problem. Using the denominator representation (5.8), out rational approximant is defined as n Rm (p, q, x) = P (p, x) = m Y Q(q, x) i=1 n X pi xi i=0 (qi2 x2 + qi x + 1) . (5.9) 105 The Jacobian matrix of partial derivatives of the residual vector is defined in (2.43) and for approximations of the form (5.9), is defined as the matrix with elements xij−1 , j = 1, . . . , n + 1 Q(q, xi ) (5.10) J(p, q)ij = 2 (2qj x + xi ) , j = n + 2, . . . , n + m + 2 R(p, q, xi ) 2 2 i (qj xi + qj xi + 1) With the Jacobian defined we can apply the Gauss-Newton method to fit the poly- nomial ratio representation (5.9) to a set of data. The main problem we are faced with in attempting to do this is that it is not obvious how to obtain a good a set of start parameters for the algorithm. We would hope for the Gauss-Newton method to have good local convergence in the vicinity of a local minimum (although this may not be the case), but even if this is the case we have no way of ensuring a start set that will converge to a solution. With the standard representation of the rational function where the denominator parameters appear linearly in the denominator, we would usually apply the Loeb algorithm to obtain a suitable set of start parameters with which to start off the Gauss-Newton process, as was discussed in Chapter 2. However, with the quadratic product representation for the denominator we have no such method and so we must use an optimisation technique that converges for a wide range of start parameters. We have used the MATLAB optimisation toolbox to implement the Levenberg-Marquardt method to obtain convergence from arbitrarily chosen start parameters. In general we have used randomly selected parameters from the interval [−1, 1]. The problem with this approach is that it is quite likely that the optimisation technique has converged to a local minimum only, and we have no idea how this compares with the global minimum for our given objective function. In addition, the approach is very much ’trial and error’, as for certain sets of start parameters we either get no convergence, or very slow convergence. However, after numerous applications of this approach on a large number of datasets, we have found that most of the time the algorithm converges to good pole-free rational approxima- 106 tions within an acceptable number of iterations. In addition, it was encouraging to see that in most cases, the algorithm converged to the same set of solution parameters for a variety of different starting parameters. Table 5.1 shows the results obtained from application of this method to data points taken from a number of different functions. In each case, we fitted a degree [4,4] rational approximation to 51 equally spaced samples on [−2, 2] from each function, with approximation start parameters as randomly selected numbers from [−1, 1]. It is important to note here that for each function we used a seperate random selection of start parameters, but we repeated the algorithm a number of times for each function using different random parameters and arrived at the same set of parameters at convergence (when convergence occurred). Figure 5.1 shows one of the actual approximations obtained using this method. Table 5.1: Error norms and iterations to convergence of pole-free rational approximations. Function Convergence kek2 e0.1x cos(2x) 20 6.2567 × 10−2 tanh(x) 38 cosh(x) 38 1.1547 × 10−2 tan(x) 42 57.574 sin(2x) 14 5.7270 × 10−2 e−x ex 2 2 ln(x + 3) 2 e−x tanh(x) 7.7715 × 10−2 14 2.0982 × 10−2 - - 34 1.0541 × 10−2 12 1.9510 × 10−2 107 1 f(x) = exp(0.1x)cos(2x) 0.5 0 −0.5 −1 −1.5 −2 Approximation f(x) −1.5 −1 −0.5 0 X 0.5 1 1.5 2 Figure 5.1: Degree [4,4] pole-free rational approximation to e0.1x cos(2x). 5.5.1 Ill-conditioning issues The columns of the Jacobian matrix that correspond to the partial derivatives of the residual vector with respect to the denominator parameters are symmetric, in the sense that if at any step of our algorithm we update values such that 2 (or more) of the denominator parameters are equal, we will have 2 (or more) identical columns in the Jacobian which will then be rank deficient. This will also be the case should any of the denominator parameters converge to the same value of any of the others. This is an important issue to consider when choosing the start parameters. When we generated random start parameters as described previously, we first checked to ensure that the initial Jacobian matrix generated with them was not too ill conditioned. If this was the case, parameters too close together we manually adjusted, or a new set of random parameters were created. In the vast majority of cases we found that for the [4,4] rational pole free approximation, we would not see a Jacobian with condition number higher than the order of 103 . For the case of a [6,6] pole free rational form, we found the condition number generally to be no larger than 106 . We also examined the [8,8] form, but in this case, we were often faced with condition 108 numbers higher than 109 . We also found that convergence was generally slower for pole free approximations higher than degree [6,6], and would not recommend fitting these higher degree approximations for these reasons. 5.5.2 Consideration of other types of constraint The approximation form described in the previous section is guaranteed to be pole free on the entire real line, but in some cases we may not require such strict conditions. For example, if we were not intending to use the approximation to extrapolate outside the approximation interval, we may be happy to have poles in the approximation, provided they do not occur in the approximation interval itself. Another example is the approximation on a real interval [a, b] where 0 < a < b. In this case we can ensure strict positivity of the denominator on the range [0, ∞) by defining the denominator polynomial in the usual way as Qm (q, x) = m X qj xj , (5.11) j=0 and then set constraints on the parameters in the form 0 < q0 < q1 < . . . < qm . (5.12) However, this has the slight disadvantage that it is a monotonically increasing function, and such a denominator will provide less flexibility than polynomial ratios formed with using (5.8) which is not necessarily monotonically increasing. In addition it is hard to see how the approximation form with denominator constraints (5.12) will result in a superior approximation to that of the asymptotic polynomial of the previous chapter, that also has a strictly positive denominator, but has fewer coefficients and is more stable numerically. The constraints in (5.7) are very simple to implement, and also serve to reduce the dimensionality of the approximation parameter vector. However, the set of approximations generated using these constraints is only a subset of the entire space of 109 pole free rational approximations, and so we could gain even greater flexibility by considering the constraints ξi = λi γi2 , (5.13) where λi > 41 . Using these constraints, does increase the dimensionality of the approximation parameter vector, but it increases the space of approximations to include all possible strictly positive denominators, and provides a more flexible denominator, and therefore potentially more flexible and superior rational approximations. 5.5.3 Pole-free SIRS approximations The simplicity of the constraints (5.7), allows us to extend the SIRS work of Chapter 3 to include pole-free constraints in the denominator. If we define a SIRS approximation be composed of only one denominator defined by (5.8) we can introduce the continuity constraints across the knot of the SIRS merely through the numerator functions. Explicitly, the resulting pole free SIRS approximation will be defined by n1 X gi u i i=0 = R− (g, q, u) for u ≤ 0 m/2 Y 2 2 (qi x + qi x + 1) n1 ,n2 i=1 Rm (g, h, q, u) = , n2 X i hi u i=0 = R+ (h, q, u) for u ≥ 0 m/2 Y (qi2 x2 + qi x + 1) (5.14) i=1 where u = x − λ, and λ is the SIRS knot. As mentioned previously, we cannot utilise the Loeb algorithm as we could with the standard SIRS approximation, and for this particular case we would use the Levenberg-Marquardt algorithm to obtain a solution. 110 5.6 Conclusion In this chapter, we have suggested a very simple factorisation that provides an alternative representation of the approximation parameters, that with the application of some simple constraints, results in a pole free rational approximation over the entire real line. In addition we have shown that we can successfully fit such approximations using established optimisation techniques. While it is accepted that these approximations do not give as small an approximation error as the free rational approximation form does, it will obviously be a superior approximation to cases where an unwanted pole is fitted in the approximant of an unconstrained form. This representation also has the advantage of reducing the number of approximation parameters in the problem, at the expense of some flexibility. Should such approximations not be as accurate as desired, greater flexibility can be obtained using the full set of approximation parameters as given by the wider range of constraints (5.13). 111 Chapter 6 ℓP RATIONAL APPROXIMATION 6.1 Introduction In this chapter we consider another modification of the Loeb algorithm, this time applying it to ℓp rational approximations with the use of Lawson’s algorithm [31]. Lawson’s algorithm is an iterative weighted least squares procedure, first studied by Lawson in [31] and has been proven to converge to a best linear ℓ∞ approximation on discrete data sets. The algorithm was later extended by Rice and Usow in [49] and has been proved to converge to best ℓp approximations for p > 2. We describe this algorithm and present a modification that extends its applicability to ℓp approximation for p < 2, along with numerical results to support this. Finally we propose a combined Lawson - Loeb algorithm for use in generating ℓp rational approximations on discrete data sets. We describe a general combined algorithm which has a number of subtle variants, and present some numerical results. The work of this chapter is published in the conference proceedings from the 2004 conference on Mathematical Methods for Curves and Surfaces held in Tromso. This paper also contains other work on the Loeb algorithm that was done in collaboration with Professor John Mason, and is contained in Appendix A. This other work is not included as a part of this thesis as it is only partially the authors work. The reason for mentioning this is because it forms part of a joint effort to attempt to prove convergence of the Loeb algorithm, which is a central theme of the thesis. Given the wide use of the Loeb algorithm within this work, we feel it appropriate to show that some attempt was made to prove its convergence. 112 6.2 The Lawson algorithm, and the Rice and Usow extension The Lawson algorithm [31] is used to fit linear ℓ∞ approximations to discrete data. It is an iterative procedure that has proven convergence to a best linear ℓ∞ approximation, and its implementation involves fitting weighted ℓ2 approximations at each iteration. This algorithm was extended by Rice and Usow [49] to include ℓp approximation problems (p > 2) using the same weighted least squares approach, and also has been proved to converge to best linear ℓp approximation. The original Lawson algorithm has also been utilised in more recent work on the Huber M-Estimator [2]. We will now describe the Rice and Usow algorithm in more detail. Consider the linear form L(A, x) = n X ai φi (x) (6.1) i=0 where A = (a0 , . . . , an ) is a set of approximation parameters and φi(x) are a set of linearly independent basis functions. We wish to fit ℓp approximations to a set of function values f (xj ) = fj (j = 1, . . . , m) defined on a discrete point set x = (x1 , x2 , . . . , xm ). This is achieved by finding solution parameters A that minimise the quantity ke(A, x)kp = where m X i=j |ej (A, xj )|p 1/p ej (A, xj ) = L(A, xj ) − fj , (6.2) (6.3) are the elements of the residual vector e(A, x). The basic principle behind the algorithm is to convert the problem into a weighted least-squares problem, by considering |ej (A, xj )|p ≈ w|ej (A, xj )|2 , (6.4) where w is a suitable weight term. Specifically, the Rice and Usow algorithm works by updating the weight at iteration (k + 1) according to the formula p−2 wjk+1 = (wjk |ekj |) p−1 (6.5) 113 where ekj = fj − L(Ak , xj ) (6.6) is the residual at x = xj at iteration k. If wjk ≈ |ekj |p−2 (6.7) then wjk |ekj |2 ≈ |ekj |p and so we have an ℓp approximation. Now if wjk ≈ |ekj |p−2 then p−2 wjk+1 ≈ (|ekj |p−2 |ekj |) p−1 = |ekj |p−2 (6.8) and so wj → |ej |p−2. As has been mentioned, this algorithm is proved to converge to a best linear ℓp approximation for p > 2, and an extensive proof of this can be found in [49]. Before discussing the application to rational approximation, we firstly propose a further extension of this algorithm to the case of linear ℓp approximation for p < 2. Our proposed algorithm differs from that of Rice and Usow by updating the weight at iteration (k + 1) according to the formula wjk+1 = wjk |ekj | ! 2−p 3−p (6.9) Now if wjk ≈ |ekj |p−2 we have wjk+1 ≈ |ekj |p−2 |ekj | 2−p ! 3−p = |ekj |p−2 (6.10) and so wj → |ej |p−2 as required. We now describe the implementation of the algorithm before presenting some numerical results of its application. At iteration k, define ekj = fj − L(A, xj ) and wjk as the approximation error and weight at xj respectively, and choose an initial set of weights wj1 = 1 . m 1. Find the best ℓ2 approximation L(A, x) to f with weight wjk at xj and calculate ekj . 114 2. Calculate new weights wjk+1 using (6.5) for p > 2 or from (6.9) for p < 2. 3. Normalise the weights wjk+1 . wjk+1 := Pm k+1 j=1 wj (6.11) 4. Proceed from step 1 until the solution has converged. As has been shown, the principle behind the algorithms is to generate a weight term approximately equal to |e|p−2, thus reducing the ℓp approximation problem to a weighted least squares problem. This can be presented more generally by writing the equations for updating the weights as wjk+1 = (wjk )λ1 |ekj |λ2 (6.12) for general indices λ1 , λ2 . The requirement that wj ≈ |ej |p−2 leads to |e|p−2 = |e|(p−2)λ1 |e|λ2 . (6.13) If we set λ1 = λ2 = λ and then equate indices in (6.13) we are left with the Rice and Usow algorithm (6.5). The algorithm is easy to implement but can be rather slow to converge although there are proposed methods of accelerating convergence described in [49]. Due to the division by the approximation error in (6.9), we have set a maximum weight value to avoid infinite or very large weights in the case of data points having residuals very close to zero. We have chosen to set an upper bound of 106 and set any weights greater than this equal to 106 . We now present some numerical results for this algorithm for various values of p < 2. In practice we have found that choices of p ≤ 1 generally fits a good ℓ1 approximation to data, and so we compare them with the best ℓ1 approximation obtained using the Barrodale and Roberts algorithm [9]. In particular we compare the data points that are interpolated by these methods. In all cases we have chosen a Chebyshev polynomial basis and taken 21 equally spaced abscissae xi ∈ [−1, 1]. 115 Algorithm Interpolation points xi kL(x) − f k1 Iterations Barrodale Roberts ±0.9, ±0.3 0.00542 - Lawson (p = 1) ±0.9, -0.4, 0.3 0.00550 10 Lawson (p = 0.1) ±0.9, -0.4, 0.3 0.00550 7 Lawson (p = −1) ±0.9, -0.4, 0.3 0.00550 6 Table 6.1: Degree 3 ℓp approximation of log(x + 3) using Lawson’s algorithm Algorithm Interpolation points xi kL(x) − f k1 Iterations Barrodale Roberts ±0.9, ±0.5, 0 0.05849 - Lawson (p = 1) ±0.9, ±0.5, 0 0.05849 10 Lawson (p = 0.1) ±0.9, -0.4, 0.3 0.05849 8 Lawson (p = −1) ±0.9, -0.4, 0.3 0.05849 7 Table 6.2: Degree 3 ℓp approximation of tanh(x) using Lawson’s algorithm Algorithm Interpolation points xi Barrodale Roberts ±1.0, ±0.8, ±0.5, ±0.2 Lawson (p = 1) ±1.0, ±0.8, ±0.5, ±0.2 Lawson (p = 0.1) ±1.0, ±0.8, ±0.5, ±0.2 Lawson (p = −1) ±1.0, ±0.8, ±0.5, ±0.2 kL(x) − f k1 Iterations 2.3487 × 10−3 - 2.3487 × 10−3 2.3487 × 10−3 2.3487 × 10−3 8 6 6 2 Table 6.3: Degree 6 ℓp approximation of e−x using Lawson’s algorithm As can be seen from Tables 6.1, 6.2 and 6.3, using the Lawson algorithm with p = 1 has generated a best or near-best ℓ1 approximation. Good ℓ1 approximations are also obtained for other choices of p < 1. We have also found that these approximations successfully ignore outliers in the data, as one would expected with an ℓ1 approximation [9]. In addition we have generally observed that, as the value of p decreases, the 116 system becomes increasingly more ill-conditioned, but that the algorithm converges more quickly. For choices of p less than 1.5, we have also noticed that the resulting approximation interpolates at n + 1 data points, where n is the degree of the approximation. For p values inbetween 1.5 and 2, the approximation starts to exhibit behaviour consistent with least squares solutions. In this sense it would appear that for values 1 < p < 2 the resulting approximation is a compromise between ℓ1 and ℓ2 approximation, interpolating at some data points and approximating the remainder. Examining the behaviour of approximations for various values in this range, the transition from ℓ1 to ℓ2 approximations can clearly be seen, as shown in Figure 6.1. Here we can see the effect of outliers on some approximations fitted to data for various values of p. Modified Lawson approximations for various p values 3.5 3 Data p = 1.99 p = 1.8 p = 1 2.5 f(X) 2 1.5 1 0.5 0 −0.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 X Figure 6.1: Lawson algorithm approximations to data with outliers, for various p < 2 6.3 Application of Lawson’s algorithm to ℓp Rational approximation The study of the Rice and Usow variant of Lawsons, in addition to the previous work on the Loeb algorithm, inspired the idea of combining these two approaches into a 117 single algorithm for use in fitting ℓp rational approximations. This thought process was mainly due to the fact that both algorithms are iterative weighted least-squares methods and it seemed reasonable to try and combine the two. The development of this algorithm is further motivated by the fact that there seems to be a shortage of methods available for this type of data fitting. We have not seen a great deal of published work on algorithms specifically dealing with ℓp rational approximation, other than the work of Watson [52], who considers the problem via the Gauss-Newton method with numerator and denominator variable separation. 6.4 A combined Rice-Usow-Loeb Algorithm We consider the general ℓp rational approximation problem in which we wish to approximate a set of function values f (xi ) = fi defined on a discrete point set x = (x1 , x2 , . . . , xN ) by a polynomial ratio R(p, q, x) as defined in section 1.7. We propose to minimize the quantity kf − R(p, q, x)kp (6.14) by iteratively minimizing l wk (f Q(qlk , x) − P (plk , x)) Wkl 2 (6.15) with respect to the approximation parameters q, p, over two iteration variables k, l. In this sense we are using two weights, one of which (the w term) corresponds to the linear ℓp approximation weight from the Rice-Usow algorithm, and the other (the W term) corresponding to a Loeb type weight term defined as the denominator obtained using the denominator solution parameters obtained at the previous iteration. We update the wkl terms when iterating over k in a similar manner to that the Rice-Usow variant algorithm according to the formula p−2 (wkl |elk |) p−1 l 2−p wk+1 = wkl 3−p |elk | for p > 2 for p < 2 (6.16) 118 where elk (x) = f Q(qlk , x) − P (plk , x), with initial weights set as w11 = (6.17) 1 . N We update the Wkl when iterating over the l variable by setting Wkl+1 = 1 Q(qlk , x) (6.18) as is the case with Loeb’s algorithm. Clearly we may proceed in a number of ways by choosing the way in which to iterate over the variables k, l. We suggest the most obvious: 1. Cycle through l until convergence with k fixed and then through k with l fixed. Repeat until convergence. 2. Cycle through k until convergence with l fixed and then through l with k. Repeat until convergence. 3. Set k = l and update both weights simultaneously. We now present some numerical results from the application of the third of the above algorithms. For these results we fitted a degree (5, 5) polynomial ratio to 101 equally spaced values on [−2, 2] from the function tanh(x). Table 6.4 shows iterations to convergence for a number of different values of p. In each case the algorithm converged successfully to a pole free solution. As can be seen in Figure 6.2, the error curve from both approximations for p = 4 and p = 16 behave like a minimax error curve, with the larger p value exhibiting the equioscillation property to a larger extent than the smaller. We have observed similar bevahiour in all cases we have tested, with the error curve tending to a minimax error curve for increasingly larger values of p. This behaviour suggests that we can obtain approximately minimax rational approximations using this algorithm for large values of p. 119 p value Iterations kek2 3 10 3.3157 × 10−7 4 15 8 30 12 53 16 51 32 93 3.3902 × 10−7 3.4559 × 10−7 3.4786 × 10−7 3.4900 × 10−7 3.5070 × 10−7 Table 6.4: Degree (5, 5) ℓp rational approximation of tanh(x) using Rice-Usow-Loeb algorithm Of the 3 different variants on the proposed Rice-Usow-Loeb algorithm, we have found that the third provides the fastest convergence. All three methods give similarly good approximations, but the faster convergence of the third makes it our preferred option. To illustrate the performance of this algorithm we compared it with the method of Watson to fit ℓp rational approximations for various values of p. In each case, we fit degree (2,2) rational approximations to 51 equally spaced points from the function y = tanh(x) over the interval [0,1]. The method used by Watson was applied using the solution parameters from the Rice-Usow-Loeb algorithm at convergence. The results are displayed in table 6.4, in which we used the abbreviation RUL for Rice-UsowLoeb algorithm. From these results, we can see that the RUL algorithm performs Norm RUL Convergence Lawson kekp Watson Convergence Watson kekp ℓ3 12 1.28184 × 10−3 18 1.28180 × 10−3 ℓ8 18 ℓ16 44 6.57338 × 10−4 5.48020 × 10−4 55 113 6.57330 × 10−4 5.48016 × 10−4 Table 6.5: Comparison of approximations obtained using the method of Watson and the RUL algorithm 120 Approximation error for degree (5,5) Lawson−Loeb algorithm −8 6 x 10 4 Approximation error 2 0 −2 −4 p = 16 p=4 −6 −8 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 X Figure 6.2: Error plots of ℓ4 and ℓ16 rational approximations to tanh(x) using the Rice-Usow-Loeb algorithm. well, generating equally good approximations as the method of Watson, but also with faster convergence. 6.5 A Lawson-Loeb algorithm Up to now, we have only discussed the Rice-Usow variation of the original Lawson algorithm, used for ℓp approximations (p > 2). The original Lawson algorithm is a linear weighted least-squares method, that is proved to converge to a best linear Chebyshev approximation [31]. It is implemented as follows For a linear approximation form L(A, x), we define at iteration k, the residual ekj = fj − L(Ak , xj ), and wjk as the weight at xj , and finally, choose an initial set of weights wj1 = 1 , m where m is the number of data points we are approximating. 1. Find the best ℓ2 approximation L(A, x) to f with weights wjk at xj and calculate the residuals ekj . 121 2. Calculate new weights according to the equation wjk+1 wjk |ekj | P = m . k k j=1 wj |ej | (6.19) 3. Proceed from step 1 until the solution has converged. As with the case of the Rice-Usow-Loeb algorithm, we propose a Lawson-Loeb algorithm of the same kind, but this time using the first weight wjk+1 calculated according to (6.19), and the second using the formula Wkl+1 = 1 . Q(qlk , x) (6.20) With these weights, we can again propose the same three variants on the algorithm 1. Cycle through l until convergence with k fixed, and then through k with l fixed. Repeat until convergence. 2. Cycle through k until convergence with l fixed and then through l with k. Repeat until convergence. 3. Set k = l and update both weights simultaneously. As was the case with the Rice-Usow-Loeb algorithm, we have again found that the third option provides the fastest convergence, and is the method we would recommend. Table 6.5 shows the results of the application of this algorithm to a variety of functions, showing iterations to convergence, and approximation error of the resulting approximations. In all cases, the data points were evaluated at 101 equally spaced points on the interval [−2, 2] and approximated using a degree (5, 5) polynomial ratio. As the results indicate, even when convergence does occur, it is very slow. In some cases for the approximations in table 6.5 convergence occurred with a slight modification to the degree of the approximation (using even degrees for cos(x) for example) but again, the convergence was slow. In fact, we have observed that this algorithm 122 Function Iterations kekℓ∞ sin(x) 531 4.7141 × 10−8 cos(x) na na tanh(x) 360 e−x 4.8890 × 10−8 725 1.9652 × 10−10 cosh(x) na na cos(x) + sin(x) 1272 1.2220 × 10−7 Table 6.6: Degree (5, 5) ℓ∞ rational approximations using Lawson-Loeb algorithm is even more sensitive to a good choice of degree for the approximation than the standard Loeb algorithm or the Rice-Usow-Loeb algorithm. When convergence does occur, the resulting error curves exhibit the equioscillation property perfectly, as is shown in figure 6.3. The inconsistency of the performance of the algorithm, combined with its speed is slightly disappointing, as it had been hoped that this would provide a fast, weighted least-squares algorithm to obtain Chebyshev rational approximations. Judging by the results, it would appear that from a performance perspective, it would be better to try to obtain near best rational Chebyshev approximations using high values of p in the Rice-Usow-Lawson algorithm instead. For example, fitting the same data used in for tanh(x) using the Rice-Usow-Loeb algorithm, with a choice of p = 16, the resulting Chebyshev norm of the error is 5.0937 × 10−8, which is almost the same as that obtained using the Lawson-Loeb method, and converges in 30 steps, rather than 360 as shown table 6.5. 6.6 Conclusion We have proposed an extension of the Rice-Usow algorithm for cases p < 2, which converges quickly to near best ℓ1 linear approximations, and generally compares 123 −7 1.5 Approximation error for cos(x) x 10 1 Approximation error 0.5 0 −0.5 −1 −1.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 X Figure 6.3: Error plot of ℓ∞ rational approximation to tanh(x) using the Lawson-Loeb algorithm. favourably with approximations obtained with the Barrodale-Roberts algorithm. Also this algorithm is considerably easier to implement than the Barrodale-Roberts algorithm, although may not be favoured over it due to lack of a convergence proof. We have also observed good results with the proposed Rice-Usow-Loeb algorithm, which exhibits good convergence properties with the data that we have tested it on, and produces good ℓp approximations that compare favourably to those obtained using the optimisation methods proposed by Watson. As an extension of this, for fitting Chebyshev rational approximations, we have also proposed a Lawson-Loeb algorithm that is similar in style to the Rice-Usow-Loeb algorithm. This algorithm in general is not so successful or robust, with slow convergence, and in some cases no convergence at all. However, when it does converge, it appears to converge to an approximation whose error exhibits the equioscillation property. If this algorithm was more reliable with regards to convergence, it would be interesting to research some methods of improving the speed of convergence, but as it stands the algorithm is slow and unreliable. As a consequence of this poor performance, if it was desired to fit a Chebyshev rational approximation to data, we would recommend the use of an established algorithm 124 such as the differential correction method which has proven convergence properties. 125 Chapter 7 CONCLUSIONS 7.1 Introduction We have presented a number of algorithms and new rational approximation forms for the approximation of discrete data in general, but with particular emphasis on data that exhibits the type of asymptotic behaviour described in chapter 1. This research has been motivated by the sponsoring institution (NPL) requirement for new approximation methods to better approximate discrete data of this kind. In this final chapter, we shall summarise the research presented in this thesis, and highlight some potential areas where this work could be extended with future research. 7.2 The Semi-Infinite Rational Spline The work of chapter 3 introduces the semi-infinite rational spline approximation form, and discusses a modification to the Loeb method that provides an algorithm to fit this form to discrete data, that is straight forward to implement. The SIRS provides an approximation form that is able to have different asymptotic limits as x approaches ±∞. This makes it a useful tool in the approximation of data that has a similar asymptotic behaviour, and we have shown that we can obtain very good approximations to such data, particularly in the region where the asymptotic behaviour starts to dominate the shape of the data. In many cases such approximations are superior to those of a standard polynomial ratio fitted using the Loeb algorithm. The first very obvious extension to the work of chapter 3 is that this could be very easily extended to cover SIRS approximation in the other norms ℓ1 and ℓ∞ , as the 126 work of Barrodale and Mason [7] has shown the Loeb algorithm to be effective and easily implemented in these norms also. Similarly, it seems conceivable that the Differential Correction Method (2.3) could be modified to approximation with the SIRS, as its continuity constraints at the knot are implicitly satisfied by the definition of the function. Although this has not been looked into in great detail, it certainly seems to be a potential area of extension of the algorithm. It will be evident that this thesis has been of a very practical nature, with almost no results or proofs of existence or uniqueness of best SIRS approximations. This is another important area for future work, and would certainly make the SIRS more attractive as an approximation tool if such proofs could be established. 7.3 Asymptotic polynomial approximation The Asymptotic polynomial work of chapter 4 is another useful tool for approximation of data with asymptotic behaviour. It has the advantage of being numerically stable, by orthogonalising the numerator basis functions with respect to the denominator weight function. We have found that this work gives very good approximations, which have the advantage of being pole free on the entire real line due to the strictly positive denominator function. One way in which this work could be extended would be to consider another weight function very similar to that described in chapter 4, but having two knots instead w− (b, x) w(b, x) = w0 (b, x) w+ (b, x) of one. We define this new weight function by = = (1 + 1 k1 − λ 1 )2 ) 2 s21 (x 1 1 = k2 2 (1 + s2 (x − λ2 )2 ) 2 for x ≤ λ1 , for λ1 ≥ x ≤ λ2 , (7.1) for x ≥ λ2 , where λ1 < λ2 , and b = (s1 , s2 , k1, k2 , λ1 , λ2 )T . Approximation with asymptotic polynomials defined with weight function (7.1) is effectively fitting a linear approximation to the data on the interval [λ1 , λ2 ], and only fits 127 weighted polynomials outside this interval to enforce the correct type of asymptotic behaviour. Again we could consider the knots to be fixed or variable and it would be interesting to see if this type of weight function would give superior approximations to the original method. Another very interesting extension would be to consider approximation using bivariate asymptotic polynomials. The paper of Huhtanen and Larsen [26] describes an algorithm that generates a set of bivariate orthogonal polynomials. Unfortunately this work only deals with the case of orthogonality with respect to a weight function that is unity, which for our needs would result in bivariate linear approximation. If it were possible to extend this work to generate bivariate orthogonal polynomials with respect to arbitrary weight functions (or a small class of special functions appropriate for our needs), it may be possible to extend the asymptotic polynomial to apply to bivariate approximation. 7.4 Pole free rational approximation Chapter 5 introduces a very simple and novel approach that can be used to fit polynomial ratios that are guaranteed to be pole free over the entire real line, using the Levenberg-Marquardt algorithm. Here we have considered representing the denominator polynomial as a product of quadratics, each of which is forced to have a strictly negative discriminant, and hence complex roots, via the denominator approximation parameter definitions. Again, it would be nice if existence proofs could be established for this form, and therefore we highlight this as an area of future work. As was the case for the SIRS, it seems reasonable to assume that we could apply this approach to approximation in other norms, in particular it may be possible to modify the Differential Correction Method to fit pole free Chebyshev rational approximations. If this was possible, it may also be possible to establish convergence proofs for the algorithm applied to this form. This potential extension to other norms would seem to 128 apply to almost all of the work in this thesis, which has dealt almost exclusively with least-squares approximations in accordance with the needs of the collaborating CASE sponsor, NPL. The natural assumption would be that the work could be extended to the problem of Chebyshev rational approximation, as this is a well established field, with a great deal of research material available. Another area of interesting research would be to look at other constraints on the individual quadratic factors of the denominator. For example, if it was only necessary for the denominator to be pole free on the approximation interval itself (rather than the entire real line), then we would be faced with a less stringent set of conditions on the parameters, and a potentially more flexible approximation as a result. 7.5 ℓp Rational Approximation Chapter 6 introduces some new algorithms that are combinations of Loeb-Lawson and Rice-Usow-Loeb for ℓ∞ and ℓp rational approximation respectively. Also we have proposed an extension to the Rice-Usow algorithm for ℓp approximation for the cases p < 2. Clearly we would have liked to have produced a proof of convergence of the algorithm, so this is one area for future consideration. Similarly, would be the case for the Rise-Usow-Loeb algorithm, although we would anticipate considerable difficulty here, as the original Loeb algorithm itself has so far not been proved to converge (as far as we are aware), so finding a proof for the hybrid algorithm may well require a proof for the original Loeb method first. We found that the Loeb-Lawson algorithm was generally not so reliable with many cases of non-convergence discovered, and that for cases when it did converge, the solution was very close to the same solution as for the Rice-Usow-Loeb method using a large p value. Also the latter method converged much more quickly. Had the performance of the Loeb-Lawson algorithm been more impressive we would have considered a direct comparison with the solutions of the DCM (2.3) which converges to the best rational Chebyshev approximation. 129 Another possibility is to combine the semi-infinite rational spline work with this algorithm, thus to create an algorithm for ℓp SIRS rational approximation. 7.6 Radial basis function rational approximation Another area that was researched (although not presented in the thesis) was fitting ratios of linear combinations of radial basis functions [48]. This was only a small amount of research, fitting approximations using the Loeb algorithm, using various combinations of the abscissae as the centres of the radial basis functions. We found that the resulting approximations were generally very good, but this study did give rise to a large number of questions, such as how to choose the centres to begin with, then how to select those appearing in the numerator and those in the denominator. For linear interpolation using radial basis functions, the centres are usually chosen to be the data points themselves, but when approximating with radial basis functions, there are other methods of choosing the centres, such as clustering. After some initial research, we found that performing any significant research in this area would take the thesis into a very different direction and so after the initial encouraging results were obtained, the work was left aside to continue on work more appropriate to the rest of the thesis. However, we feel that this is a very interesting and new area to work on, but as yet we have seen no research in this area. The nature of radial basis functions makes them a popular tool for multivariate approximation and interpolation, so if some algorithms for fitting ratios of linear combinations of radial basis functions could be developed it would be a significant achievement of interest to members of the multivariate approximation, radial basis function, and rational approximation research community. 130 APPENDIX A Here we present the remainder of the contents of the paper that was jointly published by the author of this thesis. It is the first part of the paper ”Rational and ℓp Approximations - Renovating Existing Algorithms” and is published in the proceedings of the 2004 conference on Mathematical Methods for Curves and Surfaces held in Tromso. The work of this part of the paper cannot be claimed to be the sole work of the author, and was done in collaboration with Professor John Mason. It is reproduced as it appears in the conference proceedings, and is included for completeness, as it does complement the other work in this thesis, being largely concerned with obtaining some detailed error analysis for a slightly modified Loeb algorithm. Abstract In this paper we present a modified version of Loeb’s algorithm for ℓ2 rational approximation together with an error analysis and numerical results that suggest linear convergence. We also propose an extension to Lawson’s algorithm to include ℓp approximation for p < 2, and a combined Lawson-Loeb algorithm for use in generating ℓp rational approximations to discrete data. Introduction The fact that ratios of linear forms can operate in infinite ranges and can have asymptotic limits and poles make them a very desirable tool for use in the modelling of physical systems. Rational functions are particularly well suited for the modelling of functions that are known to have a particular asymptotic behaviour such as decay, 131 or singularities. In consequence the subject of rational approximation needs to be updated, as do some of the very early, but re-usable algorithms. First of all we investigate the use of Loeb’s algorithm [14] in fitting least squares rational approximations to discrete data. This is a very simple yet effective iterative procedure which has been shown to converge quickly to near-best rational approximations for a large number of functions [7]. We present a modified version of this algorithm that yields an informative error analysis which, in conjuction with our numerical results suggests linear convergence. This algorithm also has the property of fitting a relative approximation, while fitting simultaneously the data values and their reciprocals. We then go on to study the Lawson algorithm. This is an iterative weighted least squares procedure, first studied by Lawson in [31] and has been proven to converge to a best linear ℓ∞ approximation on discrete data sets. The algorithm was later extended by Rice and Usow in [49] and has been proved to converge to best ℓp approximations for p > 2. We describe this algorithm and present a modification that extends its applicability to ℓp approximation for p < 2, along with numerical results to support this. Finally we propose a combined Lawson - Loeb algorithm for use in generating ℓp rational approximations on discrete data sets. There appears to be a shortage of methods available for this approximation problem and so we describe a general method which has a number of subtle variants, along with some numerical results. ℓ2 Rational approximation - Loeb’s algorithm Consider the rational approximation form R(x) = A(x) a0 φ0 (x) + . . . + an φn (x) = B(x) b0 φ0 (x) + . . . + bm φm (x) (7.2) where the φi (x) form a set of linearly independent basis functions, and {a0 , . . . , an , b0 , . . . , bm } is the set of approximation parameters. 132 We consider the problem of finding the best ℓ2 rational approximation to a given set of function values f (xi ) = fi defined on a discrete point set X = {x1 , x2 , . . . , xN }. Thus we seek the parameter vector that minimizes the quantity kf (x) − R(x)k2 . One method available for fitting rational functions to discrete data is Loeb’s algorithm [14]. This is an iterative procedure in which the quantity kwk (f Bk − Ak )k2 (7.3) is minimized with respect to the approximation parameters at the kth iteration. Here Ak and Bk respectively denote the numerator and denominator functions A(x) and B(x) obtained at the current step k. The term wk is a weight function defined as 1/Bk−1 where Bk−1 is the denominator function obtained from the previous iteration and evaluated at the data points xi . To prevent the trivial solution A(x) = B(x) ≡ 0, the set of approximation parameters is usually normalized by setting b0 = 1, and the iteration usually started with initial weight w1 = 1. In practice this is a very reliable algorithm and is attractive due to its ease of implementation and fast convergence to near-best approximations in the majority of cases, particularly when used to obtain least squares approximations [7]. We propose a modified version of Loeb’s algorithm which uses the weight function 12 1 1 wk = (7.4) + 2(f Bk−1)2 2(Ak−1 )2 in place of the original weight 1/Bk−1 . The reasons for this choice of weight function are as follows 1. The weight is suitable for relative approximation of f by f− f Bk − Ak → f Bk−1 f A B ≡ A : B fB − A . fB (7.5) It is also suitable for relative approximation of the reciprocal function f −1 by B : A f −1 − −(f −1 Ak ) − Bk → − f −1 Ak−1 f −1 B A ≡ fB − A . A (7.6) 133 Thus we have chosen the new weight w as the r.m.s of the weights 1 fB and 1 , A so as to approximate both f and f −1 . 2. w treats A and f B alike and involves both. 3. f B ≃ A and so, to a modest accuracy 1 1 ≃ . fB A w≃ (7.7) It is assumed here that both f (x) and f −1 (x) have no zeros in the domain of the data, so that the weight has no poles. We can find this approximation A B with weight (7.4) using an iterative procedure in which a Galerkin solution is obtained at step k. Consider the following inner product for functions u(x), v(x) defined by hu, vi = N X u(xi )v(xi ). (7.8) i=1 We can then obtain a solution at step k by solving the system of m + n + 1 equations: h(f Bk − Ak ), wk2 φj i = 0, j = 0, . . . , m + n (7.9) with respect to the m + n + 1 approximation parameters. This is equivalent to finding the ℓ2 solution of the overdetermined set of equations (f Bk−1)−1 (f Bk − Ak )(xi ) = 0 i = 1, . . . , N (Ak−1 )−1 (f Bk − Ak )(xi ) = 0 (7.10) i = 1, . . . , N with respect to the same set of approximation parameters. Convergence and Error Analysis At step k we obtain functions Ak and Bk that satisfy the Galerkin property (7.9). We define the following limiting functions which also satisfy the Galerkin property (7.9) A = B = w = lim Ak (7.11) lim Bk (7.12) lim wk (7.13) k→∞ k→∞ k→∞ 134 Although we have assumed existence of A,B and w, we have found that these limits exist in practice for reasonably behaved data. From (7.9) and (7.8) we have m+n X j=0 and h(f Bk − Ak ), wk2 φj iφj = 0 m+n X j=0 h(f B − A), w 2 φj iφj = 0. (7.14) (7.15) Subtracting (7.14) from (7.15) we are left with m+n X j=0 h(f δBk − δAk )w 2 + (w 2 − wk2 )(f Bk − Ak ), φj iφj = 0 (7.16) where δBk = B − Bk and δAk = A − Ak . From the definition of the weight (7.4) we have 1 1 1 1 1 2 2 + − − w − wk = 2 (f B)2 A2 (f Bk−1 )2 (Ak−1 )2 (7.17) which reduces to −δBk−1 f (2B − δBk−1 ) −δAk−1 f (2A − δAk−1 ) − . 2 2f 4 B 2 Bk−1 2A2 A2k−1 (7.18) If we neglect quadratic terms involving δ this is approximately equal to δAk−1 −δ(Ak−1 + Ek−1 ) − 2 (A + E)(Ak−1 + Ek−1 ) AA2k−1 (7.19) Ek = f Bk − Ak (7.20) E = fB − A (7.21) δEk = E − Ek . (7.22) where Since we are dealing with good approximations, the E terms are small, and so we are left with w 2 − wk2 ≃ − 2δAk−1 . AA2k−1 (7.23) 135 Substituting (7.23) into (7.16) and rearranging gives m+n X j=0 h(f δBk − δAk )w + 2w −1(f Bk − Ak ) δAk−1 , wφj iφj ≃ 0. A2k−1 (7.24) Using (7.7), equation (7.24) reduces to m+n X j=0 hδEk w, wφj iwφj − 2 m+n X j=0 hEk AδAk−1 w, wφj iwφj ≃ 0. A2k−1 (7.25) In a Galerkin space, the best linear ℓ2 approximation of degree p to a function G is given by G≃ p X i=1 hG, φj iφj (7.26) and the weighted equivalent defined by Gw ≃ p X i=1 hGw, wφj iwφj = p X i=1 hGw 2 , φj iwφj . (7.27) Also in a Galerkin space the following inequality is valid kGwk2 ≤ 2k p X i=1 hGw, wφj iwφj k2 . (7.28) Thus from (7.27) it is evident that the left hand side quantity in (7.25) is a best ℓ2 approximation to δEk w and the right hand side quantity is a best ℓ2 approximation to 2wEk δAk−1 AA−2 k−1 . Therefore δEk ≃ 2Ek δAk−1 A−1 k−1 (7.29) which, using (7.7) can be expressed as δEk δFk−1 ≃2 Ek Fk−1 (7.30) where Fk = f Bk + Ak and δFk = f δBk + δAk . Integrating both sides of (7.30) leads to 2 Ek ≃ cFk−1 (7.31) 136 for some constant of integration c. Finally, from (7.28) and (7.29) we obtain kδEk k2 ≤ 4kEk δAk−1 A−1 k2 . (7.32) We now go on to present some numerical results to support some of the above results. In all cases we have chosen a monomial basis, a quadratic numerator and denominator in the approximation form, and have chosen {xi }21 i=1 ∈ [−1, 1]. The results presented k kEk k 2 kFk−1 k kEk k 2 k kFk−1 kδEk k Ek δAk−1 2 Ak−1 2 5.5475 e-06 20.3554 2.7253 e-07 3.1809 e-06 3 6.7892 e-06 32.4337 2.0933 e-07 1.4810 e-08 1.9351 e-08 4 6.7975 e-06 32.5078 2.0910 e-07 6.1040 e-11 7.9779 e-11 5 6.7975 e-06 32.5081 2.0910 e-07 2.5109 e-13 3.2857 e-13 6 6.7975 e-06 32.5081 2.0910 e-07 1.9179 e-15 1.3506 e-15 7 6.7975 e-06 32.5081 2.0910 e-07 6.0130 e-18 0.0 5.9954 e-06 Table 7.1: Approximation of log(x + 3) using Loeb’s algorithm k kEk k 2 kFk−1 k kEk k 2 k kFk−1 kδEk k E δAk−1 2 kAk−1 2 9.3961 e-04 50.8086 1.8493 e-05 9.0611 e-05 3 9.4561 e-04 88.4315 1.0693 e-05 1.7738 e-07 2.6125 e-07 4 9.4560 e-04 88.4269 1.0694 e-05 7.4319 e-10 1.2449 e-09 5 9.4560 e-04 88.4269 1.0694 e-05 3.5068 e-12 5.5449 e-12 6 9.4560 e-04 88.4269 1.0694 e-05 2.6475 e-14 0 2.3917 e-03 Table 7.2: Approximation of ex + e−0.5x using Loeb’s algorithm 2 in Table 1 and Table 2 show that the quantities Ek and Fk−1 are proportional as in (7.31). The results also support the validity of equations (7.29) and (7.32). In our 137 testing of this algorithm, we have found that it generally converges in the same number of iterations (or less) as the existing Loeb algorithm. It also compares favourably in terms of the size of the norm of the approximation error at convergence. 138 BIBLIOGRAPHY [1] J. Abouir and A. Cuyt. Multivariate partial Newton-Padé and Newton-Padé type approximants. Journal of Approximation Theory, 72:301–316, 1993. [2] I. Anderson, J. C. Mason, and C. Ross. Extending Lawson’s algorithm to include the Huber M-estimator. In Albert Cohen, Christophe Rabut, and Larry L. Schumaker, editors, Curve and Surface Fitting, pages 1–8. Vanderbuilt University Press, 2000. [3] K. Appel. Rational approximation of decay-type functions. Nordisk Tidskr. Informationsbehandling, 2:69–75, 1962. [4] G. A. Baker and P. R. Graves-Morris. Padé Approximants : Basic Theory. Addison-Wesley, 1981. [5] M. Van Barel and A. Bultheel. Discrete linearized least squares rational approximation on the unit circle. J. Comput. Appl. Math., 50:545–563, 1994. [6] R. M. Barker, M. G. Cox, A. B. Forbes, and P. M. Harris. Software Support for Metrology Best Practice Guide 4: Modelling Discrete Data and Experimental Data Analysis. Technical report, National Physical Laboratory, Teddington, UK, 2004. [7] I. Barrodale and J. C. Mason. Two simple algorithms for discrete rational approximation. Mathematics Of Computation, 24(112):877–891, 1970. 139 [8] I. Barrodale and C. Phillips. Solution of an overdetermined system of linear equations in the Chebyshev norm. ACM Transactions on Mathematical Software, 1(3):264–270, 1975. [9] I. Barrodale and F. D. K. Roberts. An improved algorithm for discrete ℓ1 approximation. SIAM J. Numer. Anal., 10:839–848, 1973. [10] R. Boudjemaa, A. B. Forbes, P. M. Harris, and S. Langdell. Multivariate empirical models and their use in metrology. Technical Report CMSC 32/03, National Physical Laboratory, Teddington, UK, 2003. [11] E. W. Cheney. Introduction to Approximation Theory. McGraw Hill, 1966. [12] E. W. Cheney and H. L. Loeb. Two new algorithms for rational approximation. Numer. Math., 3:72–75, 1961. [13] E. W. Cheney and M. J. D. Powell. The differential correction algorithm for generalized rational functions. Constr. Approx., 3:249–256, 1987. [14] E. W. Cheney and T. H. Southard. A survey of methods for rational approximation. SIAM Review, 5(3):219–231, 1963. [15] Edwin K. P. Chong and Stanislaw H. Żak. An Introduction to Optimization. Wiley, 2001. [16] A. A. M. Cuyt. The QD-algorithm and multivariate Padé-approximants. Numer. Math, 42:259–269, 1983. [17] A. A. M. Cuyt and B. M. Verdonk. General order Newton-Padé approximants for multivariate functions. Numer. Math, 43:293–307, 1984. 140 [18] L. C. W. Dixon. Nonlinear Optimization. The English Universities Press Limited, 1972. [19] D. J. Leeming E. H. Kaufman and G. D. Taylor. Uniform approximation on subsets of [0, ∞) by reciprocals of polynomials. In Approximation Theory IV, pages 553–559. Academic Press, Inc, 1983. [20] D. J. Leeming E. H. Kaufman and G. D. Taylor. Uniform approximation on subsets of [0, ∞) by rational functions. In Approximation Theory 5, pages 407– 410. Academic Press, Inc, 1986. [21] G. E. Farin. NURB curves and surfaces, from projective geometry to practical use. A. K. Peters, Wellesley, Massachusetts, 1995. [22] G. E. Forsythe. Generation and use of orthogonal polynomials for data fitting with a digital computer. SIAM J., 5:74–88, 1957. [23] Luca Gemignani. Chebyshev rational interpolation. Numerical Algorithms, 15:91–110, 1997. [24] G. H. Golub and C. F. Van Loan. Matrix Computations. John Hopkins University Press, Baltimore, 3rd edition, 1996. [25] M. Gugat. The Newton differential correction algorithm for rational Chebyshev approximation with constrained denominators. Numerical Algorithms, 13:107– 122, 1996. [26] M. Huhtanen and R. M. Larsen. On generating discrete orthogonal bivariate polynomials. BIT, 42:393–407, 2002. 141 [27] M. J. D. Powell I. Barrodale and F. D. K. Roberts. The differential correction algorithm for rational ℓ∞ approximation. SIAM J. Numer. Anal., 9(3):493–504, 1972. [28] E. H. Kaufman Jr. and G. D. Taylor. Linearly constrained generalised rational approximation. In Charles K. Chui, L. L. Schumaker, and J. D. Ward, editors, Approximation Theory 6, volume 2, pages 353–356. Academic Press, Inc, 1989. [29] E. H. Kauffman and G. D. Taylor. Uniform approximation by rational functions having restricted denominators. J. Approx Theory, 32:9–26, 1981. [30] T. Kilgore. Rational approximation on infinite intervals. Computers and Mathematics with Appl, 48(9):1335–1344, 2004. [31] C. L. Lawson. Contributions to the theory of linear least maximum approximation, Ph.D. thesis, UCLA, 1961. [32] H. L. Loeb. On rational fraction approximations at discrete points. Convair Astronautics Applied Mathematics, ser. 9, 1957. [33] Nathaniel Macon and D. E. Dupree. Existence and uniqueness of interpolating rational functions. The American Mathematical Monthly, 69(8):751–759, 1962. [34] H. J. Maehly. Methods for fitting rational approximations, Part 1: Telescoping procedures for continued fractions. J. ACM, 7:150–162, 1960. [35] H. J. Maehly. Methods for fitting rational approximations, Parts II and III. J. ACM, 10:257–277, 1963. [36] D. W. Marquardt. An algorithm for least squares estimation of non-linear parameters. SIAM J. Appl. Math., 11:431–441, 1963. 142 [37] J. C. Mason. Some new approximations for the solution of differential equations, DPhil thesis, The Queens College, Oxford, UK, 1965. [38] J. C. Mason. Laurent-Padé approximants to four kinds of Chebyshev polynomial expansions part 1: Maehly type approximants. Journal of Numerical Algorithms, 38:3–18, 2005. [39] J. C. Mason. Laurent-Padé approximants to four kinds of Chebyshev polynomial expansions part 2: Clenshaw-Lord type approximants. Journal of Numerical Algorithms, 38:19–29, 2005. [40] J. C. Mason and D. C. Handscomb. Chebyshev Polynomials. Chapman and Hall / CRC Press, London, 2003. [41] Jorg J. Moré. The Levenberg-Marquardt algorithm: Implementation and theory. Lecture Notes on Mathematics, 630:105–116, 1977. [42] M. R. Osborne. Nonlinear least squares - the Levenberg algorithm revisited. Journal of the Australian Mathematical Society, 19:343–357, 1976. [43] P. P. Petrushev and V. A. Popov. Rational Approximation of Real Functions (Encyclopedia of mathematics and its applications: v. 28). Cambridge University Press, 1987. [44] Les Piegl and Wayne Tiller. The NURBS Book. Springer, 2nd edition, 1997. [45] Tomaso Pomentale. On discrete rational least squares approximation. Nu- merische Mathematik, 12:40–46, 1968. [46] M. J. D. Powell. Approximation Theory and Methods. Cambridge University Press, 1981. 143 [47] M. J. D. Powell. The theory of radial basis function approximation in 1990. Technical Report DAMTP/1990/NA11, University of Cambridge, Cambridge UK, 1990. [48] M. J. D. Powell. Recent research at Cambridge on radial basis functions. Technical Report DAMTP 1998/NA05, University of Cambridge, Cambridge UK, 1998. [49] J. R. Rice and K. H. Usow. The Lawson algorithm and extensions. Mathematics of Computation, 22:118–127, 1968. [50] Adrian J. Shepherd. Second-Order Methods for Neural Networks. SpringerVerlag, 1997. [51] Jeiqing Tan and Yi Fang. Newton-Thiele’s rational interpolants. Numerical Algorithms, 24:141–157, 2000. [52] G. A. Watson. Discrete ℓp approximation by rational functions. In P. R. GravesMorris, E. B. Saff, and R. S. Varga, editors, Rational Approximation and Interpolation, pages 489–501. Springer-Verlag, 1983. [53] L. Wittmeyer. Rational approximation of empirical functions. Nordisk Tidskr. Informationsbehandling, 2:53–60, 1962.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement