TREE -RING BULLETIN, Vol. 41, 1981 THE SMOOTHING SPLINE: A NEW APPROACH TO STANDARDIZING FOREST INTERIOR TREE -RING WIDTH SERIES FOR DENDROCLIMATIC STUDIES EDWARD R. COOK and KENNETH PETERS Lamont -Doherty Geological Observatory Palisades, New York ABSTRACT A new approach to removing the non -climatic variance of forest interior tree -ring width series, using the smoothing spline, is described. This method is superior to orthogonal polynomials because it makes no assumptions about the shape of the curve to be used for standardization. Also, because the spline curve can range continuously from a linear least squares fit to cubic interpolation through the data, it is far more flexible than polynomials and provides a more "natural" fit. For computing the spline, we found that specifying the Lagrange multiplier p which appears in the calculus of variation solution rather than the residual variance as suggested by Reinsche was both practical and more efficient. In effect, the smoothing spline is a one -parameter family of low -pass filters defined by p. We describe the general characteristics of these filters in the time and frequency domains and compute the response functions for several of them. The smoothing spline is an excellent tree -ring standardization method because its filtering characteristics are well defined. Its utility for dendroclimatology should be considerable since, outside of semiarid environments, sites similar to forest interiors predominate. Es wird ein neuer Ansatz zur Beseitigung der nicht -klimatisch bedingten Varianz aus den Jahrringfolgen von Bäumen aus dem Bestandesinneren mit Hilfe von beschrieben. Dieses Verfahren ist der Berechnung von es keine Annahmen über die zur Standardisierung benötigte Kurvenform macht. Da die Spline -Kurve kontinuierlich von einem linearen Ausgleich auf der Grundlage der kleinsten Abweichungsquadrate bis zu einer kubischen Interpolation reichen kann, ist sie weitaus flexibler als Polynome und führt zu einer "natürlichen" Anpassung. Wir haben herausgefunden, daß die Vorgabe des Lagrange- Multiplikators p, der bei der Lösung der Variation vorkommt, zur Berechnung des Spline praktikabler und wirksamer ist als die Vorgabe der Restvarianz, wie Reinsche vorschlägt. In der Tat ist der Ausgleichsspline eine Familie von Einparameter -Tiefpassfiltern, die durch p definiert werden. Wir beschreiben die allgemeinen Eigenschaften dieser Filter im Zeit und Frequenzbereich und berechnen für einige von ihnen die Response- Funktionen. Ausgleichs -Splines orthogonalen Polynomen überlegen, da Der Ausgleichsspline ist ein sehr gutes Verfahren zur Standardisierung von Jahrringen, da seine Filtereigenschaften gut definiert sind. Seine Einsatzmöglichkeit in der Dendroklimatologie dürfte beträchtlich sein, da außerhalb der semi- ariden Standorte solche dominieren, die dem Bestandesinneren ähnlich sind. Une nouvelle approche destinée à ôter la variance non climatique contenue dans les séries dendrochronologiques provenant de l'intérieur de zones forestières par l'utilisation de fonctions "spline" est décrite. Cette méthode est supérieure à celle basée sur les fonctions polynomials orthogonales parce qu'elle ne fait aucune hypothèse concernant la forme de la courbe qui doit être utilisée pour la standardisation. De plus, la courbe d'approximation engendrée par une fonction spline peut varier continuellement depuis un lissage linéaire calculé par les moindres carrés jusqu'à une interpolation cubique au travers des données. De ce fait, cette équation est bien plus souple que les polynomiales et procure des approximations plus "naturelles ". Pour calculer la fonction spline, nous avons trouvé que spécifier le multiplicateur p de LAGRANGE plutôt que la variance résiduelle comme le propose REINSCHE est à la fois pratique et plus efficace. En effet, le spline de lissage est une famille à un paramètre de filtres passe -bas défini par p. Nous décrivons les caractères généraux de 46 COOK AND PETERS ces filtres dans le temps ainsi que les domaines de fréquence et nous calculons la fonction -réponse pour plusieurs d'entre eux. Le lissage par spline est une méthode de standardisation dendrochronologique ' parce que ses caractéristiques filtrantes sont bien définies. Son utilité en dendroclimatologie devrait étre considérable puisqu'en dehors des régions semi- arides, les sites correspondant à l'intérieur des foréts sont prédominants. INTRODUCTION In the semiarid environments of western North America, trees growing at or near the upper and lower forest borders are generally unaffected by stand competition and disturbance because of the wide spacing between neighboring trees. The rate of annual radial growth of trees growing in such open- canopy environments generally declines in an orderly fashion with increasing age to a relatively stationary mean level. Because this declining growth rate is biological in origin, it must be removed from each tree -ring series before the final composite tree -ring chronology can be used to study variations in past climate. The process of modeling and removing such non climatic "noise" is known as standardization (Fritts 1971). Simple linear regressions and modified negative exponential curves (Fritts et al. 1969) are commonly used to standardize ring -width series for semiarid site trees. When one moves to closed- canopy forest interior sites common to the deciduous forests of eastern North America, the non- climatic component in annual ring -width series becomes increasingly complex and variable due to competition between trees for light and nutrients and from stand disturbances. Such additional noise is difficult to model because it is often episodic in nature. The standardization techniques used for semiarid site tree growth lack sufficient flexibility for removing the nonclimatic variance of forest interior ring -width series. This problem is not trivial since it can severely limit the potential of dendroclimatic research in eastern North America and Europe. One approach towards minimizing this problem has been the use of orthogonal polynomials of varying degrees to model and remove non -climatic variance (Fritts 1976). Because of occasional dissatisfaction with the polynomial standardization technique, we have researched a promising alternative, the smoothing spline. BACKGROUND Splines have been used traditionally as mathematical analogues to the thin flexible strips used in drafting for interpolating new values between adjacent measurements. As such, the spline is of no use for tree -ring research since it passes through the data points on the assumption that these points are measured without error. Reinsch (1967), however, considered the case where the data points were subject to unwanted experimental error. In order to extract the underlying function from the experimental noise, he developed an algorithm for a smoothing spline. Like the cubic interpolating spline, this smoothing spline has continuous first and second derivatives. The cubic spline can be thought of as a concatenation of cubic polynomial segments that are joined together at their ends or knots. The continuity of the first and second derivatives assures that the segments are joined in a very smooth fashion. In this sense the smoothing spline is a series of piecewise cubic polynomials with a knot at each data point abscissa (Wold 1974). Splines are inherently superior to polynomials for approximating functions that are disjointed or episodic in nature. To quote Rice (1969: 123) from Wold (1974): The Smoothing Spline 47 "spline functions are the most successful approximating functions for practical Applications so far discovered. The reader may be unaware of the fact that ordinary , olynomials are inadequate in many situations. This is particularly the case when one approximates functions which arise from the physical world rather than from the mathematical world. Functions which express physical relationships are frequently of a disjointed or disassociated nature. That is to say that their behavior in one region may be totally unrelated to their behavior in another region. Polynomials, along with most other mathematical functions, have just the opposite property. Namely, their behavior in a small region determines their behavior everywhere. Splines do not suffer this handicap since they are defined piecewise, yet, for [more than 3 data points] they represent nice, smooth curves in the physical world." A major advantage of orthogonal polynomials that allows for automated curve fitting is the statistical independence between successively higher order curves. The widely -used ring -width standardization program, developed by the Laboratory of Tree -Ring Research at The University of Arizona, utilizes a testing algorithm in its orthogonal polynomial option. It accepts a given order polynomial fit when the next two higher orders do not reduce the residual variance by 5% or more. By contrast, the greatest obstacle to using the spline in an efficient automated fashion was the lack of a satisfactory criterion for specifying the degree of smoothing. THE SMOOTHING SPLINE The smoothing spline algorithm of Reinsch (1967) minimizes the total squared curvature of the spline, JXo (x)]2dx, (la) under the constraint i 2 n g(xi) - Yi 0 < (lb) SYi where yi is the input series, Syi is a series of weights, and S is a scaling parameter. The quantities Syi control the extent of smoothing and are implicitly rescaled by varying S. Reinsch suggested using for Syi a standard deviation associated with yi. We tried this but found that it led to a noticeable bias in the fits and in the resultant tree -ring indices. Also, there are good reasons for weighting all of the ring widths equally. First, the actual measurement errors are independent, of reasonably constant variance and usually negligible. Second, most of the variation in the ring widths, which is ring width dependent, is not error. For ring -width series the local standard deviation is proportional to the local mean. Roughly speaking the standardization curve should pass through the local average of the ring widths and this result is not achieved easily if mean dependent weighting is used. Thus we decided to set Syi = 1.0 for all i. The above expression then reduces to an unweighted residual sum -of- squares criterion and the spline corresponding to a given S value is computed by an iterative procedure. For a given data set (with all Syi equal to 1.0) the spline fit is determined by one parameter, S, which we scaled to be a fraction, s', of the variance of the data about the mean. COOK AND PETERS 48 A better parameter for spline selection was found by examining the calculus of variation solution of the Reinsch problem. Each spline can be defined uniquely also by the value of the Lagrange multiplier p that is associated with the constraint (lb). The base 10 logarithms of p range from + co to -cc, but virtually all the corresponding variation in S occurs between + 2.00 and -10.00 for our data. The positive extreme defines an interpolating spline and the negative a simple linear least squares fit to the data. S (and s') increases monotonically with 1 /p. An important property of the p versus s' relation is that it is unaffected by the mean or variance of the data and, for a time -invariant process that is sampled sufficiently often, independent of N also. That is, a particular value of p or s' corresponds uniquely to a particular fraction of variance removed from a given process. In addition, specifying p instead of s' allows us to compute the spline directly rather than iteratively, which greatly reduces the computation time per spline. 15- A Ê ,I.O 3 o 0.5 0.0 1850 1900 1950 1900 1950 1900 1950 1.5 00 , , , 1800 ' . 1850 ' YEARS E 3 as 00 as 0.0 1750 1800 1850 YEARS Figure 1. Four examples of the smoothing spline applied to forest interior ring -width series. Each spline was computed for log p = -4.0. The solid line curve is the spline fit and the dashed lined curve is the orthogonal polynomial fit. The four series are from the same site in southeastern New York and the lower two plots are cores from the same tree. The Smoothing Spline 49 Using a wide range of p values, we fit splines to many different raw ring -width series of forest interior trees. By displaying each curve fit on a cathode ray tube, the 'aptness of the fit was quickly, albeit subjectively, evaluated. By this trial and error procedure, we found that the log p value of -4.00 generally yielded a satisfactory curve fit to the data. Figure 1 illustrates four examples of this smoothing spline for log p = -4.00. For comparison, the orthogonal polynomial curve fit computed by The University of Arizona program is shown as a dashed line in each series. The four ring - width series are from trees growing within 100 meters of each other on a site in southeastern New York. Series A and B are from two different trees while series C and D are from opposite sides of a third tree. Series A is an example where the polynomial and spline fits give roughly the same solution. Here the underlying growth curve is consistent in the sense that its characteristics for the entire length are reasonably modeled by the selected polynomial equation. Series B -D, however, illustrate a major advantage of the spline. The spline curves have a "natural" flexibility due to their piecewise nature as if they were fit to the data by eye and are consistently satisfactory. The polynomial curves are either totally unsatisfactory as in C or adequate only in certain intervals as Rice (1969) described. The poorer performance of the orthogonal polynomials may in part be a function of the testing algorithm. If the required reduction of residual variance were reduced to allow for more flexibility, the polynomial curves might better approximate the splines as in series A. Higher order polynomials, however, are still subject to more constraints than splines and the residual variance test may not consistently select the best order polynomial for each data series. Because the smoothing spline can range continuously from a simple linear fit to a interpolation through the data, problems of overfitting can quickly arise. By overfitting, we mean that an excessive amount of variance, some of it climatic, has been removed from the tree -ring series. Since we have no a priori knowledge of the variance structure of the climatic signal other than that it is sort of "red" sometimes, we can best minimize the overfitting problem by comparing the spline fits of different tree -ring series from the same site. Any consistent low frequency similarities between the curves would suggest that a common signal, perhaps climate, has been removed. In Figure 1, the four splines show very little covariation through time indicating that the non -climatic variance in each series has been reasonably modeled. In cases where a general disturbance such as fire or insect infestation has affected an entire stand of trees, this approach will be more difficult to apply. TIME AND FREQUENCY DOMAIN PROPERITES The results shown in Figure 1 and others using different p values suggest that the smoothing spline behaves at least approximately as a running average; the shape of the weight function being determined by the parameter p, peaky for large p and flat for small p. Under this interpretation the smoothing spline can be characterized by an impulse response function in the time domain and by a frequency response function in the frequency domain. Figure 2 shows the results of computing three different splines for a 300 -point series consisting of a unit spike at point 150 added to a constant series of 300 1.0 values (solid lines). The weight functions are symmetric and only the central values and the leading weights are shown. Each filter is smoothly tapered and has minor side lobes that dampen out quickly. Note that the base widths of the filters are all moderate fractions COOK AND PETERS 50 of 300. When the filter base width is comparable to the number of data points, or if the spike is near the ends of the data, the shape of the response is different. For instance, the log p = -4.0 response is the same to a few parts in ten thousand foi a unit spike centered in 100 points, but for 50 points the response is noticeably different (dashed line). For spikes near the ends of the data the response is asymmetric as well. By Fourier transforming the matrix operations occurring in the computation of the spline and ignoring the finite length of the data set, a frequency response function of the form 1 u(f) = 1 1+ p(cos 2 71f + 2) 6(cos 2 irf - 1)2 (2) is derived. Several of these functions are plotted in Figure 3. The frequency at which the spline reduces the amplitude of a sine wave by 50% is shown on each curve. For example, the 50% frequency response for log p = -4.0 occurs at a period of 53 years. 1.07 1.06 1 1.04 ti 1.02 1.01 1.00 I IO 20 30 40 50 60 70 FILTER WIDTH 80 90 100 110 Figure 2. The impulse response functions of the smoothing spline for different log p values. The units of each axis are dimensionless. The dashed line filter (log p = -4.0) is from a data set with only 50 points. The Smoothing Spline 51 Also, from (2) we can compute the p value for a spline which has a 50% frequency response at a specified frequency: 6(cos 2 rf - 1)2 (cos 2 rf + 2) (3) Again, for small p values (corresponding to large base widths in the time domain) (2) and (3) do not tell the whole story. When p = 0, for instance, u(f) = 0 everywhere except at f = 0, where it equals 1.0. In the time domain this corresponds to convolving with a function that is zero everywhere. In fact the spline fit is a sloping line which according to Reinsch is a least squares fit to the original data. The time series description of the smoothing spline outlined here, and which is developed more fully in a another paper (Peters and Cook 1981), is intended to help one select the degree of smoothing objectively on the basis of the frequency response function (2) or (3) when the error variance in the data values is unknown or a meaningless concept, thus extending the applicability of the technique. In practice we have found that the qualifications regarding small values of p, although they must be borne in mind, are not a serious restriction. Another example of the end effects are shown in Figure 4. This ring -width series of an eastern hemlock is a dramatic example of the suppression and release found in some trees growing in forest interiors. Two splines were computed using a value of log p = -4.0: one for the entire series (solid line) and one for a segment (dash -dot line). The end effects are obvious but acceptably 1.0 -20 -3.0 LOG 1 1000 i 400 -1.0 FREQUENCY I I 200 100 PERIOD 50 22 I 10 5 2 IN YEARS Figure 3. Frequency response functions for several smoothing splines. The 50% frequency response in years for each spline is shown. 1 COOK AND PETERS 52 small when compared with the magnitude of the ring width variation. The spline fits the data near the end points as though a reasonable extension had been made and then a moving average filter defined by (2) had been applied. This is the cas for all the other data sets we have worked with also, for all values of p. Defining a "reasonable" extension as one in accord with these observations we have found that a reliable rule of thumb is to select a p value on the basis of (2) or (3) as though the frequency response function were an accurate description of the spline for all values of p. DENDROCLIMATIC CONSIDERATIONS We believe that the smoothing spline offers a major improvement in standardizing ring -width series that are poorly modeled with straight line or negative exponential curve fits. As a well defined class of low -pass filters, the behavior of the spline is well defined in both the time and frequency domains. A spline provides a more natural fit to the data because it operates effectively as a centrally weighted moving average on the data. Orthogonal polynomials, however, try to generalize the underlying structure of the data by operating on the entire sequence in a least squares sense. While a polynomial fit may coincide with a spline fit as in Figure 1A, it is always under more constraints that usually cause distortion in the shape of the computed growth curve. Because the filtering action of these splines is known, each tree -ring chronology should always be catalogued with information about the frequency response of the splines used in standardization. For those researchers investigating long -term climatic change or low- frequency cyclic phenomena, this information must be provided lest they arrive at biased conclusions due to the filtered nature of the data. The choice of each spline used for standardization is still subjective and must be made on the basis of both frequency and time domain considerations. We want to 2.5 - 0.5 - 0.0 1700 1750 1800 1850 1900 1950 YEARS Figure 4. An example of the effect of filter truncation at the ends of a data series. The solid line is the spline fit to the complete (1690 -1976) tree -ring series. The dash -dot line is the spline fit to the middle (1750 -1899) segment. Each spline was computed using a log p value of -4.0. The Smoothing Spline 53 preserve as much low frequency climatic variance as possible and yet remove divergent non -climatic anomalies that, in the time domain, could be wrongly interpreted as exceptional climatic events. Ideally, each spline should be "as straight as possible" and still remove most of the variance that is not in common to all tree -ring series collected from the same site. The key to this approach is adequate sampling which, for forest interior sites, means a minimum field collection of 40 increment cores from trees of the same species. By carefully comparing the splines by eye or statistically prior to merging the standardized tree -ring series into a composite site chronology, the effects of inadequate curve fits can be minimized. The log p value of -4.0 that defines a spline with a 50% frequency response of 53 years is a useful starting point for using the smoothing spline. It was generally satisfactory for the ring -width series we used for testing purposes because any lower frequency climatic variance was indistinguishable from the variance judged to be non- climatic. The latter component, being a combination of biological growth trend, changing stand density, and episodic disturbance masked more slowly varying climatic signals. There will certainly be instances, however, where more low frequency variance should be retained in the final tree -ring chronology when the configuration of the non climatic component is relatively simple. The smoothing spline is not a panacea for removing non -climatic variance in forest ring -width series. A certain amount of climatic information will always be lost due to the shape of the frequency response curves and where the signal and noise spectra overlap in the lower frequencies. These are problems common to any filtering operation. Nor will the spline allow us to relax the need for adequate sampling since good replication is still the best way to increase the signal -to -noise ratio in tree -ring chronologies. With these considerations in mind, the smoothing spline represents a highly flexible standardization technique that can be tailored to the needs of the researcher. Although developed specifically for tree -ring series from forest interior sites, its application extends to any series for which a particular model is not easily justifiable. ACKNOWLEDGEMENTS We thank Drs. W. S. Broecker, P. Stoffa, G. C. Jacoby and H. C. Fritts for comments and suggestions that improved this paper. This research was supported by Grant ATM77 -19217 from the Climate Dynamics Research Section of the National Science Foundation. Lamont -Doherty Geological Observatory Contribution No. 3283. REFERENCES Fritts, H. C. Dendroclimatology and dendroecology. Quaternary Research 1: 41949. Tree -rings and climate. Academic Press, New York. Fritts, H. C., J. E. Mosimann, and C. P. Bottorff 1969 A revised computer program for standardizing tree -ring series. Tree -Ring Bulletin 29: 15 -20. Peters, K. and E. R. Cook 1981 The cubic smoothing spline as a digital filter. Lamont -Doherty Geological Observatory of Columbia University, Technical Report #CU- 1- 81 /TRI. Reinsch, C. H. 1967 Smoothing by spline functions. Numerische Mathematik 10: 177 -83. Rice, J. R. 1969 The approximation of functions, Vol. 2. Addison -Wesley, Reading, Mass. Wold, S. 1974 Spline functions in data analysis. Technometrics 16: 1 -11. 1971 1976
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement