# Technical Report Image resampling Neil Anthony Dodgson Number 261

Technical Report UCAM-CL-TR-261 ISSN 1476-2986 Number 261 Computer Laboratory Image resampling Neil Anthony Dodgson August 1992 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 1992 Neil Anthony Dodgson This technical report is based on a dissertation submitted by the author for the degree of Doctor of Philosophy to the University of Cambridge, Wolfson College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/TechReports/ ISSN 1476-2986 3 Summary Image resampling is the process of geometrically transforming digital images. This dissertation considers several aspects of the process. We begin by decomposing the resampling process into three simpler sub-processes: reconstruction of a continuous intensity surface from a discrete image, transformation of that continuous surface, and sampling of the transformed surface to produce a new discrete image. We then consider the sampling process, and the subsidiary problem of intensity quantisation. Both these are well understood, and we present a summary of existing work, laying a foundation for the central body of the dissertation where the sub-process of reconstruction is studied. The work on reconstruction divides into four parts, two general and two speciﬁc: 1. Piecewise local polynomials: the most studied group of reconstructors. We examine these, and the criteria used in their design. One new derivation is of two piecewise local quadratic reconstructors. 2. Inﬁnite extent reconstructors: we consider these and their local approximations, the problem of ﬁnite image size, the resulting edge eﬀects, and the solutions to these problems. Amongst the reconstructors discussed are the interpolating cubic B-spline and the interpolating Bezier cubic. We derive the ﬁlter kernels for both of these, and prove that they are the same. Given this kernel we demonstrate how the interpolating cubic B-spline can be extended from a onedimensional to a two-dimensional reconstructor, providing a considerable speed improvement over the existing method of extension. 3. Fast Fourier transform reconstruction: it has long been known that the fast Fourier transform (FFT) can be used to generate an approximation to perfect scaling of a sample set. Donald Fraser (in 1987) took this result and generated a hybrid FFT reconstructor which can be used for general transformations, not just scaling. We modify Fraser’s method to tackle two major problems: its large time and storage requirements, and the edge eﬀects it causes in the reconstructed intensity surface. 4. A priori knowledge reconstruction: ﬁrst considering what can be done if we know how the original image was sampled, and then considering what can be done with one particular class of image coupled with one particular type of sampling. In this latter case we ﬁnd that exact reconstruction of the image is possible. This is a surprising result as this class of images cannot be exactly reconstructed using classical sampling theory. The ﬁnal section of the dissertation draws all of the strands together to discuss transformations and the resampling process as a whole. Of particular note here is work on how the quality of diﬀerent reconstruction and resampling methods can be assessed. 4 About this document This report is, apart from some cosmetic changes, the dissertation which I submitted for my PhD degree. I performed the research for this degree over the period January 1989 through May 1992. The page numbering is diﬀerent in this version to versions prior to 2005 because I have been able to incorporate many of the photographs which were previously unavailable in the online version. Figures 1.4, 2.1, 2.2, 4.34, 7.9, and 7.10 are either incomplete or missing, owing to the way in which the original manuscript was prepared. Appendix C is also missing for the same reason. Acknowledgements I would like to thank my supervisor, Neil Wiseman, for his help, advice, and listening ear over the course of this research. I am especially grateful for his conﬁdence in me, even when I had little of my own, and for reading this dissertation and making many helpful comments. I would also like to thank David Wheeler and Paul Heckbert for useful discussions about image resampling and related topics, which clariﬁed and stimulated my research. Over the last three and a third years several bodies have given me ﬁnancial assistance. The Cambridge Commonwealth Trust paid my fees and maintenance for the ﬁrst three years, for which I am grateful. Wolfson College, through the kind oﬃces of Dr Rudolph Hanka, and the Computer Laboratory, have both contributed towards travel, conference, and typing expenses; and I am grateful for these too. Finally, I am extremely grateful to my wife, Catherine Gibbs, for her ﬁnancial support, without which that last third of a year would have been especially diﬃcult. This dissertation was typed in by four people, not a pleasant task. Many thanks go to Lynda Murray, Eileen Murray, Annemarie Gibbs, and Catherine Gibbs for taking this on, and winning! The Rainbow graphics research group deserve many thanks for their support and friendship over my time in Cambridge; they have made the research much more enjoyable. Thanks go to Olivia, Stuart, Jeremy, Heng, Alex, Adrian, Dan, Tim, Rynson, Nicko, Oliver, Martin, Jane, and Neil. Equal, if not more, thanks go to Margaret Levitt and Eileen Murray, who have helped to keep me sane; and also to the other inmates of Austin 4 and 5 for the interesting working environment they provide. Final heartfelt thanks go to my parents and to Catherine, to whom this dissertation is dedicated. Their support, and encouragement, has been invaluable in getting me to this point. Contents Notation 8 Preface 9 1 Resampling 11 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Sources of discrete images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Resampling: the geometric transformation of discrete images . . . . . . . . . . . . 16 1.5 The practical uses of resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 Optical image warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.7 Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.8 How it really works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.9 The literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2 Sampling 33 2.1 Converting the continuous into the discrete . . . . . . . . . . . . . . . . . . . . . . 33 2.2 The problem of discrete intensities . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Producing a discrete image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4 The sampling process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.5 Area vs point sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.6 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.7 Anti-aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.8 Sampling which considers the reconstruction method . . . . . . . . . . . . . . . . . 49 2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 Reconstruction 52 3.1 Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.2 Skeleton reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5 6 CONTENTS 3.3 Perfect interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.4 Piecewise local polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.5 What should we look for in a NAPK reconstructor? . . . . . . . . . . . . . . . . . 72 3.6 Cubic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.7 Higher orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4 Inﬁnite Extent Reconstruction 98 4.1 Introduction to inﬁnite reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.2 Edge eﬀects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 4.3 Sinc and other perfect interpolants . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4 Gaussian reconstructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 4.5 Interpolating cubic B-spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.6 Local approximations to inﬁnite reconstructors . . . . . . . . . . . . . . . . . . . . 142 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5 Fast Fourier Transform Reconstruction 145 5.1 The reconstructed intensity surface . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.2 DFT sample rate changing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 5.3 FFT intensity surface reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . 154 5.4 Problems with the FFT reconstructor . . . . . . . . . . . . . . . . . . . . . . . . . 156 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6 A priori knowledge reconstruction 171 6.1 Knowledge about the sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 6.2 Knowledge about the image content . . . . . . . . . . . . . . . . . . . . . . . . . . 174 6.3 One-dimensional sharp edge reconstruction . . . . . . . . . . . . . . . . . . . . . . 175 6.4 Two-dimensional sharp edge reconstruction . . . . . . . . . . . . . . . . . . . . . . 191 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 7 Transforms & Resamplers 211 7.1 Practical resamplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.2 Transform dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 7.3 Evaluation of reconstructors and resamplers . . . . . . . . . . . . . . . . . . . . . . 215 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 8 Summary & Conclusions 228 8.1 Major results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 8.2 Other results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 CONTENTS 8.3 7 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 A Literature Analysis 230 B Proofs 238 B.1 The inverse Fourier transform of a box function . . . . . . . . . . . . . . . . . . . . 238 B.2 The Fourier transform of the sinc function . . . . . . . . . . . . . . . . . . . . . . . 238 √ B.3 Derivation of a = −2 + 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 B.4 The Straightness Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 B.5 A signal cannot be bandlimited in both domains . . . . . . . . . . . . . . . . . . . 242 B.6 The Fourier transform of a ﬁnite-extent comb . . . . . . . . . . . . . . . . . . . . . 244 B.7 The not-a-knot end-condition for the B-spline . . . . . . . . . . . . . . . . . . . . . 245 B.8 Timing calculations for the B-spline kernel . . . . . . . . . . . . . . . . . . . . . . . 245 B.9 The error in edge position due to quantisation . . . . . . . . . . . . . . . . . . . . . 246 C Graphs 248 D Cornsweet’s ﬁgures 262 Glossary 266 Bibliography 268 Notation There is little specialised notation in this thesis. We draw your attention to the following notational conventions: Spatial domain functions tend to be represented by lowercase letters, for example: g and f (x), or by lowercase words, for example: sinc(x). Frequency domain functions tend to be represented by the equivalent uppercase letters, for example: G and F (ν), or by capitalised words, for example: Box(ν). Domains are represented by calligraphic uppercase letters, for example: Z (the integers) and R (the real numbers). This non-standard notation is due to the inability of our typesetter to generate the correct notation. We use three functions which generate an integer from a real number: ceiling(x) gives the nearest integer greater than, or equal to, x; ﬂoor(x) gives the nearest integer less than, or equal to, x; round(x) gives the nearest integer to x. These are represented in equations as shown below: ceiling(x) = x ﬂoor(x) = x round(x) = x Some of the derivations required the use of a book of tables. We used Dwight [1934]. References to Dwight are written thus: [Dwight, 850.1], giving the number of the appropriate entry in Dwight’s book. The term continuous is used in this dissertation in a non-standard way: a continuous function is one that is deﬁned at all points in the real plane, but is not necessarily continuous in the mathematical sense. A more rigourous description for this use of continuous would be everywhere deﬁned. The concept of mathematical continuity is represented in this dissertation by the term C0-continuous. There is a glossary of terms at the back of the dissertation, just before the bibliography. 8 Preface Digital image resampling is a ﬁeld that has been in existence for over two decades. However, when we began this research in 1989, all work on image resampling existed in conference proceedings, journals, and technical reports. Since that time, the subject has come more into the public eye. Heckbert’s Masters thesis, “Fundamentals of Texture Mapping and Image Warping” [Heckbert, 1989], brought many of the ideas together, concentrating on two-dimensional geometric mappings (speciﬁcally aﬃne, bilinear and projective) and on the necessary anti-aliasing ﬁlters. Wolberg’s “Digital Image Warping” [Wolberg, 1990], is the ﬁrst book on the subject. Its strength is that it draws most of the published work together in one place. Indeed, it grew out of a survey paper [Wolberg, 1988] on geometric transformation techniques for digital images. The book has a computational slant to it, being aimed at people who need to implement image resampling algorithms. The same year saw the publication of the long awaited successor to Foley and van Dam’s “Fundamentals of Interactive Computer Graphics”, [1982]. Foley, van Dam, Feiner and Hughes [1990] contains a good summary of several image resampling concepts and techniques [ibid. pp.611, 614, 617–646, 741–745, 758–761, 815–834, 851, 910, 976–979] in contrast with its predecessor which does not mention resampling at all. In the face of this ﬂurry of publicity what can we say of image resampling. Certainly it has been in use for over twenty years, and is now used regularly in many ﬁelds. Many of the problems have been overcome and the theory behind resampling has been more to the fore in recent years. What place, then, is there for research in the ﬁeld? Much: image resampling is a complicated process. Many factors must be taken into account when designing a resampling algorithm. A framework is required which can be used to classify and analyse resampling methods. Further, there are holes in the coverage of the subject. Certain topics have not been studied because they are not perceived as important, because the obvious or naive answer has always been used, or because no-one has yet thought to study them. In this dissertation we produce a framework which is useful for analysing and classifying resampling methods, and we ﬁt the existing methods into it. We continue by studying the parts of the framework in detail, concentrating on the reconstruction sub-process. Thus we look at piecewise quadratic reconstructors, edge eﬀects, the interpolating Bezier cubic, the kernel of the interpolating cubic B-spline, Fraser’s [1987, 1989a, 1989b] FFT method and its extensions, and a priori knowledge reconstructors; whilst not ignoring the well-trod ground of, for example, piecewise cubic reconstructors and classical sampling theory. The dissertation is divided into eight chapters, arranged as follows: 1. Resampling — we introduce the concepts of a digital image, a continuous image, and the resampling process. Resampling is decomposed into three sub-processes: reconstruction of a continuous intensity surface from a discrete image, transformation of that continuous surface, and sampling of the transformed surface to produce a new discrete image. This decomposition underpins the rest of the dissertation. 9 10 2. Sampling — the concepts of sampling are introduced, laying a foundation for the work on reconstruction and resampling as a whole. As a side issue we discuss the problem of intensity quantisation as it relates to image resampling. 3. Reconstruction — an introduction to reconstruction. Most of the chapter is given over to a discussion of piecewise local polynomial reconstructors, and the factors involved in their design. 4. Inﬁnite Extent Reconstruction — sinc, Gaussian, interpolating cubic B-spline, and interpolating Bezier cubic are all theoretically inﬁnite. We discuss these, their local approximations, and the problem of a ﬁnite image. The latter problem applies to local reconstructors, as well as inﬁnite extent ones. 5. Fast Fourier Transform Reconstruction — a method of implementing an approximation to sinc reconstruction, developed by Donald Fraser [1987]. We describe his method, and then present improvements which solve two of its major problems. 6. A priori knowledge reconstruction — in the preceding three chapters we assumed that we had no a priori knowledge about how the image was sampled, or about its contents. In this chapter we brieﬂy consider what can be achieved given a priori knowledge about how the image was sampled. The bulk of this chapter, however, is an analysis of what can be achieved given a particular set of intensity surfaces (those with areas of constant intensity separated by sharp edges) and one particular type of sampling (exact area sampling). 7. Transforms and Resamplers — where we consider transforms, and the resampling process as a whole. An important section in this chapter is the discussion of how to measure the quality of a reconstructor and of a resampler. 8. Summary and Conclusions — where all our results are drawn into one place. Chapter 1 Resampling 1.1 Introduction Digital image resampling occupies a small but important place in the ﬁelds of image processing and computer graphics. In the most basic terms, resampling is the process of geometrically transforming digital images. Resampling ﬁnds uses in many ﬁelds. It is usually a step in a larger process, seldom an end in itself and is most often viewed as a means to an end. In computer graphics resampling allows texture to be applied to surfaces in computer generated imagery, without the need to explicitly model the texture. In medical and remotely sensed imagery it allows an image to be registered with some standard co-ordinate system, be it another image, a map, or some other reference. This is primarily done to prepare for further processing. 1.2 Images There are two basic types of image: continuous and discrete. Continuous images have a real intensity at every point in the real plane. The image that is presented to a human eye or to a video camera, before any processing is done to it, is a continuous image. Discrete images only have intensity values at a set of discrete points and these intensities are drawn from a set of discrete values. The image stored in a computer display’s frame buﬀer is a discrete image. The discrete values are often called ‘pels’ or ‘pixels’ (contractions of ‘picture elements’). There are intermediate types of image, which are continuous in some dimensions but discrete in others. Wolberg [1990, p.12] gives two examples of these: where the image is continuous in space but discrete in intensity, or where it is continuous in intensity but discrete in space. A television signal is an interesting third case: it is continuous in intensity and in one spatial dimension (horizontally) but it is discrete in the other spatial dimension (vertically). A digital computer, being a discrete device, can deal only with discrete images. What we perceive in the real world are invariably continuous images (which our eyes discretise for processing). For a human to perceive a discrete image it must be displayed as a continuous image, thus always putting us one step away from the data on which the computer operates. We must accept (and compensate for) this limitation if we wish to use the power of digital computers to process images. Binkley [1990] discusses some of the philosophical problems of digital imagery. He states that ‘digital image’ is an oxymoron: “If a digital image is something one can see (by experiencing it with one’s eyes), one cannot compute it; but if one can apply mathematical operations to it, then it has no intrinsic visual manifestation.” Resampling is essentially a mathematical operation 11 12 CHAPTER 1. RESAMPLING Figure 1.1: The same image represented in three diﬀerent ways: (1) displayed on a monitor. Figure 1.2: The same image represented in three diﬀerent ways: (2) rendered in halftone by a laser printer. mapping one collection of numbers into another. The important point here is that what is seen by a human to be the input to and output from the resampling process are continuous representations of the underlying discrete images. There are many diﬀerent ways of reconstructing a continuous image from a discrete one. We must be careful about the conclusions we draw from the continuous representations of discrete images. Binkley’s ﬁgures 2 and 3 show how diﬀerent two continuous representations of the same discrete image can be. Figure 1.1 and ﬁgure 1.2 are based on Binkley’s ﬁgures, and show that the same image can give rise to very diﬀerent continuous representations. Figure 1.3 shows the discrete representation that is stored in the computer’s memory. The images we deal with are thus discrete. In general any practical discrete image that we meet will be a rectangular array of intensity values. Experimental and theoretical work is being done on hexagonal arrays [Mersereau, 1979; Wüthrich, Stucki and Ghezal, 1989; Wüthrich and Stucki, 1991] but, at present, all practical image resampling is performed on rectangular arrays. 1.3. SOURCES OF DISCRETE IMAGES 13 148 149 145 145 143 144 145 139 .. . 151 148 146 144 144 146 143 141 .. . 153 151 152 148 145 144 146 144 .. . 151 152 152 149 147 147 146 143 .. . 150 151 150 149 147 146 148 149 .. . 151 149 150 149 146 147 148 148 .. . 151 147 147 148 147 146 149 144 .. . ··· ··· ··· ··· ··· ··· ··· ··· .. . 6 0 0 4 7 0 0 0 .. . 1 0 0 0 0 0 0 0 .. . 0 0 1 1 3 4 4 2 .. . 0 3 9 14 13 13 11 18 .. . 0 7 19 26 15 13 10 14 .. . 0 5 11 15 6 5 5 5 .. . 57 31 23 40 66 38 46 64 59 42 49 53 56 36 37 38 58 37 35 45 67 53 47 54 71 60 49 70 ··· ··· ··· ··· 136 138 133 131 137 138 134 127 137 136 134 129 133 134 132 126 129 128 130 125 127 128 127 126 Figure 1.3: The same image represented in three diﬀerent ways: (3) the array of numbers held in the computers memory. (a) - "" " (b) - - - "" " (c) - "" " Figure 1.4: The three ways of producing an image: (a) capturing a real world scene; (b) rendering a scene description; and (c) processing an existing image. 1.3 Sources of discrete images There are three distinct ways of producing such a rectangular array of numbers for storage in a computer. These can be succinctly described as capturing, rendering, or processing the discrete image (ﬁgure 1.4): Image capture is where some real-world device (for example: a video camera, digital x-ray 14 CHAPTER 1. RESAMPLING Description Pattern 6 Recognition [Computer] Graphics ? Image 6 Image Processing Figure 1.5: Pavlidis [1982] ﬁgure 1.1, showing the relationship between image processing, pattern recognition (image analysis), and computer graphics. Scene Description Real-World Image Image Capture Image Rendering ? ? ? Image Processing Image Continuous World (real world) Discrete World (in the computer) Figure 1.6: The three sources of discrete imagery shown as processes. scanner, or fax machine) is used to capture and discretise some real-world (continuous) image. Image rendering is where some description of a scene is stored within the computer and this description is converted into an image by a rendering process (for example: ray-tracing). The ﬁeld of computer graphics is generally considered as encompassing all the aspects of this rendering process. Image processing is where a previously generated discrete image is processed in some way to produce a new discrete image (for example: thresholding, smoothing, or resampling). Pavlidis [1982] produced a simple diagram, reproduced in ﬁgure 1.5, showing the links between discrete images and scene descriptions. Based on his diagram, we produce ﬁgure 1.6 which shows the three sources of discrete imagery: In order for us to evaluate digital imagery we must add ourselves into the picture (ﬁgure 1.7). Here we see that, to perceive a discrete image, we need to display that image in some continuous form (be it on a cathode ray tube (CRT), on photographic ﬁlm, on paper, or wherever) and then use our visual system on that continuous image. 1.3. SOURCES OF DISCRETE IMAGES 15 Imaginary Scene ‘Coding’ ? ? ? Scene Description Real-World Image Vision Image Capture ? ? Displayed Image Vision Image Analysis ? Image Processing Image Image Display Scene Description Processing Image Rendering ? ? ? Perceived Image Perceptual World (in the mind) Continuous World (real world) Discrete World (in the computer) Figure 1.7: The complete diagram, showing the connection between images in the discrete, continuous, and perceptual worlds. To complete the diagram, consider the scene description. Such a description could be generated from an image by an image analysis process [see, for example, Rosenfeld, 1988]; or it could be generated from another scene description (for example: by converting a list of solid objects into a set of polygons in preparation for rendering, as in the Silicon Graphics pipeline machines; or Bley’s [1980] cleaning of a ‘picture graph’ to give a more compact description of the image from which the graph was created). Mantas [1987] suggests that pattern recognition is partially image analysis and partially scene description processing. A third way of generating a scene description is for a human being to physically enter it into the computer, basing the description on some imagined scene (which could itself be related to some perceived scene, a process not shown in the ﬁgures). Adding these processes to ﬁgure 1.6 gives us ﬁgure 1.7 which illustrates the relationships between these processes and the various image representations. To avoid confusion between continuous and discrete images in this dissertation, we have chosen to call a continuous image an intensity surface. It is a function mapping each point in the real plane to an intensity value, and can be thought of as a surface in three dimensions, with intensity being the third dimension. We can thus take it that, when the word ‘image’ appears with no qualiﬁcation, it means ‘discrete image’. 16 CHAPTER 1. RESAMPLING Perceived Image Vision Displayed Image Mental Transform ? Perceived Vision Image Perceptual World Display Transform ? Displayed Image Image Resample ? Display Continuous World Image Discrete World Figure 1.8: The relationship between the resampled and the original images. The two continuous versions must be perceived as transformed versions of one another. 1.4 Resampling: the geometric transformation of discrete images In image resampling we are given a discrete image and a transformation. The aim is to produce a second discrete image which looks as if it was formed by applying the transformation to the original discrete image. What is meant by ‘looks as if’ is that the continuous image generated from the second discrete image, to all intents and purposes, appears to be identical to a transformed version of the continuous image generated from the ﬁrst discrete image, as is illustrated in ﬁgure 1.8. What makes resampling diﬃcult is that, in practice, it is impossible to generate the second image by directly applying the mathematical transformation to the ﬁrst (except for a tiny set of special case transformations). The similarity between the two images can be based on visual (perceptual) criteria or mathematical criteria, as we shall see in section 3.5. If based on perceptual criteria then we should consider the perceptual images, which are closely related to the continuous images. If based on mathematical criteria the perceptual images are irrelevant and we concentrate on the similarities between the continuous images and/or between the discrete images. Whatever the ins and outs of how we think about image resampling, it reduces to a single problem: determining a suitable intensity value for each pixel in the ﬁnal image, given an initial image and a transformation1 . 1.5 The practical uses of resampling Resampling initially appears to be a redundant process. It seems to beg the question: “why not re-capture or re-render the image in the desired format rather than resampling an already existing image to produce a (probably) inferior image?” Resampling is used in preference to re-capturing or re-rendering when it is either: • the only available option; or • the preferential option. 1 c.f. Fiume’s deﬁnition of rendering, 1989, p.2. 1.5. THE PRACTICAL USES OF RESAMPLING 17 The ﬁrst case occurs where it is impossible to capture or render the image that we require. In this situation we must ‘make do’ with the image that we have got and thus use resampling to transform the image that we have got into the image that we want to have. The second case occurs where it is possible to capture or render the required image but it is easier, cheaper or faster to resample from a diﬀerent representation to get the required image. Various practical uses of image resampling are2 : • distortion compensation of optical systems. Fraser, Schowengerdt and Briggs [1985] give an example of compensating for barrel distortion in a scanning electron microscope. Gardner and Washer [1948] give an example of how diﬃcult it is to compensate for distortion using an optical process rather than a digital one (this example is covered in more detail in section 1.6). • registration to some standard projection. • registration of images from diﬀerent sources with one another; for example registering the diﬀerent bands of a satellite image with one another for further processing. • registering images for time-evolution analysis. A good example of this is recorded by Herbin et al [1989] and Venot et al [1988] where time-evolution analysis is used to track the healing of lesions over periods of weeks or months. • geographical mapping; changing from one projection to another. • photomosaicing; producing one image from many smaller images. This is often used to build up a picture of the surface of a planet from many photographs of parts of the planet’s surface. A simpler example can be found in Heckbert [1989] ﬁgures 2.9 and 2.10. • geometrical normalisation for image analysis. • television and movie special eﬀects. Warping eﬀects are commonplace in certain types of television production; often being used to enhance the transition between scenes. They are also used by the movie industry to generate special eﬀects. Wolberg [1990, sec. 7.5] discusses the technique used by Industrial Light and Magic, the special eﬀects division of Lucasﬁlm Ltd. This technique has been used to good eﬀect, for example, in the movies: Willow 3 , Indiana Jones and the Last Crusade 4 , and The Abyss 5 . Smith [1986, ch. 10] gives an overview of Industrial Light and Magic’s various digital image processing techniques. • texture mapping; a technique much used in computer graphics. It involves taking an image and applying it to a surface to control the shading of that surface in some way. Heckbert [1990b] explains that it can be applied so as to modify the surface’s colour (the more narrow meaning of texture mapping), its normal vector (‘bump mapping’), or in a myriad of other ways. Wolberg [1990, p.3] describes the narrower sense of texture mapping as mapping a two-dimensional image onto a three-dimensional surface which is then projected back on to a two-dimensional viewing screen. Texture mapping provides a way of producing visually rich and complicated imagery without the overhead of having to explicitly model every tiny feature 2 Based on a list by Braccini and Marino, [1980, p.27]. 3 One character (Raziel) changes shape from a tortoise, to an ostrich, to a tiger, to a woman. Rather than simply cross-dissolve two sequences of images, which would have looked unconvincing, two sequences of resampled images were cross-dissolved to make the character appear to change shape as the transformation was made from one animal to another. 4 The script called for a character (Donovan) to undergo rapid decomposition in a close-up of his head. In practice, three model heads were used, in various stages of decomposition. A sequence for each head was recorded and then resampled so that, in a cross-dissolve, the decomposition appeared smooth and continuous. 5 At one point in the movie, a creature composed of seawater appears. Image resampling was used in the generation of the eﬀect of a semi-transparent, animated object: warping what could be seen through the object so that there appeared to be a transparent object present. 18 CHAPTER 1. RESAMPLING in the scene [Smith, 1986, p.208]. Image resampling, as described in this dissertation, can be taken to refer to texture mapping in its narrow sense (that of mapping images onto surfaces). The other forms of texture mapping described by Heckbert [1990b] produce diﬀerent visual eﬀects and so the discussion will have little relevance to them. 1.6 Optical image warping Image warping need not be performed digitally. The ﬁeld of optical image warping has not been completely superseded by digital image warping [Wolberg, 1990, p.1]. However optical systems are limited in control and ﬂexibility. A major advantage of optical systems is that they work at the speed of light. For example Chiou and Yeh [1990] describe an optical system which will produce multiple scaled or rotated copies of a source image in less than a millionth of a second. These images can be used as input to an optical pattern recognition system. There is no way that such images could be digitally transformed this quickly, but there is a limited use for the output of this optical process. The problems of using an optical system for correction of distortion in imagery are well illustrated by Gardner and Washer [1948]. In the early 1940s, Zeiss, in Germany, developed a lens with a large amount of negative distortion, speciﬁcally for aerial photography. The negative distortion was designed into the lens in order that a wide ﬁeld of view (130◦) could be photographed without the extreme loss of illumination which would occur near the edges of the ﬁeld if no distortion were present in the lens. A photograph taken with this lens will appear extremely distorted. To compensate, Zeiss developed (in parallel with the lens) a copying or rectifying system by which an undistorted copy of the photograph could be obtained from the distorted negative. For the rectiﬁcation to be accurate the negative has to be precisely centred in the rectifying system. Further, and more importantly, the particular rectifying system tested did not totally correct the distortion. Gardner and Washer suggest the possibility that camera lens and rectiﬁer were designed to be selectively paired for use (that is: each lens must have its own rectiﬁer, rather than being able to use a general purpose rectiﬁer) and that the lens and rectiﬁer they tested may not have been such a pair. However, they also suggest that it would be diﬃcult to actually make such a lens/rectiﬁer pair so that they accurately compensate for one another. The advent of digital image processing has made such an image rectiﬁcation process far simpler to achieve. Further, digital image processing allows a broader range of operations to be performed on images. One example of such processing power is in the Geosphere project, where satellite photographs of the Earth, taken over a ten month period, were photomosaiced to produce a cloudless satellite photograph of the entire surface of the Earth [van Sant, 1992]6 . 1.7 Decomposition Image resampling is a complex process. Many factors come into play in a resampling algorithm. Decomposing the process into simpler parts provides a framework for analysing, classifying, and evaluating resampling methods, gives a better understanding of the process, and facilitates the design and modiﬁcation of resampling algorithms. One of the most important features of any useful decomposition is that the sub-processes are independent of one another. Independence of the sub-processes allows us to study each in isolation and to analyse the impact of changing one sub-process whilst holding the others constant. In chapter 7 we will discuss the equivalence 6 An interesting artifact is visible in the Antarctic section of this photomosaic: the Antarctic continent is so stretched by the projection used that resampling artifacts are visible there. However, the rest of the mosaic looks just like a single photograph — no artifacts are visible. 1.7. DECOMPOSITION Image 19 Reconstruct - Intensity Surface Sample - Image Figure 1.9: A two part decomposition of resampling into reconstruction and sampling. Reconstruct Original Image Sample Intensity Surface Resampled Image Figure 1.10: An example of a two-part decomposition (ﬁgure 1.9). The squares are the original image’s sample points which are reconstructed to make the intensity surface. The open circles are the resampled images sample points which are not evenly spaced in the reconstructed intensity surface, but are evenly spaced in the resampled image. of resamplers; that is, how two dissimilar decompositions of a resampling method can produce identical results. The simplest decomposition is into two processes: reconstruction, where a continuous intensity surface is reconstructed from the discrete image data; and sampling, where a new discrete image is generated by taking samples oﬀ this continuous intensity surface. Figure 1.9 shows this decomposition. Mitchell and Netravali [1988] follow this view as a part of the cycle of sampling and reconstruction that occurs in the creation of an image. Take, for example, a table in a ray-traced image, texture-mapped with a real wooden texture. First, some wooden texture is sampled by an image capture device to produce a digital texture image. During the course of the ray-tracing this image is reconstructed and sampled to produce the ray-traced pixel values. The ray-traced image is subsequently reconstructed by the display device to produce a continuous image which is then sampled by the retinal cells in the eye. Finally, the brain processes these samples to give us the perceived image, which may or may not be continuous depending on quite what the brain does. These many processes may be followed in ﬁgure 1.7, except that image resampling is a single process there, rather than being decomposed into two separate ones. This decomposition into reconstruction and sampling has also been proposed by Wolberg [1990, p.10], Ward and Cok [1989, p.260], Parker, Kenyon and Troxel [1983, pp.31–32], and Hou and Andrews [1978, pp.508–509]. It has been alluded to in many places, for example: by Schreiber and Troxel [1985]. It is important to note that all of these people (with the exception of Wolberg) used only single point sampling in their resampling algorithms. This decomposition becomes complicated when more complex methods of sampling are introduced and Wolberg introduces other decompositions in order to handle these cases, as explained below. Figure 1.10 gives an example of a resampling decomposed into these two sub-processes. A second simple decomposition into two parts is that espoused by Fraser, Schowengerdt and Briggs [1985, pp.23–24] and Braccini and Marino [1980, p.130]; it is also mentioned by Foley, van Dam, Feiner and Hughes [1990, pp.823–829] and Wolberg [1990]. The ﬁrst part is to transform the source or destination pixels into their locations in the destination or source image (respectively). The second part is to interpolate between the values of the source pixels near each destination pixel to provide a value for each destination pixel. This decomposition is shown in ﬁgure 1.11. 20 CHAPTER 1. RESAMPLING Image Transform - Image Interpolate - Image Figure 1.11: A two part decomposition into transformation and interpolation. The interpolation step could be thought of as a reconstruction followed by a point sample at the (transformed) destination pixel, but this further decomposition is not explicit in this model of resampling7 . Note that point sampling is assumed in the model of the resampling process and thus any more complicated sampling method cannot easily be represented. Both of these two part models are valid and useful decompositions of resampling. However, both contain the implicit assumption that only single point sampling will be used. Many resampling processes do exclusively use single point sampling. However, those which do not will not ﬁt into these models. A more general model of resampling is required if we are to use it to classify and analyse all resampling methods. It may not be immediately apparent why these models cannot embrace other sampling methods. In the case of the second decomposition, there is no place for any other sampling method. Single point sampling is built into the model. The sampling is performed as part of the interpolation step, and the assumption is that interpolation is done at a single point from each sample value. It is possible to simulate other forms of sampling by modifying the interpolation method, but such modiﬁcation is transform-speciﬁc and thus destroys the independence of the two sub-processes in the model. The ﬁrst decomposition can be expanded to include other sampling methods, but there are diﬃculties. To illustrate this let us look at exact-area sampling using a square kernel the size of one pixel. This resampling model (ﬁgure 1.9) has the transform implicitly built into the sampling stage. This works well with point sampling. A transformed point is still a point. With area sampling we have two choices: either we transform the centre of the pixel and apply the area sampler’s kernel ‘as is’ (as in ﬁgure 1.12(a)) or we transform the whole area and apply the transformed kernel (as in ﬁgure 1.12(b)). The former method is equivalent to changing the reconstructor8 in a transformation-independent way and then using point sampling. This is not what is desired. We would get the same results by retaining point sampling and using a diﬀerent reconstructor. The latter method is what is wanted, the whole area is warped by the transformation. Unfortunately, the sampler is now transform-dependent (remember that transforming a point produces a point, so a point sampler is transform-independent: although the positions of the point samples change, the shape of the sampling kernel (a point) does not; in area sampling, the position and shape of the kernel change and thus the area sampler is transform-dependent). Therefore, neither method of expanding the reconstruct/sample decomposition is particularly helpful. To produce an all-encompassing decomposition we need to split resampling into three parts. 1.7.1 A three-part decomposition Fiume [1989, pp.36-38; 1986, pp.28-30] discusses texture mapping using a continuous texture function deﬁned over a texture space [see p.37 of Fiume, 1989]. He comments: “In practice, texture mapping consists of two distinct (but often interleaved) parts: the geometric texture mapping technique itself. . . and rendering”. Rendering is a sampling process (Fiume discusses rendering in 7 The ‘interpolation’ need not be a true interpolation but may be an ‘approximation’, that is a function which does not interpolate the data points but still produces an intensity surface which reﬂects the shape implied by the data points. 8A reconstructor is a reconstruction method. 1.7. DECOMPOSITION 21 (a) (b) Figure 1.12: Two ways of extending two-part decomposition to cope with area sampling. The squares are the original image’s pixels on a regular grid. The open circles are the resampled image’s pixels at the locations they would be if they were generated by taking point samples oﬀ the intensity surface (ﬁgure 1.10). As an example, we have taken a square area sampler centred at this point. This can be implemented in two ways: (a) transform-independent sampling, the sample ﬁlter shape does not change; (b) transform-dependent sampling, the sample ﬁlter shape changes. Intensity Surface Transform- Intensity Surface Render (Sample) Image Figure 1.13: Fiume’s [1989] decomposition of resampling. He does not, however, start with an image, but with an intensity surface. Some extra process is required to produce the intensity surface from an image. great detail in his chapter 3 [Fiume, 1989, pp.67-121]). Figure 1.13 shows Fiume’s decomposition. Compare this with Mitchell and Netravali’s ‘reconstruct/sample’ decomposition (ﬁgure 1.9). These two decompositions do not have the same starting point. Fiume is interested in the generation of digital images from scene speciﬁcations (the rendering process in ﬁgure 1.13(b)), thus he starts with some continuous intensity surface. He states that this continuous intensity surface could itself be generated procedurally from some scene speciﬁcation or it could be generated from a digital image but gives no further comment on the process required to generate a continuous intensity surface from a digital image. The required process is reconstruction, as found in Mitchell and Netravali’s decomposition. Fiume, however, splits Mitchell and Netravali’s sampling step into two parts: the geometric transformation and the sampling, where ‘sampling’ can be any type of sampling, not just single point sampling. He comments: “The failure to distinguish between these [two] parts can obscure the unique problems that are a consequence of each.” The three part decomposition thus developed from these two-part decompositions is shown in ﬁgure 1.14. 1.7.2 Description of the parts A discrete image, consisting of discrete samples, is reconstructed in some way to produce a continuous intensity surface. The actual intensity surface that is produced depends solely on the image being reconstructed and on the reconstruction method. There are many diﬀerent reconstruction methods so a given image can give rise to many diﬀerent intensity surfaces. This intensity surface is then transformed to produce another intensity surface. This is a wellunderstood and simple idea, but it is the heart of resampling. Resampling itself, the geometric 22 CHAPTER 1. RESAMPLING Image Reconstruct - Intensity Surface Transform - Intensity Surface Sample - Image Figure 1.14: The three part decomposition of resampling into reconstruction, transformation, and sampling. Image Reconstruct- Intensity Surface Transform- Intensity Surface Preﬁlter- Intensity Surface Single Point Sample - Image Figure 1.15: Smith’s [1982] four part decomposition of resampling. transformation of one digital image into another (or, rather the illusion of such a transformation, as described in section 1.4) is a complex process. The transformation stage in this decomposition is applying that geometric transform to a continuous intensity surface, and is a much simpler idea. Mathematically an intensity surface can be thought of as a function, I(x, y), deﬁned over the real plane. The function value I(x, y) is the intensity at the point (x, y). When we say that the intensity surface is transformed we mean that every point, (x, y), in the real plane is transformed and that the intensity values move with the points. Thus a transform τ will move the point (x, y) to τ (x, y) and the transformed intensity surface, Iτ (x, y) will bear the following relation to the original intensity surface: Iτ (τ (x, y)) = I(x, y) This holds for one-to-many and one-to-one transforms. For many-to-one transforms some arbiter is needed to decide which of the many points mapping to a single point should donate its intensity value to that point (or whether there should be some sort of mixing of values). Catmull and Smith [1980] call this the ‘foldover’ problem because it often manifests itself in an image folding over on top of itself, perhaps several times. An example of such a transform is mapping a rectangular image of a ﬂag into a picture of a ﬂag ﬂapping in the breeze. Whatever the transform, the end result is a new intensity surface, Iτ (x, y). The ﬁnal stage in this decomposition is to sample oﬀ this new surface to produce a new, resampled image. Any sampling method can be used, whether it is single point, multiple point, or area (ﬁltered) sampling. This decomposition is general. All three parts are essential to a resampling method. It is our hypothesis that virtually any resampling method can be decomposed into these three stages. Appendix A lists the resamplers discussed in around ﬁfty of the resampling research papers, and decomposes them into these three parts. It is, however, possible to decompose resampling further. The two four-part decompositions which have been proposed are described in the following section. 1.7.3 Further decomposition The two decompositions discussed here have more parts than those discussed above and are not as general as the three-part decomposition. They can each be used to classify only a subset of all resamplers, although in the ﬁrst case this subset is large. Smith and Heckbert’s four part decomposition Smith [1982] proposed this decomposition in the early eighties and Heckbert [1989, pp.42–45] subsequently used it with great eﬀect in his development of the idea of a ‘resampling ﬁlter’. 1.7. DECOMPOSITION Image Reconstruct- Intensity Surface 23 Transform- Intensity Surface Point Sample Image Postﬁlter - Image Figure 1.16: The super-sampling, or postﬁltering decomposition. Smith’s decomposition is shown in ﬁgure 1.15. It arose out of sampling theory where it is considered desirable to preﬁlter an intensity surface before point sampling to remove any high frequencies which would alias in the sampling process. The two stages of preﬁlter and single point sample replace the single sample stage in the three-part decomposition. Not all samplers can be split up easily into a preﬁlter followed by single point sampling. Such a split is appropriate for area samplers, which Heckbert [1989] uses exclusively [ibid., sec. 3.5.1]. For multiple point samplers this split is inappropriate, especially for such as jittered sampling [Cook, 1986] and the adaptive super-sampling method. Heckbert [1989 pp.43–44] slightly alters Smith’s model by assuming that the reconstruction is one that can be represented by a ﬁlter. The majority of reconstructors can be represented as reconstruction ﬁlters, but a few cannot and so are excluded from Heckbert’s model. Two examples of these exceptions are the sharp edge reconstruction method described in section 6.4, and Olsen’s [1990] method for smoothing enlarged monochrome images. However, the idea of a reconstruction ﬁlter and a preﬁlter are important ones in the development of the concept of a resampling ﬁlter. This decomposition is useful for those resamplers where the ﬁlter idea is appropriate. See section 1.8.2 for a discussion of resampling ﬁlters. The super-sampling or postﬁltering decomposition This decomposition is only appropriate for resamplers which use point sampling. It consists of the same reconstruction and transformation stages as the three-part decomposition with the sampling stage replaced by point sampling at a high resolution and then postﬁltering the high resolution signal to produce a lower resolution output [Heckbert, 1989, pp.46–47]. This is illustrated in ﬁgure 1.16. This can be considered to be a decomposition in two separate resampling processes. In the ﬁrst, the transformed intensity surface is sampled at high resolution, while in the second (the postﬁltering) this high resolution image is resampled to a low resolution image. Area sampling does not ﬁt into this model, although high resolution single point sampling and postﬁltering is used as an approximation to area sampling. As the number of points tends to inﬁnity, point sampling plus postﬁltering tends towards area sampling, but using an inﬁnite number of point samples is not a practicable proposition. Certain types of multi-point sampling do not ﬁt into this model either, notably adaptive super-sampling. Therefore this decomposition, while useful for a subset of resampling methods, is not useful as a general decomposition. 1.7.4 Advantages of the three-part decomposition Our view is that the three-part decomposition is the most useful general decomposition for the study of resampling. This is justiﬁed below. Accordingly this thesis is divided in much the same way with chapters associated with each of the three parts. This decomposition is fairly intuitive to a person taking an overview of all resampling methods, but is not explicitly stated elsewhere. Wolberg [1990, p.6] mentions “the three components that comprise all geometric transformations in image warping: spatial transformations, resampling and antialiasing”; but these are not the three parts we have described, and it is odd that ‘resampling’ (the whole process) is considered to be a part of the whole by Wolberg. Foley et al [1990, p.820], 24 CHAPTER 1. RESAMPLING on the other hand, describe a decomposition as follows: “we wish to reconstruct the abstract image, to translate it, and to sample this translated version.” This is our three-part decomposition (ﬁgure 1.14), but for translation rather than general transformation. Other transforms naturally follow, Foley et al discuss these without explicitly generalising the decomposition. The three-part decomposition into reconstruct, transform and sample has several advantages over other decompositions: Independence of parts. Most resamplers can be split into these three independent parts so that changing any one part does not aﬀect the other two. This means that neither the reconstructor nor the sampler is dependent on the transform, and that the reconstructor is the same for all parts of the original image and the sampler the same for all parts of the resampled image. In any resampling method, some constituents of the method will depend on position in relation to the original sampling grid and some in relation to the resampled sampling grid. Placing the former in the reconstructor and the latter in the sampler makes them independent of whatever transform is applied. The transform itself is left in splendid isolation as a simple mathematical process. There exist resampling methods where the sampler or reconstructor cannot be made independent of the transform because some part of one or the other depends on both the original and the resampled images’ sampling grids. One such method is Williams’ [1983] pyramidal resampling scheme where the level in the pyramid is selected based on how an area in the resampled image plane maps into the original image plane. The choice of level for any given point (whether in the original or resampled image’s plane) is transform-dependent and thus either the reconstructor or the sampler (or perhaps both) must be transform-dependent. The transform itself still remains a simple mathematical process of mapping one plane into the other (for further discussion of Williams’ method see section 7.1.1.) It is helpful for analysis of reconstruction and sampling methods that they be transformindependent. It is also necessary that, having discussed and analysed these sub-processes, the resampling process as a whole is considered to see the overall eﬀect produced by all three sub-processes. It is useful to be able to alter one sub-process without aﬀecting or being aﬀected by either of the other sub-processes. This aids immensely in analysing the sub-processes and various resampling methods. Equivalence of sampling grids. By extracting the transformation as a separate step between reconstruction and sampling we allow the original and resampled images’ sampling grids to be identical. The main problem with the two-part reconstruct/sample decomposition (ﬁgure 1.9) is that the sampling takes place on a non-uniform grid. As pointed out in the discussion of this decomposition, this non-uniform sampling makes area sampling (and some point sampling) a complex business. Further, virtually all sampling theory and practice is directed at sampling on a regular grid (even the stochastic sampling methods are based on regular sampling grids). Keeping the transform separate from the sampler allows the sampler to be based on a regular sampling grid. The fact that the sampling grids which produce the original and resampled images are identical, simpliﬁes and uniﬁes the discussion and analysis of the sampling process. It also alleviates us of needing to consider transform-dependent sampling methods. These change as one moves around the (non-uniform) sampling grid and are tricky things to deal with. By allowing us to always use a uniform sampling grid the three-part decomposition frees us from this diﬃcult area and makes for clear, simple theory. A resampler will be spatially-variant, except in the case of very simple transformations (aﬃne transforms and scaling transforms). This decomposition removes the spatial variance from the reconstruction and sampling stages so that we need only consider their spatially invariant cases, greatly simplifying the analysis of resampling techniques. 1.7. DECOMPOSITION 25 Intensity Surface Reconstruct Resample Transform ? Intensity Surface Continuous World Image ? Sample - Image Discrete World Figure 1.17: The original image (at top) is reconstructed to give an intensity surface. This is then transformed to produce the transformed intensity surface, which is sampled, giving the resampled image. In a digital computer the entire process is performed in the discrete world, shown by the dotted arrow in this ﬁgure. Separation of format conversion. Reconstruction and sampling are opposites. Reconstruction converts a discrete image into a continuous intensity surface. Sampling converts a continuous intensity surface into a discrete image. Resampling is the complex process of geometrically transforming one digital image to produce another (see ﬁgure 1.8). This decomposition replaces that transformation with the conceptually much simpler transformation from one continuous intensity surface to another, and it adds processes to convert from the discrete image to the continuous intensity surface (reconstruction) and back again (sampling). Thus the decomposition separates the format conversion (discrete to continuous, and continuous to discrete) from the transform (the core of the process). This whole process is illustrated in ﬁgure 1.17. Generality of the decomposition. Almost all resamplers can be split into these three parts, but decomposing resampling into more parts limits the types of resamplers that can be represented, as illustrated in section 1.7.3. Appendix A lists many of the resamplers that have been discussed in the literature and decomposes them into these three parts. 1.7.5 One-dimensional and multi-pass resampling Multi-pass resampling can mean a sequence of two-dimensional resampling stages to produce the desired result. An example of this is the decomposition in ﬁgure 1.16 which can be thought of as one resampling followed by another. Such methods ﬁt easily into our framework by considering them to be a sequence of resamplings with each resampling being decomposed individually into its three component stages. Multi-pass resampling, however, usually refers to a sequence of two or more one-dimensional resamplings. For example, each horizontal scan-line in the image may be resampled in the ﬁrst pass and each vertical scan-line in the second. The overall eﬀect is that of a two-dimensional resampling. Catmull and Smith [1980] deal with this type of resampling; other researchers have suggested three- [Paeth, 1986, 1990] and four- [Weiman, 1980] pass one-dimensional resampling. 26 CHAPTER 1. RESAMPLING Image Resample - Image @ @ @ @ @ Image Resample all horizontal scan lines Image Resample Image all vertical lines Xscan XXX XXX XX XXX XXX For each scan line: Scan Line Reconstruct- Intensity Curve Transform - Intensity Curve XX X Sample Scan Line Figure 1.18: The decomposition of resampling for a two-pass one-dimensional resampler. The whole resampling decomposes into two sets of resamplings, one resampling per scan line. These then decompose into the three parts, as before. To include such resampling methods in our framework we note that the overall resampling consists of two or more resampling stages. In each of these stages each one of a set of scan-lines undergoes a one-dimensional resampling which can be decomposed into reconstruction, transformation and sampling. An example of this decomposition for a two-pass resampling is given in ﬁgure 1.18. Thus the decomposition is quite valid for multi-pass one-dimensional resampling methods. Indeed, many two-dimensional reconstructors are simple extrapolations of one-dimensional ones (for example: see section 3.4.2). Multi-pass, one-dimensional resampling tends to be faster to implement than the same twodimensional resampling. In general, though, the one-dimensional version produces a more degraded resampled image than the two-dimensional one. Fraser [1989b] and Maisel and Morden [1981] are amongst those who have examined the diﬀerence in quality between one-pass, two-dimensional and multi-pass, one-dimensional resampling. 1.7.6 Three dimensions and mixed dimensional cases Resampling theory can be simply extended to three or more dimensions (where an intensity volume or hyper-volume is reconstructed). There are still the same three sub-processes. Multi-pass resampling also generalises: three dimensional resampling, for example, can be performed as a multi-pass one-dimensional process [Hanrahan, 1990]. Finally, some resamplers involve the reconstruction of an intensity surface (or volume) of higher dimension than the image. One example is the 1 12 -dimensional reconstruction of a one-dimensional image, discussed in section 7.2. Another is Williams’ [1983] resampler: reconstructing a three dimensional intensity volume from a two dimensional image. This is discussed in detail in section 7.1.1. 1.8. HOW IT REALLY WORKS Inverse Transform Reconstruction Calculation Reconstruct Original Image 27 Transform Reconstructed Intensity Surface Sample Transformed Intensity Surface Resampled Image Figure 1.19: How resampling is implemented when it involves point sampling. For each point sample required in the process, the point is inverse transformed into the original image’s space and a reconstruction calculation performed for the point from the original image’s samples in the region of the inverse transformed point. 1.8 How it really works The theory is that a digital image is reconstructed to a continuous intensity surface, which is transformed, and then sampled on a regular grid to produce a new digital image (ﬁgure 1.14). In practice resampling cannot be done this way because a digital computer cannot contain a continuous function. Thus practical resampling methods avoid the need to contain a continuous function, instead calculating values from the original image’s data values only at the relevant points. Conceptually, the usual method of implementation is to push the sampling method back through the transformation so that no continuous surface need be created. Whilst this process is conceptually the same for an area sampler as for a point sampler, the former is practically more complicated and so we will discuss the two cases separately. 1.8.1 The usual implementation of a point sampler Each point sample that must be taken is taken at a single point on the transformed intensity surface. This point usually corresponds to exactly one point on the original intensity surface. Remember that for many-to-one transformations some arbiter must be included to decide which one of the many takes priority, or to decide how to mix the many reconstructed intensities to produce a single intensity9 . The value of the intensity surface at this point (or points) in the original image can then be computed from the original image’s data values. This process is illustrated in ﬁgure 1.19. For a single point sampler this value is assigned to the appropriate pixel in the ﬁnal image. For a multi-point sampler, several such samples are combined in some way to produce a pixel value. Thus to get the value of a point sample the following process is used: 1. Inverse transform the location of the point from the ﬁnal image’s space to the original image’s space. 2. Calculate the value of the intensity surface at this point in the original image’s space. 9 The situation where a ﬁnal intensity surface point has no corresponding original intensity surface point is discussed in the section on edge eﬀects (section 4.2). 28 CHAPTER 1. RESAMPLING Reconstruction Calculation Inverse Transform Reconstruct Original Image Transform Reconstructed Intensity Surface Sample Transformed Intensity Surface Resampled Image Figure 1.20: How resampling is implemented when it involves area sampling. For each area sample required in the process, the area is inverse transformed into the original image’s space and a reconstruction calculation performed for the whole area from the original image’s samples in the region of the inverse transformed area. Image Reconstruction ﬁlter - Intensity Surface Inverse transformed preﬁlter - Inverse transformed single point sampling Intensity - Image Surface Figure 1.21: An equivalent decomposition of an area sampler. Rather than use a transforminvariant area sampler, here we use an inverse transformed preﬁlter, followed by inverse transformed single point sampling. These two steps combine to produce the same eﬀect as a transform followed by area sampling. 1.8.2 The usual implementation of an area sampler Analogous to the point sampling case, the area of the sample is inverse transformed (taking with it, of course, the weighting information for every point in the area). This inverse transformed area will now be spatially-variant as it is now transformation-dependent. The same shaped area in diﬀerent parts of the ﬁnal image’s space will inverse transform to diﬀerently shaped areas in the original image’s space. Somehow this spatially-variant area sampler is combined with the spatially-invariant reconstructor to produce a calculation which will give a single value for each ﬁnal image pixel. The process is illustrated in ﬁgure 1.20. Heckbert [1989] explains how this combination can be achieved in the case where the reconstructor can be represented as a ﬁlter: Heckbert’s resampling ﬁlter An area sampler can be thought of as a ﬁlter (called a ‘preﬁlter’) followed by single point sampling. Thus the three-part decomposition becomes four part (see ﬁgure 1.15). If both sampling stages are pushed back through the transform we get the situation shown in ﬁgure 1.21. The space-invariant reconstruction ﬁlter can now be convolved with the generally space-variant inverse transformed preﬁlter to produce what Heckbert calls the resampling ﬁlter, shown in ﬁgure 1.22. The right hand process is a point sampler and so the intensity surface values need only be generated at those points — in a similar way to the point sampling case. Thus the area sampling resampler can be implemented as a point sampling resampler. 1.9. THE LITERATURE Image Resampling ﬁlter 29 - Intensity Surface Inverse transformed single point sampling - Image Figure 1.22: Heckbert’s [1989] resampling ﬁlter. The whole resampling is decomposed into two stages for implementation: a resampling ﬁlter composed of the convolution of the reconstruction ﬁlter and the inverse transformed preﬁlter, and inverse transformed single point sampling. 1.8.3 Comparison The major diﬀerence between a point and an area sampling process is that the calculation to produce a value is space-invariant in the former case and generally space-variant in the latter. Only in the case of an aﬃne transform will the area-sampling case produce a space-invariant calculation [Heckbert, 1989, p.46, sec. 3.4.4]. The added complexity of space-variance makes area sampling much more diﬃcult to implement than point sampling. This disadvantage must be weighed against the advantages of area sampling. 1.8.4 Other implementations Whilst those shown here are the most common implementations, it is possible to generate others. For those, relatively rare, reconstructors which cannot be represented as ﬁlters, Heckbert’s ‘resampling ﬁlter’ is not a viable implementation. Some other way of combining the reconstructor and area sampler would need to be found if such a reconstructor were to be used with an area sampler (the alternative is that this reconstructor could only ever be used with point sampling methods). On a diﬀerent tack, resamplers could be found for which the following implementation would prove possible: The original digital image is analysed to produce a digital ‘image description’ which describes an intensity surface. This image description is transformed, producing another image description which describes the transformed intensity surface. This image description can then be rendered, potentially by any sampler, to produce the ﬁnal image (see ﬁgure 1.23). Such reconstruction methods are discussed in chapter 6. 1.8.5 Comments However we think of a particular resampling process: whether as a single process; as a resampling ﬁlter that is point sampled; as reconstruction followed by sampling; or as reconstruct, transform, sample; it is the same resampler producing the same results. The implementation may appear totally diﬀerent to the theoretical description, but the results are the same. 1.9 The literature Digital image resampling is useful in many ﬁelds and so the literature relating to it is in many disparate places. Medical imaging, remote sensing, computer vision, image processing and computer graphics all make use of resampling. The practitioners in the various ﬁelds all have their own views on the subject, driven by the uses to which they put resampling. Thus we ﬁnd, for example, that 30 CHAPTER 1. RESAMPLING Scene Description Transform Analysis Original Image Transformed Scene Description Render (Sample) Reconstructed Intensity Surface Transformed Intensity Surface Resampled Image Figure 1.23: An alternative method of resampling. The image is analysed to give a scene description, which is transformed, and the transformed version rendered to produce the resampled image. in one ﬁeld (computer graphics), where distortion can be very high in resampled imagery, greater emphasis is placed on sampling methods than in another ﬁeld (registering medical images) where distortion is mild [Heckbert, 1989; Wolberg, 1990]. Digital image resampling has its beginnings in the early ’70s. Catmull [1974] was the ﬁrst in the ﬁeld of computer graphics to work on texture mapping, when he discovered a way to map images onto curved surfaces. Rifman, McKinnon, and Bernstein [Keys, 1981; Park and Schowengerdt, 1983] did some of the ﬁrst work on resampling for correction of remotely sensed image data (satellite imagery) around the same time. However none of this work made its way into the standard textbooks on image processing [Pratt, 1978; Gonzalez and Wintz, 1977; Rosenfeld and Kak, 1982] or computer graphics [Foley and van Dam, 1982] which were all produced around 1980. The exception to this is Newman and Sproull’s [1979] “Principles of Interactive Computer Graphics”, but even they only mention the subject in passing [ibid., p.265]. The image processing texts do spend time on image restoration, which is related to a priori knowledge reconstruction (chapter 6), but none on resampling. It was in the eighties that far more interest was paid to digital image resampling (only three of the 54 references listed in appendix A were published before 1980). The rush of publications was probably due to increased computer power and increased interest in raster displays; both for geometrical correction of digital images and for generation of realistic synthetic images. Heckbert [1986a] presented a good survey of texture mapping as it was to date. He built on this foundation in his masters thesis [Heckbert, 1989], which is a valuable reference for those working in texture mapping. The work on image resampling reached a milestone in 1990. In that year the ﬁrst book [Wolberg, 1990] was published on the subject and the new standard texts [Foley et al 1990; Pratt, 1991] included sections on image resampling and its related topics. “Digital Image Warping” by George Wolberg [1990] is the only existing book on digital image resampling. Prior to its publication the only source of information on image resampling was a myriad of articles scattered through technical journals and conference proceedings across half a dozen ﬁelds. Wolberg’s book draws much of this material into one place providing a “conceptually 1.10. SUMMARY 31 uniﬁed, interdisciplinary survey of the ﬁeld” [Heckbert, 1991a]. Wolberg’s book is not, however, the ﬁnal word on image resampling. His speciﬁc aim was to emphasise the computational techniques involved rather than any theoretical framework [Wolberg, 1991]. “Digital Image Warping” has its good points [McDonnell, 1991] and its bad [Heckbert, 1991b] but as the only available text it is a useful resource for the study of resampling (for a review of the book see Heckbert [1991a]; the discussion sparked by this review may be found in Wolberg [1991], McDonnell [1991] and Heckbert [1991b]). The most recent standard text, Foley, van Dam, Feiner and Hughes [1990] “Computer Graphics: Principles and Practice”, devotes some considerable space to image resampling and its related topics. Its predecessor [Foley and van Dam, 1982] made no mention at all of the subject. This reﬂects the change in emphasis in computer graphics from vector to raster graphics over the past decade. The raster display is now the most common ‘soft-copy’ output device [Sproull, 1986; Fournier and Fussell,1988], cheap raster displays having been developed in the mid nineteenseventies as an oﬀshoot of the contemporary television technology [Foley and van Dam, 1982]. Today it is true to say that “with only a few specialised exceptions, line [vector] graphics has been subsumed by... raster graphics” [Fiume, 1986, p.2]. A number of the myriad articles on resampling are analysed in appendix A. The resamplers in each article have been decomposed into their three constituent parts, which are listed in the table there. This produces some insights into research in resampling. Over half the articles consider only single point sampling, concentrating instead on diﬀerent reconstructors or transforms. Many articles examine only one of the three parts of the decomposition, keeping the other two constant. There are a signiﬁcant number which consider only a scaling transform and single point sampling. There are others which concentrate on the transforms. It is only a minority that consider the interaction between all parts of the resampler, and these are mostly in the computer graphics ﬁeld where distortions tend to be highest and hence the resampler has to be better (the higher the distortion the harder it is to resample well). Most of the resamplers surveyed ﬁt neatly into our three part decomposition, having three independent parts. A few ﬁt less neatly, in that the three parts cannot be made independent. This latter set of resamplers are those in which area sampling is approximated. They depend on an approximation in original image space to an area in resampled image space transformed back into original image space. Because of the approximation either the reconstructor or the sampler must be transformation-dependent. Only one of the resamplers in appendix A does not ﬁt well into the decomposition. This is Namane and Sid-Ahmed’s [1990] method of binary image scaling. Their method cannot be said to produce any kind of continuous intensity surface and has thus proven diﬃcult to ﬁt into the three-part decomposition. One ‘resampler’ not listed in the table is Kornfeld’s [1985, 1987] image prism. This device only performs a limited set of orthogonal transformations which exactly map pixels onto pixels. Therefore no resampling is required. In the following chapters many of these articles will be referred to in the discussion of the three parts of resampling. At the end we will draw the strands together and consider resampling as a whole. 1.10 Summary The concept of digital image resampling has been introduced. It is a process which ﬁnd uses in many ﬁelds. We have seen how it ﬁts into the larger framework of operations on images, and how the continuous image we view diﬀers from the discrete image that the computer operates on. Several decompositions of resampling have been considered. We favour the three-part decomposition into reconstruction, transformation, and sampling. The remainder of this dissertation is 32 CHAPTER 1. RESAMPLING organised around this decomposition. We concluded the chapter by considering how a resampler can be implemented, and by brieﬂy looking at the history and literature of digital image resampling. We now go on to consider the third of the sub-processes: sampling. Chapter 2 Sampling Sampling is the process of converting a continuous intensity surface into a discrete representation. As well as being the third sub-process of resampling, it is the process by which the original image is generated. Thus sampling is important in its own right as a sub-process, and also in how it it aﬀects the original image and subsequent reconstruction from that image. Conventional sampling theory deals with a regular grid of point samples, with each sample in the image being generated by a single one of these point samples. In this dissertation ‘sampling’ refers to many sampling methods besides this single point sample method. In this chapter we discuss these sampling methods and their applicability to various sampling tasks. 2.1 Converting the continuous into the discrete Sampling takes a continuous signal, one deﬁned over all space, and produces a discrete signal, one deﬁned over only a discrete set of points. In practice the two signals are only deﬁned within a region of interest. Discussion of what lies outside this region is left until chapter 4. In our case, the continuous signal is an intensity surface; that is: a three-dimensional function with two spatial dimensions and one intensity dimension where each point in the spatial plane has a single intensity. When dealing with colour, the number of dimensions increases, typically to three colour dimensions (but still only two spatial dimensions). When dealing with one or three spatial dimensions the situation also changes but the underlying principle is the same: for every spatial point there is a single intensity or colour. An intensity surface is a many-to-one mapping from a spatial domain into an intensity (or colour) domain. The discrete signal in our case is an image. It has the same number of spatial and intensity (colour) dimensions as the intensity surface but is discrete in the spatial domain. A digital computer cannot represent a continuous quantity and so the image will be discrete in intensity as well. In practice we ﬁnd that spatial discreteness is central to the resampling problem, whilst intensity discreteness is a peripheral problem which can, for the most part, be safely ignored. Why this is so is discussed in the next section. 2.2 The problem of discrete intensities Ideally, inﬁnite precision in recorded intensity is desirable, no errors would then arise from slightly imprecise intensity values. In practice, intensities are recorded to ﬁnite precision which leads to errors in the reconstructed intensity surface, whether on the screen or in the resampling process. These errors are most noticeable when only a few discrete intensity levels are available, and most 33 34 CHAPTER 2. SAMPLING Pratt [1991] ﬁg. 2.3–1(a) Figure 2.1: The minimum noticeable intensity diﬀerence (∆I/I) as a function of intensity (I) [Pratt, 1991, ﬁg. 2.3–1(a)]. It is 2% for much of the range, rising to higher values for very dark or very light intensities. image processing texts include an example showing the eﬀect on a displayed image of allowing a larger or smaller number of intensity levels [see, for example, Rosenfeld and Kak, 1982, pp.107–8 ﬁgs. 14 and 15; Gonzalez and Wintz, 1977, ﬁg. 2.8]. The extreme case is where there are only two levels: black and white. Such an image takes comparatively little storage (one bit per pixel) but has a limited usefulness. With digital half-toning techniques it can be made to simulate a wide range of grey shades, but, for resampling, it is preferable to have true shades of grey rather than simulated ones. The general case, then, is one where a signiﬁcant number of discrete intensity levels are used to represent a continuous range of intensities. But how many intensity levels are suﬃcient? The minium necessary number of intensity levels The human visual system is limited in how small an intensity change it can detect. Research suggests that, for two large areas of constant intensity, a two percent diﬀerence can be just detected [Crow, 1978, p.4]. The minimum diﬀerence that can be detected rises to a higher value for very dark or very bright areas [Pratt, 1978, pp.17–18] (see ﬁgure 2.1). It also rises when comparing small areas of constant intensity [ibid., pp.18–19, ﬁg. 2.5]. When dealing with colour images the minimum noticeable diﬀerences for pure colour information (that is with no intensity component) are much larger than those for intensity information, hence broadcast television has an intensity channel with high spatial resolution and two channels carrying the colour information at low spatial resolution [NMFPT, 1992]. Here, we shall consider only intensity. The number of intensity levels required to produce a faithful representation of an original intensity surface depends on the image itself [Gonzalez and Wintz, pp.27-28] and on the type of reconstruction. If we wish to display any image on a ‘reasonable’ display device then how many levels are required? Many researchers have answered this question and their results are given below. We are working with binary digital computers and so, sensibly, most of the answers are powers of two. Crow [1978, p.4] says that between six and eight bits (64 to 256 intensity levels) are required, depending on the quality of the display. He also calculates that the typical display would need about 162 intensity levels. This calculation is reproduced and explained later. Gonzalez and Wintz [1977, pp.19 and 26] suggest that 64 levels (six bits) may be suﬃcient in some cases but say that “to obtain displays that will appear reasonably smooth to the eye for a large class of image types, a range of over 100 intensity levels is generally required.” Rosenfeld and Kak [1982, p.106] also suggest that more than 100 levels may be needed for some applications. Finally, Foley and van Dan [1982, p.594] and Pavlidis [1982, p.39] both agree that 64 levels are generally adequate but that 256 levels may be needed for some images (Pavlidis hints that 256 2.2. THE PROBLEM OF DISCRETE INTENSITIES 35 Pratt [1991] ﬁg. 2.3–1(a) altered to include region A Figure 2.2: The minimum noticable intensity diﬀerence (∆I/I) as a function of intensity (I); in region A ∆I/I is higher that 2.35%. So long as the darkest possible intensity falls in this region, 1024 linearly spaced intensity levels will be suﬃcient for all images (after [Pratt, 1991, ﬁg. 2.3– 1(a)]). levels may not be enough for a small number of images). The consensus appears to be that many images require only 64 intensity levels and virtually none need more than 256. If we wish all images that we could possibly display to exhibit no artifacts due to intensity quantisation then we must take into account those images with large areas of slowly changing intensity at the dark ¡end of the intensity range. Crow [1978, p.4] performs a ‘back of the envelope’ calculation to see how many intensity levels are required, given the two percent minimum perceivable intensity diﬀerence mentioned earlier. He notes that the most intense spot that a typical cathode ray tube (CRT) can produce is 25 times more intense than the dimmest. If the intensity levels are exponentially distributed (that is, each is two percent brighter than the previous level) then about log 25 log 1.02 = 163 intensity levels are required. However, for work in image resampling it is important that the intensities are distributed linearly. This means that the average of two intensities is a third intensity that is perceptually halfway between the two. Thus a checkerboard of squares in any two intensities, viewed from suﬃcient distance that the individual checks are imperceptible, will appear like a plane of constant intensity of the average of the two intensities. This property is important because virtually all resampling work involves taking the weighted average of a number of intensities and this must produce the perceptually correct result. (An exponential distribution could be used if it were transformed to the linear domain before any calculations and transformed back afterwards. This is a tremendous overhead for a resampling operation and should thus be avoided.) With linearly distributed intensity levels all adjacent levels are the same distance apart, say ∆I, in intensity space. If, as above, the brightest level is 25 times as bright as the darkest and the total number of levels is n then ∆I will be: ∆I = 25 − 1 n−1 ∆I must be such that the second darkest level is two percent brighter than the darkest, therefore, in the above formula, ∆I = 0.02. This means that the number of levels, n, is around n = 1201. 1200 is considerably more than the 256 quoted earlier, and is slightly more than 210 (1024). Ten bits should however be suﬃcient, because the minimum perceivable intensity diﬀerence for the darkest levels is known to be higher than two percent. With 1024 levels the diﬀerence between the darkest and second darkest levels is 2.35%. This should be small enough provided that the darkest intensity on the screen is dark enough to be in the area of the response curve which rises above the two percent level (region A in ﬁgure 2.2). With 256 levels (eight bits) the diﬀerence is 9.41%, which may be too large. Observations on an eight bit, linearly distributed display show that it is too large: false contouring can be seen in areas of slowly varying dark intensity. 36 CHAPTER 2. SAMPLING Blinn [1990, p.85] notes the same eﬀect in digital video. In his discussion of digital video format D1 he says: “Digital video sounds like the world of the future, but I understand there’s still a bit of a problem: 8 bits don’t really give enough resolution in the darker areas to prevent contouring or banding.” Murch and Weiman [1991] have performed experiments which show that ten is the maximum number of bits required to make intensity diﬀerences between adjacent levels imperceptible, with the possible exceptions of CRTs with an extremely high maximum intensity. For any image without large areas of slowly varying dark intensities, ten bits of intensity information is too much, and so, for many images, if the intensity values are stored to a precision of eight bits, few visible errors will result from the quantisation of intensity. If they are stored to a precision of ten bits or more, practically no visible errors will arise. 2.2.1 Extra precision Resampling involves the manipulation of intensity values. As these values are used in calculations, a little extra precision may be desirable to absorb any numerical inaccuracies which arise. This is illustrated in the following diagram: To absorb For human visual consumption numerical errors Stored Precision In calculations it is also possible for intensities to be produced that are ‘whiter than white’. Such supra-white intensities have values brighter than the brightest that can be represented. Extra bits should be provided at the top of the intensity value to catch such overﬂow, otherwise the calculation can wrap around and produce spurious wrong intensities. Underﬂow will produce negative intensities which, while physically unrealisable, are a possible outcome of resampling calculations. Thus the intensities need to be stored as a signed quantity to catch such sub-black intensities, for the same reasons we needed to catch supra-white values. A sign bit must be included. This amends our diagram to: Overﬂow Human visual consumption & (displayed) Absorption sign bit Such a representation cannot catch all overﬂows, underﬂows, or numerical inaccuracies, but it will help immensely in catching most of them. As a practical example, our experimental image processing system has a generous bit allocation scheme. It uses 32 bit integers and an 8 bit display: overﬂow is allocated seven bits plus the sign bit, absorption is given fourteen and the lowest two of the ten bits required for correct display cannot be displayed: Displayed Sign bit Overﬂow Required for human visual consumption Absorption 2.3. PRODUCING A DISCRETE IMAGE 37 Render Scene Description Generate 3D continuous scene Map onto a surface in the scene Resample Image Reconstruct Intensity Surface Transform Project onto a 2D plane Intensity Surface Sample Image Project onto a 2D plane Capture Real World Scene Discrete domain Continuous domain Discrete domain Figure 2.3: The three image creation processes subdivided. On the right is the image that is created. In each case it is generated from an intensity surface by a sampling sub-process. This intensity surface is generated in a diﬀerent way for each creation process. To conclude: with these precautions it can be seen that little, if any, visible errors or artifacts will arise from intensity quantisation. Thus the fact that the intensity values are quantised can be largely ignored in our discussion. We may now proceed to discuss the more important problem of spatial discreteness. 2.3 Producing a discrete image Sampling occurs in three distinct processes: image capture, image rendering and image resampling. These are the three image creation processes described in section 1.3. In each of these a continuous intensity surface is sampled to produce a discrete image. The continuous intensity surface is generated in a diﬀerent way in each process but the sampling is fundamentally the same in all three cases, as illustrated in ﬁgure 2.3. In image capture some physical device is used to sample a real world scene. Such devices include video cameras, scanners and fax machines. Figure 2.3 divides the image capture process into two parts. This subdivision can be thought of in two ways: (a) the theoretical two-dimensional intensity surface is generated by the optics of the capture device and occurs immediately in front of the light sensors (ﬁgure 2.4(a)); (b) the theoretical two-dimensional intensity surface exists just in front of the entire capture device and the entire device, optics and light sensors included, form part of the sampling process (ﬁgure 2.4(b)). Some capture devices have no optics, in which case (a) and (b) are identical. (b) has the disadvantage that depth eﬀects caused by the optics of the system cannot be modelled as pointing the capture device at a two-dimensional intensity surface because depth eﬀects (for 38 CHAPTER 2. SAMPLING Capture Device electrical signal optics Real World Scene Real World Scene Theoretical Intensity Surface (a) light sensors Theoretical Intensity Surface (b) Figure 2.4: The two ways of subdividing the image capture process: (a) a theoretical twodimensional intensity surface is generated by the optics of the capture device, just before it is captured by the light sensors; (b) a theoretical two-dimensional intensity surface is assumed to occur just before the entire capture device, and this intensity surface is sampled by the whole device. example, depth of ﬁeld) depend on the three-dimensional layout of the real-world scene. On the other hand (a) models all eﬀects because the intensity surface is placed after the optics which process the scene into a two-dimensional function. In (b) the intensity surface could be reasonably thought of as a two-dimensional window into the three-dimensional scene. The big disadvantage of (a) is that part of the capture device is included in the sampling process and part is not. It is far preferable for the entire device to be included in the sampling process as all parts of the device can aﬀect the samples that are produced. For this reason (b) is the preferred way of thinking of this theoretical division of capture into two parts. Note that atmospheric eﬀects are here part of the scene, not of the sampling process. Wolberg and Boult [1992] discuss image creation, subdividing the process into ﬁve parts: warping before entering the capture device, blurring by the optics of the capture device, warping by the optics of the capture device, blurring by the light sensors and single point sampling. This subdivision can be placed in our diagram as follows: the warping before entering the device is considered part of the real world scene, the blurring and warping by the optics, blurring by the light sensors, and single point sampling are all part of the sampling sub-process. Alternatively, if we were to use (a) rather than (b); the blurring and warping caused by the optics would be part of the process of generating the two-dimensional intensity surface and only the blurring by the image sensors and the single point sampling would be included in the sampling process. This latter assumption is used by Boult and Wolberg in their generation of a reconstructor which corrects for the capture device’s sampling method (see section 6.1). Image rendering takes a digital scene description and generates a continuous, usually threedimensional, scene from this description. As in image capture, this three-dimensional scene is projected to a theoretical two-dimensional intensity surface which is then sampled. Again, if the sampling method generates depth eﬀects then this model is not totally accurate. However, it is a useful approximation. In image resampling there are no problems with depth eﬀects, the entire process takes place in two dimensions. In cases where resampling is used to generate part of a scene in a rendering process (the dashed arrow in ﬁgure 2.3) then depth eﬀects may come into play. These three processes cover the three ways of generating an image. Image processing in general (with the exception of image resampling) does not involve the creation of an intermediate intensity surface. Virtually all image processing operations carry a direct correspondence between input and 2.4. THE SAMPLING PROCESS 39 output pixels and can work by a discrete convolution process [see for example, Fang, Li and Ni, 1989, for a discussion of discrete and continuous convolution]. Image resampling, by comparison, does not have this direct correspondence and some process is required so that output pixels can be positioned anywhere, in relation to the input pixels. In our three-part model this process is provided by reconstructing an intensity surface, transforming this, and sampling the transformed version to produce a new image. We have seen that sampling is a very similar process for the three distinct operations of image capture, image rendering, and image resampling. It is thus important for its part in image resampling and for its part in creating the images that are resampled. We now go on to discuss this sampling process. 2.4 The sampling process The aim of sampling is to generate pixel values so that they “best represent” the two-dimensional intensity surface. [Foley et al, 1990, p.619]. This is a good concept but what is meant by ‘best represent’ ? To some it may mean that the samples obey classical sampling theory. To others it may mean that the resulting image, when displayed on a certain device, is visually as similar as possible to the two-dimensional intensity surface. These two ideas are not necessarily the same thing. In the subsequent sections we discuss both of these ideas and the tension between them. 2.4.1 Classical sampling theory The roots of sampling theory go back to 1915, with Whittaker’s work on interpolation of a set of equispaced samples. However, most people attribute the sampling theorem to Shannon [1949], who acknowledges a debt to many others in its development. Attribution of the theorem has been jointly given to Whittaker and Shannon [Gonzalez and Wintz, 1977, p.72] Shannon and Nyquist [Turkowski, 1986], and to Shannon, Kotel’nikof and Whittaker [Petersen and Middleton, 1962, p.279]. Shannon’s statement of the sampling theorem is1 : If a function f (t) contains no frequencies higher that W cps it is completely deter1 seconds apart. mined by giving its ordinates at a series of points spaced 2W 2 [Shannon, 1949, Theorem 1] . Mathematically, the sampling process is described as the product of f (t) with the comb function: comb(t) = ∞ δ(t − nT ) n=−∞ where δ(t) is the Dirac delta-function3 . This gives the sampled function: ∞ δ(t − nT ) f (t) fˆ(t) = n=−∞ 1 The theory laid out by Shannon [1949] and others is for one dimensional sampling only. Petersen and Middleton [1962] extended Shannon’s work to many dimensions. 2 cps = cycles per seconds ≡ Hertz 3 The ∞ −∞ Dirac delta function is zero everywhere except at t = 0. The area under the function is unity, that is δ(t) dt = 1. One deﬁnition of the Dirac delta function is: δ(t) = lim a→∞ √ a e−aπx 2 40 CHAPTER 2. SAMPLING Spatial Domain Frequency Domain f( x) F (ν) (a) ν x -1 2T 1 2T F ( ν ) ∗ Comb( ν ) f( x) x comb(x) (b) ν x -T -1 T T 1 T Figure 2.5: The sampling process: (a) the continuous spatial domain function, f (x), has a Fourier 1 ; (b) when it is sampled transform, F (ν), bandlimited to below half the sampling frequency, 2T (f (x)×comb(x)) its Fourier transform is convolved with Comb(ν) producing replicas of the original Fourier transform at a spacing of T1 . = ∞ (δ(t − nT )f (t − nT )) (2.1) n=−∞ This is not the same as: fˆn = f (nT ) (2.2) Equation 2.1 is a continuous function which consists of an inﬁnite sum of weighted, shifted Dirac delta functions and is zero everywhere except at t = nT, n ∈ Z. Equation 2.2 is a discrete function which is deﬁned on the set n ∈ Z. The values of the discrete function are the weights on the delta functions that make up the continuous function, that is: fˆ(t) = ∞ fˆn δ(t − nT ) n=−∞ In a computer we, of course, store the discrete version; but mathematically and theoretically we deal with the continuous one. The sampling theorem can be justiﬁed by considering the function in the frequency domain. The continuous spatial domain function, f (t), is bandlimited. That is it contains no frequencies higher than νb (W in the statement of Shannon’s theorem above). Its Fourier transform, F (ν) is thus zero outside the range (−νb , νb ). Sampling involves multiplying f (t) by comb(t). The equivalent operation in the frequency domain is to convolve F (ν) by Comb(ν), the Fourier transform of comb(t). Comb(ν) is composed of Dirac delta-functions at a spacing of T1 (for a proof of this see Marion [1991, pp.31–32]). Convolving this with F (ν) produces replicas of F (ν) at a spacing of T1 . Figure 2.5 illustrates this process. If T < 2ν1b then the copies of F (ν) will not overlap and the original F (ν) can be retrieved by multiplying F̂ (ν) by a box function: 1, |ν| < νb Box(ν) = 0, otherwise This removes all the copies of the original except for the copy centred at ν = 0. As this is the original frequency domain function, F (ν), the spatial domain function will also be perfectly reconstructed. 2.4. THE SAMPLING PROCESS 41 Frequency Domain F (ν) (a) ν -1 2T 1 2T F ( ν ) ∗ Comb( ν ) (b) ν -1 T 1 T Figure 2.6: An example of sampling at too low a frequency. Here we see the frequency domain representations of the functions. The original function, (a), is not bandlimited to within half the sampling frequency and so when it is sampled, (b), the copies overlap and add up producing the function shown by the dark line. It is impossible to recover the original function from this aliased version. Multiplying by a box function in the frequency domain is equivalent to convolving in the spatial domain by the box’s inverse Fourier transform, s(x). This can be shown to be s(x) = 2νb sinc(2νb x), (a proof of this can be found in appendix B.1). The sinc function is where sinc(x) = sin(πx) πx discussed in sections 3.3 and 4.3. If T ≥ 2ν1b then the copies of F (ν) will overlap (ﬁgure 2.6). The overlapping parts sum to produce F̂ (ν). There is no way that this process can be reversed to retrieve F (ν) and so f (t) cannot be perfectly reconstructed. If F̂ (ν) is multiplied by the box function (the perfect reconstructor) 1 and summed then the resulting function is as if F (ν) had been folded about the frequency 2T 1 (ﬁgure 2.7). For this reason ν = 2T is known as the folding frequency. The eﬀect of this folding is that the high frequencies in F (ν) alias into low frequencies. This causes artifacts in the reconstructed image which are collectively known as ‘aliasing’. In computer graphics the term aliasing is usually incorrectly used to refer to both aliasing and rastering artifacts [Pavlidis, 1990]. This point is picked up in section 2.7. Figure 2.8 gives an example of aliasing. It normally manifests as unsightly ripples, especially near sharp changes in intensity. To avoid aliasing we must ensure that the sampled intensity surface is bandlimited to below the folding frequency. If it is not then it can be preﬁltered to remove all information above this frequency. The procedure here is to multiply the intensity surface’s Fourier transform by a box function, a process known as bandlimiting the intensity surface. Foley et al [1990, ﬁg. 14.29] give an example of this procedure. This preﬁltering followed by point sampling is equivalent to an area sampling process, as is explained later. These procedures are mathematically correct and produce an image free of aliasing. However they have their drawbacks. Firstly, if an intensity surface has to be preﬁltered then the image does not represent the original intensity surface but rather the ﬁltered one. For certain applications this may be undesirable. To represent a ﬂat or linearly sloped intensity surface, inﬁnite frequencies are required. Bandlimiting prevents such surfaces from being perfectly represented, and so any ‘ﬂat’ part of a bandlimited image will be ripply. An example of this eﬀect is that any discontinuity in a band-unlimited intensity surface will have a 9% overshoot either side if it is bandlimited, no matter what the bandlimit is [Lynn and Fuerst, 1989, p.145]; ripples will also propagate out from the discontinuity, reducing in magnitude as they get farther away. Thus a bandlimited intensity 42 CHAPTER 2. SAMPLING F (ν) (a) −1 2Τ 0 1 2Τ ν F^ ( ν ) x Box ( ν ) (b) summed folded −1 2Τ 0 1 2Τ ν Figure 2.7: An example of aliasing. (a) is the frequency spectrum of the original image. If the original image is sampled and then reconstructed by multiplying its frequency spectrum by a box function then (b) is the result. The copies of the spectrum overlap and add up, as if F (ν) in (a) 1 and summed. had been folded back about 2T Figure 2.8: An example of a reconstructed intensity surface which contains aliases. These manifest themselves as ripples around the sharp edges in the image. The small image of the letter ‘P’ is the original. The large is an intensity surface reconstructed from this small original using an approximation to sinc reconstruction. 2.5. AREA VS POINT SAMPLING 43 surface is, by its very nature, ripply. This can be seen in ﬁgure 2.12(a); there is no aliasing in this ﬁgure, the intensity surface is inherently ripply due to being bandlimited. The human visual system abhors ripples, they seriously degrade an image. If however, the ripples in the reconstructed intensity surface are undetectable by the human eye then the surface looks very good. These points are picked up in chapter 7. More importantly, it is practically impossible to achieve perfect preﬁltering and so some aliasing will creep in. In an image capture device, some (imperfect) preﬁltering will occur; that is: the image capture device does not perform perfect preﬁltering. Fraser [1987] asserts that most people depend on this preﬁltering to bandlimit the intensity surface enough that little aliasing occurs. In the digital operations of rendering and resampling, perfect preﬁltering is simply impossible, ensuring that some aliasing always occurs. This is because perfect preﬁltering involves either continuous convolution by an inﬁnite sinc function or multiplication of the continuous Fourier transform by a bandlimiting function. Both of these operations are impossible in a discrete computer. Finally, perfect reconstruction is also practically impossible because it requires an inﬁnite image. Thus, whilst it would be theoretically possible to exactly recreate a bandlimited intensity surface from its samples, in practice it is impossible. In fact, most display devices reconstruct so badly that correct sampling can be a liability. Sampling which takes into account the reconstruction method is the topic of section 2.8. Before discussing this, however, we need to examine the various types of sampling. These fall into two broad categories: area samplers and point samplers. 2.5 Area vs point sampling Any sampling method will fall into one of these two categories. Area sampling produces a sample value by taking into account the values of the intensity surface over an area. These values are weighted somehow to produce a single sample value. Point sampling takes into account the values of the intensity surface only at a ﬁnite set of distinct points4 . If the set contains more than one point then these values must be combined in some way to produce a single sample value. This is not the usual description of point sampling, because ‘point sampling’ is usually used in its narrow sense: to refer to single point sampling. Fiume [1989, section 3.2.4] studied the various sampling techniques, the following sections draw partly on his work. 2.5.1 Area sampling Exact-area sampling The assumptions behind exact-area sampling are that each pixel has a certain area, that every part of the intensity surface within that area should contribute equally to the pixel’s sample value, and that any part of the intensity surface outside that area should contribute nothing. Hence, if we make the common assumption that pixels are abutting squares then for a given pixel the sample value produced by the exact area sampler will be the average intensity of the intensity surface within that pixel’s square (ﬁgure 2.9(a)) Alternately, we could assume that pixels are abutting circles [Durand, 1989] or overlapping circles [Crow, 1978] and produce the average value of the intensity surface over the relevant circular area (ﬁgure 2.9(b) and (c)). Obviously other assumptions about pixel shape are possible. Exact-area sampling is very common, and is discussed again when we look at sampling methods which take into account how the image will be reconstructed (section 2.8). 4 Fiume [1989, p.82] states that the set of point samples should be countable, or more generally, a set of measure zero. 44 CHAPTER 2. SAMPLING (a) (b) (c) Figure 2.9: Three common assumptions about pixel shape: (a) abutting squares, (b) abutting circles, and (c) overlapping circles. General area sampling The general case for area sampling is that some area is chosen, inside which intensity surface values will contribute to the pixel’s sample value. This area is usually centred on the pixel’s centre. Some weighting function is applied to this area and the pixel’s value is the weighted average over the area. Exact-area sampling is obviously a special case of this. Equivalence to preﬁltering Area sampling is equivalent to preﬁltering followed by single point sampling. In area sampling, a weighted average is taken over an area of the intensity surface. With preﬁltering a ﬁlter is convolved with the intensity surface and a single point sample is taken for each pixel oﬀ this ﬁltered intensity surface. To be equivalent the preﬁlter will be the same function as the area sample weighting function. The two concepts are diﬀerent ways of thinking about the same process. Looking back to chapter 1, Heckbert [1989] proposed that the sampling sub-process be decomposed into preﬁltering and single point sampling. Whilst this is entirely appropriate for all area samplers, not all point samplers can be represented in this way, because some point samplers cannot be represented by ﬁlters. Speciﬁc examples are adaptive super-samplers and stochastic point samplers, discussed below. 2.5.2 Point sampling Unlike an area-sampling process, a point-sampler ignores all but a countable set of discrete points to deﬁne the intensity value of a pixel. Point sampling comes in two ﬂavours: regular and irregular (or stochastic) point sampling. Wolberg [1990, sections 6.2 and 6.3] summarises the various types of point sampling; here we only outline them. Single point sampling In single point sampling, the samples are regularly spaced and each pixel’s sample value is produced by a single point. In fact this is exactly the method of sampling described in classical sampling theory (section 2.4.1). All other point sampling methods are attempts to approximate area sampling methods or attempts to reduce artifacts in the reconstructed intensity surface (often they attempt to be both). Super-sampling As in single point sampling, the samples are arranged in a regular fashion, but more that one sample is used per pixel. The pixel’s intensity value is produced by taking a weighted average of the sample values taken for that pixel. This can be seen as a direct approximation to area sampling. 2.5. AREA VS POINT SAMPLING 45 Fiume [1989, p.96, theorem 6] proves that, given a weighting function, a super-sampling method using this weighting function for its samples converges, as the number of samples increases, toward an area sampling method using the same weighting function. Such a super-sampling technique can be represented as a preﬁlter followed by single point sampling; on the other hand, the adaptive super-sampling method cannot. Adaptive super-sampling Adaptive super-sampling is an attempt to reduce the amount of work required to produce the samples. In order to produce a good quality image, many super-samples need to be taken in areas of high intensity variance but a few samples would give good results in areas of low intensity variance. Ordinary super-sampling would need to take many samples in all areas to produce a good quality image, because it applies the same sampling process for every pixel. Adaptive supersampling takes a small number of point samples for a pixel and, from these sample values, ascertains whether more samples need to be taken to produce a good intensity value for that pixel. In this way fewer samples need to be taken in areas of lower intensity variance and so less work needs to be done. Stochastic sampling Stochastic, or irregular, sampling produces a sample value for each pixel based on one or more randomly placed samples. There are obviously some bounds within which each randomly placed sample can lie. For example it may be constrained to lie somewhere within the pixel’s area, or within a third of a pixel width from the pixel’s centre. The possible locations may also be constrained by some probability distribution so that, say, a sample point has a greater chance of lying near the pixel centre than near its edge. Stochastic methods cannot be represented as a preﬁlter followed by single point sampling, because of this random nature of the sample location. Stochastic sampling is in favour amongst the rendering community because it replaces the regular artifacts which result from regular sampling with irregular artifacts. The human visual system ﬁnds these irregular artifacts far less objectionable than the regular ones and so an image of equal quality can be achieved with fewer stochastic samples than with regularly spaced samples (see Cook [1986] but also see Pavlidis’ [1990] comments on Cook’s work). Fiume [1989, p.98, theorem 7] proves that the choice of point-samples stochastically distributed according to a given probability distribution will converge to the analagous area-sampling process as the number of points taken increases, a result similar to that for super-sampling. 2.5.3 Summary Area sampling yields mathematically precise results, if we are attempting to implement some preﬁlter (section 2.4.1). However, in digital computations it may be impossible, or diﬃcult to perform area sampling because it involves continuous convolution. In resampling it is possible to implement some area samplers, at least approximately, but a general area sampler is practically impossible. The point sampling techniques have been shown to be capable of approximating the area sampling methods. Such approximations are necessary in cases where area sampling is impracticable [Fiume, 1989, p.102]. 46 CHAPTER 2. SAMPLING Figure 2.10: Two very similar images: (a) on the left, was generated by summing the ﬁrst twentynine terms of the Fourier series representation of the appropriate square wave; (b) on the right, was generated by super-sampling a perfect representation of the stripes. Figure 2.11: The Fourier transforms of the images in ﬁgure 2.10. (a) on the left and (b) on the right. The Fourier transforms are shown on a logarithmic scale. Note that the background is black (zero) in (a) but has non-zero, but small, magnitude in (b). 2.6 Practical considerations An image capture device always implements some area sampling process. There is usually some small change that can be made in the area sampler by, for example, changing the focus of the lenses or inserting a mask. Major changes involve the expensive process of designing a new device. Rendering and resampling occur in a digital computer and, for rendering, most area sampling techniques are analytically impossible and/or computationally intractable [Fiume, 1989, p.102]. Thus they cannot be implemented. The complexity of scenes in image rendering means that area sampling, where possible, can be extremely expensive to compute. Image rendering therefore makes almost exclusive use of point-sampling techniques. 2.6. PRACTICAL CONSIDERATIONS 47 Figure 2.12: The two images of ﬁgure 2.10 reconstructed using as near perfect reconstruction as possible. (a) on the left and (b) on the right. This ﬁgure shows part of the reconstructed intensity surface. Aliasing artifacts occur in (b) only Figure 2.13: The two images of ﬁgure 2.10 reconstructed using as nearest-neighbour reconstruction. (a) on the left and (b) on the right. This ﬁgure shows part of the reconstructed intensity surface. (a) contains only rastering atifacts, while (b) contains a combination of rastering and aliasing artifacts. 48 CHAPTER 2. SAMPLING Image resampling involves a simpler ‘scene’ and so both the computationally possible area samplers and the point samplers can be used. Heckbert [1989, sec. 3.5.1] dismisses point sampling techniques on the grounds that they do not eliminate aliasing. A super-sampling method shifts the folding frequency higher, but band-unlimited signals are common in texture mapping [ibid., p.47], and so, while super-sampling will reduce aliasing, it will not eliminate all aliases, no matter how many super-samples are taken. Heckbert thus considers only area sampling techniques. This argument is certainly valid for the surface texture mapping application discussed by Heckbert. However stochastic point sampling methods have done much to eliminate aliasing artifacts (by replacing them with less-objectionable noise artifacts) and many people do use point sampling methods in preference to area sampling methods simply because they are easier to implement. Further, in rendering work, it is often expedient to treat all samples identically and so, if texture mapping (a form of resampling) occurs in the scene, the samples will be taken in the same way as samples are taken in the rest of the scene: almost invariably by some point sampling technique. Wolberg [1990] covers the stochastic sampling methods, drawing on the rendering literature. However, Heckbert [1991a] found this material out of place in Wolberg’s book. We think that it is necessary because such methods are used for resampling work. Finally it is useful to realise that many practitioners in resampling use single point sampling. It appears to be the default method. We believe that this is because single point sampling is (a) simple to implement; (b) the obvious sampling method; and (c) all that is required if the image need only undergo a mild transformation. This latter point draws on the fact that the aliasing (and other) artifacts, which more sophisticated sampling methods suppress, do not manifest themselves very much under a mild transformation (that is, scale changes of ±20% at most). Some applications of resampling only require such mild transformations, (for example the medical image registration system developed by Venot et al [1988] and Herbin et al [1989]) while others require more violent transforms (for example texture mapping) and thus require more advanced sampling, especially in cases of image shrinking where the sampling method dominates the resampler (see chapter 7). 2.7 Anti-aliasing The purpose of all sampling, other than single-point sampling, is ostensibly to prevent any artifacts from occuring in the image. In fact it is usually used to prevent any artifacts from occuring in the intensity surface reconstructed from the image by the display device. This prevention is generally known in computer graphics as anti-aliasing. This is something of a misnomer as many of the artifacts do not arise from aliases. The perpetuation of the incorrect terminology is possibly due to the fact that the common solution to cure aliasing also alleviates rastering, the other main source of artifacts in images [Pavlidis, 1990, p.234]. Crow [1977] was the ﬁrst to show that aliasing (sampling a signal at too low a rate) was the cause of some of the artifacts in digital imagery. He also notes [ibid., p.800] that some artifacts are due to failing to reconstruct the signal properly. This latter problem he termed ‘rastering’. ‘Aliasing’, however, became the common term for both of these eﬀects, with some subsequent confusion, and it is only recently that the term ‘rastering’ has reappeared [Foley et al, 1990, p.641]. Mitchell and Netravali [1988] do discuss the two types of artifact as distinct eﬀects but perpetuate the terminology by naming aliasing and rastering, pre-aliasing and post-aliasing respectively. Sampling has thus been used to alleviate both types of artifact. This has an inbuilt problem that, to correct for reconstruction artifacts (rastering), one needs to know the type of reconstruction that will be performed on the image. Most images are displayed on a CRT, and all CRTs have a similar reconstruction method, so this is not too big a problem. However, when displayed on a diﬀerent device (for example a ﬁlm recorder) artifacts may appear which were not visible on the CRT because the reconstruction method is diﬀerent. Further, an image used for resampling which 2.8. SAMPLING WHICH CONSIDERS THE RECONSTRUCTION METHOD 49 has been ‘anti-aliased’ for use on a CRT may not be particularly amenable to the reconstruction method used in the resampling algorithm. Figures 2.10 through 2.13 illustrate the distinction between aliasing artifacts and reconstruction artifacts. Two images are shown of alternate dark and bright stripes. These images were specially designed so that no spurious information due to edge eﬀects would appear in their discrete Fourier transforms (that is: the Fourier transforms shown here are those of these simple patterns copied oﬀ to inﬁnity so as to ﬁll the whole plane). Figure 2.10(a) was generated by summing the ﬁrst twenty-nine terms of the Fourier series representation of the appropriate square wave as can be seen by its Fourier transform (ﬁgure 2.11(a)). Figure 2.10(b) was generated by one of the common anti-aliasing techniques. It was rendered using a 16 × 16 super-sampling grid on each pixel with the average value of all 256 super-samples being assigned to the pixel. Figure 2.11(b) shows its Fourier transform. It is similar in form to ﬁgure 2.11(a) but the aliasing can be clearly seen in the wrap around eﬀect of the line of major components, and also in that the other components are not zero, as they are in the unaliased case. When these are reconstructed using as near-perfect reconstruction as possible we get the intensity surfaces shown in ﬁgure 2.12. The ripples in ﬁgure 2.12(a) are not an aliasing artifact but the correct reconstruction of the function; it is, after all, a sum of sine waves. The ripply eﬀect in ﬁgure 2.12(b)is due to aliasing. The intensity surface that was sampled had constant shaded stripes with inﬁnitely sharp transitions between the dark and light stripes. The perfectly reconstructed intensity surface shown here perfectly reconstructs all of the aliases caused by sampling the original intensity surface. By contrast, ﬁgure 2.13 shows the intensity surfaces which result from an imperfect reconstructor: the nearest-neighbour interpolant. The artifacts in ﬁgure 2.13(a) are entirely due to rastering (the image contains no aliasing). This is the familiar blocky artifact so often attributed, incorrectly, to aliasing. The artifacts in ﬁgure 2.13(b) are due to a combination of aliasing and rastering. Oddly, it is this latter intensity surface which we tend to ﬁnd intuitively preferable. This is probably due to the areas perceived as having constant intensity in ﬁgure 2.10(b) retaining this constant intensity in ﬁgure 2.13(b). How these intensity surfaces are perceived does depend on the scale to which they are reconstructed. The ‘images’ in ﬁgure 2.10 are of course intensity surfaces but are reconstructed to one eighth the scale of those in ﬁgure 2.12 and ﬁgure 2.13. If one stands far enough back from these larger-scale ﬁgures then they all look identical. It is fascinating that the intensity surface with both types of artifact in it appears to be the prefered one. This is possibly due to the facts that (a) the human visual system is good at detecting intensity changes, hence rippling is extremely obvious; and (b) most scenes consist of areas of constant intensity or slowly varying intensity separated by fairly sharp edges, hence representing them as a bandlimited sum of sinusoids will produce what we perceive as an incorrect result: most visual scenes simply do not contain areas of sinsoidally varying intensity. Thus there is a tension between the sampling theory and the visual eﬀect of the physical reconstruction on the display device. Indeed many researchers, instead of turning to the perfect sampling theory method of sampling, turn instead to methods which take the reconstruction process into account. 2.8 Sampling which considers the reconstruction method In such sampling the reconstruction is usually onto a display device, rather than to any theoretical intensity surface. Several models of the display device have been used in the design of sampling methods. The common assumption is that each pixel represents a ﬁnite area on the screen and therefore 50 CHAPTER 2. SAMPLING should be area sampled from that area. This leads to the exact area sampling method given in section 2.5.1. Such an assumption can be found in many papers on resampling, for example Weiman [1980], Paeth [1986], Fant [1986], and Durand and Faguy [1990] all assume square (or rectangular) abutting pixels. Paeth [1990, pp.194-5] later revised his earlier work to allow for more sophisticated sampling methods. Durand [1989] used abutting circular pixels, which seems a less obvious choice, particularly as parts of the sampled intensity surface are not included in the sum for any pixel. His reasoning was to reduce the number of calculations in his resampling method (compared with abutting rectangular pixels). This assumption can also be found in Kiryati and Bruckstein [1991] but they are modelling a physical sampling device, not a physical reconstruction. Foley et al [1990, pp.621-3] show that this exact area sampling method is inadequate for sequences of images. They suggest weighted area sampling functions which cover a larger area than the pixel (for example: a circularly symmetric tent function centred at the pixel centre). Finally, it is well known that a CRT display reconstructs an image by ﬁrst convolving it with a nearest-neighbour function (the eﬀect of the digital to analogue converter) and then with a Gaussian (the eﬀect of the spot in the CRT tube) [Blinn, 1989b]. Crow [1978] used this type of model in his sampling method. He says that: “A properly adjusted digital raster display consists of regularly spaced dots which overlap by about one-half.” His conclusion was that a pixel sampled over such an area with a weighting function that peaked in the centre and fell oﬀ to the edges gave reasonable results. We thus ﬁnd that these methods are motivated by the reconstruction method rather than any sampling theory. Fortunately, they ﬁt neatly into the sampling theory concept of preﬁltering (that is: area sampling). A ﬁnal important class of samplers are those which take into account the very limited range of intensity levels of a particular image format or display device. Examples of such samplers are those which perform half-toning (for example: the PostScript half-toning algorithm, illustrated in ﬁgure 1.2) and those which use dithering to reduce quantisation error eﬀects. These methods are catering for devices with a small number of intensity levels and so do not produce images for which the assumption made in section 2.2 will hold. Reconstructing an intensity surface from such an image can produce bizarre results unless intelligence is built into the reconstruction algorithm, but they are useful where the target of the resampling is a device with a limited range of intensity levels. The PostScript image command [Adobe, 1985a, p.170, pp.72-83] essentially implements an image resampling process which uses a nearest-neighbour reconstruction technique and a half-toning sampling technique to decide which pixels are white and which are black in the image in the printer’s memory. This resampled image is then reconstructed by the printer when it produces the picture on paper. 2.9 Summary Sampling is the process of converting a continuous intensity surface into a discrete image. Sampling is used in all three processes (capturing, rendering, and resampling) which generate a discrete image, and is fundamentally the same in all three cases. Quantisation of intensity does not pose a problem, provided enough quantisation levels are available. There are two basic types of sampling: area, where a sample’s value is determined from the intensity surface’s values over some non-zero area; and point, where a sample’s value is determined from the intensity surface’s values at only a ﬁnite set of points. The main artifact caused by sampling is aliasing, not to be confused with the rastering artifact caused by reconstruction. 2.9. SUMMARY 51 Chapter 3 Reconstruction Reconstruction is the process of producing a continuous intensity surface from a discrete image. The produced intensity surface should follow the implied shape of the discrete data. In this chapter we ﬁrst present several ways of classifying reconstructors, introducing several concepts which are used in later work. From there we look at skeleton, super-skeleton and Dirac delta function reconstruction methods before turning our attention to more conventional reconstructors. The ﬁrst of these is the ‘perfect’ interpolant: the sinc function. A brief glimpse at the theory of perfect interpolation is given here as a precursor to the discussion on piecewise local polynomials which takes up the remainder of this chapter. We return to perfect interpolation in chapter 4. The piecewise local polynomial reconstructors are widely used, simple, and easy to implement. The sections in this chapter cover the ﬁrst four orders of piecewise local polynomial (up to cubic) and brieﬂy consider the extension to higher orders. There is also a substantial discussion of the criteria involved in designing a particular piecewise local polynomial, which has application to all reconstructors. Beyond this chapter lie three more on reconstruction. The ﬁrst considers edge eﬀects, inﬁnite extent reconstructors, and local approximations to these; the second examines the fast Fourier transform reconstructors; and the third develops some issues in a priori knowledge reconstruction. We begin by presenting some classiﬁcations for reconstructors and introduce much of the terminology which is used throughout these four chapters. 3.1 Classiﬁcation There are several ways of classifying reconstructors. The following are some of the main distinctions which can be drawn between reconstruction methods. 3.1.1 A priori knowledge The reconstruction process is inﬂuenced by the sampling method which generated the discrete image. How we assume the samples were created aﬀects how we create our reconstruction method. How the samples were actually created, in collusion with our reconstruction method, aﬀects the result. If these two sampling methods are not the same then the reconstructed intensity surface may not be what we intended. In many cases the sampling method is unknown, or we may choose to ignore what we do know about it. In such a situation, where all that is known about the image are the image sample 52 3.1. CLASSIFICATION 53 values (and their relative positions), we must use a reconstructor which only takes those values into consideration. We call such a reconstructor a ‘no a priori knowledge’ (NAPK) reconstructor. That is, no knowledge is utilised except for the image sample values. Such a reconstructor can thus be applied to any image. If we have some a priori knowledge about the image (that is knowledge that is separate from the image data itself) then we can potentially produce a better reconstructed intensity surface than a NAPK reconstructor could. Such a priori knowledge could be about the sampling method used, the content of the image, or both. A reconstructor which uses such knowledge we call an ‘a priori knowledge’ (APK) reconstructor. APK reconstruction is discussed in chapter 6. 3.1.2 Reconstruction ﬁlters Reconstruction methods can be classiﬁed as to whether or not they can be represented by a ﬁlter. Many reconstruction methods can be expressed as ﬁlters. A ﬁlter is some continuous function which is convolved with the sample values to produce the continuous intensity surface. In dealing with images it is important to remember that there are two ways of thinking about them. The ﬁrst is to think of an image as it is stored in the computer: as a ﬁnite array of intensity values, fi,j , where (i, j) is the co-ordinate of the pixel with value fi,j . The second way is to consider the image to be a continuous function consisting of an inﬁnite sum of equispaced, weighted Dirac delta functions. For a square pixel array, for example: ι(x, y) = ∞ ∞ fi,j δ(x − i, y − j) i=−∞ j=−∞ ι(x, y) is a continuous function which is non-zero only at integer locations1 . It is this latter, continuous, function with which the reconstruction ﬁlter, h(x, y), is convolved to produce the continuous intensity surface: f (x, y) = h(x, y) ∗ ι(x, y) h(x, y) is referred to as the ﬁlter’s impulse response, the ﬁlter’s kernel, or sometimes simply as the ﬁlter itself. A problem with this is that the mathematical image function, ι(x, y), is inﬁnite in extent whilst the discrete function, fi,j , is ﬁnite in extent. This means that all of the weights in ι(x, y) which lie beyond the edges of the discrete image have an indeterminate value. The solutions to this edge eﬀect problem, how to determine these unknown values, are discussed in chapter 4. Throughout the following chapters two diﬀerent functions will be used in analysing reconstruction ﬁlters: the ﬁlter itself, h(x, y), and the intensity surface produced by that ﬁlter, f (x, y). In the case of a reconstruction method that cannot be represented by a ﬁlter, only the intensity surface, f (x, y), can be used. 3.1.3 Skeleton vs continuous reconstruction The skeleton reconstructors do not generate a continuous intensity surface, instead they generate a surface which only has a deﬁned value at certain points and is elsewhere undeﬁned. Such a surface cannot be represented by a ﬁlter. However, a similar idea can be, that is: a surface which is non-zero and ﬁnite at certain points and is elsewhere zero. A related idea to this is the Dirac 1 It is a simpliﬁcation to assume that the samples occur at integer locations in the real plane. But here we can make this simpliﬁcation without loss of generality, because any rectangular sampling grid can be converted to one where the samples occur at integer locations. For example, if the samples are at (x, y) = (x0 + i∆x, y0 + j∆y), then 0 0 and y = y−y they can be put at integer locations by changing the coordinate system: x = x−x ∆x ∆y 54 CHAPTER 3. RECONSTRUCTION delta function surface, described above, which consists of an inﬁnite sum of weighted, shifted Dirac delta functions. Both the skeleton and Dirac delta function reconstructors are discussed under the general heading of ‘skeleton reconstruction’ (section 3.2) although a Dirac delta function surface can be considered continuous for area sampling processes. All other reconstructors produce a continuous intensity surface. As this is the norm, no special section is required to discuss continuous reconstruction. 3.1.4 Interpolating or approximating There are two broad categories of reconstructor: the interpolants and the approximants. The interpolants (interpolating reconstructors) generate intensity surfaces which pass through all of the sample data points; the approximants (approximating reconstructors) generate surfaces which do not have to do this, but an approximant may generate an intensity surface which may pass through any number of the points. If we have no a priori knowledge, then we are best advised to use an interpolant. This is because we know nothing about how the image was captured, so we must make an assumption. The simplest, and generally accepted, assumption is that the samples are single point samples of an original intensity surface and thus the original surface interpolates the sample points. Lacking any further information we assume that it is this intensity surface that we ideally wish to reconstruct, and thus the reconstructed intensity surface should interpolate the points. The concept of interpolating rather than the more general reconstruction is so strong that many authors assume that a reconstructor is an interpolant (Mitchell and Netravali [1988], for example, discuss approximants but use the word ‘interpolation’ to describe all reconstructors). If, however, we have some a priori knowledge, or some such knowledge can be extracted from the image then we will probably ﬁnd that our reconstructor is an approximant, as it is well known that no captured images are produced by single point sampling owing to limited resolution and bandwidth in the sampling process. Few rendered images are produced by single point sampling either, because more sophisticated sampling methods are used to prevent artifacts occurring in the images. Thus the samples do not necessarily lie on the original intensity surface, and so interpolating through them will not produce the original intensity surface. 3.1.5 Finite or inﬁnite; local or non-local Some reconstructors are formally inﬁnite; they depend on an inﬁnite number of sample points. Put another way, they have reconstruction ﬁlters of inﬁnite support (assuming they have ﬁlters). Other reconstructors are ﬁnite; they depend on only a ﬁnite number of sample points to produce the value at any given point on the intensity surface. A ﬁnite reconstructor may be local or non-local. This is a less sharp division. Local reconstructors base the value of the intensity surface at any point on only local data points. Typically this means data points no more than a few pixel widths away from the point being calculated. Nonlocal reconstructors depend on data farther away (as well as the local data). An inﬁnite-extent reconstructor could be considered the limiting case of non-local reconstructors. Most practical reconstructors are ﬁnite. Some inﬁnite reconstructors can be implemented so that the calculation take ﬁnite time, but the vast majority cannot. Inﬁnite-extent reconstructors and the ﬁnite-extent approximations to them are discussed in chapter 4. Localness is an important consideration when implementing a reconstructor. In general, the more local a reconstructor, the faster the value of the intensity surface can be calculated at an arbitrary point. Localness is also an important consideration in stream processing: the fewer points needed at any one time, the better. It is discussed in more detail in section 3.5.5. 3.2. SKELETON RECONSTRUCTION 55 Figure 3.1: Each grid (solid lines) represents part of an inﬁnite array of square pixels. On the left are the reﬂection lines (dashed lines) for those reﬂections which exactly map pixel centres into pixel centres. On the right are the rotation centres (dotted circles) for the 180n◦ and 90n◦ rotations which map pixel centres to pixel centres. 3.1.6 Other classiﬁcations There are other ways of classifying reconstructors: • is it a polynomial or not? • does it have a strictly positive kernel or not? (see section 3.5.7) • does it approximate some other function? If so, which? For example, approximations to the sinc function or to a Gaussian function are common. In this dissertation several families of reconstructors are discussed. They are certainly not mutually exclusive. The family of piecewise local polynomials are examined later in this chapter. There, many of the criteria used in the design of reconstructors are evaluated. These criteria can also be used to classify reconstructors. Whilst the discussion there is focussed on piecewise local polynomials, it is applicable to all reconstructors. Thus, we leave discussion of these other classiﬁcations until then (section 3.5). We now turn to the slightly odd families of skeleton and Dirac delta function reconstructors. 3.2 Skeleton reconstruction The question arises as to whether a continuous intensity surface is actually required. In some circumstances it is only necessary to have the value of the intensity surface at the sample points themselves. Elsewhere the intensity surface is undeﬁned (some may prefer to say that it is zero, which is not the same thing). Such a bare bones reconstruction is called skeleton reconstruction. Skeleton reconstruction is only useful with a limited set of transformations and samplers. For example, with single point sampling, the set of transformations is that where the transformations exactly map pixel centres to pixel centres. This set includes (for a square pixel grid) rotation by 90n◦ (n ∈ Z); mirroring about certain horizontal, vertical and 45◦ diagonal lines; and translations by an integral number of pixels in the horizontal and vertical directions (see ﬁgure 3.1 and also Kornfeld [1985, 1987]). For one bit per pixel images, Fiume [1989, section 4.6] proves that these 56 CHAPTER 3. RECONSTRUCTION Figure 3.2: Shear distortion — the image on the left is sheared to produce that on the right. The pixel centres in the right hand image are still in the same square grid orientation. However the pixels themselves are distorted. This leads to distortions in the output image if only pixel centres are considered in the resampling from one image to the other. transformations are the only ones which can be performed on a square pixel grid without degrading the image. For many bit per pixel images the questions of whether there are other transformations that lose no information is an open one. As well as these, there are lossy transformations which map pixel centres to pixel centres. These include scaling by a factor of N1 , N ∈ Z, N = 0, about certain points (generally pixel centres). This maps some pixel centres to pixel centres and is a digital sub-sampling operation. Guibas and Stolﬁ [1982] propose this form of image shrinking in their bitmap calculus. It gives the least pleasing result of all shrinking methods and is only applicable to shrinking by integral factors ( N1 , N ∈ Z, N = 0)2 . A further set of transformations which map pixel centres to pixel centres is the set of shears, by an integral number of pixels. However, these transformations produce distortions if done in this simplistic way. These distortions occur because, while pixel centres do map to pixel centres, the pixel areas do not map onto pixel areas. Figure 3.2 shows the distortion for square pixels sheared by one pixel distance 3 . Skeleton reconstruction does not produce a continuous intensity surface because the value of the ‘intensity surface’ is undeﬁned except at the sample points. But a surface can be designed which produces similar results to those described. The method described above has an intensity surface which takes the sample values at the sample points and is undeﬁned at all other points. ‘Undeﬁned’ is a diﬃcult concept to express mathematically, so to generate a ﬁlter kernel which describes this skeleton reconstruction we replace ‘undeﬁned’ with ‘zero’ and get the kernel: 1, x = 0 and y = 0 h(x, y) = 0, otherwise The properties of an intensity surface reconstructed by this kernel are: • it has non-zero values only at the sample points; • it will produce a sensible result only if it is point sampled at only these points; 2 For shrinking by a factor of 1 , R ∈ R, |R| ≥ 1, you need at least some form of reconstruction or a more R sophisticated sampling method. Heckbert [1989] says that for R 1 it is only necessary to reconstruct using a Dirac delta function reconstructor, but that a good area sampler is required. 3 Interestingly enough, many other reconstruction methods will produce as bad a result in this case unless a more sophisticated sampling method is employed. This is because the value of the intensity surface at the pixel centres under interpolation methods is exactly the sample value at that pixel centre before interpolation and so point sampling at these points produces exactly the same value as if we had used skeleton reconstruction. 3.2. SKELETON RECONSTRUCTION 57 • a point sample anywhere else will produce the value zero; • an area sampling method will not work at all. This last point is due to the fact that any area sampler’s preﬁlter applied to this surface will produce the ﬁltered surface g(x, y) = 0, ∀x, y ∈ R. Thus all the sample values from the area sampler will be zero. This reconstructor is diﬀerent to the Dirac delta function reconstructor discussed by Heckbert [1989, p.51], and it produces diﬀerent results. The Dirac delta function produces a very diﬀerent kind of intensity surface. Its ﬁlter is deﬁned as: h(x, y) = δ(x, y) and it has the following properties: • it is the identity function, that is: it has the following eﬀect on the continuous image function: = δ(x, y) ∗ ι(x, y) = ι(x, y) f (x, y) • it is zero everywhere except at pixel centres where it tends to inﬁnity; • for some small , less than the distance between pixels: i+ j+ f (x, y) dy dx = fi,j i− j− • it only works with area sampling methods because using a point sampling method could result in the product of two Dirac delta functions, the value of such a product is undeﬁned; • not all area samplers produce a good result, only those for which a large number of delta functions fall within the area of the sampler for each sample taken. The notable diﬀerence is that the Dirac delta function works with area samplers, but not point samplers, whilst the skeleton reconstructor works with point samplers but not area ones. There are other reconstructors which produce a similar eﬀect to skeleton reconstruction, the difference being that their set of deﬁned points is not limited to the sample points. They are able to produce a ﬁnite number of new points between the existing data points. However they cannot be practically used to ﬁnd the value of the surface at an arbitrary point. Such a reconstructor can be dubbed ‘super-skeleton’ by analogy to ‘sampling’ and ‘super-sampling’. The best example of a super-skeleton method is the fast Fourier transform (FFT) scaling method [Fraser, 1987, 1989a, 1989b; Watson, 1986] (section 5.1). This method produces a ﬁnite number of equispaced samples between existing samples but cannot be used to produce samples at arbitrary points. Fraser’s implementation allows for x × 2n , n ∈ Z points from x original sample points. Watson’s method allows us to produce an arbitrary number of points in place of our original x (these ﬁgures are quoted for one-dimensional signals, they need to be squared for two dimensions). To produce a continuous surface would require an inﬁnite number of points and hence inﬁnite time; so whilst theoretically possible it is hardly practicable. In this method, all the points have to be generated before you can use them for further processing, they cannot be generated one at a time. This FFT method reconstructs a surface, which, as before, has non-zero values only at a set of points and is elsewhere undeﬁned (alternatively, zero). There is no way of stating the method as a reconstruction ﬁlter. It is however, a perfectly valid reconstruction method. Again, on its own, it is only useful for a limited number of applications, most notably for scaling the image. 58 CHAPTER 3. RECONSTRUCTION However, as part of a more powerful reconstruction method it has great utility, as we shall see later (section 5.3). A super-skeletal intensity surface, like a skeletal surface, has deﬁned values only at certain points, and is undeﬁned elsewhere. The values at these points are ﬁnite. An analagous procedure could be followed to produce a ‘super-delta function’ reconstructor with extra delta functions occurring between the ones at the pixel centres. Which is chosen depends on the use that is being made of the intensity surface. The former is useful for point sampling; the latter for area sampling. Again only a limited set of transformations and samplers can be used with these reconstruction methods. We now turn our attention to those reconstruction processes which do produce a continuous intensity surface in the more usual sense. We begin by looking at the theory of perfect interpolation. 3.3 Perfect interpolation For any set of ﬁnitely spaced sample points there exist an inﬁnite number of interpolating functions. Which of these functions is the one that we want? Whittaker posed this question in 1915 and produced the possibly unexpected result that through any set of ﬁnite-value, ﬁnite-spaced points there exists a single interpolating function which is everywhere ﬁnite valued and has no spatial frequencies greater than one half the sample frequency4 . This function Whittaker called the cardinal function of the inﬁnite set of functions through these sample points. This result was also discovered empirically in signal processing work. Shannon [1949] states that both Nyquist (in 1928) and Gabor (in 1946) had pointed out that “approximately 2T W numbers are suﬃcient” to represent a one-dimensional function limited to the bandwidth W and the time interval T . Mitchell and Netravali [1988] note that Mertz and Gray (in 1934) had established the same principle as a rule of thumb for processing images (two-dimensional signals). However, it was in 1949 that Shannon ﬁrst formalised what we now call ‘the sampling theorem’ described in section 2.4.1. His contribution was to prove it as an exact relation, not just an approximate one. Shannon’s theorem can be seen as a restatement of Whittaker’s result, in diﬀerent language. Given a set of samples of some function, one can reconstruct this cardinal function by convolving the samples with a sinc function (see section 2.4.1). The theory behind this can be found in both Whittaker [1915] and Shannon [1949] but perhaps more understandably in Wolberg [1990] or Foley et al [1990, section 14.10]. The sinc function has also been called the ‘cardinal function’, [Petersen and Middleton, 1962, p.292], the ‘cardinal sine’ [Marion, 1991, p.24], and some call it the ‘cardinal spline’ [Maeland, 1988 p.213]. However the term ‘cardinal spline interpolation’ is often used in a wider sense to mean any interpolation. Schoenberg [1973] uses the term to describe interpolants which fall between the sinc interpolant and linear interpolation in complexity. Other perfect interpolation reconstructors exist for other classes of signal, but, for a square pixel 1 . grid, a two-dimensional sinc function perfectly reconstructs all signals band-limited to |νx |, |νy | < 2T To use this ‘perfect interpolation’ one of the following two methods must be used: either (a) convolve the original image samples with the appropriate sinc function, or (b) do the appropriate processing in the continuous frequency domain. The latter method, being continuous, is impossible in a digital computer; the FFT method discussed in chapter 5 is, however, a digital approximation to it. The former method has drawbacks which will be discussed in more detail in section 4.3. At this point all we need know is that the ‘perfect’ interpolation method does have drawbacks; notably it is painfully slow, and for non-bandlimited images it reconstructs aliases (manifesting mainly as ripples in the intensity surface). Because of these drawbacks other interpolating and approximating reconstruction methods have been used over the years. We begin our discussion of these with one very important family of reconstruction methods: the piecewise local polynomials. 4 Equivalently “no periodic constituents of period less than twice the sampling period” [Whittaker, 1915, p.187]. 3.4. PIECEWISE LOCAL POLYNOMIALS 3.4 59 Piecewise local polynomials The piecewise local polynomials are a very useful and commonly used family of reconstructors. Before looking at them in detail we discuss the general form in one dimension and its extension to two or more dimensions. This is followed by a section on each order from the ﬁrst to the third; the third being used as a case study of the design of such reconstructors. The case study leads us into a discussion of the factors involved in the design of piecewise local polynomial (and, more generally, all) reconstructors. In the ﬁnal major part of this section we discuss the fourth order, ﬁnishing with a brief look at higher orders. At this juncture we need to point out that the order of a polynomial is one more than its highest degree term. Thus a quadratic is a second degree polynomial, but a third order one. 3.4.1 The general case in one dimension The piecewise local polynomial curve consists of polynomial pieces, each one sample width wide. Every piece of the curve depends on the m nearest sample points. Thus, if m is even, each piece starts and ends at adjacent sample points; if m is odd, each piece starts and ends halfway between adjacent data points. This is important and is elaborated on in section 3.4.6. The general form of the piecewise local polynomial is: b f (x) = ci (x) fα+i i=−a a and b depend on m: m−1 2 m = 2 a = b They can be shown to obey the property a + b = m − 1. Thus there are m terms in the sum. α is an integer dependent on x: x − x0 , ∆x α= x − x0 , ∆x m even m odd The samples are assumed to be equispaced at a spacing of ∆x. Sample zero has value f0 and is at location x0 (ﬁgure 3.3). ci (x) is a polynomial function of order n. Its general form is: ci (x) = n−1 ki,j tj j=0 where: t = x − xα ∆x For m even: 0 ≤ t < 1. For m odd: − 12 ≤ t < 12 . x is thus split into two parameters: an integer part, α, and a fractional part, t. The relationship between them is: x = x0 + (α + t)∆x = xα + t ∆x 60 CHAPTER 3. RECONSTRUCTION f0 f-1 f4 f1 f3 f-2 f2 f-3 x-3 f5 x-2 x-1 ∆x x0 x1 x2 x3 x4 x5 Figure 3.3: The relationship between xi , fi and ∆x. The xi are equispaced along the x-axis with a spacing of ∆x. The value of the sample at xi is fi . The whole general form can be represented as the single equation: f (x) = b ki,j tj fα+i i=−a where: m−1 a= 2 b= n−1 (3.1) j=0 x − x0 , m even ∆x α= x − x0 , m odd ∆x x − xα t= ∆x m 2 Note that the formula has m × n degrees of freedom (ki,j values). The larger m × n, the more computation is required. The smaller m × n, the less the reconstructor can achieve. A balance must be found, choosing m and n to give the best result. In normal use m = n; the reasons for this are given in section 3.5.5. Figure 3.4 gives some examples of piecewise local polynomial reconstructors. 3.4.2 Extending to two dimensions The standard way of extending this to more than one dimension is to use the same one-dimensional polynomial along each axis of the co-ordinate system (assuming the usual rectangular co-ordinate system). In two dimensions this thus gives: f (x, y) = b i=−a = j n−1 ki,j t j=0 b b n−1 n−1 b p=−a n−1 q kp,q s fα+i,β+p q=0 Ki,j,p,q tj sq fα+i,β+p i=−a p=−a j=0 q=0 where: Ki,j,p,q = ki,j × kp,q (3.2) 3.4. PIECEWISE LOCAL POLYNOMIALS 61 (a) m = 1 n=1 (b) m = 2 n=2 (c) m = 2 n=4 (d) m = 3 n=3 (e) m = 3 n=3 (f) m = 4 n=4 (g) m = 4 n=4 Figure 3.4: Some piecewise local polynomial reconstructed curves, showing the number of points each segment is based on (m) and the order of the piecewise curve (n). (a) Nearest-neighbour; (b) linear; (c) two point cubic; (d) interpolating quadratic; (e) approximating quadratic; (f) CatmullRom cubic; and (g) approximating cubic B-spline. 62 CHAPTER 3. RECONSTRUCTION a= m−1 b= 2 x−x 0 α= ∆x x−x 0 ∆x t= , m even , m odd m 2 0 y−y , m even ∆y β= y−y0 , m odd ∆y x−xα ∆x s= y−yβ ∆y Whilst this is only the product of two functions of the form of equation 3.1 it appears considerably more complex. Because the two-dimensional form is unwieldy we use the one-dimensional form in our work. We keep in mind that, in real image resampling, the equivalent two-dimensional form is used. The extension is similar for three or more dimensions. It is possible to extend from one to many dimensions in a diﬀerent way. This is very seldom done but one such extension method is discussed in section 3.4.5. It is also possible to use diﬀerent m and n values in each dimension, although it is diﬃcult to conceive of a situation that would warrant it. Perhaps a spatio-temporal resampling would require a diﬀerent reconstructor in the temporal dimension to that used in the spatial dimensions. 3.4.3 Why are they so useful? The piecewise local polynomials are easy to understand, simple to implement, and quick to evaluate. The localness is especially useful in stream processing applications. The family contains many reconstructors which have been studied over the years: some in their own right and some as approximations to more complex functions. Having examined the general form of the family we now move on to discuss the ﬁrst, second, and third order sub-families. 3.4.4 Nearest-neighbour The simplest reconstruction method of all is the nearest-neighbour method which assigns to each point the intensity of the sample at the sample point nearest to that point. It is the ﬁrst order piecewise linear polynomial. The parameters, m and n, are set to m = n = 1. The single degree of freedom is given the value 1 so that the function interpolates the sample points. The function is: f (x) = fα where: α= and its kernel is the box function: h(s) = x − x0 ∆x 1, |s| < 12 0, otherwise (3.3) Thus the value of f (x) at any point is based on just a single sample value. An example of nearestneighbour reconstruction is shown in ﬁgure 3.4(a). This reconstructor produces an intensity surface which is, in general, discontinuous. It is the worst of the common methods, but it has been widely used because of the speed with which it can be implemented and its sheer simplicity. Improvements in computer power have meant that it has been replaced in many applications although it is still the dominant method in one area: frame buﬀer hardware zoom functions [Wolberg, 1990, p.127]. It is only recently that hardware has become available for real-time hardware magniﬁcation using more sophisticated algorithms [TRW, 1988, chip TMC2301]. 3.4. PIECEWISE LOCAL POLYNOMIALS 63 This reconstructor is the ﬁrst order member of the family of B-splines. Each member of this family is generated by convolving a number of box functions to produce the ﬁlter kernel [Andrews and Patterson, 1976]. The ﬁrst order member’s ﬁlter (equation 3.3) is a single box function. Convolving two of these together produces the second order member: linear interpolation. 3.4.5 Linear interpolation This is, perhaps, the most used reconstruction method. In one dimension it involves the two closest sample points to each point. The general form of a second-order piecewise linear polynomial with m = n = 2 is (from equation 3.1): f (x) = (at + b)fα + (ct + d)fα+1 where: α = t = x − x0 ∆x x − xα ∆x If we wish this to be an interpolating method then the four degrees of freedom are completely speciﬁed to give: f (x) = (1 − t)fα + tfα+1 It is basically a ‘join the dots with straight lines’ method and it is hard to think of any other useful values for the degrees of freedom. The ﬁlter kernel for this function is the convolution of two box functions (equation 3.3), producing a ‘tent’ function: 1 + s, −1 < s < 0 1 − s, 0 ≤ s < 1 h(s) = 0, otherwise An example of this type of reconstruction is shown in ﬁgure 3.4(b). There are two ways of extending this to two dimensions. The ﬁrst is as described in section 3.4.2 and produces possibly non-planar, bilinear patches bounded by four sample points. The other divides each square bound by four adjacent sample points into two triangles and produces a planar patch in each triangle (ﬁgure 3.5) [Pratt, 1978, pp.114–116]. The two methods generally do not produce the same result and the latter method can produce diﬀerent results depending on how the square is split into triangles. We will consider only the former as it is better, is more amenable to mathematical analysis, and is by far the most used. Linear interpolation produces an intensity surface which is guaranteed continuous; however, it is discontinuous in its ﬁrst derivative. It is one of the most widely used reconstruction algorithm and is recommended by Andrews and Patterson [1976] in their study of interpolating B-splines, of which it is the second order member. It produces reasonable, but blurry, results at a moderate cost [Wolberg 1990, p.128]. However, often a higher ﬁdelity is required and so we go on to higher orders of polynomials. 3.4.6 Quadratic functions The quadratic functions have m = n = 3; each section is thus a quadratic curve based on three points. This family of reconstructors does not appear to have been studied. Hence we make the quadratics a case study in the design of reconstructors. We begin with the general case, adding more and more criteria to the function until we have used all of the degrees of freedom to produce the best possible quadratic reconstructor. 64 CHAPTER 3. RECONSTRUCTION (a) (b) (c) Figure 3.5: Two ways of extending linear interpolation to two dimensions. (a) shows the bilinear approach, which we recommend. (b) and (c) show the subdivision into two planar triangles approach. Notice that this can lead to diﬀerent results depending on how the square is split into triangles (after Pratt [1978] ﬁg. 4.3–6). Before doing so, however, we must clear up a misunderstanding in the literature. Wolberg [1990] says that third order polynomials are inappropriate because it has been shown that their ﬁlters are space-variant with phase-distortions and, further, that these problems are shared by all polynomial reconstructors of odd order. This result is attributed to the fact that the number of sampling points on each side of the reconstructed point always diﬀers by one. Wolberg thus does not consider any polynomials of odd order. He bases his conclusion on the work of Schafer and Rabiner [1973] who show this result given certain assumptions (perhaps their work explains why no-one has considered quadratic reconstruction). However, as will be shown, if we replace one of Schafer and Rabiner’s initial assumptions with another we can produce quadratic polynomials whose ﬁlters are space-invariant and which produce no phase-distortion. It should also be noted that nearest-neighbour (ﬁrst order) reconstruction is an odd order method and that Wolberg (and many others) do consider this without considering the fact that the number of sampling points on each side of the reconstructed point always diﬀers by one (zero on one side and one on the other). This lack of consistency seems to be because nearest-neighbour is so simple that it has linear phase anyway, and so produces no phase distortions. Parker et al [1983], who also consider the nearest-neighbour interpolant, say that using three points for reconstruction (that is: quadratic reconstruction) would result in two points on one side of the reconstructed point and only one point on the other side. They believe that this is a bad thing and therefore conclude that the next logical reconstruction function (after linear reconstruction) would use the two nearest points in each direction, that is: cubic reconstruction. It is true that quadratic reconstruction requires that there be two points on one side of the reconstructed point and only one on the other but our assertion is that if these three points are the three points closest to the reconstructed point then a space-invariant, non-phase-distorted, perhaps even useful, reconstruction method can be created. It is this assumption that the three points used for quadratic reconstruction are the three closest to the reconstructed point that distinguishes our work from that of Schafer and Rabiner [1973]. They assume that each piecewise segment of the curve must start and end at sample points. This leads to an unbalanced situation (ﬁgure 3.6). In half of each piecewise segment there is a fourth sample point that is closer to the reconstructed point than one of the sample points that is 3.4. PIECEWISE LOCAL POLYNOMIALS 65 R Q P S α γ Figure 3.6: An unbalanced situation with a quadratic piece starting and ending at sample locations. The piece depends on three points: Q, R, and S. It is deﬁned over the range α, to start and end at points Q and R. Thus for half the range, that is for range γ, point P is closer to the points on the reconstructed curve than point S, although it is point S which is used to calculate values. h(s) 1 0.5 -1 0 1 2 s Figure 3.7: The kernel of one of Schafer and Rabiner’s quadratic interpolants. This kernel does not have linear phase, as can be readily seen from the fact that the kernel has no vertical line of symmetry. used to calculate the reconstructed value. Schafer and Rabiner prove that a quadratic Lagrange interpolating function based on this assumption does not have linear phase (that is: it produces phase distortion in the reconstructed intensity surface) and, hence, is useless as a reconstructor. It can be shown that any quadratic reconstructor based on this assumption will not have linear phase (see section 3.5.1 for a discussion of linear phase). 66 CHAPTER 3. RECONSTRUCTION ∆x 2 ∆x Figure 3.8: Each piece of a quadratic local polynomial starts and ends halfway between sample points. The ﬁlter kernel for the continuous version of one of Schafer and Rabiner’s quadratic Lagrange interpolants is shown in ﬁgure 3.7. It is deﬁned as: 1 2 3 s + s + 1, −1 < s ≤ 0 2 2 2 0<s≤1 −s + 1, (3.4) h(s) = 1 2 3 s − s + 1, 1<s≤2 2 2 0, otherwise The other possible Lagrange interpolant’s kernel is this mirrored in the line s = 0. It is obvious from this that the interpolant does not have linear phase. A reconstructor with linear phase obeys the rule that h(s − s ) = h(−(s − s )) for some constant s . That is to say the function is symmetric about the line s = s . For zero phase s = 0 and h(s) = h(−s). The eﬀect of linear phase is to translate the zero phase intensity surface in the spatial domain. Thus we can, without loss of generality, concentrate on the zero phase situation. The idea of linear, rather than zero phase comes from work in the time domain where a reconstructed point can only be output after all of the sample points which aﬀect its value have been input. In a real-time system there must thus be some delay between points going into and points coming out of the system when data occurring after the time of the required output point aﬀects that output point’s value. Linear phase is necessary in such cases. In our case we have all of the data points available at once and so need only concern ourselves with the zero phase case. Should we need to shift the spatial domain ﬁlter later then we can be assured that it will have linear phase if we simply let h (s) = h(s − s ) for some constant s . We will start by looking at the general quadratic reconstructor in the spatial domain. Its intensity surface will be made up of a piecewise quadratic curve. Every point on the curve will depend on the three sample points closest to that point. Thus each quadratic piece will start and end halfway between samples (ﬁgure 3.8). The equation is f (x) = + + where α = (at2 + bt + c)fα−1 (dt2 + et + f )fα (gt2 + ht + i)fα+1 x − x0 ∆x and t = (3.5) x − xα ∆x We want this function to have zero phase. This applies the restriction that the function’s ﬁlter kernel, h(s), is real and even. For f (x) above, the kernel is: 2 gs + (h + 2g)s + (i + h + g), ds2 + es + f, h(s) = as2 + (b − 2a)s + (c − b + a), 0, − 23 ≤ s ≤ − 12 − 21 ≤ s ≤ 12 1 3 2 ≤s≤ 2 otherwise 3.4. PIECEWISE LOCAL POLYNOMIALS 67 For this to be real and even, we require h(s) = h(−s). This gives us the values of four parameters, say e, g, h, and i, in terms of the other ﬁve: e = 0, g = a, h = −b, i = c Thus reducing equation 3.5 to: f (x) = + + (at2 + bt + c)fα−1 (dt2 + f )fα 2 (at − bt + c)fα+1 (3.6) We now have a system with ﬁve degrees of freedom, compared to the original nine. What we use these degrees of freedom for depends on what features we would like our function to have. Looking back we can see that nearest-neighbour interpolation does not produce a continuous curve, whilst linear interpolation does. So the ﬁrst ‘desirable feature’ (after zero phase) is to constrain the curve to be continuous (this is called C0-continuity: continuity of the function itself). We do this by making sure that all the pieces join up at the ‘knots’. If f (x) = f (α, t) (that is: f (x) = f (xα + t∆x)), then we want f (α, 12 ) = f (α + 1, − 12 ): ( 14 a + 12 b + c)fα−1 + ( 14 a = ( 14 d + f )fα − 12 b + c)fα + + ( 14 a − 12 b + c)fα+1 ( 14 d + f )fα+1 + ( 14 a + 12 b + c)fα+2 Which gives us two parameters in terms of the other three (say, c = − 12 b − 14 a and d = −4(b + f )). This reduces equation 3.6 to: f (x) = + + (at2 + bt + (− 12 b − 14 a))fα−1 (−4(b + f )t2 + f )fα 2 (at − bt + (− 12 b − 14 a))fα+1 (3.7) Leaving us with the three degrees of freedom: a, b and f . We can use these in an attempt to make the piecewise quadratic interpolate the points. This requires that f (xα ) = fα : fα = f (xα ) = + + (− 12 b − 14 a)) fα−1 (f ) fα (− 12 b − 14 a)) fα+1 Giving us a and f , say, in terms of the one remaining parameter: a = −2b and f = 1. We are thus left with one degree of freedom: b. Equation 3.7 can now be rewritten as: f (x) = + + (−2bt2 + bt)fα−1 (−4(b + 1)t2 + 1)fα (−2bt2 − bt)fα+1 (3.8) Thus far our function is doing no more than linear interpolation; it is: zero phase, C0-continuous and interpolating. Let us add to this list: continuous in the ﬁrst derivative (that is C1-continuous). This requires: ! ! (3.9) f α, 12 = f α + 1, − 12 Substituting this into equation 3.8 we get: (−b)fα−1 = + (−4b − 4)fα (3b)fα + + (−3b)fα+1 (4b + 4)fα+1 + (b)fα+2 68 CHAPTER 3. RECONSTRUCTION Giving us the two conditions that: b = 0 and b = − 74 This is an inconsistency and so we cannot have a zero-phase, C1-continuous, interpolating, piecewise quadratic reconstructed surface. If we now remove the criterion that the curve be interpolating and enforce the criterion that it be C1-continuous we can apply equation 3.9 to equation 3.7 giving, say, b and f in terms of the one remaining parameter: b = −a and f = 32 a. Again we can rewrite equation 3.7: f (x) = + + (at2 − at + 14 a)fα−1 (−2at2 + 32 a)fα (at2 + at + 14 a)fα+1 Factoring a out of this we get: f (x) = a[ (t2 − t + 14 ) fα−1 + (−2t2 + 32 ) fα + (t2 + t + 14 ) fα+1 ] (3.10) So we now have two families of curves: zero-phase, C0-continuous and interpolating, as given in equation 3.8; and: zero-phase, C1-continuous and approximating as given in equation 3.10. These two families are plotted in ﬁgures 3.9 and 3.10, respectively, acting on a sample data set designed to illustrates some of their properties. Given these families, what criterion can we use to pick the best reconstructor? Chu [1990], in a paper on the use of splines in CAD, uses the criteria that, if several of the points lie in a straight line then the interpolating function should give a straight line. This makes sense in an image, in as much as if several adjacent sample points have the same intensity then the intensity surface between those points should be that constant intensity (except possibly near the end of the run of constant intensity sample points). Look at it this way: we are producing a quadratic segment based on the nearest three sample points. If these three points lie in a straight line then the only sensible shape for that quadratic segment is to lie on that straight line. The only information available for the segment is those three points: if they lie in a straight line then the segment must too; there is no more available information for it to do anything else. Applying this criteria to our two families, we ﬁnd that, for the C0-continuous interpolant: b = − 12 . For the C1-continuous approximant: a = 12 . Examining ﬁgures 3.9 and 3.10, we see that these two curves are the ones which appear to most nearly match the implied shape of the data. These ﬁnal two curves are shown in ﬁgure 3.11. Their equations are: C0-continuous, interpolating quadratic: f (x) = + + (t2 − 12 t)fα−1 (−2t2 + 1) fα (t2 + 12 t)fα+1 (3.11) ( 12 t2 − 12 t + 18 )fα−1 (−t2 + 34 )fα ( 12 t2 + 12 t + 18 )fα+1 (3.12) C1-continuous, approximating quadratic f (x) = + + Perhaps the most important thing to notice about both functions is that they are constrained to pass through the midpoints of the segments joining each pair of adjacent data points; that is they must pass through the points : " # xα + xα+1 fα + fα+1 , 2 2 3.4. PIECEWISE LOCAL POLYNOMIALS 69 b = -1.25 b = -1.00 b = -0.75 b = -0.50 b = -0.25 b = 0.00 b = +0.25 Figure 3.9: The one parameter family of C0-continuous interpolating quadratics (equation 3.8) acting on a one-dimensional image. Formally this constraint is a direct result of imposing the C0-continuity and straightness criteria on f (x). Intuitively the reasoning is that any two adjacent parabolic segments must meet at a α+1 point midway between the two sample points (that is: at x = xα +x ) and that, at this point, 2 the value of f (x) can only depend on the two sample points common to both segment deﬁnitions. Thus, the only sensible value of f (x) lies on the line segment connecting those two points; and therefore f (x) is constrained to pass through the midpoint of this line segment. This problem will occur, in some form, for all odd order, piecewise polynomials, because, in each case, the segments meet midway between sample points rather than at sample points (which is where they meet in even order, piecewise polynomials). The C0-continuous interpolant (equation 3.11) does no more than the linear interpolant (section 3.4.5). They have the same features (zero-phase, C0-continuity, interpolating and straightness); they achieve the same thing in diﬀerent ways. With linear interpolation each segment is the single straight line segment between two adjacent data points; while with the quadratic interpolant each segment is the single parabolic segment through a data point and the two adjacent midpoints. In both cases the piecewise curve is exactly deﬁned by the only possible pieces that can pass through the controlling points. The C0-continuous interpolant can be shown to be a linear blend of two linear functions, as illustrated in ﬁgure 3.12. This is analagous to Brewer and Anderson’s [1977] Overhauser cubic function which they derived as a linear blend of two interpolating parabolas. This fourth order function is better known as the Catmull-Rom cubic [Catmull and Rom, 1974]. Our third order 70 CHAPTER 3. RECONSTRUCTION a = +1.25 a = +1.00 a = +0.75 a = +0.50 a = +0.25 a = 0.00 a = -0.25 a = -0.50 Figure 3.10: The one parameter family of C1-continuous approximating quadratics (equation 3.10) acting on a one-dimensional image. 3.4. PIECEWISE LOCAL POLYNOMIALS 71 (a) (b) Figure 3.11: The two piecewise local quadratic reconstructors acting on a one-dimensional image: (a) the C0-continuous interpolant (equation 3.11), (b) the C1-continuous approximant (equation 3.12). f p ( r) f x f 1 x 1 r=0 c( t ) 2 t=0 r=0.5 x 2 t=0.5 r=1 s=0 q( s ) 3 3 t=1 s=0.5 s=1 Figure 3.12: The C0-continuous quadratic interpolant as a linear blend of two linear functions. p(r) and q(s) are straight lines passing through the data points as shown. c(t) is a linear blend of these, producing a quadratic function which passes through the central data point, and the midpoints of the two lines. c(t) is calculated from p(r) and q(s) as: c(t) = (1 − t)p(t + 12 ) + (t)q(t − 12 ) 72 CHAPTER 3. RECONSTRUCTION function is one of the general set of Catmull-Rom curves described in section 3.6.2. The C1-continuous approximant (equation 3.12) is also constrained to pass through the midpoints but not through the sample points. It seems counter-intuitive to have a function which is constrained to pass through non-sample points and yet does not, in general, pass through the sample points themselves. However, this function is of good lineage, being the third order B-spline. It can be generated by convolving three box functions (equation 3.3). Having looked at these two quadratic reconstruction methods, we can see that both have detractions. The C0-continuous interpolant can be said to do little more than linear interpolation (in fact, it can be viewed as adding a small ‘correction’ to linear interpolation, [Press et al, 1988, chapter 3]), whilst the C1-continuous approximant is constrained to interpolate a set of points that are not the sample points (but are derived from them). 3.4.7 Visual evaluation While the above comments are valid from an inspection of the equations, an important test of the quality of any reconstructor is a visual inspection of the intensity surface that it produces. We said in section 3.4.5 that the linear interpolant produces a blurry intensity surface. Comparing the two quadratic reconstructors to linear interpolation we ﬁnd that the approximant generates a blurrier surface than linear, and the interpolant generates a less blurry surface. If we now compare these quadratics to similar cubic reconstructors, we ﬁnd that the quadratic approximant is almost as blurry as the approximating cubic B-spline. The surprising result is that the intensity surface reconstructed by the quadratic interpolant is practically indistinguishable from that generated by the Catmull-Rom cubic interpolant, which is the best of the cubic interpolants (section 3.6.2). As the quadratic can be evaluated in approximately 60% of the time of the cubic, this quadratic would appear to oﬀer a faster way of achieving similar visual quality to the Catmull-Rom cubic. 3.4.8 Summary We have shown that third order, zero-phase reconstructors exist. We have not disproved Schafer and Rabiner’s [1973] proof that such a reconstructor does not exist, given certain initial assumptions; rather we have changed one of the initial assumptions and thus been able to consider a diﬀerent family of functions. From the family of third order, zero-phase reconstructors we have found two interesting members by progressively applying fruitful criteria to the general third order, zero-phase reconstructor. Both of these bear the straightness and C0-continuity criteria. One is also C1-continuous and can be shown to be the third order B-spline. The other is interpolating and only C0-continuous. It can be shown to be a linear blend of two linear functions, and gives similar visual quality to the Catmull-Rom cubic discussed in section 3.6. The derivation of these third order reconstructors has brought out the important issue of what criteria a reconstruction method should include. The next section considers the various criteria which could be incorporated in a reconstructor. 3.5 What should we look for in a NAPK reconstructor? There are many factors which could be taken into consideration when designing a ‘no a priori knowledge’ reconstructor. In this section we will discuss these factors, with speciﬁc reference to piecewise local polynomials. 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 73 (a) (b) (c) Figure 3.13: The eﬀects of a non-linear phase reconstructor. (a) and (b) are the same data set, one the reverse of the other. They have both been reconstructed by the same non-linear phase reconstructor (equation 3.4 was used). (c) is (a) reversed so as to allow easy comparison with (b). The two curves are obviously diﬀerent. 3.5.1 No phase distortion The ﬁrst need is for linear phase. This idea was introduced in section 3.4.6. It basically ensures that the signal will not be distorted by its frequency components having their phases shifted by non-linear amounts (shifting them by linear amounts is equivalent to moving the entire signal in the spatial domain [see page 66]; shifting them by non-linear amounts produces distortion in the spatial domain). Practically, linear phase implies that the reconstruction ﬁlter’s kernel is real and evenly symmetric about some point. In ﬁlter design we can, without loss of generality, use the zero phase situation. With zero-phase the kernel is real and even (symmetric about the origin). In general, it would seem that a spatial ﬁlter can always have zero phase, while a temporal ﬁlter needs linear phase in the cases where a zero phase ﬁlter would be non-causal. A very basic reason behind the need for zero or, at most, linear phase is for the intensity surface to be the same whether the data is presented forward or backward. With a non-linear phase reconstructor the orientation of the data aﬀects the shape of the intensity surface, while with a linear phase reconstructor the orientation of the data has no bearing on the shape of the intensity surface. An example of this is given in ﬁgure 3.13. The practical outworking of having linear phase is that, if there are an odd number of points controlling each section then the sections will start and end midway between sample points. If 74 CHAPTER 3. RECONSTRUCTION there are an even number of points then the sections will start and end at sample points. Other factors While a linear phase reconstructor ensures that we will get no spatial distortion due to phase shifts in the frequency domain, our reconstructor still needs many more constraints in order that the spatial domain intensity surface is a good approximation to the original intensity surface. What is meant by ‘good approximation’ tends to depend on who you ask: are we trying to approximate the cardinal function through the sample points?, a Gaussian reconstructed surface?, or the real original function? Are we trying to make the intensity surface mathematically nice? (that is: obey some predeﬁned mathematical criteria of goodness) or visually nice? (that is: look right). In this entire section we consider both mathematical and visual criteria for correctness. Whatever the eﬀect we are trying to achieve, from now on we can only aﬀect magnitudes in the frequency domain, because we have already constrained our reconstructor to have zero (or at most linear) phase. Thus, if ever we need to look at the frequency response of a linear phase ﬁlter we know that we need only consider the magnitude of the frequency response. 3.5.2 Continuity Continuity is arguably the most important criterion for our intensity surfaces, after linear phase. There are several grades of continuity. We met the ﬁrst two in section 3.4.6. C0-continuity is simply continuity of the intensity surface. C1-continuity is C0-continuity plus continuity of the ﬁrst derivative of the intensity surface. In general, Cn-continuity is C0-continuity plus continuity of all derivatives of the intensity surface up to and including the nth . The degree of continuity required depends on the application. In Computer-Aided Design (CAD), where design curves are being ‘reconstructed’ from control points, a minimum of C2-continuity is generally desirable [Chu, 1990, p.283]. For our application (reconstructing intensity surfaces) we have seen that a reconstructor with no continuity (the nearest-neighbour interpolant) is in frequent use and that the most common reconstructor (linear) has only C0-continuity. Thus the stringent requirements of CAD are not essential here. How much continuity is required for intensity surface reconstruction? Nearest-neighbour interpolation is not good enough for many purposes and has been replaced by linear and other methods. The human eye is very sensitive to intensity diﬀerences [Pratt, 1977] and so C0-continuity is desirable. It is true that the original intensity surface may have had C0-discontinuities (that is: edges5 ), but it is extremely unlikely that the nearest-neighbour reconstructed intensity surface has C0discontinuities in the same place. The human eye is so sensitive to discontinuities that adding incorrect ones produces very visible rastering artifacts (for example: the familiar blocky eﬀect due to nearest-neighbour interpolation; see ﬁgure 2.13). C0-continuity is thus highly desirable because of the human eye’s ease of picking out C0-discontinuities. Can we stop at C0-continuity? In the design of our quadratic reconstructors we attempted to enforce C1-continuity. How much continuity is required? Keys [1981] in his development of a cubic interpolant constrains it to be zero-phase, interpolating, and C1-continuous, but gives no reason for doing so [Keys, 1981, p.1154] except to say that C1discontinuities are not normally desirable [ibid., p.1159]. Mitchell and Netravali [1988, p.223], in their study of cubic ﬁlters, insist that the ﬁlter should be smooth in the sense that its value and ﬁrst derivative are continuous everywhere (that is: C1-continuous). Their reason is that discontinuities 5 ‘C0-discontinuity’ is not a term deﬁned in the literature. We deﬁne ‘Cn-discontinuity’ to mean “discontinuous in the nth derivative but continuous in all derivatives from 0 . . . n − 1”. 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 75 in the function lead to high-frequency leakage in the frequency response of the ﬁlter which can allow aliasing. Chu [1990, p.283] states that, for CAD applications, the curves should have C2-continuity (that is: continuity of curvature). Foley, van Dam, Feiner and Hughes [1990, p.479] say that there are applications, such as the aerodynamic design of cars and aeroplanes, where it is important to control higher-derivatives than the second; so we can assume that C3- or even C4-continuity is required here. However, we are dealing with an intensity surface rather than with a curve that will be seen as a wiggly line. A gradient, or a discontinuity in intensity is perceived diﬀerently to a gradient or a discontinuity in a line, for example in a CAD curve. For this reason the criteria used for continuity in CAD applications are probably irrelevant. The visual evidence favours C0-continuity — even though this intensity surface will never be seen (it must go through at least one sample-and-reconstruct cycle before it is consumed visually); lack of C0-continuity can cause visual problems which propagate through this cycle. Unfortunately, it is not true that C0-continuity is desirable in all cases. If a C0-discontinuity in the intensity surface matches a C0-discontinuity in the original surface then the result is what we want (assuming that the sampling method is chosen intelligently). A good example of this is using nearest-neighbour interpolation to reconstruct a certain type of checkerboard image. The edges in the original image fell along pixel edges and so the nearest-neighbour interpolated image is an exact reconstruction of the original (ﬁgure 7.9). Such cases are, however, rare. All we can say from visual evidence is that introducing C0-discontinuities in the reconstructed intensity surface in places where they did not exist in the original intensity surface is a bad idea. With no a priori knowledge we cannot know where any C0-discontinuities could have been in the original image and so C0-continuity is a good criterion to use if there is no a priori knowledge about the placement of C0-discontinuities. The visual evidence favouring C1-continuity is much weaker than that for C0-continuity because the visual response to a C1-discontinuity is much weaker than the response to a C0-discontinuity. There is evidence that the human visual system cannot detect C2-, or higher, discontinuities in intensity, and there exists little contrary evidence in favour of human ability to detect these higher order discontinuities. Mach [1865] does suggest that the Mach band eﬀect is related to the value of the second derivative of intensity and hence the visual system is slightly sensitive to C2-discontinuities. It seems unlikely that we are aﬀected by C2-discontinuities in the reconstructed intensity surface, especially as there is at least one cycle of sampling and reconstruction between this intensity surface and the intensity surface that is displayed and, hence, viewed by a human. On the other hand, if humans could be slightly sensitive to it, we should take this into consideration. Turning to the mathematical evidence for requiring continuity we ﬁnd much stronger arguments. Any discontinuity in a spatial function requires high, up to inﬁnite, frequencies in the frequency domain. To minimise aliasing artifacts, high frequencies must be minimised, thus discontinuities should be avoided. The biggest problems in reconstructed intensity surfaces arise from C0-discontinuities; however, C1-discontinuities are enough of a problem that Mitchell and Netravali [1988, p.223] cite this as the reason for requiring C1-continuity. C2-discontinuities seem to cause few problems and C2-continuity can be considered an optional extra: it is nice to have but not at the expense of other, more important criteria. Above C2-continuity, other problems start to creep in because of the number of degrees of freedom required to produce higher orders of continuity. The intensity surface may oscillate wildly to allow for Cn-continuity (n > 2) and this is a visually undesirable eﬀect. The recommendations for continuity are: • C0-continuity is a very high priority; • C1-continuity is a medium priority (it makes for smooth changes and there is evidence that humans detect C1-discontinuities); • C2-continuity is a low priority, it need only be pursued when more important criteria have 76 CHAPTER 3. RECONSTRUCTION been met; • any higher continuity is a very low priority, it may actually be counter-productive to insist on higher continuity in a piecewise local polynomial as a lot of degrees of freedom, and hence a high order, are required to allow for it. Forcing a higher order of continuity may cause undesirable rippling artifacts in the reconstructed intensity surface. However, from a frequency domain point of view, the higher n in the Cn-continuity, the better the frequency response should be, all other things being equal. Having decided, for a given reconstructor, that Cn-continuity is desired, it is enforced by ensuring that the piecewise function is Cn-continuous across the joins, or knots, between the pieces. Each piece is inﬁnitely diﬀerentiable along its length by its very nature; it is only the ends that need to be matched up. 3.5.3 Straightness This property can be interpreted in two ways, one of which includes the other. The ﬁrst is that, for a piecewise function where each piece depends on the m nearest points; if, for a given piece, the sample value at all m points is the same, then the piece should have that value along its entire length. In other words, if the samples controlling a piece have a constant value then the reconstruction for that piece is a ﬂat constant signal, producing a visually correct result. Mitchell and Netravali [1988, p.223] explain the need for this criterion in a diﬀerent way: it removes any sample frequency ripple. If you look back to ﬁgure 3.9 you will see that, for every reconstructed function except the one which meets this criterion (b = − 12 ) there is a periodic ripple at the sample frequency. By insisting on this straightness criterion, this ripple is designed out of the reconstruction. Mitchell and Netravali [1988] illustrate this in their ﬁgure 8 using an unnormalised Gaussian ﬁlter. They say that sample frequency ripple can be shown to be aliasing of the image’s DC (‘direct current’ or ‘zero frequency’) component in the frequency domain. Designing sample frequency ripple out of the ﬁlter ensures that its frequency response will be zero at all integer multiples of the sampling frequency (except zero), thus eliminating all extraneous replicas of the DC component. Parker, Kenyon and Troxel [1983, p.36] say that the straightness criterion also forces the reconstructor to have a DC ampliﬁcation of one. This means that the average intensity of the intensity surface is the same as that of the image. The mathematical expression of this constraint is that: ∞ h(s − m) = 1, ∀s ∈ R m=−∞ Looking at it another way, for any order of reconstructor as given in equation 3.1, for the case where all the points aﬀecting the piece have the same value (fi = fα , i ∈ [α − a, α + b]) we have: f (x) = fα b n−1 ki,j tj i=−a j=0 We would like: fα = fα T = b n−1 ki,j tj , ∀t ∈ T i=−a j=0 [0, 1), [− 12 , 12 ), m even m odd 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 77 which implies that: b i=−a b ki,j = 0, ∀j ∈ {1, 2, . . . , n − 1} (3.13) ki,0 = 1 i=−a These two conditions enforce the straightness criterion. The second interpretation of straightness goes farther than this and insists that, for any set of m sample points in a straight line, the reconstructor produces that straight line for the appropriate polynomial piece. It is easy to show that, to ensure this, two constraints must be added to the two in equation 3.13 (the derivation of these equations is given in appendix B.4): b i=−a b iki,j = 0, ∀j ∈ {0, 2, 3, . . . , n − 1} (3.14) iki,1 = 1 i=−a The ﬁrst two constraints (equation 3.13) ensure that, for sample points lying on a ﬁrst order curve (that is: a horizontal straight line) the reconstructed function also lies on that ﬁrst order curve. The second pair of constraints (equation 3.14) ensure that, for sample points, lying on a second order curve (that is: on an arbitrary straight line) the reconstructed function also lies on that second order curve. It is also possible to extend this criterion to third order curves, in which case the points lie not on a straight line but on a parabola. The ﬁrst order constraint allows us to select functions with no sample frequency ripple (as in ﬁgure 3.9) and also with the correct multiplicative constant (as in ﬁgure 3.10) because it constrains the function to be both ﬂat (no sample frequency ripple [Mitchell and Netravali, 1988, p.223]) and interpolative (correct multiplicative constant [Parker et al, 1983, p.36]) when the m points all have the same sample value. For both interpolants and approximants this constraint ensures that, given m sample points with the same sample value, the piece controlled by those points will be ﬂat, constant and interpolating. This is obviously desirable for an interpolant. A few moments thought and a glance at ﬁgure 3.10 will show that it is also the only sensible choice for an approximant; basically the m data points reduce to a single piece of information (the common sample value) and so the approximant can do no better than be ﬂat, constant and interpolating. Obviously, the criterion can be extended to higher orders but it is the ﬁrst order that is important. Only the ﬁrst order constraints (equation 3.13) are required to enforce the straightness criterion as it is needed for reconstructing intensity surfaces. 3.5.4 Interpolating or approximating If we assume that our sample points are perfect point samples then we should use an interpolating function, because the sample points are constrained to lie on the original intensity surface. In an ideal world that would be the end of the matter, however, it is well known that the intensity samples in captured imagery cannot be perfect point samples and often do not approximate point samples at all well. In rendered imagery, perfect point samples are possible, but they are not common in good quality rendered images. Given this, an approximant is a valid reconstructor. In order to produce a useful approximant, however, some a priori knowledge is required about how the samples are produced and/or about the type of image sampled. In the discussion in this section we are restricting ourselves to the case where we have no a priori knowledge. In this 78 CHAPTER 3. RECONSTRUCTION case we assume, as mentioned earlier (section 3.1.4), that the samples are perfect point samples, and reconstruct the surface represented by those samples as best we can. It can be argued that, because we know that we have not got perfect point samples, it is better to make some assumption of non-perfection. The problem is that we do not know what assumption to make. It would be possible to make a guess as to how the samples were formed, or it may be possible to derive such information from the image. Here we make the guess that they are single point samples, which is the commonest assumption. Other assumptions could throw us into the world of APK reconstructors, considered in chapter 6, where reconstruction is done on the basis of knowledge about the sampling process. Given the assumption of single point sampling, is there a use for any approximant? The answer is yes. In order to make a piecewise curve of a certain order interpolate, we have to use some of its degrees of freedom. If we remove the interpolation constraint then we release these degrees of freedom. They can then be used to constrain the piecewise polynomial to meet other criteria. This is exactly what was done in the case of the quadratic in section 3.4.6; the interpolation criterion was dropped, releasing two degrees of freedom which were used to meet the C1-continuity criterion. Thus an approximant is useful where criteria other than interpolation are considered more desirable than interpolation. 3.5.5 Localisation This is not a criterion applied to our general form (equation 3.1) but is more fundamental, relating to how that formula was derived. The question is: which sample points are used to calculate the intensity at any given point? In our examples so far we have used the rule that, for a piecewise reconstructor of order n, the function value at any point is based solely on the n nearest sample points to that point. This leads to several questions: • Why use the sample points directly? • Why n points? • Why the nearest points? The ﬁrst question is profound. Our general formula for a piecewise polynomial reconstructor is given in equation 3.1. This is, however, a speciﬁc form of the more general: f (x) = b ki,j tj gα+i n−1 i=−a where: m−1 a = 2 m b = 2 (3.15) j=0 x−x ∆x , m even ∆x , m odd 0 α t = = x−x0 x − xα ∆x The diﬀerence here is that gα+i is used in place of fα+i The gα are coeﬃcients calculated from the fα plus, in some cases, other ﬁxed conditions. Equation 3.1 is a special case of equation 3.15 with gα = fα . A well known example of equation 3.15 is the interpolating cubic B-spline. Wolberg [1990, pp.136137] gives an example of this showing that one method of calculating the gα from the fα is by the 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? following matrix equation (assuming N sample points): 4 1 f0 f1 1 4 1 0 f2 1 4 1 .. = . . 1 .. 1 fN −2 1 4 1 fN −1 0 1 4 g0 g1 g2 .. . × gN −2 gN −1 79 That is, F = KG. Inverting the tridiagonal matrix, K, gives its inverse, K −1 , giving us G = K −1 F . We thus get a function for each gα : N −1 −1 gα = Kα,l fl l=0 −1 where Ki,j is the element in the ith row and j th column of the matrix K −1 . The nature of K −1 (it contains no zeros) means that every gα depends on every fα . Thus every point, f (x), on the intensity surface depends on all the fα sample values. In general, for almost all such methods, each gα value is a function of all of the fα values, and so the conclusions from this speciﬁc interpolating B-spline example can be generalised. Thus, from equation 3.15: f (x) = = b ki,j tj n−1 i=−a j=0 N −1 b n−1 l=0 = N −1 l=0 i=−a j=0 n−1 l=0 where: κα,j,l = N −1 −1 Kα+i,l fl −1 ki,j Kα+i,l tj fl κα,j,l tj fl (3.16) j=0 b −1 ki,j Kα+i,l i=−a The κα,j,l values can be precalculated in the same way that the ki,j values are for equation 3.1. Indeed equation 3.16 is almost equation 3.1 with m = N , but note that here κ depends on α while, in equation 3.16, the equivalent k is independent of α. By using such a method every point on the intensity surface will, in general, depend on all the sample points. Finding the intensity value, f (x), for any point will thus involve a lot of calculation. The great advantage of using only the local sample points (the n nearest) is that the calculation is fast. If, as in many resampling situations, there is a need for quick computation or for fast arbitrary sampling oﬀ the intensity surface, localisation is a good idea and therefore equation 3.1 is preferable to equation 3.16. It is feasible to precalculate all of the gα coeﬃcients for use in equation 3.15; and this is generally what is done in any real system. This gives the same speed advantage in the actual resampling as using equation 3.1. Instead of calculating intensity values from local sample values they are calculated from local, pre-computed, gα coeﬃcients. The disadvantage here is that the precomputation still has to be done and is diﬀerent for each image6 . If we want a quick reconstructor we may not be willing to pay the price of the precompu6 whilst the ki,j values are the same for all images, and the κα,j,l for all images of the same size. 80 CHAPTER 3. RECONSTRUCTION tation. Further, if the data is being processed as a stream, as in scan-line algorithms, it may be impractical to read and store a whole scanline, calculate the coeﬃcients and produce the output samples. In this case an algorithm requiring this precomputation of coeﬃcients will be impractical. The advantage of using precomputed coeﬃcients is that they allow all of the data to be taken into account at each sample point rather than just the local data. This should, arguably, make possible a more accurate reconstructor. In chapter 4 we look at such non-local reconstructors and their advantages. For now we will concentrate on the simplicity of local reconstructors. That is reconstructors where gα = fα . It is possible to conceive a way of calculating the gα values which makes f (x) dependent only on local fα values. For example, gα = 14 fα−1 + 12 fα + 14 fα+1 . Such a reconstructor can, however, be easily recast in the terms of equation 3.1 by changing the ki,j values. Another local possibility is a non-linear manipulation of the fα values, for example gα = log(fα ). We do not consider such functions here because non-linear manipulations of the intensity values are not useful in resampling work, section 2.2 points out the necessity of the intensity surface being a linear function of the input image data. This still leaves two questions: • Why n points? • Why the nearest points? Why do we use n points when reconstructing with an nth order piecewise polynomial? A polynomial of order n can be expressed as f (x) = n−1 ci xi i=0 That is, it has n coeﬃcients. These n coeﬃcients can be uniquely determined from n data values. For example, in a piecewise polynomial function each piece is a function of the form: f (t) = n−1 cj tj j=0 where cj = b ki,j fα+i (this is simply a reformulation of equation 3.1). i=−a Note that a + b = n − 1, thus there are n fα+i values involved in calculating the ci co-eﬃcients. But why use exactly n data points when wanting an order n function? Approaching the problem from the other end: given m data points why limit ourselves to a function of order m? There are two alternatives: either use a function with order less than m or use a function with order greater than m. A function of order less than m has fewer than m coeﬃcients and thus cannot take all the available data into account. With an order n function (n < m) the eﬀective result will be that m − n data points will be irrelevant in calculating the function values, that is: m independent variables are used to calculate n values therefore the potential of m − n values are ‘lost’. Given that m points have the expressive power to produce m independent coeﬃcients [Foley et al, pp.478–479] it seems ludicrous to limit the function to any order less than m. The counter to this observation is that a good reconstructor, the interpolating cubic B-spline, has m n. The m data values are combined to give n coeﬃcients for each segment of the curve, producing a better result than can be obtained from just n data values. Another example of m > n is given by Keys [1981, pp.115960]. He presents a cubic function based on six data points which has an O(h4 ) convergence rate 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 81 to the Taylor series expansion of the function being interpolated, as compared to O(h3 ) for the Catmull-Rom spline. The use of a function of order n greater than m, is very tempting. While the data can only determine m independent coeﬃcients the extra n − m coeﬃcients give more degrees of freedom with which to satisfy more criteria (note that with a function of order n, and m data points you get m × n degrees of freedom). Unfortunately, the problem here is that the coeﬃcients are underconstrained and have to create data items to deﬁne themselves on. There is simply not enough data available to fully deﬁne the function and so the extra criteria that are imposed (using the extra degrees of freedom) force ‘default data values’ to be accepted — for example a quintic function (n = 6) based on m = 4 data points has a potential for two extra data values. As these are not supplied the criteria force these extra data values to some value, which may be constant for all pieces. For example, in this case, it may be that the value of the second derivative at each end of the segment is zero. Thus we now have six data values to base the six coeﬃcients on, the m = 4 real data points plus the two constant data values that f (α, 0) = 0 and f (α, 0) = 0. The following example will help to clarify this. Previously we produced the standard linear interpolant (m = n = 2) when deriving a reconstructor for the two point case. Now let m = 2 and n = 4. Thus we will produce a piecewise cubic function where each piece depends on two data points: f (x) = (at3 + bt2 + ct + d)fα + (et3 + f t2 + gt + h)fα+1 If we now impose the criteria met by the piecewise linear function in section 3.4.5 and use the extra freedom given by the cubic function to also enforce C1-continuity we get: f (x) = f (α, t) = (2t3 − 3t2 + 1)fα + (−2t3 + 3t2 )fα+1 Thus function’s kernel is: −2s3 − 3s2 + 1, 2s3 − 3s2 + 1, h(s) = 0, (3.17) −1 ≤ s ≤ 0 0≤s≤1 otherwise This is a member of the one parameter family of cubics described by Wolberg [1990, pp.129-131] and many others (equation 3.20). To produce this particular member the parameter, a, is set to zero. Analysis of this function in the spatial domain shows that, for any α: f (α, 0) = 0 and f (α, 1) = 07 . Thus the two extra data points required by the cubic are provided by these two ‘default data points’; the ﬁrst derivatives of the function at xα and xα+1 default to zero. This gives the interpolated function the form shown in ﬁgure 3.14. The extra C1-continuity provided by the cubic function means that its frequency response is smoother; it does, however, produce a rather odd looking interpolant. Maeland [1988] claims, from frequency domain evidence, that this piecewise cubic interpolant produces a much better quality of interpolation than the simple linear interpolation. In contrast, Fraser [1989b, p.272–3] claims that it produces much the same quality. Our visual evaluation shows that it is less blurry than linear interpolation but has a blocky (anisotropic) artifact, similar to nearest-neighbour, but not as obvious8 . 7 f (α, t) = f (xα + t∆x) 8 Mitchell and Netravali’s [1988] subjective evaluation also shows that the dominant artifact in this reconstructor is anisotropy. 82 CHAPTER 3. RECONSTRUCTION Figure 3.14: The two point cubic operating on an image. Notice that the slope of the curve at each data point is zero. Using a function of order higher than m allows more constraints to be met by the reconstructor at the expense of every piece being partly controlled by these ‘default data values’. Given that a piecewise function of order n can be used to meet up to n independent data items the most sensible course of action is to take those n items to be n real data points from the set of data available, rather than force n − m (m < n) of these items to be default data values, most likely the same value for every piece. We conclude that m = n is the ideal case but that both m < n and m > n are useful. Why the nearest points? The simple answer is that, to ﬁnd the value of the function at a given point, the data points which are most relevant are those nearest to the point. The farther apart two points are, the weaker the relation between their function values. Thus if you are going to use m data points to calculate the value at some point, then it is logical to use the m nearest data points. Thus we should use the nearest sample points to deﬁne the pieces of a local polynomial piecewise function. This gives us simplicity and localisation of data references in implementations. No data point beyond the local neighbourhood of the point we are interested in is used in the calculation of the function value at that point; unlike methods with precalculation where every data point tends to be involved in calculating the function value at every point. 3.5.6 Approximating sinc, approximating Gaussian, or neither One measure of the quality of a reconstructor is how closely it approximates some other, more desirable, reconstructor. The reasons for not implementing this more desirable reconstructor could be many; most often we are looking for a good local approximation to a non-local reconstructor. Mathematically the sinc reconstructor is the most desirable. Much eﬀort can be expended in designing a local ﬁlter to match the sinc function as closely as possible [Schafer and Rabiner 1973]. This work is usually undertaken in the frequency domain where the frequency response is designed to be as close to a box function as possible. For example, Reichenbach and Park [1989] analyse the two parameter family of cubics (equation 3.19) in the frequency domain to ﬁnd the best approximation to sinc reconstruction. Another example is Wolberg’s [1990, sec.5.4.7] tanh ﬁlter which has an approximate box function frequency response composed of the product of two tanh functions. Visually, the sinc function may not necessarily be the best function to use (ﬁgures 2.10–2.13). The Catmull-Rom function is considered to be a local approximation to the sinc function [Wolberg 1990, sec.5.4.3] yet both Mitchell and Netravali [1988], and Ward and Cok [1989] claim that visually the Catmull-Rom cubic produces a better reconstructed intensity surface than the sinc function. What reasons can be put forward to explain this apparent inconsistency? Ward and Cok [1989] suggest that this is due to the original signal not being bandlimited and 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 83 thus sinc reconstruction causing ripples near edges because of the aliased high spatial frequencies. However, we saw in section 2.7 that even a perfectly bandlimited image can display objectionable ripples in its sinc reconstructed intensity surface. The sinc reconstructed intensity surface of any image will contain ripples, although, if the image is bandlimited to well below the Nyquist limit, then the ripples will almost certainly be visually acceptable. Ward and Cok’s theory is that the sharp cutoﬀ in the frequency domain of the sinc reconstruction (inﬁnitely sharp, it is a box function) causes the rippling and that passing some frequency information above the Nyquist limit is beneﬁcial, producing visually nicer results, especially near edges. They believe that linear interpolation makes the picture too blurry (and thus must pass too much information above Nyquist and too little below) whilst the Catmull-Rom cubic interpolant gives about the right balance. Taking their argument and developing it further we postulate the following frequency domain explanation for the perceived diﬀerences in intensity surface quality arising from sinc and CatmullRom reconstructions: Almost any image of interest will have a frequency spectrum which is not bandlimited; this spectrum will usually peak near zero and fall oﬀ rapidly at higher intensities, as shown in ﬁgure 3.15(a). When sampled without perfect preﬁltering, as is usual, the resulting spectrum contains aliases, as illustrated in ﬁgure 3.15(b). Reconstructing this using sinc perfectly reproduces these aliases (ﬁgure 3.15(c)). Comparing ﬁgure 3.15(c) with ﬁgure 3.15(a) shows what is wrong with the reconstructed intensity surface: the frequencies just below the Nyquist frequency have too large a magnitude, whilst those just above have too small. Catmull-Rom cubic reconstruction, on the other hand, produces a reconstructed surface closer to the original. Figure 3.15(d) shows that the frequency domain representations of the original and the reconstructed surfaces match much better than with sinc reconstruction. Those extra high frequencies prevent the oﬀensive rippling artifacts. These diagrams, look quite convincing but how realistic are they? Firstly, many images do exhibit this distribution of frequencies: a high peak near zero and a long tail oﬀ to inﬁnity. Certainly the images we looked at had frequency responses that dropped as they approached the Nyquist frequency, but they did not fall all the way to zero, which indicates the presence of aliases. The eﬀect of sampling such a signal will be as illustrated in ﬁgure 3.15(b): the frequencies near zero undergoing little change whilst the frequencies near Nyquist exhibit large changes. However, these diagrams show only the eﬀect on magnitudes. The eﬀect of phase in all this has not been mentioned. The phases of the frequency components of the reconstructed intensity surfaces near and above Nyquist will bear little relationship to those in the original intensity surface, because the phases of the correct component and the aliased component will be added together, producing an incorrect result. The aliased component has a large magnitude relative to the correct component around the Nyquist frequency, so causing the largest errors in phase around this frequency. This may turn out to be unimportant, but it throws doubt on this explanation of why we observe that the Catmull-Rom cubic is better that the sinc function at image reconstruction. The ripples caused by sinc reconstruction are certainly caused by the limited frequency domain spectrum; that is there are no spatial frequencies greater than the Nyquist frequency. The outworking of this is that no change can occur in the intensity surface faster than a distance of 2ν1N (the sampling spacing) and that any change is accompanied by subsidiary ripples on either side. To explain the good results from Catmull-Rom interpolation we can consider the results in the spatial domain. Here we note several features of this reconstruction: • like sinc, no changes can occur faster than 1 2νN (the sample spacing); • the side lobe sharpening eﬀect (one ripple either side of an edge) is used to make edges appear sharper, and is thus not perceived as a ripple but as a sharper edge; • unlike sinc, no distracting extra ripples are produced; 84 CHAPTER 3. RECONSTRUCTION F( ν ) (a) - νN 0 νN ν F( ν ) (b) - νN 0 νN ν F( ν ) (c) - νN 0 νN ν F( ν ) (d) - νN 0 νN ν Figure 3.15: The eﬀects, in the frequency domain, of sampling and subsequent reconstruction by sinc and Catmull-Rom reconstructors: (a) the original intensity surface’s Fourier transform, note that it has frequencies higher than the sampling frequency νN ; (b) the sampled image’s Fourier transform after sampling at frequency νN , the dotted line is the curve from (a), aliasing has occurred (c) the Fourier transform of the sinc reconstructed image; the Fourier transform of the sinc reconstructor is shown by the dotted line; (d) the Fourier transform of the Catmull-Rom reconstructed image; the Fourier transform of the Catmull-Rom reconstructor is shown by the dotted line. 3.5. WHAT SHOULD WE LOOK FOR IN A NAPK RECONSTRUCTOR? 85 • large, near constant, areas can be reproduced with no visible internal rippling. Basically the Catmull-Rom cubic panders to our visual system’s predilections, whilst the sinc function panders to our desire for mathematical correctness. We are good at recognising sharp edges; the Catmull-Rom cubic sharpens edges. We object to extra ripples in ‘ﬂat shaded’ areas, which sinc reconstruction produces in abundance; the Catmull-Rom cubic produces none. Other approximations The other function to which a good local approximation is frequently sought is the Gaussian. The Gaussian is inﬁnite in extent, like sinc, and it falls oﬀ to zero as it goes to inﬁnity, also like sinc. There the similarities stop. Sinc is an oscillatory function whilst a Gaussian is strictly positive. The Gaussian is the limit of the B-spline family. It can be generated by convolving an inﬁnite number of boxes together. All members of the B-spline family are approximations to the Gaussian. The Gaussian tends to produce a blurry intensity surface (as do most B-splines) but is prized because it is strictly positive and has other nice analytical properties [Turkowski, 1990, p.150]. These other properties are irrelevant when designing an approximation as they will not be present in the approximation. The strict positiveness is an important property which is discussed in the next section. The sinc and Gaussian are by far the most approximated functions. There appears to be no work on approximating other functions. Keys [1981] chooses his cubic function by ﬁnding the member of the family of cubics which produces an intensity surface which matches the Taylor series expansion for the original intensity surface to as many terms as possible . This criterion is not making the reconstruction ﬁlter approximate some other ﬁlter, but is making the reconstructed intensity surface approximate the original intensity surface in a well deﬁned mathematical way. The ﬁnal option is to approximate no particular function at all. This means that the reconstructor is designed based on the other criteria discussed in this chapter and judged on its own merits, not on its closeness to some more ‘desirable’ reconstructor. Any resulting resemblance to some other reconstructor is thus purely coincidental. 3.5.7 Negative lobes vs strictly positive These terms refer to the shape of the reconstruction ﬁlter. All reconstruction ﬁlters are mostly positive (to ensure that the average intensity of the intensity surface is the same as that of the image from which it is reconstructed, the integrated area under the whole reconstruction ﬁlter must be unity). The debate is over whether the ﬁlter should be all positive or whether negative lobes are permissable. The problem with negative lobes is that they introduce the possibility that some parts of the intensity surface may have intensity values brighter than the brightest white in the image, or, worse still, darker than black. The former can be dealt with, as it is always (theoretically) possible to produce a brighter white. The latter is a major problem, there is no such thing as negative light, there is no intensity ‘blacker than black’ and so such intensities cannot be displayed9 . Blinn [1989b] says that clamping these values to zero (black) is about the best you can do, but is not ideal. This issue is taken up in section 4.5.2. Because of this problem of negative light some have advocated using only a strictly positive kernel. Unfortunately, such reconstructors tend to produce blurry intensity surfaces. Further this makes it diﬃcult to produce a good interpolating reconstruction. Blinn [1989b, p.87] asserts that any decent approximation to a low-pass ﬁlter, for either sampling or reconstruction, has negative lobes. 9 Negative light is an interesting concept. A negative light bulb, for example, would be very useful: if you are working nights and have to sleep during the day, just turn on your negative light bulb, ﬂooding the room with negative light and there you have it, a darkened room. It can’t do anything about traﬃc noise though. 86 CHAPTER 3. RECONSTRUCTION It is the negative lobes in the Catmull-Rom cubic, for example, that generate the desirable edge sharpening eﬀect. Thus we are faced with a simple choice: either allow negative lobes and cope with any negative intensities which arise; or reject negative lobes and accept a certain amount of blurriness in the intensity surfaces. 3.5.8 Computational considerations In all this we have given little consideration to the length of time required to generate actual values of the intensity at arbitrary points. Keys [1981, p.1153] remarks “Because of the [large] amount of data associated with digital images, an eﬃcient interpolation algorithm is essential.” Obviously some housekeeping calculations are common to all piecewise local polynomials (for example, ﬁnding α and t) but there is part of the calculation which depends on m and n. To calculate a single value using equation 3.1 (the one-dimensional version) requires m(n + 1) multiplications (plus (n − 2) to ﬁnd all the tj values for t ∈ {2, 3, . . . , n − 1}) and (mn − 1) additions. Thus it is an O(mn) process. For computational eﬃciency we therefore need to keep m × n as small as possible. Small m also means that there need to be less of the image in memory at once, an important point for stream processing: the fewer input points that need to be operated on for each output point, the better. Speed of computation, which is a function of m × n, not just m, is also very important in stream processing. So we ﬁnd a tension here: on the one hand the larger m × n the more criteria the reconstruction can meet; on the other hand the smaller m × n, the faster the resampling can be performed. This generalises to other reconstructors; computational cost is related to ‘M ’, the number of input points used to produce one output point, and ‘N ’, the amount of processing required for each of the ‘M ’ input points. In general, the more criteria a reconstructor has, and the more input points required to generate one output point, the longer it takes to evaluate. 3.5.9 No extra knowledge Finally it should be noted that there is no knowledge about the image besides the data points and thus any reconstruction method must be based solely on information in the data points. Indeed, this is what is meant by ‘no a priori knowledge’. Later we will look at what can be achieved when more knowledge is available. For example, Mitchell and Netravali [1988] show that a much better cubic reconstruction can be produced if both intensity and ﬁrst derivative of intensity values are available at each sample point. 3.5.10 Summary Chu [1990] gives a list of the ﬁve features which a good interactive interpolation procedure for CAD should possess. While providing a good starting point for investigating features which a good reconstructor should possess, the use of the resulting curve dictates diﬀerent priorities. The features which a good NAPK reconstructor should possess are: • Linear phase. • C0-continuity, C1-continuity if possible, higher orders of continuity if desirable (a general rule is that the higher the order the better the frequency response, all other things being equal). 3.6. CUBIC FUNCTIONS 87 • First order straightness, so that the intensity surface does not exhibit sample frequency ripple. • Maximum use of the available information — a practical outworking of this is if we restrict ourselves to an order n function then we should use the same number of data points (n) and the most relevant ones (the closest). • Reasonable computational eﬀort. 3.6 Cubic functions We now turn back to considering speciﬁc functions. The cubic piecewise functions are amongst the most studied group of reconstructors. The general form is: (k−1,3 t3 + k−1,2 t2 + k−1,1 t + k−1,0 )fα−1 (k0,3 t3 + k0,2 t2 + k0,1 t + k0,0 )fα (k1,3 t3 + k1,2 t2 + k1,1 t + k1,0 )fα+1 (k2,3 t3 + k2,2 t2 + k2,1 t + k2,0 )fα+2 f (x) = + + + or, more compactly: f (x) = 2 i=−1 3 ki,j tj fα+i (3.18) j=0 x − x0 x − xα α= , t= ∆x ∆x where Given these 16 degrees of freedom (ki,j ) we agree with Mitchell and Netravali [1988] in imposing the following criteria: • Linear phase (Mitchell and Netravali do not explicitly state this but it is implicit in their formulations of the function as a symmetric cubic ﬁlter with only eight degrees of freedom, the other eight are required to ensure linear phase). • C0-continuity • C1-continuity • Straightness (which they call designing “the problem of sample-frequency ripple” out of the ﬁlter). With these constraints, which, from our discussion in section 3.5, are not contentious, the degrees of freedom in equation 3.18 are reduced from sixteen to two10 . The function is now as follows: f (x) = + + + where B and p.223]. (− 16 B − C)t3 (2 − 32 B − C)t3 (−2 + 32 B + C)t3 ( 16 B + C)t3 + + + + ( 12 B + 2C)t2 (−3 + 2B + C)t2 (3 − 52 B − 2C)t2 (−C)t2 (− 12 B − C)t ( 16 B) )fα−1 (1 − 13 B) )fα + ( 12 B + C)t ( 16 B) )fα+1 )fα+2 (3.19) C are the two remaining degrees of freedom as used by Mitchell and Netravali [1988, ( ( ( ( + + + + 10 The only bone of contention here could be whether the C1-continuity is necessary. As we decided in section 3.5.2 that C1-continuity is desirable, if possible, and as there are quite enough degrees of freedom to allow C1-continuity then it is sensible to include it as a criterion here. 88 CHAPTER 3. RECONSTRUCTION If we add the C2-continuity criterion then the equation above reduces to the case where B = 1 and C = 0. This C2-continuous cubic is the approximating cubic B-spline (which is not interpolating). If we impose the interpolating criterion (and do not impose the C2-continuity criterion) then we get: B = 0 with C still a free parameter. This one-parameter family of interpolating cubic functions have been extensively studied [Keys, 1981; Park and Schowengerdt, 1983; to a lesser extent: Maeland, 1988 and Parker et al., 1983]. Keys [1981] uses the parameter a (a = −C) while Park and Schowengerdt [1983] use the parameter α (α = a). Whatever the name of the parameter, it is the same family of cubics. Its general form is: (−2a)t2 + (a)t )fα−1 f (x) = ( (a)t3 + 3 + ( (2 + a)t + (−3 − a)t2 + 1 )fα (3.20) + ( (−2 − a)t3 + (3 + 2a)t2 + (−a)t )fα+1 3 2 + ( (−a)t + (a)t )fα+2 Various members of this family have been advanced as useful: −3 ≤ a ≤ 0 because between these limits the parametric cubic is a local approximation to the sinc interpolant [Wolberg, 1990, p.130]. a = −1 because the slope of the kernel, h(s) at s = ±1 is then equal to the slope of the sinc function kernel at s = ±1 [Parker et al, p.36]. a = − 43 to make the function C2-continuous at s = ±1; it is still C2-discontinuous at s = ±2 [see Maeland, 1988]. a = − 21 to make the interpolation function agree with the Taylor series expansion of the function being interpolated to as many terms as possible [Keys, 1981]. In this case this means that the interpolation function can exactly reproduce a quadratic (second-order) function. Park and Schowengerdt [1983] propose this value also, their argument being that a = − 12 gives the best frequency response in that it provides the best low frequency approximation to the ideal reconstruction ﬁlter (sinc) and is also particularly eﬀective at suppressing aliasing [ibid., p.265]. Therefore, they state that it is the optimal value of a in the image-independent case. a = 0 is recommended by Maeland [1988] as a strictly-positive two-point cubic reconstructor and is, in fact, the two-point cubic that we derived in section 3.5.5. “a dependent on the image energy spectrum”. This idea, from Park and Schowengerdt [1983], is that the value of a is tailored to the energy spectrum of the image that is to be resampled. They show that, in typical cases, this value should lie in the range − 34 ≤ a ≤ − 12 . 3.6.1 Other criteria Mitchell and Netravali [1988] performed an experiment to ﬁnd the combination of values for B and C which gave the best subjective results. They concluded that the reconstructor deﬁned by B = 13 , C = 13 in equation 3.19 is subjectively the best (based on nine experts’ opinions). This is neither interpolating nor C2-continuous but a curious blend of two-thirds Catmull-Rom spline (see below) plus one-third approximating cubic B-spline. This reconstructor will exactly reproduce a ﬁrst-order function as will any reconstructor for which 2C + B = 1 [Mitchell and Netravali, 1988, p.244], reconstructors for which 2C + B = 1 can only exactly reproduce zeroth-order functions and the reconstructor C = 12 , B = 0 (Catmull-Rom spline) can (as mentioned already) exactly reproduce a quadratic function. Mitchell and Netravali [1988] propose a further reconstructor, B = 32 , C = − 14 which very strongly suppresses rastering artifacts but also blurs the image. 3.6. CUBIC FUNCTIONS 3.6.2 89 The Catmull-Rom spline The reconstructor for which B = 0, C = 12 is commonly known as the Catmull-Rom spline. The name derives from a 1974 paper by Catmull and Rom in which they propose a large family of functions deﬁned by the weighted blending of other (usually simpler) functions. Their general form, translated into our notation, is: * gi (x)wi (x) (3.21) f (x) = i* i wi (x) where the gi (x) are the original functions, the *wiw(x)(x) are the weighting or blending functions, i i and f (x) is the resulting function. Note that in this model functions are blended together, rather than sample values as in our model (equation 3.1). While there are many reconstructors that can be deﬁned using Catmull and Rom’s method, one speciﬁc example in their paper has become widely known as the Catmull-Rom spline. It is generated as follows: The gi (x) are deﬁned to be the set of straight lines passing through adjoining points: " " # # x − xi x − xi gi (x) = fi 1 − + fi+1 ∆x ∆x The wi (x) are deﬁned to be oﬀset third order B-splines: " # x − xi 1 − wi (x) = B3 ∆x 2 where: B3 (s) = 1 2 3 2s − 2s −s2 + 34 , 1 2 3 2s + 2s + 98 , + 98 , ≤ s ≤ 32 − 12 ≤ s ≤ 12 − 32 ≤ s ≤ − 12 1 2 (Note that B3 (s) is also the kernel of one of our quadratic reconstructors — see equation 3.12.) This produces the Catmull-Rom spline, which was also chosen by Keys [1981] and Park and Schowengerdt [1983] as the best member of the family of parametric cubics which they were studying. There is yet another way of deriving the Catmull-Rom reconstructor. It can be generated by taking a linear blend of two parabolic functions. This process is explained by Brewer and Anderson [1977] who base their derivation on earlier work by Overhauser [1968]. It is illustrated in ﬁgure 3.16. The unique quadratic through the three points (xα−1 , fα−1 ), (xα , fα ) and (xα+1 , fα+1 ) can be shown to be: ( 12 s2 − 12 s)fα−1 qα (s) = + (−s2 + 1)fα + ( 12 s2 + 12 s)fα+1 where s = x−xα ∆x . In ﬁgure 3.16, p(r) = q2 (r) and q(s) = q3 (s). From this we can easily derive the cubic, cα (t): cα (t) = (1 − t)qα (t) + tqα+1 (t − 1) Which is the Catmull-Rom spline yet again (in ﬁgure 3.16, c(t) = cα (t)). Brewer and Anderson [1977] call this spline the ‘Overhauser Curve’ after the original inventor of the parabolic blending method. However Overhauser’s [1968] statement of the method (and Rogers 90 CHAPTER 3. RECONSTRUCTION f 2 p ( r) f f c( t ) q( s ) 1 f x x 1 r=-1 4 x 2 t=0 r=0 s=-1 3 x 3 t=1 r=1 s=0 4 s=1 Figure 3.16: Brewer and Anderson’s [1977] Overhauser cubic, deﬁned as the linear blend, c(t) of two interpolating parabolas, p(r) and q(s): c(t) = (1-t)p(r)+tq(s) = (1-t)p(t)+tq(t-1) . q( s) c( t) r t=0 p( r) t1 s1 t= t 0 r1 t q( s) p( r) s Figure 3.17: How Overhauser [1968] deﬁned the Overhauser cubic (after Overhauser [1968] ﬁgure 1, p.1). 3.6. CUBIC FUNCTIONS 91 and Adams’ [1975, pp.133–138] subsequent restatement) uses a slightly diﬀerent derivation. His derivation uses a local coordinate system for each of the two parabolas, as illustrated in ﬁgure 3.17. c(t) is then linearly blended from the two functions, as before, by: # " # " t t p(r) + q(s) c(t) = 1 − t0 t0 To make sense of this it is necessary to have r and s as functions of t. Overhauser chose to deﬁne these two functions by dropping a perpendicular from the point t, on the t-axis, to the r- or s-axis, as illustrated by the points t1 , s1 and r1 in ﬁgure 3.17. Unfortunately, this makes the resulting algebra extremely complex. So much so that it is not worth following for our use11 . Thus, while Brewer and Anderson [1977] do produce a curve in the spirit of Overhauser, it is not the same curve as that developed by Overhauser [1968]. Brewer and Anderson’s ‘Overhauser Curve’ can be ﬁtted into Catmull and Rom’s formalism by deﬁning: gi (t) wi (x) where B2 (s) = qi (t) = B2 (x − i) 1 + s, −1 ≤ s ≤ 0 1 − s, 0 ≤ s ≤ 1 = 0, otherwise B2 (s) is a linear blending function. It is the same function as the linear interpolation kernel and the second order B-spline. We have now seen that the same spline has been derived independently in four diﬀerent ways. To Catmull and Rom [1974] it was the quadratic B-spline blend of three linear functions; to Brewer and Anderson [1977] it was the linear blend of two quadratics; to Keys [1981] it was the C1-continuous interpolating cubic which matched a Taylor’s series expansion to the highest order and to Park and Schowengerdt [1983] it was the C1-continuous interpolating cubic with the best frequency response. As if this were not enough, there is a ﬁfth derivation of the same cubic, which can be found in the next section. The fact that the same cubic has been independently put forward as somehow ‘the best’ cubic by two separate methods, and is also the end result of three other derivations would seem to be heavy evidence in favour of adopting the Catmull-Rom spline as the best all-round cubic interpolant. Mitchell and Netravali’s [1988] subjective analysis backs this up; the Catmull-Rom spline falls near the middle of the region of ‘satisfactory’ subjective behaviour. Whether the Catmull-Rom spline is the best all-round cubic reconstructor is another question. Mitchell and Netravali would say no, the reconstructor deﬁned by B = 13 , C = 13 in equation 3.19 is subjectively better, while Reichenbach and Park [1989] in their critique of Mitchell and Netravali’s [1988] paper claim that, from a frequency domain analysis, yes, the Catmull-Rom spline is the best. Their criterion for ‘best’ is closeness of the reconstructor’s frequency response to that of the perfect reconstructor: sinc. Wolberg [1990, p.132] points out that this disagreement brings into question whether Reichenbach and Parks’ criterion for ‘best’ is useful, as it does not equate to the subjective ‘best’ and, for pictures for human consumption, it is subjective quality that is important. On the other hand Mitchell and Netravali’s experiment involved scaling and sampling as part of the resampling process; so their results judge the subjective quality of the whole resampling, not just the reconstruction. This theme is picked up in chapter 7. For a given application, it may be desirable to use a diﬀerent interpolant. In particular, if a strictly positive interpolating kernel is required, then the cubic interpolant with parameter a = 0 can be used [Maeland, 1988, pp.215-216]. If an interpolant is required that perceptually sharpens the picture then the cubic interpolant with a = −1 can be used [Parker et al., 1983, pp.36, 39]. 11 However, it should be noted that Overhauser’s use (CAD) is very diﬀerent in its requirements to ours. 92 CHAPTER 3. RECONSTRUCTION However, Mitchell and Netravali [1988] claim that this interpolant introduces too much ringing into the reconstructed intensity surface. An interpolant with properties midway between this sharpening interpolant and the Catmull-Rom interpolant can be obtained by using a = − 34 [Parker et al., 1983, p.36]. 3.6.3 Other cubic methods There are many methods for generating cubic reconstructors in the literature. Some ﬁt into our formalism and some do not. Cubic polynomials are the most often used polynomials because lower order polynomials give too little ﬂexibility and higher order polynomials can introduce unwanted wiggles (and require more computation) [Foley et al., 1990, p.478, sec. 11.2]. Their usefulness has lead to a large body of knowledge about them — mostly about their usefulness for describing surfaces in computer aided design (CAD) and its related ﬁelds. It is useful to brieﬂy scan through these methods to see if they shed any more light on useful methods for intensity surface reconstruction. Foley et al. [1990] give a summary of the most common of these. We will consider the utility of each one. Hermite The Hermite form of a cubic is determined from two endpoints and tangents (ﬁrst derivatives) at those endpoints. The formula for a piecewise Hermite is: [Foley and van Dam, 1982, p.516]: f (x) = [ t3 t2 + + + x − x0 where α = ∆x and t = −2 1 3 −2 0 1 0 0 1 fα fα+1 −1 0 fα 0 fα+1 (2t3 − 3t2 + 1)fα (−2t3 + 3t2 )fα+1 (t3 − 2t2 + t)fα (t3 − t2 )fα+1 = t 2 −3 1 ] 0 1 x − xα as usual. ∆x Here fi are the sample values, as before, and fi are the tangents of the sampled function at the sample points12 . We will not, in general, have these tangent values as we are dealing with NAPK reconstructors. However, Mitchell and Netravali [1988] show that, if these tangent values are available (as, for example, is possible with some artiﬁcially generated images), then the Hermite form produces a better result than the best cubic reconstruction based only on the sample values. In the absence of explicit tangent information, it is common to approximate the tangent by the diﬀerence between adjacent sample values. That is: fi ≈ (∆x) 12 N.B. fi = + + df dt x=xi ∆fi = ∆fi = fi+1 − fi ∆x because the equation is in terms of t rather than x: fi = + + + df + df dx + dx + = = (∆x) + + + dt x=xi dx dt x=xi dt x=xi 3.6. CUBIC FUNCTIONS 93 ’ Pα ^ P α+1,−1 ^ P α,1 Pα+1 Pα ’ Pα+1 Figure 3.18: How the Bezier and Hermite cubics are related: the Hermite form of the curve is controlled by the two data points, Pα and Pα+1 and by the two derivative vectors P α and P α+1 ; the Bezier form is controlled by the two data points, Pα and Pα+1 and by two other control points, P̂α,1 and P̂α+1,−1 . The relationship between these is given in equations 3.22 and 3.23. The approximation fi ≈ fi − fi−1 is equally valid and it is the average of these two approximations that is generally taken: ∆fi + ∆fi−1 fi ≈ 2 where ∆fj = fj+1 − fj . If this approximation is used in the Hermite cubic then resulting function is the Catmull-Rom spline. This is the ﬁfth derivation of the Catmull-Rom spline (mentioned in section 3.6.2). Reichenbach and Park [1989] independently discovered this derivation of the spline. Bezier The Bezier form of the cubic is related to the Hermite. It uses four control points for each piece: two endpoints plus two other points to control the shape of the curve. If we take the two endpoints to be (xα , fα ) and (xα+1 , fα+1 ) and the two control points to be (x̂α,1 , fˆα,1 ) and (x̂α+1,−1 , fˆα+1,−1 ). The location of these two control points is linked to the Hermite form by the formula: [see Foley and van Dam, 1982, p.519]. P̂α,1 P̂α+1,−1 1 = Pα + P α 3 1 = Pα+1 − P α+1 3 (3.22) (3.23) This relationship is illustrated in ﬁgure 3.18. The Bezier form can be used in two ways: either by deﬁning the control points in some way, or by using the existing data points as control points. The former method presents us with a situation similar to that with the Hermite: how do we deﬁne these extra points? If we have the derivative values then we can use equations 3.22 and 3.23 to calculate the positions of the control points and thus produce a Hermite cubic as before. Rasala [1990] shows how the positions of the control points can be computed from all the available data13 , this is discussed in section 4.5.1. However, his ﬁrst approximations to this globally con13 Theoretically all of the available data should be used. In practice it is only necessary to use the 12 to 16 nearest points, still rather more than the four nearest points we are using here. 94 CHAPTER 3. RECONSTRUCTION trolled piecewise cubic, which require four deﬁning points are, in one case the Catmull-Rom spline (again!) and in the other case, the one-parameter cubic (equation 3.20) with a = − 38 . The second way of using a Bezier is either to allow every third sample point be an endpoint (of two adjoining sections) with the intermediate points being the other control points, or to use the four nearest points as the control points. The former produces a curve which, in general is only C0-continuous and interpolates only one third of the points. The latter produces a curve which in general is discontinuous and not interpolating. For these reasons, neither is useful. B-splines The uniform non-rational cubic B-spline [Foley et al, 1990, pp.491-495] is the reconstructor deﬁned by setting B = 1 and C = 0 in equation 3.19, otherwise known as the approximating cubic B-spline. Non-uniform, non-rational B-splines [Foley et al., 1990, pp.495-500] have advantages over the uniform rational B-spline, in that the knots do not have to be uniformly spaced in parameter space and knots can have a multiplicity which creates a Cn-discontinuity at the knot. Neither of these properties is useful for intensity surface reconstruction because the samples are uniformly 0 spaced in parameter space (α = x−x ∆x ) and there is no need for multiple knots. Thus non-uniform, non-rational B-splines oﬀer features that we do not need. They could be useful if we had an image that was not sampled uniformly. Non-uniform rational B-splines (NURBS) [Foley et al., 1990, pp.501- 504; Piegl, 1991], have many features which make them extremely useful in CAD/CAM and other graphics applications. However, for image reconstruction they are ‘over the top’ because, again, the extra features that they provide are unnecessary for reconstruction. These features are similar to those for non-uniform, non-rational B-splines, plus the fact that they can precisely deﬁne any conic section. β-splines β-splines [Foley et al., 1990, pp.505-507] have two extra parameters: β1 , the bias parameter, and β2 , the tension parameter. These are used to further adjust the shape of the B-spline curve. They can be either local to each control point (‘continuously shaped β-splines’ or ‘discretely shaped β-splines’) in which case we are again faced with the problem of how to derive these two extra data for each point, or are global (‘uniformly shaped β-splines’) in which case we need to ﬁnd acceptable values for β1 and β2 . β1 biases each segment more towards either its left-hand control points (1 < β1 ≤ ∞) or its right-hand control points (0 ≤ β1 < 1). This is undesirable in our case as there should be no global bias in either direction. Thus we must constrain β1 = 1. This makes the β-spline C1-continuous: f (x) = where α = x − x0 ∆x , 1 (−2t3 + 6t2 − 6t + 2)fα−1 12 + β2 ((2β2 + 6)t3 + (−3β2 − 12)t2 + (β2 + 8))fα ((−2β2 − -6)t3 + (3β2 + 6)t2 + 6t + 2)fα+1 (2t3 )fα+2 and t = x − xα . ∆x If β2 = 0 then this is the ordinary cubic B-spline, which can be represented by the Mitchell and Netravali cubic (equation 3.19) with B = 1 and C = 0. However, if β2 = 0 then this function cannot be represented by equation 3.19 and so it cannot be meeting all of the criteria met by equation 3.19. A little analysis shows that it does not have linear phase for β2 = 0 and so we must reject it for this reason (see section 3.5.1). The ﬁnal cubic in Foley et al’s [1990, p.507] list is the Kochanek-Bartels function; a variant of 3.7. HIGHER ORDERS 95 the Hermite cubic where each segment depends on the four nearest control points and three extra parameters speciﬁc to that segment. As with the Hermite curve the problem with using this function is that we have no way of generating these extra parameters. This completes the list of cubics given by Foley et al [1990]. Others do exist and some families have been given names which have not been mentioned here, for example the Mitchell and Netravali cubics with C = 0 are Duﬀ’s tensioned B-splines [see Mitchell and Netravali, 1988, p.223]. Most of these functions have been primarily designed for CAD; few of them are applicable to reconstruction. The aims of the two ﬁelds are diﬀerent, there tends to be more data available for describing the curve in CAD than in resampling (for example: derivative values), thus allowing these more complex forms, and CAD has greater need of weird and wonderful properties in its curves than resampling (for example: discontinuities at some sample points, and not at others). However, this section has, at least, considered the utility of these functions. The approximating cubic B-spline has already been met as a case of equation 3.19. The interpolating cubic B-spline and interpolating Bezier cubic are discussed in chapter 4. 3.6.4 Summary It would appear that the all-round best cubic interpolant is the Catmull-Rom spline. There is less consensus on the all-round best cubic reconstructor, but Mitchell and Netravali’s results would indicate that it is some linear blend of the Catmull-Rom spline and the cubic B-spline: f (x) = (1 − ζ)CatRom(x) + ζBSpline(x) with ζ closer to zero than one. Mitchell and Netravali [1988] recommend ζ = 13 , while Reichenbach and Park [1989] recommend ζ = 0. 3.7 Higher orders In section 3.5.10 we saw the multitude of conditions which it would be useful to get a reconstructor to obey. It would be almost impossible to meet all these conditions. If we look at the Catmull-Rom spline, it does meet most of the conditions but fails in that it is not C2-continuous and its frequency response, while the closest to the perfect frequency response of any cubic [Reichenbach and Park, 1989], is still not particularly close to it (Note that the approximating cubic B-spline is the only C2-continuous cubic, and it is not an interpolant. It produces a rather blurry intensity surface.) The solution to this problem could be to go to a higher-order. Parker et al [1989, p.39] state that “although the length [cubic function, 4 data points] for interpolating functions is adequate for many tasks, additional improvement in performance could be made by including more points in the calculation. The same criteria used for selecting interpolating functions used in this paper — frequency responses, phase shift and gain — would also be useful in selecting between functions with a longer extent.” Keys [1981, pp.1159-60] six point cubic function has an O(h4 ) convergence rate to the Taylor series expansion of the function being interpolated. As cubics can give at most O(h4 ) accuracy, it immediately follows that higher order convergence rates require higher order piecewise polynomials. However, Mitchell and Netravali [1988, p.225] believe that no simple ﬁlter (such as a piecewise local polynomial) will be found that will improve much on the cubic ﬁlters they have derived. They thus suggest that other avenues of research would be more fruitful. Further, Foley and van Dam [1982] note that higher order polynomials can introduce unwanted wiggles in the reconstructed function. Quadratics and cubics allow a single lobe, or ripple, either side of an edge, which perceptually sharpens the edge (section 4.5.2). Quartics, and higher orders, allow more ripples either side, which can perceptually degrade the image. Hence it may be counter-productive to go to higher orders than cubic. 96 CHAPTER 3. RECONSTRUCTION We have done little work on ﬁfth, and higher, order piecewise polynomials and so cannot give evidence as to the validity of these arguments. It should be fairly simple, though, to extend our analysis to higher orders. We leave this for future work. However, there seems to be little room for signiﬁcant improvement, and so, perhaps, the subject of piecewise local polynomials should be closed. 3.8 Summary We have seen several ways of classifying reconstructors, including local vs non-local, and skeleton vs continuous. Skeleton reconstruction describes a situation where the intensity surface is deﬁned only at speciﬁc points, and is undeﬁned elsewhere. Delta function reconstruction is a similar idea. The bulk of this chapter was devoted to piecewise local polynomials. There are, by far, the most studied group of local reconstructors. They are easily understood, quickly implemented, and fast to evaluate. We proved that zero-phase piecewise local quadratics do exist, and derived two new quadratic reconstructors. One of these provides similar visual quality to the Catmull-Rom cubic, with faster evaluation. The study of quadratics led into an examination of the factors involved in the design of reconstructors. This in turn brought us to a discussion of piecewise local cubics. Here we saw that the evidence indicates that the Catmull-Rom cubic is the best all-round cubic interpolant, and that it, or some linear mix of it and the cubic B-spline, is the best all-round cubic reconstructor. 3.8. SUMMARY 97 Chapter 4 Inﬁnite Extent Reconstruction In the previous chapter we discussed the most studied group of reconstructors, the piecewise local polynomials. We now turn our attention to other classes of reconstructor: 1. the inﬁnite reconstructors 2. the local approximations to the inﬁnite reconstructors 3. the a priori knowledge reconstructors. The ﬁrst two are discussed in this chapter. The ﬁnal class we leave until chapter 6. 4.1 Introduction to inﬁnite reconstruction Into this class fall all the reconstructors which use all of the sample points in the image. We will concentrate on these four: • sinc interpolant • Gaussian approximant • interpolating cubic B-spline • interpolating Bezier cubic All these methods are based on a theory of an inﬁnite image; that is: the theory does not take the image’s edges into account. In a practical situation the image has edges and these must be dealt with if we are to use an inﬁnite reconstructor. Edge eﬀects and the edge extension solutions are discussed in the next section. 4.2 Edge eﬀects While the ﬁnite size of real images is a readily apparent problem for inﬁnite extent reconstructors, the problem exists to a lesser extent for ﬁnite-extent reconstructors, such as those discussed in the previous chapter. It is feasible that a certain transformation, reconstructor, and sampler may combine to produce a resampling which never needs to access data beyond the edge of the image, but in general, we have to answer the question: what is out there? 98 4.2. EDGE EFFECTS 99 The simple answer is that nothing is out there, we know absolutely nothing about the intensity values beyond the edges of our ﬁnitely sized image. This is unfortunately of little use in cases where we need to know the value of a sample beyond the edge of the image. There are at least eight ways of specifying the values of the pixels beyond the image’s edge; thus producing an inﬁnite image from a ﬁnite one. These are: 1. Undeﬁned 2. Error Limits 3. Constant 4. Extrapolation 5. Replication 6. Reﬂection 7. Transparent 8. Special Cases The last one requires special knowledge about the reconstructor involved and must be tailored to suit. The others are independent of the reconstructor. There are two types of occurrence which require information about pixel values beyond the edge of the image: (a) when we require the value of the intensity surface at some point within the ﬁnite image’s edges, but, to calculate the value at that point, the reconstructor needs pixel values from beyond the edge of the image; and (b) when we require the value of the intensity surface at some point outside the ﬁnite image’s edges. Figure 4.1 gives an example of both these types of pixel. It is possible to use a diﬀerent method to generate intensity surface values of type (a) from that used for type (b). For example, type (a) could be generated by some edge extension method while type (b) is given a constant value. In most situations, these two cases collapse nicely into the single case: ‘when the reconstructor requires the value of a pixel beyond the image’s edges’. In ﬁgure 4.1 the shaded band shows the area near the edge of the image where the reconstructor needs values beyond the image’s edge to produce intensity surface values. For an inﬁnite extent reconstructor this shaded area extends over the whole image, but for ﬁnite extent reconstructors it is a band, usually a few pixels wide, around the edge of the image. It is non-existent only for nearest-neighbour, skeleton, and Delta-function reconstruction. The necessity of edge extension depends on the reconstructor, transformer, and sampler used. If they form a resampling where the required portion of the intensity surface lies within the inner region in ﬁgure 4.2 (region A) then no image extension is required, and no edge eﬀects will occur. An example is shown in ﬁgure 4.3(a). However, many resampling applications require values of the intensity surface in region B (the band) and often also in region C (the outer region). An example is shown in ﬁgure 4.3(b). In these cases we must decide how to perform edge extension and how to minimise the resulting edge eﬀects. There are two possible ways of deﬁning edge extension. One is to assign values to the unknown pixels and use the reconstruction method on these assigned sample values to generate the intensity surface. The other method may give values to some, or all, of these unknown pixels but it also overrides the reconstruction method by explicitly deﬁning part of the intensity surface, usually region C in ﬁgure 4.2. Little work has been done on the edge extension problem. The usual assumption that is made is that any pixel outside the image has the intensity zero [Fiume, 1989, p.78, deﬁnition 16; Pratt, 1978, ch.9]. Two other common assumptions are that the image is repeated (that is: it tiles the plane), or that it is mirrored and repeated. These are illustrated later in the chapter. White and 100 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION (a) (b) Figure 4.1: The two types of destination pixel for which edge eﬀects are important. The image on the left is rotated (dashed line on right) to produce a new image (solid line on right). The shaded area in the left hand image represents that part of the reconstructed intensity surface whose values depend on samples outside the image’s edges. Pixel (a) in the ﬁnal image lies within the original image’s edges but its value depends on sample values outside those edges. Pixel (b) lies within the ﬁnal image’s edges but outside the original image’s edges. A B C image edge Figure 4.2: The three areas of an image: A: the intensity surface in this area depends only on pixel values within the image’s edges. B: the intensity surface in this area depends on pixel values both inside and outside the image’s edges. C: the intensity surface in this area lies outside the image’s edges. 4.2. EDGE EFFECTS 101 (a) (b) Figure 4.3: Two examples of transforms, one requiring edge extension and one not. (a) all intensity surface values in the ﬁnal image (right) lie within region A (ﬁgure 4.2) in the original image, so no edge extension is required. (b) Intensity surface values in the ﬁnal image lie in all three regions of the original image, so some edge extension is required. Brzakovic [1990] present two novel methods of image extension but, otherwise, there appears to be little published on it. These, and other, methods of edge extension are discussed in detail below. 4.2.1 The methods of edge extension Edge extension is fundamentally the process of producing pixel values outside the image’s edges for the reconstructor to operate on, thus producing an intensity surface of inﬁnite extent. It also includes those methods which explicitly deﬁne part of the intensity surface to have values independent of the reconstructor. The ensuing discussion covers the eight methods of edge extension. (1) Undeﬁned Our opinion is that this is the philosophically correct value to give to those pixels which lie beyond the image’s edges. We don’t know what these values are and so they should remain undeﬁned. Furthermore, if the value ‘undeﬁned’ occurs in the deﬁnition of the intensity surface’s value at any point then that intensity surface value is also undeﬁned. The practical upshot of this is that the only part of the intensity surface which is deﬁned is that which lies within the inner boundary of the shaded area in ﬁgure 4.2 (region A). So, if we have an N × N image, and a reconstructor which requires an L pixel wide boundary, the resulting intensity surface will be deﬁned over only (N − 2L) × (N − 2L) pixels. This assumption is often used in other image processing operations, where the loss of a few pixels is unimportant. For example, see Fang, Li and Ni’s [1989] discussion of two dimensional convolution. 102 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION In resampling, however, this solution is unworkable. Firstly, if we use an inﬁnite extent reconstructor, practically the entire intensity surface will be undeﬁned1 . Secondly, and more importantly: in most cases, some of the destination pixels will have an undeﬁned value. It is impossible to display ‘undeﬁned’ as it is not a real intensity value. We could decide that a pixel with an undeﬁned value is displayed with some constant intensity but, if we do this, undeﬁned pixels cannot be distinguished from pixels which really have that intensity. Further, if undeﬁned values are stored as this constant value then errors will occur if the destination image is the source for further processing unless we also keep note of which stored pixels are deﬁned and which are not. The problem has a simple solution in Fang, Li and Ni’s case. Their destination image is simply smaller than the source, and there are no undeﬁned pixels within the destination’s edges. In our general resampling case we cannot guarantee this, and, if we do display an image which contains undeﬁned pixels we must display them with some intensity other than ‘undeﬁned’. We cannot have undeﬁned pixels and display them too! This particular solution to the edge problem, while philosophically sound, is unsatisfactory for practical purposes. We must look to other solutions. (2) Error bounds In the previous case we assumed that we knew nothing about the value of the pixels beyond the image’s edges. This is untrue, in that we will, in general, know the bounds within which these pixel values are constrained to lie. If we are dealing with an image representation which bounds all pixel values between two extrema, then we can assume that all the unknown pixel values must lie between them also. Having deﬁned bounds within which the unknown pixel values can lie, it is possible to calculate bounds within which the intensity surface must lie, as illustrated in ﬁgure 4.4. Thus, for every destination pixel, it is possible to give an upper and a lower bound to the pixel’s value. For some destination pixels (under some reconstructors) these two bounds will be equal to one another and for some they will be equal to the extrema, or even beyond them, as in ﬁgure 4.4(Catmull-Rom cubic). It may be thought that the error bounds of source pixels just beyond an image edge should be less separated than those well beyond the edge. That is: the error bounds should increase as one moves away from the edge until they reach the extrema (ﬁgure 4.5(b)). Unfortunately, this is not the case. Theorem 1 gives the lie to the scenario in ﬁgure 4.5(b) for a general image. Theorem 1 The ﬁrst unknown pixel beyond an image’s edge can take any value between the extrema. Proof: Whittaker [1915] proves that, for any set of equispaced samples, F∞ = {(xi , fi ) : i ∈ Z, xi = x0 + i∆x} there exists a unique function, f (x), through those sample points such that f (x) is bandlimited to 1 (later named the folding frequency2 ). below 2∆x 1 ‘practically’ because, for example, a sinc reconstructed intensity surface would be undeﬁned everywhere except at the sample points, where it would be equal to the sample value at that point. 2 note that the Nyquist frequency of f (x) will be less than or equal to the folding frequency [see Lynn and Fuerst, 1989, p.11] 4.2. EDGE EFFECTS Two-point cubic 103 Approximating cubic B-spline Catmull-Rom cubic (a) (b) (c) Figure 4.4: Error bounds on intensity surface values. The crosses mark known pixel values, the dotted vertical lines unknown pixel values, bounded above and below. The shaded area shows the range of values within which the intensity surface can lie. Here we show three local piecewise polynomial reconstructors operating on three situations: (a) constant intensity; (b) oscillating intensity; (c) slowly varying intensity. (a) (b) image edge image edge Figure 4.5: Two possible scenarios for error bounds on unknown pixels. (a) all unknown pixels can take any value between the extrema; (b) pixels close to the image edge are more tightly bounded than those farther away. Theorem 1 gives the lie to the scenario in (b). 104 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Now, our image is of ﬁnite extent: Fa,b = {(xi , fi ) : i ∈ [a, b], xi = x0 + i∆x} and we have assumed that we do not know the values fi , i ∈ [a, b]. By assigning values to these, an inﬁnite number of inﬁnite sets, F∞ , can be created; each containing the set Fa,b . Thus, from Whittaker’s result, we see that there are an inﬁnite number of bandlimited functions which pass through any ﬁnite set of sample points. Therefore the ﬁrst unknown pixel (fa−1 or fb+1 ) can take any value and there will still be a bandlimited function through the set of samples. If all the samples are constrained to lie between two extrema then these ﬁrst unknown pixels will be able to take any value between them. QED. If we know that the image is bandlimited to below the folding frequency then we may be able to deﬁne tighter bounds than the extrema for unknown pixels close to the image’s edge. In general, we do not have this a priori knowledge, therefore all unknown pixels are bounded between the extrema. This solution to the edge problem has much the same drawbacks as the previous solution: (1) it is impossible to display a range of intensities for a single pixel; (2) an inﬁnite extent reconstructor will contain some uncertainty at practically every point. For example, the sinc reconstructor would produce bounds at the extrema for all but the known sample points. A solution to this problem would be to display the mean of all possible values at each point. However, calculating this mean is not a simple matter. An approximation to the mean could be obtained by setting all unknown values to the mean of the two extrema and calculating the intensity surface from these. This collapsing of the error bounds onto a single value takes us to a third solution: the constant value. (3) Constant The solution most often adopted is to set all unknown pixels to be a constant value. By common consent it appears that this value is invariably chosen to be zero, which nearly always represents black. Zero, at ﬁrst glance, is a sensible value. Any multiplications involving zero evaluate to zero and thus need not be performed. Thus, for any reconstructor that can be represented by a ﬁlter, no unknown values need to be taken into consideration when producing the intensity surface because they contribute nothing to the evaluation of the surface values. However, the identiﬁcation of zero with black3 is unfortunate. Black is at one end of the spectrum of possible intensity values. We are thus setting all unknown pixels to one of the two extreme values. In other applications, for example sound and electrical current, zero is an average value. The signal can ﬂuctuate to positive or negative. With intensity, zero is a minimum value — you cannot have negative light. While it is sensible to set zero intensity to be absence of light (in a similar way to zero mass being absence of mass) and setting unknown values to be zero can be justiﬁed (an identiﬁcation of ‘nothing’ with ‘nil’), the combination of the two produces a problem, in that it can cause excessive ringing artifacts, and negative intensities near the image’s edges. The previous edge extension method contains the seed of a better constant value: the value halfway between the two extrema; a mid-grey. Using mid-grey in place of any unknown value minimises the maximum discrepancy between the last known value and the ﬁrst unknown value on either side of an edge (see ﬁgure 4.6). Purists may balk at the idea of using any value but zero to replace the unknown values. A simple solution to this objection is to subtract the mid-grey value from all pixels in the image, set unknown pixels to zero, resample the image and then add back the mid-grey value. However, this only works 3 or, in some rare cases, with the brightest white, for much the same eﬀect. 4.2. EDGE EFFECTS (a) 105 (b) Figure 4.6: Minimising the maximum discrepancy between pixel values and the constant value. (a) using a mid-grey constant, the maximum discrepancy is half the full range; (b) using a black constant, the maximum discrepancy is the full range. for resamplings in which the ﬁnal pixel values are linear combinations of the original pixel values, which, fortunately, are the great majority of resamplings (see section 2.2). Another candidate constant value is the average value of all the known image pixels. One motivation behind this choice is that it retains the DC component of the frequency spectrum: the DFT of the ﬁnite image and the FT of the inﬁnite extended image have the same DC component. In most cases this average value would be close to the suggested mid-grey value. For the minority of cases in which the image is unusually dark or unusually light, this average value is a better choice than the mid-grey. For the rare situations with a dark centre and bright edge (or vice-versa) it is conceivable that this average would be a worse choice than the mid-grey. Though, overall it would appear that the average value would be slightly preferable to the mid-grey value, this advantage must be oﬀset against the need to calculate the average. Taking this into consideration, there is little to choose between the two values. Use of a constant intensity beyond the edges of the image is simple and eﬃcient. It requires little computational eﬀort (except when the average intensity of the known image values needs to be calculated, but even that need only be done once per resampling). When a destination pixel falls well outside the edge of the transformed source image it will, in general, be given a value close to the constant value. However, just inside and just outside the source image edge, the reconstructed surface will display ‘edge eﬀects’. The severity of these eﬀects depends partly on the diﬀerence between the pixel values just inside the edge and the constant value assigned to the unknown pixels just outside the edge. It also depends on the reconstructor used. The sinc interpolant produces many obvious artifacts, while a very local reconstructor, such as linear interpolation, displays virtually no artifacts at all (see ﬁgure 4.7). We can attempt to reduce this edge eﬀect problem by extrapolating more ‘sensible’ values oﬀ the edge of the image rather than simply setting all the unknown pixels to a constant. (4) Extrapolation Extrapolation out from the edge of the image is an attempt to make what is beyond the edge bear some relation to what is inside the edge. It cannot create any extra information, but the hope is that the available information can be used to produce sensible values for the unknown pixels. Extrapolation is considerably more hazardous than interpolation [Press et al, 1988, p.85]. A typical interpolating function (which is perforce an extrapolating function) will tend to go berserk at a distance beyond the image edge greater than the inter-pixel spacing [ibid., p.88]. We must be careful how we extrapolate. One advantage we have over general extrapolation is our knowledge of the bounds within which the values must lie. These can be used to prevent the values from going berserk and reaching values way beyond the bounds. The simplest extrapolation method is to copy the ﬁnal row of pixels to inﬁnity. This is illustrated in ﬁgure 4.9 which shows the images in ﬁgure 4.8 extrapolated by this method. White and Brzakovic [1990] modify this idea by assigning a virtually constant intensity to each unknown 106 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Figure 4.7: The eﬀect of a constant zero edge extension on two reconstructors. On the left is the intensity surface reconstructed by sinc interpolation, notice the ripples near the image edges (both inside and outside of the edges), caused by the sharp fall oﬀ to zero. On the right is the intensity surface reconstructed by linear interpolation. The only edge eﬀect is a slight fading oﬀ at the very edge of the image. The right hand image is quite blurry, but this is a property of the reconstructor, not an edge eﬀect. Figure 4.8: The original images used in the extrapolation examples. Figure 4.9: Extrapolation by copying the ﬁnal row of pixels oﬀ to inﬁnity. This process is applied to the two images in ﬁgure 4.8 4.2. EDGE EFFECTS 107 Figure 4.10: Extrapolation by White and Brzakovic’s [1990] derivative method, which resolves ties (equal derivative values in two or three directions) by using a random number generator. This process is applied to the two images in ﬁgure 4.8 half-row or half-column4 . These values are based on a set of known pixels in that row or column. This, their ‘random sequence’ method, uses the n pixels on the row (or column) just within the image edge to calculate the values using linear extrapolation under the assumption that the image intensities form a random ﬁeld. The values assigned to an unknown half-row or half-column are not quite constant, they are very slowly varying, but the eﬀect looks much the same as a constant [see White and Brzakovic, 1990, ﬁgs. 3(b) and 3(c), 4(b), and 4(c) and 5(b) and 5(c)]. White and Brzakovic point out that, while this method generates pixels whose statistical properties are close to those of true pixel within a few pixels of the image edge, this closeness deteriorates farther from the edge. For our purposes, this method gives little advantage over simple pixel replication. It requires much more calculation and this outweighs the slightly better estimate of the unknown pixel values just beyond the edge. White and Brzakovic’s [1990] second image extension method holds more promise. This ‘derivative’ method attempts to preserve the directionality of the pattern lying along the image border. It is more successful than the other methods in generating natural looking images for a small extension, but it deteriorates at greater distances from the image’s edges. The images in ﬁgure 4.10 were generated using generalised Sobel operators of size 3 × 3 [Rosenfeld and Kak, 1982, vol. 2, pp.96, 99–100] to calculate the derivative values5 . A modiﬁcation of their method which resolves ties by taking the average intensity of the tied pixels produces very similar results (ﬁgure 4.11). The 4 A half-row is all of the pixels in a row of pixels to the right (left) of a given pixel. An unknown half-row is all of the unknown pixels to the right (left) of the image edge. A similar deﬁnition holds for ‘half-column’ and ‘unknown half-column’. 5 Note that there are two errors in White and Brzakovic’s paper: (1) Equation 6 (page 348) should read: D(j, i) = k k . Mφ (m, n)I j − (k + 1) + m, i + n − λ − k+1 2 / m=1 n=1 k+1 2 where λ = tan φ and m, n ∈ {1, 2, . . . , k}. Alternatively and equivalently: D(j, i) = k−1 k−1 . Mφ (m, n)I j − k + m, i + n − λ − k−1 2 / m=0 n=0 where λ = k+1 tan φ (as before) and m, n ∈ {0, 1, . . . , k − 1}. 2 And (2) using this formula, pixels A, B and C in ﬁgure 2 (page 344) are the centre pixels for estimating derivatives for 5 × 5 derivative masks but not for any other size. 108 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Figure 4.11: Extrapolation by a modiﬁcation of White and Brzakovic’s [1990] derivative method. This modiﬁcation resolves ties by taking the average of the intensities in the tied directions. This process is applied to the two images in ﬁgure 4.8 original method resolves ties in two or three directions by the use of a random number generator. Extrapolation is a risky business. Even these moderately sophisticated methods only succeed for a small distance out from the image edge. This is good for small kernel operators, those which only need information up to a few pixels out (for example: a local reconstructor calculating values just inside the image’s edges and needing some pixel values from just outside). But they are not useful for inﬁnite operators nor for large kernel operators (although White and Brzakovic [1990] imply that they need to use kernels up to size 47 × 47, an edge extension of 23 pixels). Nor are they particularly useful for ﬁnding a pixel value when the destination pixel maps to a source pixel well outside the image’s edge (case (b) in ﬁgure 4.1). Because extrapolation is inaccurate at large distances from the pixel edge we propose the following ‘hybrid’ solution: For any pixel farther outside the image edge than a given distance, the pixel’s value is a given constant. Between this region’s boundary and the image edge the pixels are given extrapolated values such that they match up neatly with both the image and the constant at the respective edges. Figure 4.12 illustrates this concept. There are still questions to be answered: what constant to use, how wide the extrapolated region should be, how to perform the extrapolation. However, this solution holds promise in that: (a) it has not got the sharp changes in intensity at the image edge that the constant solution has, and (b) it does not tend to produce bad results at great distances from the image edge, as straightforward extrapolation can do. Our discussion in the previous section gives us guidance as to which constant value to use. The three useful possibilities are: • zero (usually black), because of the simpliﬁcation this can make to the calculation of intensity surface values; • half-way between the minimum and maximum intensities (a mid-grey), because this minimises the maximum possible diﬀerence between the intensity of an edge pixel and the constant intensity6 ; 6 an ‘edge pixel’ is the last known pixel, on a row or column, just within the image’s edge. 4.2. EDGE EFFECTS 109 (a) Image Extrapolated Region Constant Region (b) Constant Extrapolation Image Extrapolation Constant Figure 4.12: Extrapolation to a constant. (a) shows the basic idea: the image is surrounded by an area that is extrapolated, beyond this area the pixel values are set to a constant; (b) shows the idea for a single scanline. • the average intensity of the known pixels in the image (usually a mid-grey7 ), because this keeps the average intensity of the inﬁnite image the same as that of the ﬁnite image. Once a constant value is chosen, the other problems remain to be tackled. How wide should the extrapolated region be? It must be wide enough that we do not get any untoward eﬀects in the intensity surface near the image edges. A zero width gives the equivalent of the constant solution in the previous section. This is obviously too small a width as it is the edge eﬀects caused by that solution which led us to consider other solutions. The minimum acceptable width of the extrapolated region is a subjective judgement, depending partly on the type of image and the constant value chosen. On the other hand, the extrapolated region cannot be too wide. If it is allowed to be too wide then the problem of the extrapolation being inaccurate rears its head once more. A width of eight or sixteen pixels would seem to be practical. A width of four pixels may be acceptable. Given this band of unknown values, how do we extrapolate sensible values? There are some obvious simple methods of interpolating between the edge pixels and the constant pixels. For example, simple linear interpolation between the edge pixel and the ﬁrst constant pixel (ﬁgure 4.13) or cubic interpolation through two edge pixels and the ﬁrst constant pixel, ensuring that the ﬁrst derivative of the interpolated curve is zero at the ﬁrst constant pixel (ﬁgure 4.14). Such methods, and the like, produce results similar to that shown in ﬁgure 4.15. More complex methods, which take into account more of the context of each extrapolated pixel, could be employed. However, there is a simple method of ensuring that the extrapolated values fade oﬀ to the correct value. That is to extrapolate using any of the existing methods and then window the resulting data. Figure 4.16 illustrates this idea. Windowing allows such methods as the windowed, derivative method illustrated in ﬁgure 4.17. Compare this with ﬁgure 4.10 (the unwindowed derivative 7 not exactly half-intensity grey like the previous value, but usually close to this. 110 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Constant Extrapolation Image Figure 4.13: Extrapolation between the image edge and the constant region by simple linear interpolation between the edge pixel and the ﬁrst constant pixel. Constant Extrapolation Image Figure 4.14: Extrapolation between the image edge and the constant region by cubic interpolation between the edge pixel, its neighbour (the pixel second from the edge), and the ﬁrst constant pixel, setting the slope of the curve to zero at the ﬁrst constant pixel. Figure 4.15: Extrapolation by linear interpolation between the edge pixel and the ﬁrst constant pixel over a band of pixels around the image edge. All pixels beyond this band are set to a constant (mid-grey) value. This process is applied to the two images in ﬁgure 4.8 4.2. EDGE EFFECTS 111 (a) Extrapolation Image Extrapolation (b) Window Figure 4.16: Windowed extrapolation. (a) shows the image extrapolated, (b) shows this windowed so that the pixels outside the window are constant, the image pixels are unchanged, and the pixels in between are smoothly windowed oﬀ to the constant value. Figure 4.17: Extrapolation by derivative method (as in ﬁgure 4.10) windowed using a piecewise linear window. This process is applied to the two images in ﬁgure 4.8 112 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION (a) Constant Extrapolated Image Extrapolated Constant Extrapolated Image Extrapolated Constant (b) Constant Figure 4.18: Two simple windows: (a) piecewise linear; (b) cosine half-bells in the extrapolated region. Both windows have value one inside the image and zero in the constant region. method) and ﬁgure 4.15 (linear extrapolation to a constant). Windowing, in this context, has some non-standard features. Firstly, the window does not necessarily fade the function to zero, but to the constant intensity. This can be achieved by using the formula: i (x) = W (x) × (i(x) − c) + c where i(x) and i (x) are the unwindowed and windowed intensity functions respectively, W (x) is the windowing function in the range zero to one, and c is the constant intensity to which i (x) fades. Secondly, the window function should be unity for all pixels within the ﬁnite image. This restriction is to prevent corruption of the existing, and presumably correct, intensity values. The shape of the window is probably of little importance, given that the extrapolated values are themselves approximations. A piecewise linear function, as in ﬁgure 4.18(a); or cosine half-bell (ﬁgure 4.18(b)) are two simple windows. The cosine half-bell is the better of these two, as it is C1-continuous. An extension of this idea is to window the known pixel values and thus remove the need for any extrapolation. All unknown pixels are set to the constant value. Pixels just within the image edge are windowed, while the central portion of the image remains unaﬀected. Fraser [1989b] uses such a method to fade the image to zero at the edges. However, he is quick to point out that such windowing, while a good technique for other signal types (for example, sound), causes an undesirable eﬀect, vignetting, when applied to images. Figure 4.19 gives an example of vignetting. (5) Replication This is a simple method, often used in texture mapping applications. The image is simply replicated an inﬁnite number of times to completely tile the plane. Glassner [1990b] uses this as the simplest way of generating an inﬁnite texture from a ﬁnite ‘texture cell’. When using the fast Fourier transform (FFT), (section 5.4.1), this is the method of edge extension implicit in the algorithm. This property of the FFT has important ramiﬁcations, as we shall see in chapter 5. Hall [1989] used this method in his pipelined binary image processor, where he dealt with black text on a white background. His assumption was that the image would be all white near the edges and so there would be no untoward edge eﬀects (as the edges, being all white, would exactly match up). In his later work, Hall [1991] rejected this assumption and altered his algorithm (to a constant edge extension) in order to avoid the errors it could conceivably cause. 4.2. EDGE EFFECTS 113 Figure 4.19: An example of vignetting. The image on the left is the original, on the right it is vignetted: it fades oﬀ towards its edges. This is almost always an undesirable eﬀect. Replicating the image in this way is a valid image extension method, and works well for certain texture mapping applications (for example, generating wallpaper). However, in other areas it has a major ﬂaw. Except in fortuitous cases, the intensity values at one edge of the image bear no relation to those at the opposite edge. Yet, in replication, these unrelated pixels are placed next to one another. This causes edge eﬀects similar to those caused by the constant extension method, as illustrated in ﬁgure 4.20. In general, this method is not recommended except for specialist applications such as the texture mapping operation mentioned above. Along with simple copying, Glassner [1990b] explains that many diﬀerent textures can be generated by rotating, reﬂecting and/or oﬀsetting the texture cell before copying. This is basically using the texture cell as the unit cell for one of the wallpaper groups [Martin, 1982]. Some of these are illustrated in ﬁgure 4.21. While this is extremely useful for generating repetitive textures (so long as the textures are carefully chosen), it is not generally useful for edge extension. The one wallpaper group that is useful is that illustrated in ﬁgure 4.21 (W22 (pmm)) where the image is reﬂected about its edges. (6) Reﬂection This, as well as being one of the methods of generating a wallpaper pattern, has been widely used as an inexpensive method of image extension [White and Brzakovic, 1990, p.343]. It is inexpensive because there is no need to generate any additional values, the only overhead is the simple one of evaluating which of the existing image values to use. It is widely used because of the expectation that, by reﬂecting, the pixels just outside the edge will bear a good relation to those just inside, hence minimising edge eﬀects. This method is implicit in the discrete cosine transform (DCT) variation of the FFT method described in section 5.4.1, greatly improving the edge eﬀect performance over the FFT’s implicit replication method. White and Brzakovic [1990] use this method as the standard against which they compare their new methods. They conclude that this method leads to good results only in cases of highly textured regions or regions of uniform intensity. However, this method is certainly better than either copying or using a constant and is certainly cheaper than explicitly extrapolating. One of White and Brzakovic’s complaints is that this method fails dismally for highly directional images. Figure 4.22 illustrates how the various other methods cope with such a situation. Note that 114 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Figure 4.20: An example of edge eﬀects caused by the replication edge extension method. The top photograph shows an image replicated, notice how the edges of the image do not match up, causing a large intensity diﬀerence where the replicas meet. The bottom photograph shows an intensity surface reconstructed from this image by sinc reconstruction. Notice the ripples in the image near the edges, similar to those in ﬁgure 4.7(left) for a constant edge extension method. 4.2. EDGE EFFECTS Cell W1 (p1) 115 W4 (p4) W 22 ( pmm) W 3 (pg) 1 Figure 4.21: Some of the possible textures that can be generated by replicating a square texture cell with no rotational or reﬂectional symmetry of its own. On the left is the texture cell, the four wallpaper groups shown here are: W1 , simple replication of the cell; W4 , a replication that has centres of four-fold rotation; W22 , replication by reﬂection in both horizontal and vertical reﬂection lines; and W13 replication with glide reﬂections. All except W4 could also be used with a rectangular texture cell. The notation used here is that used by Martin [1982]. That in brackets is the common abbreviated form of the International Union of Crystallography. Figure 4.22: How the edge extension methods cope with a pathological case. At top left is the original which consists of diagonal lines at 22.5◦ (left half) and 45◦ (right half). Bottom left is extrapolated by copying, this fails for either case as it extrapolates horizontally or vertically. Top right is White and Brzakovic’s [1990] derivative method (ﬁgure 4.10), and bottom right is our modiﬁcation of that method (ﬁgure 4.11). Both of these perform excellent extrapolation of the 45◦ features, but fail for the 22.5◦ features, extending them horizontally, vertically, or at 45◦. 116 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION (a) (b) Figure 4.23: An example of an image folding back on itself. The shaded region represents the ﬁnite sized image, and the clear region around it part of the inﬁnite plane on which the image sits. (a) shows the original image, (b) the transformed image. If edge extension is done by any method other than transparent then the image will be obscured by the area around it, which is undesirable. White and Brzakovic’s derivative method copes splendidly with features aligned at angles of zero, forty-ﬁve and ninety degrees, but not with a feature at twenty-three degrees. Thus, whilst giving better results in some cases, it fails in other cases and so it has little or no advantage over reﬂection. The ﬁnal two edge extension methods are special cases. (7) Transparent In this solution, the intensity surface beyond the image’s edge is deﬁned to be transparent. That is, whatever existed in destination image space before the resampling can still be seen in these regions. In many cases this will simply be equivalent to setting the intensity surface to a constant outside the image edges. This is because the original content of the destination image will have been that constant intensity. However, in some situations it is vital that we make the assumption that the intensity surface beyond the image’s edges is transparent. If the transform is such that the image folds back on itself, then the source image must be transparent beyond its edges (and, also, it needs to have the same intensity values on the ‘back’ as on the ‘front’). Intuitively, it is as if the opaque image is printed on a transparent medium (such as an overhead projector acetate) which is then manipulated (see ﬁgure 4.23). Were this transparent medium to be replaced by an opaque one (for example: of constant intensity) then the image itself would be obscured. Transforms of this nature have the further problem that a point in the destination intensity surface can map to two or more points on the source intensity surface and arbitrating between them, to determine which is visible, or how to mix the values, can be a diﬃcult problem. Compounding the diﬃculty of the transparent solution is the problem of deﬁning the intensity surface just within the image edge (region B in ﬁgure 4.2). The intensity values in this area are evaluated using pixel values both inside and outside the image edges. Therefore, even though the intensity surface is deﬁned as being transparent outside the image edge, it is still necessary to extend the image in order to calculate the values of the intensity surface just inside the image edge. Thus one of the other methods of image extension must be used to produce these values, and so there still exists the potential for edge eﬀects just within the image edge. One advantage of this method is that extrapolation need only be performed out as far as the reconstructor needs values. For an inﬁnite extent reconstructor, obviously all values still need to 4.3. SINC AND OTHER PERFECT INTERPOLANTS 117 be available. But, for a piecewise cubic reconstructor, a band two pixels wide outside the image edge is all that need be generated by extension. Such situations as these lead to the ﬁnal solution to the edge extension problem. (8) Special cases In situations where the intensity surface beyond the edge of the image is deﬁned to be a constant, or transparent, or some other case that does not depend on the reconstructor, then a smallkernel reconstructor can use a special case extension method to produce the necessary pixel values beyond the image edge. These methods are not the general extension methods discussed earlier but methods speciﬁc to particular reconstructors which may only be valid for an extension of one or two pixels beyond the image edge. For local piecewise polynomials (chapter 3) the only documented extension method of this nature is Keys’ [1981] single pixel extension method for the Catmull-Rom interpolant. Keys derives the interpolant by insisting that the cubic agree with the Taylor’s series expansion of the original function to as many terms as possible. He then [ibid., p.1155] develops a formula to approximate one pixel value beyond the image edge in order to maintain this agreement. This extension can be used for pixels farther out from the edge but it was developed speciﬁcally for the ﬁrst pixel out and speciﬁcally for the Catmull-Rom reconstructor. The cubic B-spline interpolant, discussed later in this chapter (section 4.5) is theoretically of inﬁnite extent. In order to use this in a practical (ﬁnite) situation, end conditions have to be speciﬁed. In this instance we do not require the values of pixels beyond the image edge, but other edge conditions. For example, the ‘natural cubic spline’ criterion requires that the second derivative of the intensity surface at the edge points is zero [Press et al, 1988, p.96]. Wolberg [1990, p.136], on the other hand, implicitly assumes that the pixels just outside the image edge have intensity zero when he uses the cubic B-spline interpolant. A ﬁnal category of special case extension methods includes the a priori method described in section 6.4. The analysis done by this method can be used to generate an inﬁnite image by extending the edges found to inﬁnity. Alternatively the pixels outside the image edge can be left undeﬁned and the ﬁnal image have an implicit extra edge around the outside. The intensity surface beyond the image edge can then be undeﬁned, constant or transparent as required. Summary All these methods fail in certain cases. The best option for general image extension would appear to be either reﬂection for quick processing, or a good extrapolation windowed to a constant. Such an extrapolation would need to give good results out to the extent of the window8 . If extremely fast processing is required, a constant value may be used. For resampling work, it is usually desirable to use these methods of edge extension when generating pixel values of type (a) in ﬁgure 4.1, and to use a constant value (for example: zero, mid-grey, transparent) for pixels of type (b). 4.3 Sinc and other perfect interpolants The theory of perfect interpolation is straightforward. If the original intensity surface is bandlimited and the sampling rate is high enough then the samples in the image contain all the information necessary to exactly reconstruct the original intensity surface. The reconstructors that can perform this exact reconstruction are the ‘perfect interpolants’. When we introduced perfect 8 In fact it need not be good quite all the way as any strange eﬀects at the limits would be rendered practically negligible by the windowing. 118 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION ν ν y ν ν x −ν ν y ν −ν (a) ν yN xN y ν xN x −ν rN ν rN −ν ν rN x rN yN (b) (c) Figure 4.24: Examples of the ranges over which an image’s Fourier transform can be non-zero. (a) an example of the non-zero area of a bandlimited image’s Fourier transform, (b) a rectangular passband, with Nyquist frequencies: νxN and νyN , (c) a circular passband, with Nyquist frequency: νrN . interpolation, in section 3.3, we only looked at the sinc function, but other perfect interpolants exist. An inﬁnite extent intensity surface, s(x, y), in the spatial domain can be represented equally well in the frequency domain by its Fourier transform: ∞ ∞ S(νx , νy ) = s(x, y)e−i2π(xνx +yνy ) dy dx −∞ −∞ This frequency domain function, if bandlimited, will have a certain range over which it is non-zero, shown in ﬁgure 4.24 (a). Point sampling on a regular grid in the spatial domain equates to regularly replicating in the frequency domain (section 2.4.1) as illustrated in ﬁgure 4.25. To reconstruct a continuous spatial function from this we need an interpolation ﬁlter with a frequency response which is unity for the original bandlimited area (shaded in ﬁgure 4.25) and zero for all the replicas. Obviously such a ﬁlter cannot exist if the replicas overlap the original in any way, so the samples need to be suﬃciently close that the replicas are separated. However, the ﬁlter does not need to take any speciﬁc value between the replicas because the Fourier transform of the sampled image is zero in these regions. So, if the replicas do not tile the plane, there are an inﬁnite number of perfectly reconstructing ﬁlters for a given intensity surface. If the bandlimited area can be bounded by a circle (ﬁgure 4.24 (c)) then a perfect interpolation ﬁlter is one which is unity within the circle and zero outside it. In the spatial domain this ﬁlter is a ﬁrst order Bessel function [Pratt, 1978; Heckbert, 1989]. Mersereau’s [1979] results allow us to derive the spatial domain equation for a perfect hexagonal ﬁlter, useful when processing hexagonally sampled images. However, except for a few experimental cases [Wüthrich, Stucki and Ghezal, 1989; Wüthrich and Stucki, 1991], all images are sampled on a rectangular grid. For a given sampling rate there is a single function that can perfectly reconstruct any function that is bandlimited at or below half that rate. If we take the largest rectangular area in the frequency domain which can be replicated without overlap (ﬁgure 4.21(b)), then the perfect interpolation ﬁlter must be unity inside this area and zero outside it, because the replicas tile the plane. In the spatial domain this ﬁlter is a two dimensional sinc function. For bandlimits of νxN and νyN in the frequency domain, the spatial domain function is: h(x, y) = 4νxN νyN sinc(2xνxN )sinc(2yνyN ) (4.1) This is the function which is generally used as the perfect interpolant, because it perfectly interpolates all rectangularly bandlimited images. This function’s spatially separability is advantageous 4.3. SINC AND OTHER PERFECT INTERPOLANTS ν 119 y ν x Figure 4.25: The example frequency spectrum of ﬁgure 4.24(a) replicated inﬁnitely many times. This is caused by sampling in the spatial domain. The original copy is shown shaded, the others are left white. when implementing it. If rotational invariance is more desirable then the ﬁrst-order Bessel function may be used, though it is a more diﬃcult function to evaluate. The most common complaint levelled against the sinc function (and is equally valid against the other perfect interpolants) is that it produces unsightly ripples in the reconstructed intensity surface [Mitchell and Netravali, 1988; Ward and Cok, 1989]. Another commonly mentioned problem is that the sinc function can produce negative values in the intensity surface [Pratt, 1978; Blinn, 1989b] (section 3.5.7). In answer to these two criticisms, it is necessary to point out that a perfect interpolant perfectly reconstructs a suﬃciently bandlimited intensity surface. If the original intensity surface is suﬃciently bandlimited then it will contain no negative values (negative light being impossible) and any ripples in it are supposed to be there. The sinc function cannot introduce any negative light or extra ripples because it exactly reconstructs the original surface. These complaints about the sinc interpolator come from attempting to apply it to an image sampled from an intensity surface which is not bandlimited within the limits required. In such a case, aliasing will occur and the sinc interpolator will perfectly reconstruct an aliased version of the original intensity surface, possibly with negative values and unsightly ripples, rather than the original surface itself. Using the sinc function with an intensity surface that is not suﬃciently bandlimited is fraught with diﬃculties. Opinion is divided in the literature about how many real images are sampled from intensity surfaces which are bandlimited within tight enough limits to prevent aliasing. Any real world intensity surface will be bandlimited before sampling by the physical device used to capture the image, because no real-world device has an inﬁnite bandwidth [Wolberg, 1990, p.99]. Fraser [1987] believes that people, often unwittingly, rely on the physical device to suﬃciently bandlimit the signal so that either minimal or no aliasing occurs. Artiﬁcially generated intensity surfaces can contain inﬁnite frequencies due to discontinuities in the surfaces. Such surfaces are not bandlimited and will always generate images that contain aliases. If 120 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION such a surface is ﬁltered before point sampling then it may, depending on the ﬁlter, be bandlimited suﬃciently. From the images we have seen, it would appear that most images are not bandlimited suﬃciently to prevent the sinc interpolant from producing ripples around quick intensity changes; Wolberg [1990, p. 108] is of the same opinion. This is not the end of the problems with the sinc interpolant. As discussed earlier, a practical image must be ﬁnite. When using sinc interpolation the only sensible edge extension methods are those which fall oﬀ to zero (including the constant method which sets all values outside the edges to zero). The following formula can be used to calculate the value of the interpolated surface at the point (x, y): ∞ ∞ sinc(x − i)sinc(y − j)fi,j (4.2) f (x, y) = i=−∞ j=−∞ where the sample points are located at integer locations in the xy-plane and fi,j is the sample located at (i, j). If the sample values, fi,j , in the image are zero outside certain bounds then this inﬁnite (and hence impossible to do) sum reduces to a ﬁnite (do-able) one. This is why deﬁning all samples outside certain limits to be zero is the only sensible method of edge extension. Sinc reconstruction is a linear process, and so we could use the trick described in section 4.2.1 to allow any constant value, not just zero, and still require only a ﬁnite sum9 . Unfortunately, by insisting that the image be zero outside a ﬁnite boundary, the sampled intensity surface cannot be bandlimited. While this result is not intuitive, we have found a proof of it, which is given in theorem 2. Theorem 2 An image that is only non-zero within ﬁnite bounds must contain aliases. Proof: an inﬁnite bandlimited, intensity surface is sampled by an inﬁnite extent comb function. The resulting function (a continuous representation of an image (section 2.4.1)), is clamped to zero outside ﬁnite limits. This is equivalent to multiplying the sampled function by a box function. Multiplication is commutative and so we could have got the same result by ﬁrst multiplying by a box function and then multiplying by the comb sampling function. The function that is sampled is thus the product of the original intensity surface and a box function. It is thus of ﬁnite extent in the spatial domain. A function cannot be simultaneously of ﬁnite extent in both the spatial domain and the frequency domain (a detailed proof of this statement is given in appendix B.5), and so this function cannot be bandlimited, and therefore the sampled version must contain aliases QED. The most obvious eﬀect of this aliasing is rippling around the edges — this is the reason why we went from the constant edge solution in section 4.2.1 to the ‘extrapolation to a constant’ edge solutions in the following subsection. However, even this latter edge solution produces aliasing eﬀects, although not as badly. Pratt [1978] shows another way of looking at this ﬁnite-extent sampling process. This is that a ﬁnite extent sampling grid can be generated by multiplying the box and comb functions and applying the product to the continuous intensity surface. The product is simply a truncated (ﬁnite-extent) comb function. In the same way that sampling using an inﬁnite extent comb is equivalent to convolving with the comb’s Fourier transform in the frequency domain, sampling using this ﬁnite extent comb is equivalent to convolving with this function’s Fourier transform in the frequency domain. This Fourier transform is fairly simple to evaluate. With limits (±bx ∆x, ±by ∆y), for a 9 The ensuing discussion applies equally to those cases as the eﬀect of the trick in the frequency domain is to modify only the DC component, then to do the same processing as for the zero case (leaving the modiﬁed DC component unchanged) and ﬁnally to reset the DC component to its original value (leaving the rest of the components unchanged). 4.3. SINC AND OTHER PERFECT INTERPOLANTS 121 (a) (b) Figure 4.26: The Fourier transform of a ﬁnite extent comb function. (a) the Fourier transform of a comb function with 8 teeth, (b) the Fourier transform of a comb function with 32 teeth. comb of spacing (∆x, ∆y), it can be shown to be: D(νx , νy ) = sin(2πνx (bx − 12 )∆x) sin(2πνy (by − 12 )∆y) × sin(πνx ∆x) sin(πνy ∆y) (4.3) The derivation of this can be found in appendix B.6. This function is illustrated in ﬁgure 4.26. It has a similar but nonetheless diﬀerent eﬀect to convolving with a comb, with the results described earlier. 4.3.1 Richards’ positive perfect interpolant Pratt [1978, p.98] points out that a sinc-reconstructed function can be considered to be the superposition of an inﬁnite number of weighted translated sinc functions. Because each of the functions in the sum has negative lobes, it is impossible to optically create the image by superposition because negative light does not exist. Pratt then points to Richards’ [1966] discovery of a family of perfect interpolants that are positive in regions in which their magnitude is large, and hence can be used for sequential optical interpolation. In resampling, we can add the intensities up inside a computer, and so negative addends are not a problem. Nevertheless, it is instructive to look at Richards’ function. Richards’ interpolants are interesting in that he generates them in a similar way to the Bezier and B-spline interpolants discussed later (section 4.5). He eﬀectively deﬁnes a set of coeﬃcients: hn = fn − α × (fn−1 + fn+1 ) (4.4) which he convolves with a kernel, g(x), such that: h ∗ g = f ∗ sinc g(x) can be evaluated from this equation and shown to be [Richards, 1966, equation 4]: 1 0 ∞ 1 sin πx (−β)n 2 g(x) = √ 1 − 2x n2 − x2 1 − 4α2 πx n=1 where β = √ 1− 1−4α2 . 2α Figure 4.27 shows this kernel graphed for two diﬀerent values of α. While this kernel is indeed positive over a signiﬁcant range, when combined with the kernel . . . , 0, 0, −α, 1, −α, 0, 0, . . . operating on h(x) (equation 4.4) it is exactly equivalent to the sinc function and so is simply another way of writing sinc convolution. This situation is analogous 122 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Figure 4.27: Richards’ [1966] kernel, for α = 0.2875 (lower curve) and α = 0.4 (upper curve). to that with the B-spline and Bezier interpolants discussed in section 4.5. In those cases, a local kernel operates on coeﬃcients, gn , derived from the data values, fn . However, the overall kernel which operates on the fn values is not local. In a similar manner, the Richards kernel is positive over its central region and operates on a set of coeﬃcients, hn , derived from the data values, fn . However, the sinc kernel which operates directly on the fn data values is not positive over such a wide region. Pratt was attracted to Richards’ kernel because it is positive in regions where its magnitude is large. This, he claimed, allows it to be employed for sequential optical interpolation. Unfortunately this claim is only true if all of the hn coeﬃcients are non-negative. It is true that all of the data values, fn , must be non-negative, because negative light does not exist. So applying a Richards kernel to these values would have the eﬀect desired by Pratt. However, the Richards kernel is applied to the hn coeﬃcients. Thus, sequentially generating an image using a Richards kernel can only be achieved if all the hn are non-negative. A coeﬃcient, hi , is non-negative if, and only if: fi ≥ α × (fi−1 + fi+1 ) (4.5) For the Richards kernel to be positive in the region |x| < 5 (this is the signiﬁcant range used by Richards [1966, ﬁg. 1]) then we must have α ≥ 0.34222888 ± 5 × 10−9 (rather than Richards’ value of α > 0.2875). This curve is shown in ﬁgure 4.28. With α in this range, many images will contain at least one location where equation 4.5 will not be satisﬁed. This unfortunately means that the Richard’s kernel can not be used for sequential optical interpolation of these images. For an image that is bandlimited well below the folding frequency, equation 4.5 should be satisﬁed at all locations and, in these cases, it can be used for sequential optical interpolation. 4.4. GAUSSIAN RECONSTRUCTORS -8 -7 -6 -5 -4 -3 -2 123 -1 0 1 2 3 4 5 6 7 8 Figure 4.28: Richards’ [1966] kernel, for α = 0.342229. For α greater than this value the kernel is positive in the range |x| < 5. 4.4 Gaussian reconstructors The Gaussian ﬁlter is of the form: h(s) = 2 2 √1 e−s /2σ σ 2π (4.6) for a standard deviation of σ with unit area under the curve. The Gaussian has many interesting features: • it will never produce negative values, unlike the sinc function. • it falls to zero much faster than the sinc function and so can be truncated with little loss of quality, unlike the sinc function [Heckbert, 1989, p.41]. See section 4.6 for more details. • it is both circularly symmetric and separable (the only function that is), properties that are useful in implementing the ﬁlter, for example: in Greene and Heckbert’s [1986] elliptical weighted average ﬁlter [see also Heckbert, 1989, sec 3.5.8 and 3.5.9]. • it is an approximant, as opposed to sinc which is an interpolant. Whilst its properties make it useful, it produces a blurrier intensity surface than, say, the CatmullRom spline. When using a Gaussian we must decide what value of σ to use. For reconstruction of unit spaced samples, Heckbert [1989, p.41] recommends a standard deviation of about σ = 12 . Ratzel [1980], and subsequently Schreiber and Troxel [1985], use σ = 0.52 (see footnote 10 for details of where this value comes from, as it is not the value which they report). Turkowski [1990] uses two Gaussian functions, expressed in the form: h(x) = 2−x 2 /τ 2 where τ = √12 or τ = 12 . Some consider this form to be more convenient than equation 4.6 [Wolberg, 1990]. This form is not normalised, but when it is (by multiplying by the correct constant) these two values of τ correspond to σ ≈ 0.600561 and σ ≈ 0.424661 respectively. 124 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Figure 4.29: The sharpened Gaussian kernel. 4.4.1 The sharpened Gaussian This was developed by Ratzel [1980] to improve on the sharpness of a Gaussian reconstructed intensity surface. The Gaussian’s kernel is10 : hg (s) = 2 2 1 √ e−s /2(0.52) 0.52 2π (4.9) The sharpened Gaussian kernel is the weighted sum of three of these functions, two of them oﬀset from the origin: 1 hsg (s) = (7hg (s) − hg (s − 1) − hg (s + 1)) (4.10) 5 This is graphed in ﬁgure 4.29. The single side-lobe each side of the central lobe perceptually sharpens the picture, as discussed in section 4.5.2. The sharpened Gaussian can be considered to be the convolution of two simpler functions: hsg (s) = hg (s) ∗ m where m is a discrete function: m0 = 75 ; m1 = − 15 ; mn = 0, n ≥ 2; and m−n = mn . Thus, a sharpened Gaussian intensity surface can be generated by precomputing a set of coeﬃcients, g = m ∗ f , and then applying the simple Gaussian (equation 4.9) to these coeﬃcients: f (x) = hg (s) ∗ g This is computationally more eﬃcient than the straightforward f (x) = hsg (s) ∗ f , especially for the two-dimensional version of the reconstructor [Schreiber and Troxel, 1985]. 10 Ratzel claims to have used this Gaussian [Ratzel, 1980, p.68]: hg (s) = where n is an integer and s = gives: n . 5 2 2 1 √ e−n /2(2.6) 2.6 2π (4.7) This is equation 4.6 with σ = 2.6. However, converting this to normal co-ordinates 2 2 1 (4.8) √ e−s /2(0.52) 2.6 2π which is not normalised; to normalise it requires us to use equation 4.6 with σ = 0.52, which is equation 4.9. If one examines Ratzel’s hand-tuned approximation this is, in fact, the function that he used, rather than the virtually identical one (equation 4.7) he claims to have used. The only diﬀerence is the normalising coeﬃcient. This is a minor correction, but it makes a diﬀerence to the results in table 4.5 (section 4.6). hg (s) = 4.5. INTERPOLATING CUBIC B-SPLINE 125 j i −1 0 1 2 3 2 − 16 1 2 1 2 − 12 1 6 −1 1 − 12 0 1 2 1 2 1 6 2 3 1 6 0 0 0 0 Table 4.1: The coeﬃcients ki,j in equation 4.11. Ratzel’s [1980, p.97] results indicate that the sharpened Gaussian is a perceptual improvement over the ordinary Gaussian11 . 4.5 Interpolating cubic B-spline In chapter 3 we met the approximating cubic B-spline (equation 3.19 with B = 1 and C = 0). It does not produce an interpolating intensity surface. To make an interpolating cubic B-spline we apply the B-spline ﬁlter to a diﬀerent set of sample values, gi , derived from the image’s sample values, fi , in such a way that the resulting curve interpolates the image’s sample points. This process was brieﬂy described in section 3.5.5. The interpolating cubic B-spline intensity surface is deﬁned as: 2 3 ki,j tj gα+i f (x) = i=−1 (4.11) j=0 0 α and t = x−x where α = x−x ∆x ∆x . The ki,j are listed in table 4.1 [Foley and van Dam, 1982]. In order that equation 4.11 interpolates all the sample points, we must ensure that: fi = 1 (gi−1 + 4gi + gi+1 ) 6 (4.12) which gives us an inﬁnite set of equations to solve. In a practical situation, only a ﬁnite number, say n, of the fi will be known. Say these values are f0 , f1 , . . . , fn−1 , then to deﬁne the entire curve between x0 and xn−1 , n + 2 values, g−1 , g0 , g1 , . . . , gn−1 , gn , are required. These can be partially evaluated by solving the n equations in n + 2 unknowns: g f0 −1 1 4 1 1 4 1 0 g0 f1 f2 g1 1 4 1 . = 6 . .. 1 ··· 1 .. 1 4 1 fn−2 gn−1 0 1 4 1 gn fn−1 This leaves us with two edge eﬀects problems: (1) how to deﬁne the last two equations required to ﬁnd the gi values; and (2) how to deﬁne the intensity surface outside the range x0 to xn−1 . Three common solutions to the ﬁrst problem are the free-end, cyclic, and not-a-knot boundary conditions [Wolberg, 1990, p.289]. 11 Ratzel actually used a truncated version of this Gaussian, setting h (s) = 0, |s| > 1.2. Such approximations are g discussed in the section 4.6. 126 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION The free-end, or natural, end condition involves setting the second derivative of the intensity = 0. This is equivalent to function equal to zero at the end points. That is, f0 = 0 and fn−1 setting: = = g−1 gn 2g0 − g1 2gn−1 − gn−2 and from equation 4.12 we can show that this implies that g0 = 3f0 and gn−1 = 3fn−1 , giving us a system of n + 2 equations in n + 2 variables: g f0 −1 0 2 f0 g0 1 4 1 0 g1 f1 1 4 1 f2 g2 1 4 1 . = 6 . (4.13) . 1 · · · 1 .. . 1 4 1 fn−2 gn−2 1 4 1 gn−1 fn−1 0 2 0 gn fn−1 The cyclic end condition equations in n variables: 4 1 1 puts gi = gi+n , which means that g−1 = gn−1 and gn = g0 . This gives n 1 4 1 1 4 1 0 1 4 1 1 ··· 1 0 1 4 1 g0 g1 g2 . . . 1 gn−2 4 1 gn−1 1 4 1 f0 f1 f2 .. . = 6 fn−2 fn−1 (4.14) The third end condition, that favoured by Wolberg, is the not-a-knot boundary condition. This condition requires f (x) to be continuous at x1 and xn−2 [Wolberg, 1990, p.289]. This is equivalent to g−1 = 4g0 − 6g1 + 4g2 − g3 and gn = 4gn−1 − 6gn−2 + 4gn−3 − gn−4 . These, combined with equation 4.12, give (see appendix B.7): g1 = 16 (8f1 − f0 − f2 ), gn−2 = 16 (8fn−2 − fn−1 − fn−3 ) and: 0 0 1 4 1 36 1 4 1 0 1 4 1 1 ··· 1 0 1 4 1 36 1 4 0 g0 g1 g2 . . . gn−2 1 gn−1 0 = 6 8f1 − f0 − f1 f0 f1 .. . fn−1 8fn−2 − fn−1 − fn−3 (4.15) Note that equation 4.13 is a tridiagonal system of equations and can be solved in O(n) time [Press et al, 1988, sec 2.6]. Equation 4.14 is virtually tridiagonal and so should be solvable in similar time [ibid., sec 2.10]. A similar comment applies to equation 4.15. The second question, (what exists beyond the edges), is a little more tricky. Note that by deﬁning the values of n + 2 coeﬃcients, g−1 , g0 , g1 , . . . , gn−1 , gn , the intensity curve described by equation 4.11 is deﬁned over the range x ∈ [x0 , xn−1 ). If we only deﬁne the values of g0 , g1 , . . . , gn−1 , (as Wolberg [1990, p.136] does) then equation 4.11 describes an intensity curve deﬁned over the smaller range x ∈ [x1 , xn−2 ). 4.5. INTERPOLATING CUBIC B-SPLINE 127 Outside these ranges, the intensity curve is undeﬁned, because the necessary gi coeﬃcients are undeﬁned. One solution is to set the intensity curve to a constant value outside this range, thus making it unnecessary to specify any of these unknown coeﬃcient values. The cyclic boundary condition automatically deﬁnes all the coeﬃcients by the formula gi = gi+n . This gives a result identical to the ‘replication’ solution in section 4.2.1. For the other boundary conditions we could specify the unknown fi values, the unknown gi values, or the value of the intensity curve outside the deﬁned area, as described above. Specifying the unknown fi values would allow us to ﬁnd the gi values by the impossible task of solving an inﬁnite system of equations. It would be possible to specify the unknown fi values only in the region which will actually be seen in the output, making it possible to calculate all the necessary gi values. Specifying the unknown gi values directly is perhaps a more tempting solution, and certainly requires far less computation. This extrapolation can be achieved by any of the edge solutions in section 4.2.1, although we are no longer working with the image samples but with equation 4.11’s coeﬃcients. Wolberg [1990, p.136] seems to assume that gi = 0, i ∈ {0, 1, 2, . . . , n − 1}. The interpolating cubic B-spline has been praised as being of higher quality than the Catmull-Rom cubic spline [Keys, 1981]. It has also been recommended as being a strictly positive interpolant, which it certainly is not. This confusion arises from the fact that the cubic B-spline function, which is applied to the gi coeﬃcients, is a strictly positive function. However, the function that is applied to the fi values is not strictly positive [Maeland, 1988]. The kernel of the interpolating cubic B-spline actually oscillates in a manner similar to the sinc interpolant’s kernel. The interpolating cubic B-spline’s kernel The kernel of the interpolating cubic B-spline can be analytically derived if we ignore boundary conditions . Equation 4.11 can now be written as a discrete convolution: f =h∗g (4.16) where f = . . . , f−2 , f−1 , f0 , f1 , f2 , . . ., likewise for h and g. Further, h is deﬁned as: h−1 = h0 = 46 ; h1 = 16 ; hn = 0, n ∈ {−1, 0, 1}. 1 6; In the frequency domain equation 4.16 is equivalent to: F =H ×G (4.17) where, for unit spaced samples, H can be shown to be: H(ν) = 4 2 + cos 2πν 6 6 To ﬁnd g in terms of f , we invert equation 4.17: 1 ×F H = L×F G = where L is: L(ν) = 1 3 = H(ν) 2 + cos 2πν and by judicious use of Dwight [417.2] we get: 2 L(ν) = ∞ 3 1+a × 1+ 2aj cos 2jπν 2 1 − a2 j=1 (4.18) 128 where a = −2 + CHAPTER 4. INFINITE EXTENT RECONSTRUCTION √ 3. A little algebra shows that 3 2 × 1+a2 1−a2 = √ 3. Transferring this back to the discrete spatial domain we get: g =l∗f where l is the inverse Fourier transform of L: ln l−n √ n 2 = 3a n≥0 = ln (4.19) Thus each gi is dependent on all of the fi values. Numerical veriﬁcation of the values of li can be achieved by running a modiﬁed version of Sedgewick’s [1988, p.549] algorithm for ﬁnding interpolating B-spline coeﬃcients. On a dataset of 32 samples, this gave values (for the middle region of the data, where edge eﬀects are minimised) for the li to three signiﬁcant ﬁgures of: l0 l−1 = l1 l−2 = l2 l−3 = l3 l−4 = l4 l−5 = l5 n ≥ 5 : l−n = ln = 1.732 = −0.464 = 0.124 = −0.033 = 0.009 = −0.002 = 0.000 ±0.001 ±0.001 ±0.001 ±0.001 ±0.001 ±0.001 ±0.001 In agreement with the theoretically derived formula. The creation of an intensity curve can be written as: i=b∗g where b is the approximating cubic B-spline kernel. This is equivalent to: i = b ∗ (l ∗ f ) = (b ∗ l) ∗ f The approximating B-spline kernel, b, is graphed in ﬁgure 4.30, and the interpolating B-spline kernel, b ∗ l, is graphed in ﬁgure 4.31. Note that b ∗ l has negative lobes. This derivation roughly parallels that given by Maeland [1988]. Maeland’s formula for the cubic B-spline kernel [p.214, equation 8] is, however, wrong. It is: 3 3 3 2 4 |s| − 2 |s| + 1, 0 ≤ |s| < 1 1 ≤ |s| < 2 − 1 |s|3 + 12 , BMaeland (s) = 4 0, otherwise Compare this with the correct version (equation 4.21) provided by Parker et al [1983, p.35] and Wolberg [1990, p.135, equation 5.4.18]. Maeland’s conclusions are still valid, but the details are slightly incorrect. The interpolating cubic B-spline is often cited as being better than the Catmull-Rom spline (or other cubics which ﬁt Mitchell and Netravali’s [1988] formula (equation 3.19)) [Keys, 1981; Maeland, 1988]. This is due to the fact that this interpolant considers a wider range of data than these other cubics. Further it is sinc-like in form; with the advantages that negative lobes bring (section 3.5.7). However, it has an advantage over the sinc function in that, while the sinc kernel only has a x1 decay, the interpolating B-spline exhibits an exponential decay (see section 4.6). 4.5. INTERPOLATING CUBIC B-SPLINE 129 2/3 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 6 7 8 Figure 4.30: The approximating cubic B-spline kernel, b. 1.0 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 Figure 4.31: The interpolating cubic B-spline kernel, b ∗ l. 130 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Qi Qi Ri Pi P i+1 Pi P i+1 Ri Figure 4.32: A Bezier cubic curve is deﬁned by four points: its two endpoints, Pi and Pi+1 , and two control points, Qi and Ri . The interpolating B-spline kernel can be used directly on the image data values, fi . In this case boundary eﬀects would be handled by one of the methods in section 4.2.1. The interpolating B-spline kernel is analytically deﬁned as: h(s) = ∞ √ √ B4 (j − s) 3a|j| , a = −2 + 3 (4.20) j=−∞ where B4 (s) is the approximating B-spline kernel: 1 3 0 ≤ |s| < 1 2 |s| − |s|2 + 23 , − 16 |s|3 + |s|2 − 2|s| + 43 , 1 ≤ |s| < 2 B4 (s) = 0, otherwise (4.21) This equation looks as if it is uncomputable because of the inﬁnite sum. However, for any given value of s, only four of the terms in the sum will be non-zero: those for which j ∈ {s−1, s , s+ 1, s + 2}. 4.5.1 Bezier cubic interpolation Bezier cubic interpolation is related to cubic B-spline interpolation in that a particular cubic function is applied to a set of coeﬃcients derived from the data points. We brieﬂy discussed the Bezier cubic in section 3.6.3. The Bezier cubic form is described by Foley and van Dam [1982, sec. 13.5.2] and the Bezier cubic interpolation formula is presented by Rasala [1990]. The cubic form is: −1 3 −3 1 Pα 3 −6 3 0 Qα (4.22) P (x) = [t3 t2 t 1] −3 3 0 0 Rα Pα+1 1 0 0 0 x−x α Where α = ∆x0 , t = x−x ∆x , and P (x) = (xα , fα ) is a point in space rather than simply a function value. This equation produces a cubic curve which passes through its end points (Pα and Pα+1 ) and whose shape is controlled by the endpoints and by the control points as shown in ﬁgure 4.32. Given a set of points, {Pi : i ∈ Z}, we can generate a set of control points, {(Qi , Ri ) : i ∈ Z}, such that the piecewise Bezier cubic deﬁned by the points is C2-continuous. Rasala [1990] gives the conditions required to generate the control points as: Pi − Ri−1 Pi − 2Ri−1 + Qi−1 = Qi − Pi (4.23) = Ri − 2Qi + Pi (4.24) 4.5. INTERPOLATING CUBIC B-SPLINE 131 Given equation 4.23, a diﬀerence vector can be deﬁned: Di = Pi − Ri−1 = Qi − Pi (4.25) (4.26) Equations 4.24, 4.25, and 4.26 can be combined to produce a recurrence relation between the Di and the Pi : Di+1 + 4Di + Di−1 = Pi+1 − Pi−1 Rasala produces the following result for a closed loop of n points (that is: Pi = Pi−n ): Di = m ak (Pi+k − Pi−k ) (4.27) k=1 where: ak = − fm−k fm fj = −4fj−1 − fj−2 , j ≥ 2 for n even m = for n odd m = n−2 2 , n−1 2 , f0 = 1, f1 = −3 f0 = 1, f1 = −4 For the inﬁnite case (that is: a non-repeating inﬁnite set of points) it can be shown that: an = −(a)n where a = −2 + √ 3 (see appendix B.3 for details). This is the same constant as that used in the B-spline case (equation 4.20) and, indeed, the same sequence of coeﬃcients (an ). This is unsurprising as both situations involve the 1–4–1 kernel. Note, however, that these values are used in diﬀerent ways. 3 4 i If we let Di = then Qi = (xi + i , fi + δi ), Ri = (xi+1 − i+1 , fi+1 − δi+1 ). δi Taking equation 4.27 for the inﬁnite case, and splitting it into two parts we get: i δi ∞ = k=1 ∞ = ak (xi+k − xi−k ) (4.28) ak (fi+k − fi−k ) (4.29) k=1 With the sample points equispaced (xi+1 = xi + ∆x) equation 4.28 becomes: i = − ∞ k (a) × 2k ∆x k=1 = −2 ∆x ∞ k(a) k k=1 = −2 ∆x a = 1 (1 − a) 2 [Dwight 9.06] 1 ∆x 3 Thus the four control points are equispaced over the unit distance, at xi , xi + 13 ∆x, xi + 23 ∆x, and xi+1 . Further, δi (equation 4.29) can be expressed as: δi = − k=∞ k=1 k (a) (fi+k − fi−k ) 132 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION 1.0 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 Figure 4.33: The Bezier interpolating cubic’s ﬁlter kernel. Substituting this into equation 4.22 we get: f (x) = (−t3 + 3t2 − 3t + 1)fα* ! ∞ + (3t3 − 6t2 + 3t) fα −* k=1 ak (fα+k − fα−k ) ! ∞ + (−3t3 + 3t2 ) fα+1 − k=1 ak (fα+1+k − fα+1−k ) + (t3 )fα+1 From this we can evaluate this interpolant’s kernel: i+1 a (3x3 + (−9i − 3)x2 + (9i2 + 6i)x + (−3i3 − 3i2 )) i 3 (3x + (−9i − 6)x2 + a + (9i2 + 12i + 3)x + (−3i3 − 6i2 − 3i)), a(3x3 − 3x2 ) + 2x3 − 3x2 + 1, h(s) = a(−3x3 − 3x2 ) − 2x3 − 3x2 + 1, ai−1 (−3x3 + (−9i + 3)x2 + (−9i2 + 6i)x + (−3i3 + 3i2 )) i + a (−3x3 + (−9i + 6)x2 + (−9i2 + 12i − 3)x + (−3i3 + 6i2 − 3i)), (4.30) i ≥ 1, i = x i≤x<i+1 0≤x<1 (4.31) −1 ≤ x < 0 i ≥ 2, i = − x −i ≤ x < −i + 1 This kernel is shown in ﬁgure 4.33. Comparing ﬁgures 4.33 and 4.31 it may be suspected that this Bezier kernel may be identical to the interpolating cubic B-spline kernel (equation 4.20) in the same way that several paths led to the Catmull-Rom. This suspicion is in fact true: a goodly amount of algebra will show that equations 4.31 and 4.20 are identical. An image to which Bezier/B-spline reconstruction is to be applied can be edge extended in any of the usual ways (section 4.2.1). Rasala’s [1990] paper, rather than discussing the general method presented here, gives the exact Bezier reconstruction for two special cases: the case of a ‘closed loop’, that is the replication edge extension method, where the image edges, in eﬀect, wrap round 4.5. INTERPOLATING CUBIC B-SPLINE 133 Brown [1969] p.252, ﬁg. 8 Figure 4.34: Brown’s [1969] ﬁgure 8. Images were reconstructed using a ﬁlter with ﬁrst and second side lobes. The second side lobe height was set at 2% of the central lobe’s height. This graph shows the subjective score for two images, one live and one still, against the height of the ﬁrst side lobe as a percentage of the height of the central lobe. The best subjective quality occurs at a height of around 12%. to meet one another; and the case of an ‘open curve’. In this latter case, one speciﬁes the tangent values of the curve at the end points and reﬂects the curve about its endpoints. This is a special case edge extension, similar, but not equal to, the reﬂection edge extension method. The B-spline/Bezier kernel has the properties of oscillating about zero and of decaying exponentially to zero. It thus has negative lobes. The ﬁrst negative lobe is by far the biggest, and hence contributes most to any sub-black or supra-white value, the second and subsequent negative lobes contribute a negligible amount to the ﬁnal intensity surface values. The Catmull-Rom spline has these properties also. It has been suggested that the interpolating cubic B-spline is a good interpolant because it is continuous in its second derivative, while the Catmull-Rom spline, which is not as good [Keys, 1981] has only ﬁrst derivative continuity. However, the discussion in section 3.5.2 indicates that the presence or absence of C2-continuity has little eﬀect on the visual ﬁdelity of the interpolation. It appears that it is the sharpening eﬀect caused by the negative lobes is the important factor here. 4.5.2 The sharpening eﬀect of negative lobes A reconstruction ﬁlter which has a negative side lobe, such as the Catmull-Rom or B-spline/Bezier, produces a visually sharper intensity surface than reconstructors without such lobes. Schreiber and Troxel [1985] noted this eﬀect in their sharpened Gaussian ﬁlter. They point out that the overshoot caused by the negative lobes appears to be a defect but actually is the feature responsible for the better tradeoﬀ between sharpness and sampling structure. However, ringing, when there is more than one lobe either side of the central lobe, is perceived to be a bad eﬀect. Brown [1969] carried out a test to discover the subjective eﬀects of these side lobes. Brown’s results show that the presence of the ﬁrst side lobe increases subjective picture quality, provided the second lobe is non-existent or at least small. The subjective quality was measured on a scale of 0–4 by seven observers and the average value plotted in ﬁgure 4.34. A value of 1 indicates no improvement; less than 1 is degradation. A value of 3 indicates that the image is subjectively as good as an image of twice the test bandwidth. The best improvements occur for a lobe of height 10–14% of the height of the intensity change (in our ﬁlters this means a ﬁrst side lobe of height 10–14% of the central lobe height). Brown’s results for the second lobe show that it 134 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Reconstructor First side lobe Second side lobe Sinc 22% 13% Catmull-Rom 7.4% 0% B-Spline/Bezier 14% 3.7% Sharpened Gaussian 8.4% 0% Table 4.2: The heights of the side lobes of four reconstructor’s ﬁlters, given as a percentage of the height of the central lobe. causes subjective degradation of the image but that, for a height of 4% or less this degradation is not signiﬁcant compared to the enhancing attributes of the ﬁrst lobe. Table 4.2 contains the size of the ﬁrst two side lobes of each of the reconstructors with a negative ﬁrst side lobe. The ﬁgures are given as percentages of the height of the central lobe. This table indicates that sinc will produce a visually degraded intensity surface. This result is well known from simply viewing sinc reconstructed and enlarged images [Mitchell and Netravali, 1988; Ward and Cok, 1989]; see ﬁgures 2.8, 2.12, and 4.20 for examples of this degradation. The B-spline/Bezier interpolant should produce a result which is about right, whilst both the Catmull-Rom and sharpened Gaussian, appear to have a ﬁrst side lobe that is slightly small for best eﬀect. Brown states [p.252] that, for moving pictures, with a signal to noise ratio of 40dB, the best height for the ﬁrst side lobes is reduced. He found that moving the ideal value from 12% (noiseless) to 6% (noisy) gave the best results. Thus the Bezier/B-spline reconstructor would be expected to do badly with noisy images while the other two would be better. This eﬀect does not account for all subjective quality, as it only improves the sharpness of the image. However it does give an important pointer to why some reconstructors appear better than others. Other factors, though also come into play. The sinc and Bezier/B-spline kernels are both oscillatory. The reason the sinc kernel causes ringing artifacts while the B-spline tends not to is that sinc has a much slower fall oﬀ to zero than the B-spline. Thus sinc has much more prominent side lobes than the B-spline. This is picked up in section 4.6. The Mach band eﬀect One possible explanation of this sharpening is that it is due to the Mach band eﬀect. This physiological eﬀect was ﬁrst described by Mach [1865] and is described in detail by Ratliﬀ [1965] (Ratliﬀ’s book also includes translations of Mach’s papers on this and related eﬀects). The basic eﬀect, as it concerns our work, is that the bright side of an edge appears brighter than its surrounds and the dark side appears darker. The stimulus presented to the eye by a sharp edge, and the sensation passed by the eye to the brain, are illustrated in ﬁgure 4.35. The sharper the edge, the more pronounced the eﬀect [Ratliﬀ, 1965, p.89]. The importance of this eﬀect is that, if the eye is presented with a diﬀerent intensity distribution which produces a sensation similar to the sensation curve in ﬁgure 4.35 then the visual system will interpret it as the stimulus curve in the same ﬁgure. Thus, to give the appearance of a sharp edge, it is not necessary to present the eye with an actual sharp edge. It is suﬃcient to present a sloping edge with a lobe at either end, producing a very similar sensation curve to that given by a sharp edge. Cornsweet [1970] presents an interesting illustration of two diﬀerent stimuli producing the same sensation. This is shown in appendix D. The response function of the human visual system is known to be shaped something like that shown in ﬁgure 4.36 [Ratliﬀ, 1965]. This shape, with a single prominent negative side lobe is very like that of the Catmull-Rom, sharpened Gaussian, and B-spline/Bezier kernels. Thus these kernels will produce an intensity curve, around an edge in the image, similar to the eye’s response to a sharp edge. This may fool the eye into seeing a sharp edge at that point. This phenomenon 4.5. INTERPOLATING CUBIC B-SPLINE 135 Intensity Stimulus Sensation Position Figure 4.35: The Mach band eﬀect, showing the stimulus presented to the eye and the sensation passed by the eye to the brain. The sensation curve has bumps either side of the edge which make the intensity appear slightly darker on the dark side and slightly lighter on the light side. Figure 4.36: The response function of the human visual system, similar in shape to the kernels of the Catmull-Rom, sharpened Gaussian, and B-spline/Bezier reconstructors. could account in part for the success of these kernels as reconstructors, that is: they are perceived to be good because they take advantage of a physiological eﬀect of the eye. The problem of negative intensities A ﬁnal note here is that there is some debate as to whether negative lobes are desirable (section 3.5.7). The main argument against negative lobes is that they can produce negative pixel values. Blinn [1989b] states “there is no such thing as negative light and clamping these negative values to zero is about all you can do, but it is not ideal”. Clamping will, in fact, destroy the lobes we are using to sharpen the edge. An alternative to clamping is to rescale the values so that the lowest and highest intensities are physically realisable [Wolberg, 1990, p.131]. This maintains the intensity changes required to produce the desired eﬀect. However, the unfortunate consequence is that the image’s contrast is reduced, which is undesirable. If we consider Blinn’s [1989b, p.87] statement that clamping negative values to zero is about all you can do, it appears that something could be achieved when we examine the visual phenomena shown in appendix D. There, Cornsweet’s [1970] ﬁgures indicate that a third solution, that of locally rescaling intensity values, could produce desirable results. Cornsweet shows that two diﬀerent 136 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION (a) (b) (c) (d) Figure 4.37: A possible solution to the problem of generating negative intensity values. (a) shows the curve we would like to simulate: a sharp edge. (b) shows the stimulus we would like to present to the eye to fool the brain into seeing the sharp edge. (c) shows this stimulus clamped to zero, thus destroying the illusion of a sharp edge. (d) shows a possible solution: locally modifying the intensities so that the stimulus is still there at the dark side of the edge, and gently blending this into the dark area to the left. intensity distributions produce identical visual response for two diﬀerent cases. This, and other, evidence indicate that local intensity changes are far more important than slow (non-local) changes and absolute intensity values. Thus, if we could ‘raise’ a region of negative light into positive values and gently blend this raised region in with the surrounding areas then the resulting visual response should be closer to the desired response than simply clamping negative values to zero, and it would not have the undesirable characteristics of globally reducing image contrast. The basic idea is shown in ﬁgure 4.37. Some way of implementing a function of this nature needs to be found. This is not a simple process as the correction applied at any one point can depend on a large neighbourhood around that point, on the distance to the ‘centre’ of the nearest negative trough, and on the depth of that trough. It is diﬃcult to decide if such a ‘correction’ can be performed. If it can, then a decision must be made as to whether it is worth the eﬀort. Further work on this solution is required. 4.5.3 The two-dimensional interpolating cubic B-spline The B-spline interpolant has been presented in a one-dimensional form. When using ﬁlters, such as the sinc or Gaussian, it is simple to extrapolate from the one- to the two-dimensional ﬁlter kernel by convolving two one-dimensional ﬁlter kernels, perpendicular to one another. This is illustrated in ﬁgures 4.38, 4.39, and 4.40. This two dimensional kernel can be directly convolved with the sample points. The B-spline kernel could also be treated like this. However, the usual way of producing a B-spline curve is to evaluate the coeﬃcients, gi and then use a simple, local B-spline kernel on these. The standard way of extending this to two dimensions is to evaluate the coeﬃcients for each scan-line in one direction, then, for each point for which a value is required, the coeﬃcients must be evaluated for the line, passing through the point, perpendicular to those scan lines; and hence the value at that point calculated (see ﬁgure 4.41). This is a very time consuming process. In practice, this limits the B-spline interpolant’s usefulness to scan-line algorithms rather than 2-D algorithms. However, if we treat the B-spline as a ﬁlter (equation 4.20), rather than as a calculation 4.5. INTERPOLATING CUBIC B-SPLINE Figure 4.38: The sinc kernel aligned along the x-axis. Figure 4.39: The sinc kernel aligned along the y-axis. 137 138 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION Figure 4.40: The two-dimensional sinc kernel, generated by convolving the two one-dimensional kernels shown in ﬁgures 4.38 and 4.39. Figure 4.41: The traditional extension of the one-dimensional interpolating cubic B-spline to twodimensional interpolation. Coeﬃcients are calculated for each scan-line in the horizontal direction, then to ﬁnd the value at any one point we need to calculate a complete set of coeﬃcients in the vertical direction — a time-consuming process, because in general, the required points can be anywhere in the plane, and so these vertical coeﬃcients will not be used for more than one point. 4.5. INTERPOLATING CUBIC B-SPLINE 139 of coeﬃcients, followed by a simpler ﬁlter, the two dimensional version is easy to construct from two one-dimensional versions, as in ﬁgures 4.38 through 4.40: h(x, y) = h(x) × h(y) where h(x, y) is the two-dimensional kernel and h(s) is the one-dimensional kernel. The question now arises, is it possible to generate a two-dimensional set of coeﬃcients and apply a simple local kernel (for example: the two-dimensional approximating cubic B-spline) to them to produce the same result as applying the complicated non-local kernel to the image sample values? If we assume an image of inﬁnite extent, then equation 4.20 is the one-dimensional ﬁlter kernel for the interpolating B-spline: h(s) = b(s) ∗ ˇl(s) where b(s) is the cubic B-spline kernel and ˇl(s) is the discrete kernel which is convolved with the data-values, fi , to produce the coeﬃcients, gi . The two one-dimensional versions of h(s) are: h(x), y = 0 hx (x, y) = 0, otherwise h(y), x = 0 hy (x, y) = 0, otherwise These can be convolved to produce the two-dimensional interpolating cubic B-spline kernel: h(x, y) = hx (x, y) ∗ hy (x, y) Expanding this out and rearranging terms we get: h(x, y) = bx (x, y) ∗ ˇlx (x, y) ∗ by (x, y) ∗ ˇly (x, y) = bx (x, y) ∗ by (x, y) ∗ ˇlx (x, y) ∗ ˇly (x, y) = b(x, y) ∗ ˇl(x, y) where b(x, y) is the two-dimensional approximating cubic B-spline kernel and ˇl(x, y) can be shown to be a discrete function with values: lx,y = lx × ly , x, y ∈ Z (ln is deﬁned in equation 4.19). Thus: lx,y = 3a|x|+|y| , x, y ∈ Z Now if we perform the discrete convolution: gx,y = lx,y ∗ fx,y then the two-dimensional B-spline kernel, b(x, y) applied to the gx,y coeﬃcients will produce an intensity surface which interpolates the fx,y data values. lx,y is an inﬁnitely wide ﬁlter. Its values, however, decrease exponentially as |x| + |y| increases (this rapid decrease is noted by Rasala [1990, p.584] in his discussion of the Bezier interpolant). This means that an excellent approximation to lx,y can be obtained by only considering values of x and y for which |lx,y | is greater than some constant. For example, if we consider all values such that |lx,y | ≥ 10−4 (an adequate lower bound for our purposes) then we need only consider values of x and y for which |x| + |y| ≤ 7. This gives a diamond of lx,y coeﬃcients to consider, with a diagonal length of 15 coeﬃcients (total, 113 coeﬃcients). This can be used to calculate all the gx,y coeﬃcients. There are the usual edge eﬀect problems near the image edges but these can be solved by any of the standard edge solutions in section 4.2.1. This method of producing the coeﬃcients diﬀers from the standard method. The standard method involves the solution of a one-dimensional system of equations for each scan-line. This system has 140 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION special end conditions (because scan-lines are ﬁnite in length) and thus cannot be represented as a digital ﬁlter12 . The special end-conditions cause a problem in that they mean that a solution cannot easily be found to a two-dimensional system of equations, and so we must resort to the time-consuming method shown in ﬁgure 4.41. A two-dimensional system of equations could be generated based on the fact that: fx,y = 1 (gx−1,y−1 + 4gx−1,y + gx−1,y+1 + 4[gx,y−1 + 4gx,y + gx,y+1 ] + gx+1,y−1 + 4gx+1,y + gx+1,y+1 ) 36 2 End conditions could be imposed for the edge values. This would produce a system of (N + 2) 2 equations in (N + 2) unknowns. This two-dimensional system does not have the tridiagonal property of its one-dimensional counterpart and so would take O(N 6 ) time to solve [Press et al, 1988]. It may be possible to reduce this order with a specially designed algorithm [ibid., sec. 2.10], but this possibility has yet to be investigated. The coeﬃcients resulting from the method of ﬁgure 4.41 are virtually identical to those generated by the ﬁlter method except near the image edges. Near the edges, the diﬀerent results are due to the diﬀerent edge solutions employed by the two methods. 4.5.4 Using the two-dimensional interpolating B-spline We have developed three diﬀerent ways of producing an intensity surface using the interpolating cubic B-spline. To evaluate the usefulness of each method, assume that the original image contains N × N sample values and that we need to evaluate the intensity surface at M × M points. That is: the reconstructor forms part of a resampler which also includes some sort of point sampler. The ﬁrst of the three methods is to convolve the two dimensional interpolating B-spline kernel, h(x, y), directly with the fx,y data values. Given these assumptions, resampling using this ﬁrst method is an O(M 2 ) process. The second method is to precalculate the gx,y coordinates using the lx,y ﬁlter on the fx,y data values. The local B-spline kernel, b(x, y), can then be applied to these. The precalculation is an O(N 2 ) process and the intensity surface evaluation is an O(M 2 ) process. Thus, overall this is an O(M 2 ) process (N is of order O(M )). The third method is that illustrated in ﬁgure 4.41. It involves precalculating one-dimensional coeﬃcients for each scan-line and then, for each point evaluated, calculating a perpendicular set of one-dimensional coeﬃcients and then using these to evaluate the value of the intensity surface at the point. The precalculation here is O(N 2 ), but the evaluation of the intensity surface values is O(N M 2 ), except in certain special cases where it can be reduced to only O(M 2 ). This makes the overall process O(M 3 ) (except in those special cases). Thus, either of the ﬁrst two cases reduces the time from O(M 3 ) (for the third, traditional, method) to O(M 2 ). Which of these two is faster depends on the actual timings of the two, and on the values of N and M . As a rough guide, let us say that the execution time of the two methods can be written as follows: T1 ≈ k1 M 2 T2 ≈ cN 2 + k2 M 2 Now, for each intensity surface value evaluated by the ﬁrst method, 113 numbers have to be computed and summed (assuming we adopt the diamond-shaped kernel of page 139); while the total for the second method is only 16. Thus k1 ≈ 7k2 . For the second method to be faster than 12 Remember that we originally produced the ﬁlter version by assuming an inﬁnitely long scan-line, and therefore no end conditions. 4.5. INTERPOLATING CUBIC B-SPLINE Method 1. Direction convolution 2. Filter precalculation 3. Traditional method 141 Precalculation — O(N ) O(N ) Calculation O(M ) O(M ) O(M ) Table 4.3: The orders of the operations involved in each of the three ways of calculating the one-dimensional, interpolating cubic B-spline. Operation Multiplies Divides Adds Table lookups Function value lookups k1 62 — 74 60 15 k2 18 — 15 16 — k3 18 — 15 16 — c2 15 — 14 15 15 c3 1 3 5 — 3 Table 4.4: The operations involved in calculating a value for each constant in timing equations for each of the three ways of calculating the one-dimensional, interpolating cubic B-spline. the ﬁrst thus requires: cN 2 < 6 k1 M 2 7 It can be shown that k1 is between 2 and 3 times c (see appendix B.8) and thus, for T2 > T1 , we require: N 2 < 2M 2 Therefore, so long as the destination image has the same or a greater number of pixels as the source image, the second method should be noticeably faster. On the other hand if the destination has signiﬁcantly less pixels than the source (say M 2 ≤ 14 N 2 ), then the ﬁrst method will be noticeably faster. The third method, being O(M 3 ) will be slower than both, except, perhaps, for very small M. 4.5.5 Using the one-dimensional interpolating B-spline While this O(M 3 ) behaviour makes the traditional method unsuitable for two-dimensional reconstruction, it suﬀers no such problem in one-dimension. Do these new methods of evaluating the interpolating B-spline oﬀer any advantages in one-dimensional reconstruction? Again, assume that the one-dimensional source ‘image’ contains N sample values and the destination, M sample values. Table 4.3 gives the time complexities of the three methods. They are all of the same order. To get some idea of how they compare, we will again produce some approximate timings. Assume: T1 T2 ≈ k1 M ≈ c2 N + k2 M T3 ≈ c3 N + k3 M The number of operations involved in each of these constants is shown in table 4.4. This shows that k2 = k3 (the process is identical for both methods), k1 ≈ 4k2 , and c2 ≈ k2 . Thus, for the second method to be faster than the ﬁrst requires N < 3M , that is the source image is at most three times the length of the destination. The second and third methods are identical for calculating the destination image values. Comparing their times for precalculating the coeﬃcients depends on how long a divide operation takes compared to a multiply. If a divide is about ﬁve times the 142 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION length of a multiply the two methods take about the same time; if less then the third is faster; if more then the second is faster. Tests on one particular machine (a DEC Decstation 5400 running Ultrix 3.1 and X11R4) indicated that a divide was about ﬁve times the length of a multiply. This indicates that both methods are as fast as each other, although the second method uses less storage than the third. Thus, for shrinking a one-dimensional image by a factor smaller than a third, the ﬁrst method is fastest. For other cases there is little to choose between the second and the third (traditional) method. However, as we have seen, for two-dimensional images the third method is considerably slower than either of the two new methods. 4.6 Local approximations to inﬁnite reconstructors The inﬁnite extent reconstructors are good theoretical devices. With appropriate edge solutions they can be used in a real situation, operating on an entire image. In practice, however, resampling is a faster operation if a smaller local reconstructor is used. Thus we seek local approximations to these non-local reconstructors. These are two basic types of approximation: either a windowed version of the inﬁnite reconstructor, or a local polynomial approximation, matching the shape of the inﬁnite reconstructor. We have already discussed several members of this latter class. In particular, the one-parameter family of cubics (equation 3.20) was originally developed by Rifman [Keys, 1981, p.1153] as an eﬃcient approximation to the sinc reconstructor [Park and Schowengerdt, 1983, p.260; Wolberg 1990, p.129]. As such, Rifman set the parameter a = −1 in order to match the slope of the sinc function at s = ±1. Since then, Keys [1981] and Park and Schowengerdt [1983] have shown that a = − 12 (the Catmull-Rom spline) is a better choice. It is still, of course, a valid approximation to a sinc function. The local B-splines, on the other hand, are local approximations to the Gaussian [Heckbert, 1989, p.41]. The approximating cubic B-spline is the most used member of this family. The other form of approximation to an inﬁnite function is a windowed version of that inﬁnite function. Windowing is a simple process where the inﬁnite function’s kernel is multiplied by an even, ﬁnite-extent window kernel. The simplest window function is a box, which has the value one within certain bounds and zero outside. Windowing by a box function is commonly called truncation. Truncation’s eﬀectiveness depends on the nature of the inﬁnite function being truncated. If the inﬁnite function’s values are negligible outside the truncation bounds then the eﬀect of truncation on the function will also be negligible. However, if the function’s values outside these bounds are not negligible then truncation will have a noticeable, usually deleterious, eﬀect on the reconstructor. Table 4.5 gives the bounds beyond which the listed functions’ magnitudes remain below each of the listed limit values. Note that all the functions are normalised so that the area under the curve 2 is one. This especially relates to the the two 2−kx Gaussians, which are usually presented in their unnormalised form. (Note that the Gaussians chosen by the diﬀerent researchers are all similar in value with standard deviations, ranging between σ = 0.42 and σ = 0.61.) All of the tabulated functions, except for the sinc function, decay exponentially. All of the exponential methods’ kernel values become negligible within at most ten sample spaces. The fact that these reconstructor kernels become negligible so quickly means that they can be truncated locally with negligible loss of quality [Heckbert, 1989 p.41; Wolberg, 1990, pp.143–146]. Examples of such truncation include Ratzel’s [1980, p.68] truncation of his Gaussian to |x| < 1.3 and of the sharpened Gaussian to |x| < 2.3 [ibid., p.69]. This truncation limit on the Gaussian would appear to be a little low. Turkowski [1990, pp.153–154] truncates his 2−2x ﬁlter to |x| < 2 and his 2−4x ﬁlter to |x| < 1.5. 2 2 4.6. LOCAL APPROXIMATIONS TO INFINITE RECONSTRUCTORS ±0.01 Function 5 8ln2 −4x2 2 2π Limits ±0.001 ±0.0001 143 Source 1.280 1.571 1.816 Gaussian; Turkowski Wolberg [1990] [1990], 1.480 1.828 2.119 Gaussian; Heckbert [1989, p.41] 1.532 1.895 2.199 Gaussian; Ratzel [1980], Schreiber and Troxel [1985] 1.740 2.165 2.520 2.212 2.650 2.992 1.81 2.855 3.77 ≈ 31.5 2.79 4.804 6.52 ≈ 318.5 3.76 6.735 8.68 ≈ 3183.0 Gaussian; Turkowski [1990], Wolberg [1990] Ratzel [1980], Schreiber and Troxel [1985] Wolberg [1990, pp.145–146] section 4.5 Wolberg [1990, pp.145–146] section 4.3 2 −1 x 1√ e 2 (0.5)2 0.5× 2π 2 − 12 x 2 1√ (0.52) e 0.52× 2π 5 4ln2 −2x2 2π 2 Sharpened Gaussian tanh, k = 4 Interpolating B-spline tanh, k = 10 sinc(x) Table 4.5: The functions on the left are all normalised. The table shows the minimum value of x above which the functions value never goes outside the listed limits. All values are given to 3 or 4 signiﬁcant digits. On the right are the sources for each function. His basic assumption appears to be that three decimal places are suﬃcient accuracy for his purposes [see ibid., ﬁgures 9, 10, 13, and 14]. These limits are close approximations to the 0.001 limits in table 4.5. Wolberg [1990, p.146] reports that the tanh ﬁlter yields excellent results when truncated to |s| ≤ 3. The numbers in table 4.5 conﬁrm this for the k = 4 ﬁlter, truncated to the 0.001 limit, but throw doubt on it for the k = 10 ﬁlter. Wolberg does not make it clear which of the two ﬁlters this truncation limit is applied to, presumably it is the k = 4 version. In section 4.5.3 we truncated the interpolating B-spline to |s| ≤ 7, that is to the 0.0001 level. Judging from the other results, this is a little conservative. Limiting at the 0.001 level, say |s| ≤ 5, seems to be adequate, and agrees with Rasala’s [1990, p. 584] estimation of the truncation limit. (Rasala estimates that for practical purposes a point is aﬀected by values up to seven samples away on either side. However, he comments that this estimate is pessimistic and that given certain limits [met by our application] only values up to ﬁve samples away need be considered). 4.6.1 Sinc The sinc function decays as x1 . Table 4.5 shows that the practical outworking of this is that sinc cannot be truncated locally without severely aﬀecting its quality. Here, ‘locally’ means within at most a dozen sample spaces. Ratzel [1980, p. 69] used a sinc truncated to |x| ≤ 10. He found that it performed poorly. While for a Gaussian this truncation limit is adequate, even excessive, for a sinc it is nowhere near big enough to make the truncation errors negligible. A solution to this problem is to multiply the sinc function by a better window function, one which falls oﬀ smoothly to zero. Wolberg [1990, pp. 137-143] and Smith [1981, 1982] describe the common windowing functions used with the sinc interpolant. These windows were all originally derived for use in one-dimensional signal processing such as sound or radar signals. Visual signals tend to be a little diﬀerent (see section 2.7) and so we should not expect the results from these types of signal to necessarily hold for images [Blinn, 1989a]. Unfortunately, any window, except a very local one, will leave the resulting reconstructor with 144 CHAPTER 4. INFINITE EXTENT RECONSTRUCTION large side lobes beyond the ﬁrst. These will produce ringing artifacts in the reconstructed intensity surface. Of the very local reconstructors, the best appears to be the Lanczos-3 window [Turkowski, 1990; Wolberg, 1990; Blinn, 1989b], which is: sinc( 3s ), |s| < 3 Lanczos3(s) = 0, otherwise This gives a windowed function: S(s) = sinc(s) sinc( 3s ), 0, |s| < 3 otherwise However, a window this local produces a function which is quite diﬀerent to the original sinc function. Hence, neither type of windowing is ideal for the sinc function. 4.7 Summary We have discussed the problem of edge eﬀects and the methods of edge extension. The best edge extension solutions are windowed extrapolation, and reﬂection. Constant edge extension is the fastest to evaluate. Several of the inﬁnite extent reconstructors have been examined. Sinc and Gaussian are well-known reconstructors, and we brieﬂy examined these and some of their variants. We have derived the kernels of the interpolating cubic B-spline and the interpolating Bezier cubic. These have been shown to be identical, and the kernel has allowed us to implement an eﬃcient two-dimensional version of the interpolating cubic B-spline. Finally, we save that all of the common inﬁnite reconstructors can be locally truncated with practically no loss of quality, except for the sinc reconstructor. Any non-local windowing of sinc will not remove the problem of ripply artifacts, where as a local windowing will. However, the local windowing will also signiﬁcantly change the reconstructor. We now go on to consider a way of implementing an approximation to sinc reconstruction which implicitly uses the replication edge extension method. Chapter 5 Fast Fourier Transform Reconstruction The theoretically perfect reconstructed intensity surface is generated by convolving the image with a sinc function. Whatever the advantages and disadvantages of this reconstruction method, it is sometimes necessary to implement it. Sinc reconstruction is, however, extremely expensive to implement. We saw in section 4.6.1 that the sinc function cannot be truncated to produce a local reconstructor without severe artifacts; windowing gives better results but is still not ideal. To implement ideal sinc reconstruction for use with point sampling and constant edge extension will be an O(N 2 ) process for every sample point for an N × N image. Thus resampling an N × N image to another N × N image with single point sampling will be an O(N 4 ) process. Fraser [1987, 1989a, 1989b] presents a method which uses the fast Fourier transform (FFT) to produce practically identical results to sinc reconstruction in O(N 2 log N ) time. This chapter discusses his method and presents two variations on it: one to improve its accuracy near image edges; the other to increase its speed and reduce its memory requirements. 5.1 The reconstructed intensity surface The fast Fourier transform techniques are fast ways of implementing the discrete Fourier transform (DFT). The DFT transforms a ﬁnite-length discrete image into a ﬁnite-length discrete frequency spectrum. The intensity surface generated by DFT reconstruction is the periodic function described by a Fourier series consisting of the DFT coeﬃcients below the Nyquist frequency and zeros above it [Fraser, 1987]. No doubt this statement requires some explanation. Given a one-dimensional sequence of N samples, fj , the DFT produces N frequency values, Fk [Lynn and Fuerst, 1989, p.212]: Fk = N −1 fj e−i2πjk/N , k ∈ {0, 1, . . . , N − 1} j=0 These two discrete series, fj and Fk , each form the weights of periodic weighted comb functions: ι(x) = I(ν) = ∞ j=−∞ ∞ fj δ(x − j ), fj = fj−N N Fk δ(ν − k), Fk = Fk−N k=−∞ 145 146 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION f (a) F j (b) k DFT j k 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 ι( x ) (c) x 0 1 I( ν ) FT 2 (d) ν -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Figure 5.1: The relationship between the discrete and continuous representations of a sampled function. (a) shows the discrete samples, which can be discrete Fourier transformed to give (b) the DFT of (a). (c) shows ι(x), the continuous version of (a), it consists of an inﬁnite periodic set of Dirac delta functions. Its Fourier transform, (d), is also an inﬁnite periodic set of Dirac delta functions and is the continuous representation of (b). (a) (b) Figure 5.2: The three possible forms of the box function: m 1, |x| < 2 1, |x| < m 1, 2 1 m (b) b2 (x) = (c) b3 (x) = (a) b1 (x) = 2 , |x| = 2 m 0, |x| ≥ 2 0, 0, |x| > m 2 (c) |x| ≤ |x| > m 2 m 2 Watson [1986] shows that these two comb functions are Fourier transforms of one another. The relationship is illustrated in ﬁgure 5.1. Now, to perfectly reconstruct f (x) from ι(x) we convolve it with N sinc(N x). This is equivalent to multiplying I(ν) in the frequency domain by a box function, suppressing all periods except the central one between ν = −N/2 and ν = N/2. An important question here is what happens at the discontinuities of the box function? Figure 5.2 gives the three plausible answers to this: the value at the discontinuity can be zero, a half, or one. In continuous work it does not matter particularly which version is used. Here, however, we are dealing with a weighted comb function. If this comb has a tooth at the same place as the box function’s discontinuity then it makes a good deal of diﬀerence which version of the box function is used. Fraser [1987, p.122] suggests that the central version be used (where b2 ( N2 ) = 12 ). His reasoning is that all components of the Fourier sequence appear once in the box windowed function 5.1. THE RECONSTRUCTED INTENSITY SURFACE 147 but the component at the Nyquist frequency occurs twice (once at each end) and so must be given a weight of one half relative to the other components. A more rigorous reason can be found from taking the Fourier transform of the spatial domain sinc function: ∞ N sinc(N x)e−i2πνx dx, N > 0 B(ν) = −∞ This can be shown (appendix B.2) to be: 1, 1 B(ν) = 2, 0, |ν| < |ν| = |ν| > N 2 N 2 N 2 (5.1) Thus we see that the middle version of the box function is the mathematically correct one to use1 . Applying the box function (equation 5.1) to the function I(ν) gives a non-periodic weighted comb function, V (ν): V (ν) = I(ν) ∗ B(ν) ∞ = Vk δ(ν − k) k=−∞ Vk Fk , F N −k , = 1 2 FN/2 , 0, Note that there will only be coeﬃcients at |k| = N 2 0≤k< N 2 − N2 < k < 0 |k| = |k| > N 2 N 2 when N is even. The coeﬃcients of V (ν) form a Fourier series which describes a continuous function in the spatial domain [Lynn and Fuerst, 1989, p.338]: f (x) = ∞ Vk ei2πkx (5.2) k=−∞ This is the intensity surface reconstructed by DFT reconstruction. The whole process is illustrated in ﬁgures 5.3 and 5.4. This same sequence of events can be seen in Watson [1986, ﬁgs 2 and 4] except that Watson uses the third version of the box function (b3 ( N2 ) = 1) rather than the second. The reconstructed intensity surface is thus perfectly reconstructed from the samples by theoretically convolving it with a sinc function and, in practice, using the DFT method. The signiﬁcant fact about this surface is that it is periodic. Edge extension has been done by replication and thus the inﬁnite convolution of ι(x) by N sinc(N x) can be performed in ﬁnite time: the value at any point on the surface can be found by evaluating the sum of N or N + 1 terms in equation 5.2, although this method of evaluation is as expensive as sinc convolution of the spatial domain samples. The edge extension implicit in the DFT method is diﬀerent from that which must be used for sinc interpolation (section 4.3). There we saw that all pixels beyond the image edges must be given the value zero (or somehow be faded oﬀ to zero). Here edge extension is by copying, which means that edge eﬀects will be visible in the intensity surface unless we get a fortuitous case where the edges match [Fraser, 1989a, pp.667–668] (section 4.2.1). 1 Any of the versions, when Fourier transformed, gives the sinc function, but the inverse transform gives the middle version of the box function. 148 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION ι( x ) (a) x 0 1 2 1 2 1 2 8sinc(8 x) (b) 8 x 0 f( x) (c) x 0 Figure 5.3: The spatial domain representation of FFT reconstruction. The sampled function, ι(x) is shown in (a). It is Fourier transformed to give ﬁgure 5.4(a). This Fourier transform is multiplied by a box function (ﬁgure 5.4(b)) to give a ﬁnite extent function (ﬁgure 5.4(c)) in the frequency domain. This is inverse Fourier transformed to give the continuous function (c) here. Multiplication by a box in the frequency domain is equivalent to convolving by a sinc function, (b), in the spatial domain. 5.2 DFT sample rate changing The DFT can be used to change the sample rate of the image, that is: to scale the image. The seed of this idea can be found in Schafer and Rabiner [1973]. The idea itself was developed by Prasad and Satyanarayana [1986] and modiﬁed slightly by Fraser [1989a] to use the correct box function. Their implementation only allows for magniﬁcation by an integer power of two. Watson [1986] independently developed the idea, though he does not use the correct box function in part of his implementation. His algorithm allows for scaling by a rational factor such that, if the original sequence is N samples and the ﬁnal L samples, then scaling is by a factor L/N . Watsons’ algorithm can thus be used to scale from any whole number of samples to any other whole number of samples whilst Fraser’s can only scale from N to 2n N, n ∈ N . 5.2.1 How it works The following explanation is based on Watson’s [1986] work, with the modiﬁcation that the correct box function is used in the ﬁrst multiplication (Watson already uses it in the second stage so it is surprising that he is inconsistent). We ﬁrst present the algorithm in the continuous domain, then explain how it can be implemented digitally. Again the one-dimensional case is used for clarity. The continuous version of the DFT sample rate changing process is illustrated in ﬁgure 5.6 (miniﬁcation) and ﬁgure 5.7 (magniﬁcation). Our image sample values represent the weights on one period of a periodic, weighted comb function (ﬁgure 5.6(a)). This is Fourier transformed to give another periodic weighted comb function (ﬁgure 5.6(b)). The Fourier transform is then correctly 5.2. DFT SAMPLE RATE CHANGING 149 I( ν ) (a) ν -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 B( ν ) (b) 1 ν -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 V( ν ) (c) ν -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Figure 5.4: The frequency domain representation of FFT reconstruction. See the caption of ﬁgure 5.3. (a) (b) (c) (d) (e) (f) Figure 5.5: Critical sampling. The sine waves in (a), (c), and (e) are sampled at exactly twice their frequencies. The perfectly reconstructed versions are shown in (b), (d), and (f) respectively. The function is only correctly reconstructed when the samples are in exactly the right place. 150 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Spatial Domain Frequency Domain (a) (b) FT -0.5 0 0.5 1 1.5 (c) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (d) FT -0.5 0 0.5 1 1.5 (e) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (f) FT -0.5 0 0.5 1 1.5 (g) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (h) FT -0.5 0 0.5 1 1.5 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Figure 5.6: An example of ‘perfect’ miniﬁcation. The sampled spatial function (a) is transformed to give (b), this is multiplied by a box function producing (d) [and thus generating a continuous function (c) in the spatial domain]. (d) is multiplied by another box function to band-limit it [ﬁltering the spatial domain function (e)], and then sampling is performed to replicate the frequency spectrum (h) and produce a miniﬁed image in the spatial domain (g). box ﬁltered (ﬁgure 5.6(d)) which gives a continuous intensity surface in the spatial domain (ﬁgure 5.6(c)). To resample this intensity surface at a diﬀerent frequency (equivalent to scaling it and then sampling it at the same frequency) we ﬁrst perfectly preﬁlter it, using another correct box function in the frequency domain. If we are enlarging the image this makes no diﬀerence to the image (ﬁgure 5.7(f)), if reducing it then the higher frequency teeth in the comb function are set to zero and any component at the new Nyquist frequency is halved (ﬁgure 5.6(f)). We can then point sample at the new frequency, which produces periodic replication in the frequency domain. Figure 5.6(h) shows this for reduction and ﬁgure 5.7(h) for enlargement. Notice that the ﬁrst multiplication with a box is redundant for reduction and the second is redundant for enlargement. So, in either case only one multiplication by a box function is required. Note also that the periodic copies move farther apart in enlargement, as if they had been pushed apart at the Nyquist frequency (and its copies); and they have been pushed closer together in reduction, hence the need to preﬁlter to prevent overlap and thus aliasing. The overlap at the Nyquist frequency (and its copies at (2N + 1)νN , N ∈ Z) appears to be acceptable [Watson, 1986]. Here the negative and positive Nyquist frequencies add together (hence, again, the need for the box function to have half values at the Nyquist frequencies). With a real spatial domain function this will mean that the even component at this frequency will be doubled and the odd component will cancel itself. So, if the original spectrum is non-zero at the Nyquist frequency then samples at the sampling frequency preserve only the even portion of this component. This is known as critical sampling [Watson, 1986, p.4]. Figure 5.5 shows three sine waves sampled at twice their own frequency, and the perfectly reconstructed version of these: only the even component survives. 5.2. DFT SAMPLE RATE CHANGING 151 Spatial Domain Frequency Domain (a) (b) FT -0.5 0 0.5 1 1.5 (c) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (d) FT -0.5 0 0.5 1 1.5 (e) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (f) FT -0.5 0 0.5 1 1.5 (g) -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 (h) FT -0.5 0 0.5 1 1.5 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 Figure 5.7: An example of ‘perfect’ magniﬁcation. See the caption of ﬁgure 5.6 for a description of the processes. Note that multiplying (d) by the appropriate box function to give (f) has no eﬀect on the function in this magniﬁcation case. Whilst this theoretical explanation clearly shows that a continuous intensity surface is generated and resampled, as a practical reconstruction method a super-skeleton surface is generated, because samples can only be produced at regularly spaced points, not at any arbitrary point. To be able to sample at any point there would need to be an inﬁnite number of points generated in each period, and hence an inﬁnite length of time to evaluate all the points (they must all be evaluated in a single operation using this method). 5.2.2 Practical implementation This algorithm is implemented using the discrete Fourier transform (normally its fast version: the fast Fourier transform (section 5.2.4)). The process mirrors that shown in ﬁgure 5.6 and ﬁgure 5.7. In this section we consider ﬁrst the case of discrete reduction and then that of enlargement. In both cases we use even length sequences in our examples. We consider the diﬀerences between even and odd length sequences in the next section. For reduction, the image (ﬁgure 5.8(a)) is discrete Fourier transformed (ﬁgure 5.8(b)). Notice that this transform has only one Nyquist component (here shown at the negative end of the spectrum). The periodic nature of the continuous equivalent means that the other Nyquist component, and all the other copies of this component, have the same value. The transformed data are then multiplied by a box function the same width as the new sample sequence. For an even length sequence, this creates a sequence of length one more that the new sample sequence, with a Nyquist frequency component at each end (ﬁgure 5.8(c)). These two components are added together to give a single Nyquist component, here shown at the negative end of the period (ﬁgure 5.8(d)). Compare this with ﬁgure 5.6(f) and (h) where sampling in the spatial domain causes the copies of the spectrum 152 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION DFT - (a) 0 1 2 3 4 5 6 7 8 9 (b) -5 -4 -3 -2 -1 0 1 2 3 4 ? × 12 ? × 12 ? ? ? ? ? ? ? (c) (e) -3 -2 -1 0 1 2 3 ?? + ? ? ? ? ? ? IDFT (d) 0 1 2 3 4 5 -3 -2 -1 0 1 2 Figure 5.8: An example of DFT miniﬁcation. The image, (a), is discrete Fourier transformed, giving (b), which is multiplied by a box function to give (c). The two Nyquist components are collapsed into one, producing (d), which is inverse discrete Fourier transformed, thus generating (e), a miniﬁed version of (a). DFT - (a) 0 1 2 3 4 5 6 7 8 9 (b) -5 -4 -3 -2 -1 0 1 2 3 4 ? × 12 ? ? ? ? ? ? ? ? ? ? ? (c) -5 -4 -3 -2 -1 0 1 2 3 4 5 IDFT ? ? ? ? ? ? ? ? ? ? ? 0 0 0 0 0 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 (d) ? (e) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 5.9: An example of DFT magniﬁcation. The image, (a), is discrete Fourier transformed, giving (b). The Nyquist component is split into two halves, producing (c). This is padded with zeros at either end to give (d), which is inverse discrete Fourier transformed, thus generating (e), a magniﬁed version of (a). to overlap at the Nyquist frequency and its copies, and hence add up at these points. The modiﬁed sequence can now be inverse transformed to give the resampled image (ﬁgure 5.8(e)). Enlargement is a similar process. The image (ﬁgure 5.9(a)) is discrete Fourier transformed to give its frequency domain representation (ﬁgure 5.9(b)). The Nyquist frequency component is split into two halves, one at each end of the sequence (ﬁgure 5.9(c)). The sequence is now one sample longer. Compare this step with ﬁgure 5.7(b) and (d) where the positive and negative Nyquist components are both halved leaving a sequence with one tooth more in ﬁgure 5.7(d) that the number of teeth in each period in ﬁgure 5.7(b). The altered transformed sequence in ﬁgure 5.9(c) is now padded with zeroes at each end to give a sequence of the desired length (ﬁgure 5.9(d)). Note, in this example, 5.2. DFT SAMPLE RATE CHANGING 153 (a) (b) 0 1 2 3 4 5 6 7 8 9 ? ? ? ? ? ? ? ? ? ? -5 -4 -3 -2 -1 0 1 2 3 4 Figure 5.10: Comparison of the usual FFT output order and the DFT output order used in ﬁgure 5.8 and ﬁgure 5.9. At the top is the usual FFT output order. The Nyquist component is number 5. This can be easily converted to our DFT output order, shown at the bottom. The Nyquist component here is −5. that one more zero is added at the negative end than at the positive end because we are assuming that the Nyquist component is at the negative end of the spectrum. The discrete spectrum is ﬁnally inverse discrete Fourier transformed to produce the desired enlarged image (ﬁgure 5.9(e)). 5.2.3 Odd vs even length sequences All of the examples, up to this point, have used even length sequences. The processing required for an odd length sequence is slightly diﬀerent and slightly easier. This is due to the fact that there is no frequency component at the Nyquist limit in an odd length sequence. Refer back to the continuous case in ﬁgure 5.6(a) and (b). In general, for a sequence of length L, there are L samples (teeth) in a unit distance in the spatial domain. Each period in the frequency domain is L units long, with the teeth of the comb function at integer locations. The Nyquist frequency is at ± L2 . For an even length sequence there are components (teeth) at these frequencies; for an odd length sequence there are no components at these frequencies because they are not at integer locations. There are therefore no special case frequencies to consider when dealing with an odd-length sequence, because there is no Nyquist component. 5.2.4 Implementation using the FFT Implementing this algorithm using a fast Fourier transform is straightforward. Two transforms are required, one of the original length and one of the ﬁnal length. Provided an FFT algorithm can be implemented for both lengths the whole algorithm can be implemented. All FFT algorithms decompose the DFT into a number of successively shorter, and simpler DFTs [Lynn and Fuerst, 1989, p.221]. Thus an FFT of length 2n , n ∈ N is well known and widely used. Decomposition of other, non-prime length DFTs is less widely used. Finally, FFTs for prime length DFTs or lengths with large prime factors are most diﬃcult, because the DFT has to be split into pieces of unequal length. One important implementation issue is that most FFT algorithms produce N coeﬃcients in the frequency domain ranging from ν = 0 to ν = 2νN− 1. The examples shown here in ﬁgures 5.8 and 5.9 show them ranging from ν = − νN to ν = νN − 12 . This is no great problem as the values generated are exactly the same because the sequence is periodic with period 2νN (see ﬁgure 5.1). Figure 5.10 shows the equivalence. So long as this is borne in mind in implementation, no problems will arise. 154 5.3 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Using the FFT to produce a continuous reconstructed intensity surface As pointed out in section 3.2, the FFT method is a super-skeleton reconstruction method. As it stands it is only useful for scaling and a limited set of other transforms. Fraser [1987, 1989a, 1989b] takes this FFT method and extends it to produce a continuous intensity surface, which is thus useful for any transform. Fraser’s idea is to use the FFT method to generate several extra points between the existing sample points. If the FFT method is used to scale by a factor of M (an integer) then M − 1 extra sample points will be generated between each adjacent pair of original sample points. A local interpolant is then used to interpolate between these points, which all lie on the theoretically perfect intensity surface (equation 5.2). The FFT enlargement eliminates the majority of the rastering artifacts that would arise from applying the local interpolant to the original samples. In practice Fraser [1987, p.122] scales the image by a factor of four or eight using the FFT method and then employs the Catmull-Rom cubic interpolant to produce the intensity surface. It is proﬁtable to compare interpolation by the sinc function, the Catmull-Rom cubic and the FFT/Catmull-Rom interpolant. Ignoring edge eﬀects, ﬁgure 5.11 compares these three interpolants in the frequency domain. The sinc function preserves the central copy of the signal, as is required for perfect interpolation. The Catmull-Rom interpolant attenuates some frequencies below the Nyquist frequency and passes some above it, thus the reconstructed signal is a distorted version of the ‘perfectly’ reconstructed signal. The FFT/Catmull-Rom method consists of two steps. The ﬁrst, the FFT step, produces a spectrum which has copies of the original spectrum at four or eight times the spacing in the sampled signal, with gaps between them of three or seven times the sampling frequency respectively. In the second step, where the Catmull-Rom interpolant is convolved with the FFT scaled signal, the central copy is almost perfectly preserved and the other copies are almost completely suppressed. Thus the FFT/Catmull-Rom method produces a nearly perfect reconstructed surface. 5.3.1 Implementation There are two distinct steps in this process, as illustrated in ﬁgure 5.12. First the image is ‘enlarged’ by a factor of, say, four and then this intermediate image is reconstructed using the Catmull-Rom cubic to give the continuous intensity surface. The FFT process requires a large amount of extra storage for the intermediate image. The intermediate image is generated in its entirety by a single FFT and cannot be generated a piece at a time on demand (but see section 5.4.2). Further, storage must be provided for the frequency domain representation of the image. This can be stored in the same location as the images if an in-place FFT is used [Lynn and Fuerst, 1989]. However the image data is real while the frequency domain data is complex; hence the frequency domain data requires twice the storage space of the image data if a naive FFT algorithm is used. Two features of the process enable it to be implemented more eﬃciently than a general FFT. Firstly, the image data is real, so the frequency domain data exhibits conjugate symmetry; that ∗ . This symmetry enables us to halve the number of operations required for both the is: Fk = F−k forward and inverse fast Fourier transforms [Watson, 1989, p.15] and halve the storage required for the frequency domain data. Secondly, there are a large number of zero values in the frequency domain image that is to be inverse transformed. Operations on these zeros are unnecessary because the FFT consists of several multiply and add stages. Multiplying by zero and adding the product gives the same result as doing nothing. Some savings can be made by implementing a special version of the FFT which does not perform the unnecessary operations. Smit, Smith and Nichols [1990] discuss precisely this version of the FFT. They show that, for scaling an image of size N by 5.3. FFT INTENSITY SURFACE RECONSTRUCTION 155 F( ν ) (a) −ν N ν F( ν ) ν N (b) −ν ν N ν N F( ν ) (c) −ν ν N ν N F( ν ) (d) −ν ν −ν N N (new) (old) N (old) ν ν N (new) F( ν ) (e) −ν N ν ν N Figure 5.11: A comparison of sinc, Catmull-Rom, and FFT/Catmull-Rom reconstruction. (a) shows the Fourier transform of the sampled function; (b) is the sinc reconstructed function, there are no aliases in this function; (c) is the Catmull-Rom reconstructed function, it contains aliases. (d) is the FFT scaled function, which has the eﬀect of pushing the components farther apart (in this ﬁgure they are moved twice as far apart). When (d) is reconstructed using a Catmull-Rom function, fewer aliases are produced (e). In a real situation the components will be moved farther apart than shown here, reducing the errors further. 156 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Image FFT Scale ×4 - Image Catmull-Rom reconstruction - Intensity Surface Figure 5.12: FFT/Catmull-Rom reconstruction is a two-stage process, requiring production of an intermediate (scaled) image. The scaling can be ×4 or ×8. a factor of M , a reduction in time of around log2 N/ log2 M N can be achieved when M and N are both powers of two. Combining both savings, a 512 sample one-dimensional sequence magniﬁed by a factor of four takes just over 40% of the time that a naive FFT implementation would take. The saving for a 512 × 512 two-dimensional image is the same. 5.4 Problems with the FFT reconstructor There are two major problems with the FFT reconstructor. Firstly, edge eﬀects appear in the intensity surface to a considerable distance from the edges. Secondly, the process is both time and memory intensive. We tackle these two problems separately and draw them together at the end. 5.4.1 Edge eﬀects Inherent in the FFT reconstructor is the assumption that edge extension is done by replicating (copying) the image. Looking back to section 5.1 we can see this in the explanation of FFT scaling. Except in exceptional cases, there will be a sharp change in intensity along the edges of the image where the replicas join. Figure 4.20(top) gives an example of this. From section 4.2.1 we know that perfect reconstruction causes unsightly ripples for some distance around such sharp changes. The FFT method implements perfect reconstruction and so there will be ripples visible around the ‘edge’ of the intensity surface. Figure 4.20(bottom) gives an example of these ripples. Visual examination of images with large intensity diﬀerences at the edges indicates that these ripples can extend twelve to sixteen original sample spacings from the edge. This means that around twelve percent of a 512 × 512 image is degraded by this process. A larger proportion is degraded in smaller images. Solving the edge eﬀect problem Fraser [1989a, p.668] noted the errors due to edge eﬀects, and that low errors occur when there are “fortuitous end-around eﬀects”, that is: when the intensities at opposite edges are very close. We simply cannot depend on good fortune to always provide images with nice end-around eﬀects. Indeed, most images do not fall into this category. Fraser [1989a, p.669; ﬁgs. 3, 4; 1989b, p.269 & ﬁg. 1] avoided the edge eﬀects in his experimental measurements by considering only the central portion of the image, remote from the edges and any edge eﬀects. In resampling work, however, it is usually necessary to use the entire image and so the edge eﬀect problem cannot be avoided. 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR 157 The solution is to produce a situation where the intensities at opposite edges of the image match before the image is transformed. There are several ways to do this: 1. window the image; 2. extend the image so that the edges match; 3. linearise the image; 4. use reﬂection in place of replication. (1) Window the image Fraser [1989a, pp.671–3] solves the edge eﬀect problem by end-windowing the image. He applies cosine half-bells to each end of each row and column in the image to form a new image whose intensities fall oﬀ smoothly to zero at the edges. Fraser showed that only a small amount of windowing can signiﬁcantly aﬀect the result. A window covering only four sample spacings (two at each end) reduces the distance from the edge that errors are noticeable to around a sixth of the distance without windowing, in the cases where the ripples are most noticeable. The problem with this method is that it causes a diﬀerent visual artifact, known as vignetting. This eﬀect is illustrated in ﬁgure 4.19. Windowing reduces the rippling artifact near the edges of the image at the expense of introducing the vignetting artifact. In places where the rippling was less noticeable, the vignetting may cause larger errors than the ripples would have. That is: it is possible for the vignetting to be more objectionable than the rippling. (2) Extend the image so that the edges match This is similar to the windowing idea. The image is left unchanged in this case and a few extra lines of pixels added around the outside to ‘fade oﬀ’ to some constant value. A more sophisticated algorithm than simple fading could be tried (section 4.2.1). Once the image has been enlarged by the FFT method the extra lines can be discarded. A simple algorithm can be developed which fades each row and column to a constant value within a few pixels. A problem with this method is that many FFT algorithms are designed for sequences of length 2n , n ∈ N , and many images have heights and widths which are also powers of two. Extending an image by a few pixels before transforming will be extremely inconvenient in these cases. Extending an image to twice its width and height (with mostly some constant value in the extension) would be more convenient (ﬁgure 5.13). (3) Linearisation The idea behind linearisation is to remove a linear trend from the data, so that the two ends match in intensity, then do the processing, and ﬁnally add the trend back in. This process, illustrated in ﬁgure 5.14, can be described for one dimension as follows: Given a set of N data points, fn , subtract a linear trend from the data to give a new sequence gn : gn = fn − sn where: s = fN −1 − f0 N Expand gn by a factor Z, using the FFT method, to give hn . This can be converted back to an expansion of fn by adding back the linear trend: n in = hn + s Z 158 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION (a) (b) Image (c) Image Windowed region Extrapolated region Image Constant Value Extrapolated region Figure 5.13: Three ways of coping with edge eﬀects in FFT expansion of an image. (a) windowing the image, shaded area aﬀected; (b) extending the image; and (c) extending the image to twice its size in both dimensions. In all three cases the resulting image (larger in (b) and (c)) is extended by replication by the FFT. (a) (b) (c) (d) Figure 5.14: An example of linearisation. (a) shows the original samples and a linear trend through them; (b) shows the same samples with the linear trend removed; (c) is the FFT expanded version of (b); (d) is (c) with the linear trend added back in. 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR 159 Analysing this method shows that it removes the edge ripples from the resulting image. The transform of the original image, f , is F : Fk = N −1 fn e−i2πkn/N n=0 This can be expanded by the FFT method to give J, the Fourier transform of the expanded image, j: ZN −1 jn e−i2πkn/ZN Jk = n=0 In general j will contain edge ripples. On the other hand, the transform of the linearised image, g, is G: Gk = N −1 gn e−i2πkn/N n=0 = N −1 fn e−i2πkn/N − s n=0 N −1 n e−i2πkn/N n=0 = Fk − sEk This can be expanded to give H: Hk = Jk − sE k Inverse transforming H gives the image, h, which then has the linear trend added back in to give the ﬁnal image, i: n in = jn − se n + s Z The diﬀerence between the latter two terms is an edge ripple which cancels out the edge ripple in j. Extending to two dimensions It is a simple matter to linearise a two dimensional image so that either the horizontal or the vertical edges match. This is achieved by linearising each column or row individually. It is possible to linearise every row and then linearise every column in the row-linearised image (this can be shown to leave every row and every column linearised). However, after FFT enlargement the image cannot be de-linearised because new rows and columns have been generated, between the existing rows and columns, for which no linearisation information exists. It is impossible to linearise the image as a two dimensional whole so, if we wish to apply the linearisation process to a two-dimensional image we must think of some other way to perform the operation. To do this we alter the two-dimensional FFT expansion. A two dimensional FFT is performed by applying the FFT to every row in the image and then to every column in the row-transformed image. The FFT expansion expands the resulting frequency domain image and then two-dimensionally inverse transforms it. As noted above, this process is not amenable to linearisation. However, if the process is altered so that the rows are expanded ﬁrst and the columns of the row-expanded image are expanded as a separate step, then linearisation is possible. The standard method and this new two-stage method are illustrated in ﬁgures 5.15 and 5.16. It is not immediately obvious that both methods should produce the same result (without linearisation). Experiments on several images have shown that they are equivalent (to within the numerical accuracy of the computer on which the operations were performed). 160 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION I T’form T’form Expand Rows Cols. Rows & Cols. I’’ Inv. T’form Inv. T’form Rows Cols. Figure 5.15: The standard way of performing two-dimensional FFT expansion of an image. This method is not amenable to linearisation. I T’form Expand Rows Rows Inv. T’form Rows I’ T’form Cols. Expand Cols. I’’ Inv. T’form Cols. Figure 5.16: The two-stage method of performing two-dimensional FFT expansion of an image. This method is amenable to linearisation of each of the two stages. 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR (a) (b) 161 (c) Figure 5.17: Two ways of reﬂecting an image to produce an image of twice the size in each dimension. (a) the original image; (b) reﬂected with the original in a corner; (c) reﬂected with the original in the centre. Applying the linearisation to each of the two stages suppresses the edge ripples which would otherwise result. Obviously linearisation is applied to the rows in the row-expansion stage and to the columns in the column expansion stage2 . (4) Using reﬂection in place of replication In chapter 4 we saw that reﬂection is a better extension method than replication. Reﬂection ensures that the edges match. Reﬂection can be implemented in the FFT expansion stage by using an image twice the size in both dimensions and ﬁlling the extra three quarters with reﬂected versions of the image as illustrated in ﬁgure 5.17. Once this image is enlarged, using FFT expansion, only the appropriate quarter of the expanded image need be used in the ensuing Catmull-Rom reconstruction. This method quadruples the computation required to perform the FFT expansion. Fortunately there is a way to compute the expansion in the same time as the basic FFT method; that is to use the discrete cosine transform (DCT), in place of the DFT. Discrete cosine transform expansion The discrete cosine transform of a sequence fn , n ∈ {0, 1, . . . , N − 1}, is [Ahmed, Natarajan and Rao, 1974]: F0 Fk = √ N −1 2 fn N n=0 = N −1 2 (2n + 1)kπ , k ∈ {1, 2, . . . , N − 1} fn cos N n=0 2N The special case for F0 can be combined into the general form as illustrated in the two-dimensional DCT [Chen and Pratt, 1984]: Fk,l = M −1 N −1 (2n + 1)lπ 4C(k)C(l) (2m + 1)kπ cos fm,n cos MN 2M 2N m=0 n=0 2 Linearisation is a good word to play that game where you try to get as many words as possible from a single long word. For example: line, arise, ear, rise, is, at, on, in, ate, note, not, near, rest, least, sat, sate, nation, sir, sire, sit, sits, tin, tins, linear, salt, real, role, sole, aerial, nil, sale, rail, tail, tale, notion, elation, eat, eats, nit, trial, etc. [Gibbs, 1992] 162 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION (a) N (b) 2N Figure 5.18: The implicit periodicity of (a) the discrete Fourier transform, and (b) the discrete cosine transform (after Rabbani and Jones [1991], ﬁgure 10.4). C(j) k ∈ {0, 1, . . . , M − 1} l ∈ {0, 1, . . . , N − 1} √1 , j = 0 2 = 1, otherwise (5.3) The DCT expresses the image as a sum of cosinusoidal terms [Chen and Pratt, 1984]: f (m, n) = M −1 N −1 C(k)C(l)Fk,l cos k=0 l=0 (2n + 1)lπ (2m + 1)kπ cos 2M 2N Compare this with the DFT decomposition into a sum of complex exponential terms (equation 5.2). The two transforms have diﬀerent implicit periodicities (ﬁgure 5.18). As we have seen the DFT assumes a replicated periodic image whilst the DCT assumes a reﬂected periodic image [Rabbani and Jones, 1991, pp.108-111]. Expansion of the DCT can be performed in much the same way as for the DFT. But there turns out to be a slight catch to DCT expansion, which we now explain. Working in one dimension for clarity, given an image fn , n ∈ {0, 1, . . . , N − 1}, the DCT is deﬁned as: N −1 2C(k) (2n + 1)kπ , k ∈ {0, 1, . . . , N − 1} (5.4) Fk = fn cos M N n=0 2N where C(k) is deﬁned in equation 5.3. This can be expanded by a factor Z to Gk : Fk , k < N , k ∈ {0, 1, . . . , ZN − 1} Gk = 0, otherwise Note that, unlike the DFT expansion, there is no special (Nyquist) frequency component, nor are there any negative frequencies. G can now be inverse transformed to give an expanded image, g: gn = ZN −1 k=0 C(k) Gk cos (2n + 1)kπ , n ∈ {0, 1, . . . , ZN − 1} 2ZN (5.5) 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR 163 Here lies the catch: with the DFT every Z th member of the expanded image would be in the original image, with Z − 1 new image samples between every adjacent pair. That is: gZ n = fn . With the DCT, as it stands, this is not the case and a small correction must be made to equation 5.5 to make it the case. To explain what happens we see that: N −1 fn = C(k) Fk cos k=0 (2n + 1)kπ , n ∈ {0, 1, . . . , N − 1} 2N (5.6) this is simply the inverse of equation 5.4. Equation 5.5 can be rewritten as: gn = N −1 C(k) Fk cos k=0 (2n + 1)kπ , n ∈ {0, 1, . . . , ZN − 1} 2ZN The diﬀerences between these two equations are the factor of there are Z times as many gn as there are fn . We require: 1 Z (5.7) in the cosine’s argument, and that gZ n = fn , n ∈ {0, 1, . . . , N − 1} We can rewrite equation 5.7 for gZ n as: gZ n = N −1 C(k) Fk cos k=0 (2n + Z1 )kπ , n ∈ {0, 1, . . . , N − 1} 2N (5.8) The only diﬀerence between equation 5.6 and equation 5.8 is that the (2n + 1) in equation 5.6 is replaced by (2n + Z1 ) in equation 5.8. The net result is that there is a linear phase shift in the gn with respect to the fn . This can be simply corrected by either: (a) altering equation 5.5 to: gn = ZN −1 C(k) Gk cos k=0 (2n + Z)kπ 2ZN or: (b) correcting for the linear phase shift in the subsequent small kernel interpolation. That is: instead of sample gn being at location x = x0 + n ∆x Z it is shifted slightly to be at: x = x0 + n − 1 2 − 1 2Z !! ∆x Z Thus the DCT can be successfully used as an expansion method which does not produce edge ripple eﬀects. Expansion using other transforms The DFT and DCT are only two of many types of image transform. We have seen that both of these can be used for image expansion. In general, any transform can be used for image expansion provided that N of the basis functions of the expanded version are identical to the N basis functions of the smaller version, except for a scale factor and possibly a linear phase shift, n = sn + c. The DFT and DCT are examples of such transforms. For the DFT the basis functions are: {e−i2πkn/N , k ∈ {− N2 , − N2 + 1, . . . , 0, 1, . . . , N2 − 1}, n ∈ {0, 1, 2, . . . , N − 1}} (5.9) Expanding this by a factor Z we ﬁnd the basis functions for k ∈ {− N2 , − N2 + 1, . . . , 0, 1, . . . , N2 − 1} are: (5.10) {e−i2πkn /ZN } 164 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Setting n = Zn gives N functions from equation 5.10 which are equal to the N in equation 5.9. Thus it is possible to expand an image using the DFT. Likewise for the DCT the basis functions are: (2n + 1)kπ 1 , k ∈ {1, 2, . . . , N − 1}, n ∈ {0, 1, 2, . . . , N − 1}} { √ , cos 2N 2 (5.11) Expanding by a factor of Z we ﬁnd that the lowest N basis function of the new sequence are: 1 (2n + 1)kπ { √ , cos , k ∈ {1, 2, . . . , N − 1}} 2ZN 2 (5.12) 1 Setting n = Zn + ( 12 − 2Z ) makes N functions in equation 5.12 equal to those in equation 5.11 and so the DCT can also be used for image expansion. Such an expansion is carried out by taking the coeﬃcients of the basis functions in the original sequence and allocating them to the equivalent basis function in the expanded sequence. All other coeﬃcients in the expanded sequence are set to zero. The DFT is a bit of a special case as the Nyquist component in the original sequence has two matching components in the expanded sequence with half its coeﬃcient being given to each. To illustrate this principle for other transforms take two square wave transforms: the WalshHadamard transform and the Haar transform, both considered for N, Z ∈ {2n , n ∈ N }. The basis function for both transforms are square-like waves. The ﬁrst sixteen Walsh-Hadamard basis function are shown in ﬁgure 5.19(a) and the ﬁrst sixteen Haar basis function are shown in ﬁgure 5.19(b). Enlarging either by a factor of Z leaves the ﬁrst N basis functions scaled but otherwise unchanged and thus either can be used for image expansion. This can be implemented so that there is no phase shift or so that there is the same phase shift as in the DCT. For both transforms, this latter case produces identical results to nearest-neighbour expansion of the image, which would be considerably quicker to implement. The former produces a shifted version of this nearest-neighbour expansion. Therefore, while theoretically possible, neither of these transforms provide a particularly eﬃcient way of achieving this type of expansion. 5.4.2 Speed and storage requirements These image expansion methods are extremely time and memory intensive. Transform expansion for an N × N image is O(N 2 log N ), thus increasing rapidly as image size increases. An N × N image expanded by a factor Z requires Z 2 N 2 complex numbers to be stored for a straight forward FFT expansion, a quarter of this for an optimised FFT expansion, and Z 2 N 2 real numbers for the other transforms discussed in the previous section. For example, a DCT expansion of a 256 × 256 image by a factor of four would typically require 4MB of storage. Experience on the Computer Laboratory’s workstations has shown that they tend to fail when asked to allocate these huge quantities of memory to an image processing program. On occasion processes have required over 50MB of memory to expand a large image by a factor of eight, and such quantities seem well beyond the capability of the machines. Even where large quantities of memory are available most of it will tend to be in virtual memory, incurring an extra time cost as data is paged in and out of random access memory. In order to speed up the algorithm and reduce the need for large quantities of quickly accessible storage we can tile the image (ﬁgure 5.20). Tiling is used in image coding to reduce the time and memory cost of the coding algorithm and because image tend to be locally similar and globally dissimilar. The JPEG image compression method, for example, splits the image into 8 × 8 pixel non-overlapping tiles [Rabbani and Jones, 1991, p.114]. In image extension we will use relatively large tiles (32 × 32 pixels is a good size), and they will overlap. Processing is done on each tile separately and so, if they did not overlap, the edges of the tiles would be obvious. Overlapping the tiles places each tile in its local context. That is: a tile is expanded but only the central portion is used, as illustrated in ﬁgure 5.21. 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR (a) Walsh-Hadamard (b) 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 165 Haar Figure 5.19: (a) the ﬁrst sixteen Walsh-Hadamard basis functions; (b) the ﬁrst sixteen Haar basis functions (after Hall [1979, pp.138–145]), these remain the same, no matter how many more coeﬃcients there are. Figure 5.20: An image (shaded) divided into tiles. 166 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Discrete Fourier Transform Add zeros to expand image Inverse Transform Place central portion in expanded image Figure 5.21: Using a tiled DFT expansion. Each tile is individually expanded, within its local context, and the appropriate expanded portion put into the expanded image. 5.4. PROBLEMS WITH THE FFT RECONSTRUCTOR 167 Error Intensity 200 150 100 50 20 15 10 5 32 0 30 2 28 Central Portion Size 4 26 24 8 22 6 Window Size 10 Figure 5.22: The values in table 5.1. Error intensity is the height of the largest ripple in intensity values, ranging from 0 to 255. This is graphed on a logarithmic scale. Only values for which Central Portion Size + Window Size ≤ 32 were evaluated. As with the untitled version, if we use a DFT expansion, we are faced with edge eﬀects. The DFT of a tile implicitly assumes that the inﬁnite image is that tile replicated to inﬁnity. The central quarter of the expanded tile will probably be unaﬀected by these edge eﬀects but transforming an image twice the size of the useful portion in both dimensions is quite an overhead. It would be better if the useful portion was a larger proportion of the whole tile. Here the edge eﬀects solutions come into play. We could apply a DCT, or linearisation and a DFT to the tile, both of which would markedly increase the size of the usable inner portion of the tile. Further, windowing is a viable proposition here because the windowed pixels will be in the unused outer portion of the tile. An experiment has been performed to ascertain the best size of window and central portion given a tile size of 32 × 32 pixels. A completely black image was enlarged with windowing designed to fade the image oﬀ to white. Thus any artifacts in the expanded image will be extremely obvious as non-black pixels. This experiment gives a feel for the maximum errors likely due to windowing. In practice windowing should be done to a mid-grey, halving these errors; but many people prefer to window oﬀ to an extreme, usually black, and so it is necessary to record these maximum errors. The results of this experiment are tabulated in table 5.1 and graphed in ﬁgure 5.22. The window size refers to the number of pixels in a row (or column) that are windowed. Half this number are windowed at each end of a row or column. Any combination which has an intensity error less than three produces no visible artifacts. If windowing is done to a mid-grey than any combination with an intensity error under six produces no visible artifacts. Therefor any combination of an n × n central portion in a 32 × 32 tile is good provided n ≤ 26 and the window width is greater than 2 and less than 32 − n. Besides requiring no visible artifacts we also require the largest possible central portion relative to the tile size. Thus 168 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Window Size 2 4 6 8 10 30 200 ± 0.5 × × × × 28 57.8 ± 0.2 62.8 ± 0.2 × × × Central Portion Size 26 24 57.0 ± 0.2 30.8 ± 0.2 4.24 ± 0.01 3.70 ± 0.01 29.2 ± 0.1 1.68 ± 0.005 × 16.7 ± 0.05 × × 22 31.6 ± 0.1 0.98 ± 0.005 1.46 ± 0.005 0.91 ± 0.005 10.75 ± 0.03 Table 5.1: The maximum height of the ripples caused by DFT reconstruction. These heights are measures in intensity levels on a scale from 0 (no ripples) to 255. Obviously there was little point in testing cases where the windowing intruded into the central portion, so these entries are marked with a cross. The values in this table are graphed in ﬁgure 5.22. (a) (b) (c) Figure 5.23: (a) a 26 × 26 pixel tile within a 32 × 32 pixel context, the outer two pixel wide border is windowed using the window function shown in (b). (c) shows the actual eﬀect of this window which fades the function oﬀ to a mid-grey value. we chose a 26 × 26 central portion in a 32 × 32 tile with a four pixel window (that is a two pixel wide cosine half bell windowed border around the tile) and fading to a mid-grey. This is illustrated in ﬁgure 5.23. Speed and storage savings If the tiles are of size L × L and the image to be expanded is of size N × N then this tiling method reduces the time required from O(N 2 log N ) to O(N 2 log L). The memory required to perform the transforms is also reduced, from O(N 2 ) to O(L2 ). The memory required to store the expanded image remains at O(N 2 ). However, each pixel in the expanded image store only need be accessed 5.5. SUMMARY 169 once in the expansion process, whilst each element in the transform memory must be accessed. 2 + 2 log2 L times (for an untiled transform L = N ). There is also the possibility that the expanded image could be generated a few tiles at a time, as required, rather than as a whole, which is what the untiled version must do. The tiled version of the algorithm thus makes a big diﬀerence to the amount of transform memory required for the transform. With a 256 × 256 pixel image split in to 32 × 32 tiles, a sixty-fourth of the memory is required, reducing the required 4MB of random access memory quoted on page 164 to 64kB. If the whole operation can be carried out in random access memory then little diﬀerence is made to the time taken to expand the image. In the example in the previous paragraph the time is reduced by thirty percent if 32 × 32 non-overlapping tiles are used and increased by nine percent if a 26 × 26 pixel central portion is used in a 32 × 32 tile. However, if the untiled version is swapping in and out of virtual memory then the tiled version can run considerably faster. Implementation in hardware Fraser [1989a] notes that the FFT is a time-intensive process, which perhaps explains why this method of reconstruction has been neglected. However the advent of very fast FFT hardware means that FFT interpolation is feasible. As an example, dataβ produce a dedicated FFT processor which can perform a 1024 point FFT every 700µs [Data Beta, 1992]. To enlarge a 256 × 256 image to 1024 × 1024 using this processor would take 753ms. Using 32 × 32 non-overlapping tiles would take 525ms, and using overlapping 32 × 32 tiles with a 26 × 26 pixel central portion would take 819ms. Theoretical ramiﬁcations of tiling By tiling the image we destroy the theoretical basis behind applying the DFT or DCT to the whole image as a way of performing sinc reconstruction. Applying the DFT or DCT to a tile gives an approximation to this reconstruction. The untiled version gives an approximate sinc reconstruction based on all of the information in the image, the tiled version gives an approximate sinc reconstruction based on local information only. We can consider the entire image to be a large tile taken from an inﬁnite, non-repeating picture. In this scenario the tiled version of the process is a scaled-down version of the untiled version with the advantage that most of the tiles can be placed in some sort of context. In spite of this it is surprising that the tiled and untiled algorithms give virtually identical results. This observation lends weight to the hypothesis that it is local information that is important in determining the intensity surface’s values, rather than any globally encoded intensity information. 5.5 Summary We have seen how DFT expansion can be used, with a local piecewise polynomial reconstructor, to generate a near perfect sinc reconstructed intensity surface in O(N 2 log N ) time. We have examined two areas where improvements can be made to the existing method. The ﬁrst, the edge eﬀect problem, has several solutions, one of which led to the use of the DCT in place of the DFT, with better intensity surface quality near edges. This also laid the foundation for the use of any transform in performing the necessary expansion. The second, the need for time and space savings, led to a tiling solution, which dramatically reduces the amount of memory required to perform the transforms and, where virtual memory must be used for a non-tiled version, can lead to speed improvements also. 170 CHAPTER 5. FAST FOURIER TRANSFORM RECONSTRUCTION Chapter 6 A priori knowledge reconstruction If we have some a priori knowledge about the image we are resampling then we may be able to reconstruct a better intensity surface than we could without this knowledge. A priori knowledge comes in two forms: 1. Knowledge about how the image was sampled; 2. Knowledge about the image’s content. The majority of this chapter is concerned with ﬁnding an exact reconstruction of the image in one particular case where both these are known. Before progressing to this we look at what can be achieved when only the ﬁrst is known. 6.1 Knowledge about the sampler Most images are captured by some physical device. Knowledge of that device’s sampling process gives the potential to invert the process and restore what would have been produced by single point sampling the original intensity surface, or at least a closer approximation to it than that produced by the device. The closer an image is to the single point sampled image, the better many of the NAPK reconstructors discussed in the previous three chapters will work. Much eﬀort has been expended over the years on image restoration, the attempt to produce a better image from a less good one. Such a process could be applied to an image which could then be reconstructed using a NAPK method. Referring back to ﬁgure 2.3 we can add this restoration process to the ﬁgure, and add also the theoretically equivalent path of single point sampling, as shown in ﬁgure 6.1. This inversion of the capture process could be built into the reconstructor, giving an APK reconstructor by combining a restorer and a NAPK reconstructor. 6.1.1 Inverting the sampling ﬁlter All capture devices implement some form of area sampling. This sampling can be thought of as convolving the intensity surface with a sampling ﬁlter, followed by single point sampling the resulting intensity surface (section 2.5.1). If this sampling ﬁlter can be inverted, and the inversion implemented, then it can be used for image restoration and hence APK reconstruction. This process is illustrated in ﬁgure 6.2, in the frequency domain. 171 172 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Real World Image - Intensity Surface Project Single Point Sample ? Capture Image Invert Capture Process - Image Figure 6.1: The image capture process, and how inverting the capture produces a better image. The ﬁrst two stages are taken directly from ﬁgure 2.3. A real world image is projected to an intensity surface which is captured by some area sampling process. Inverting this sampling produces an image which is as if the intensity surface had been single point sampled. (a) (b) (c) (d) (e) (f) (g) Figure 6.2: Frequency domain illustration of area sampling and APK reconstruction. The Fourier transform of the continuous intensity surface, (a), is multiplied by the area sampler’s Fourier transform, (b), giving a ﬁltered intensity surface, (c). This is sampled by convolving with a comb, (d), to give the sampled, ﬁltered image, (e). Inverting (b) gives (f). If (e) is multiplied by (f) then we get (g), an exact reconstruction of (a). 6.1. KNOWLEDGE ABOUT THE SAMPLER 173 If the sampling ﬁlter has frequency spectrum S(ν) then the inverted ﬁlter could be given the 1 . If S(ν) extends beyond the folding frequency then this will have the spectrum I1 (ν) = S(ν) undesirable eﬀect of amplifying copies of the central component of the image’s spectrum. So, the desired form of this inverted ﬁlter is: B1 (ν) I2 (ν) = S(ν) That is: the ideal ﬁlter’s spectrum, B1 (ν), multiplied by the inverted sampling ﬁlter’s spectrum, 1 S(ν) . I2 (ν) can be implemented as convolution in the spatial domain, if the inverse ﬁlter’s kernel can be determined. Or it can be approximated using the DFT, with the inverse ﬁltering being performed in the frequency domain and an appropriate NAPK reconstruction method applied to the resulting image. The DFT reconstruction methods of the previous chapter could be combined with this DFT restoration to produce an improved reconstructed intensity surface. The problem with this method is that all practical sampling ﬁlters fall oﬀ as they move away from zero. Thus the inverse ﬁlter ampliﬁes those frequencies near the folding frequency. It is these frequencies in the image which have the smallest amplitudes and which are most aﬀected by noise and any aliasing which may occur. The frequencies near zero, which are least ampliﬁed by the inverse ﬁlter, are those which are at least aﬀected by these two problems. Therefore inverse ﬁltering ampliﬁes any noise and aliasing artifacts present in an image. It is thus unsuitable for all images except those with very low noise and practically no aliasing. 6.1.2 Modiﬁed inverted sampling ﬁlter Oakley and Cunningham [1990] have considered the problem of ﬁnding the best reconstruction ﬁlter given the sampling function. They develop an error measure which they claim is a good approximation to the visual quality of the reconstructed images [ibid., p.191]. By minimising this measure over all images they produce a formula for the optimal reconstruction ﬁlter. In one dimension it is: K(ν)S ∗ (ν) (6.1) I(ν) = ∞ + !+ 1 +S ν + n +2 ∆x ∆x n=−∞ where S ∗ (ν) is the complex conjugate of S(ν); ∆x is the sample spacing in the spatial domain; and K(ν) is the Fourier transform of the ideal reconstruction ﬁlter. Here ‘ideal reconstruction ﬁlter’ means the ﬁlter you would like to convolve the original intensity surface with to produce the reconstructed intensity surface. Several such functions are of interest: 1. K(ν) = 1; that is: the reconstructed intensity surface should be as close as possible to the original intensity surface. Oakley and Cunningham [1990] call this a restoration case. This reconstruction ﬁlter is highly sensitive to noise [ibid., p.180]. Note that, if the sampling ﬁlter is bandlimited to below the folding frequency then the reconstruction ﬁlter generated by equation 6.1 is equal to both I1 (ν) and I2 (ν), except for a multiplicative constant. A second restoration ﬁlter is where K(ν) is the statistically optimal Wiener ﬁlter for the continuous image formation case. This produces an implementation of the Wiener ﬁlter which is corrected for the eﬀects of sampling. 2. K(ν) is a low-pass ﬁlter. Here the reconstructed intensity surface approximates as closely as possible the original intensity surface convolved with the low pass ﬁlter. Of special interest is the case where K(ν) = B1 (ν). The reconstructed intensity surface will approximate a perfectly low-pass ﬁltered version of the original intensity surface. It is instructive to note that the sum of the squares in the denominator of equation 6.1 is the sampling ﬁlter squared in the frequency domain, then sampled in the spatial domain and transformed into the 174 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION frequency domain. Thus the function: S ∗ (ν) ∞ + +S ν + n ∆x !+2 + n=−∞ 1 is closely related to S(ν) with some modiﬁcation to take into account the fact that the image is sampled, not just an intensity surface ﬁltered by S(ν). If S(ν) is bandlimited to below the folding frequency then S ∗ (ν) 1 = ∞ + S(ν) !+ +S ν + n +2 ∆x n=−∞ 3. K(ν) = S(ν). The reconstructed intensity surface here is the original surface convolved by the sampling ﬁlter — thus producing the theoretical surface generated just before single point sampling occurred. Oakley and Cunningham [1990, p.181] call this “matched reconstruction” or the “interpolating case”. If S(ν) is bandlimited to below the folding frequency than the resampling ﬁlter is a box function: I(ν) = ∆xB1 (ν). This means that the reconstructed intensity surface is the same as that produced by convolving the image with the sinc function. Both Oakley and Cunningham [1990] and Boult and Wolberg [1992] have generated implementations of matched reconstructions ﬁlters. These ﬁlters have been based on rectangular and hexagonally sampled Gaussian ﬁlters [Oakley and Cunningham, 1990] and on box and cubic B-spline ﬁlters [Boult and Wolberg, 1992]. Both claim that their derived reconstruction ﬁlters perform better than the best NAPK reconstructors, notably better than the Catmull-Rom cubic interpolant. 6.2 Knowledge about the image content It is possible that a better reconstruction can be achieved if we know something about the image’s content. This is true for image restoration; Oakley and Cunningham [1990, p.180] state that: “In general, successful image restoration relies not only on correct linear processing, but on the the exploitation of prior knowledge about the image source.” In the image resampling ﬁeld itself, Ward and Cok [1989, p.264] hypothesise: “If there is suﬃcient a priori knowledge of the content of an image, the visual quality can be improved by using interpolation functions that use this a priori information. For example, if the original image is known to consist of uniform areas and edges, interpolated pixels can be constrained by some rule to be close in value to some of the neighbouring pixels.” This is an interesting idea but Ward and Cok do not take it any farther. In the ﬁelds of image understanding and computer vision much work has, and is, been done on trying to extract features from a digital image. If an accurate list of features could be extracted then it could be used to drive a renderer which would generate a transformed image. This process is illustrated in ﬁgure 6.3. This is analagous to the reconstruct–transform–sample decomposition in ﬁgure 1.14, the diﬀerence being that the intensity surfaces are replaced by scene descriptions. Unfortunately, the generation of a detailed enough scene description is extremely diﬃcult for even the simplest scenes and so this process would appear to be impractical for image resampling, except in the case of very limited sets of images. 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION Image Extract Features- 175 TransformScene Scene Description Description Render - Image Figure 6.3: A three stage decomposition for a resampler which analyses an image into a scene description, transforms this and then renders it. Compare ﬁgure 1.14. g( x) g g g a g a-1 a+3 a+1 g g a+4 a+2 x x a x a +1 x a +2 x a +3 x a +4 Figure 6.4: The general intensity curve input to one-dimensional sharp edge reconstruction. It consists of areas of constant intensity separated by sharp discontinuities. To illustrate the type of algorithm that could be developed we will consider a simple case. In this case we limit the class of images to those which consist of areas of constant intensity bounded by inﬁnitely sharp edges. These images will be sampled using an exact-area sampler on a square pixel grid. We begin by considering the one-dimensional case before progressing to the two-dimensional case. 6.3 One-dimensional sharp edge reconstruction The general intensity curve in the one-dimensional case consists of a series of plateaux of constant intensity separated by sharp discontinuities. Such an image is shown in ﬁgure 6.4. It can be described mathematically as: (6.2) g(x) = gc , xc ≤ x < xc+1 where (. . . , (x−1 , g−1 ), (x0 , g0 ), (x1 , g1 ), . . .) is an ordered set of ordered pairs of position and intensity information. The pairs are ordered on position values: xc < xc+1 , and the intensity values fall in the range 0 ≤ gc ≤ 1. 6.3.1 The black and white case To simplify matters we begin by considering only the black and white case where gc = 0 or gc = 1. Zero is taken to be black, and one, white. We start by looking at the simplest case: a single edge between a black area and a white area (see ﬁgure 6.5). This can be represented as the set ((−∞, 0), (x1 , 1), (∞, φ)) where φ is an arbitrary value which is never used. Without loss of generality we can consider sampling with unit spacing, and with position x1 between zero and one (0 ≤ x1 < 1). If the samples are represented as {ĝn , n ∈ Z} then sample ĝn is taken at point x = n + 12 [Heckbert, 1990a]. This disparity between continuous and discrete coordinates, illustrated in ﬁgure 6.6, greatly simpliﬁes many of the calculations on them. 176 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION g( x) 1 0 x x 1 Figure 6.5: The black and white case with a single edge. The edge is at x1 . To the left the intensity curve is blank (0), and to the right, (1). sample -2 sample -1 sample 0 sample 1 x -2 -1 0 1 2 Figure 6.6: The positions of the discrete samples relative to the continuous coordinate system [after Heckbert, 1990a, ﬁg 1] If the function shown in ﬁgure 6.5 is point sampled then the sample values will be: ĝn ĝ0 ĝn = 0, n < 0 0, x1 < = 1, x1 ≥ = 1, n > 0 1 2 1 2 The information about the discontinuity’s position is lost here. It is only possible to say where the edge occurred to within ± 12 . However, we want to use exact-area sampling which is equivalent to convolving with a box ﬁlter followed by point sampling (section 2.5.1). Convolving the intensity curve with this box ﬁlter produces the function in ﬁgure 6.7. Point sampling this function produces the sample values: ĝn ĝ0 = = 0, n < 0 1 − x1 ĝn = 1, n > 0 The information about the position of the discontinuity is now encoded in ĝ0 . It can be exactly reconstructed from that value: (6.3) x1 = 1 − ĝ0 A similar situation holds in the case where g(x) = 1, x < x1 and g(x) = 0, x ≥ x1 , shown in ﬁgure 6.8. Here the sample values will be: ĝn = 0, n < 0 ĝ0 ĝn = x1 = 1, n > 0 And the discontinuity can be reconstructed exactly as being at: x1 = ĝ0 Thus, in either case, we can exactly reconstruct the curve from the image. (6.4) 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION 177 g* b 1 1 0.5 0 -1 x x 0 1 1 2 Figure 6.7: The intensity curve of ﬁgure 6.5 convolved with a box ﬁlter of width one. g( x) 1 0 x x 0 1 Figure 6.8: The other single edge black and white case, white to the left of the discontinuity and black to the right. Black and white case — many edges If we extend this to a situation where there are many edges alternating the intensity between black and white then we can apply the above result in the following way. The original function, g(x), is represented by its exact-area samples {. . . , ĝ−1 , ĝ0 , ĝ1 , . . .}. The function that is reconstructed from these samples is, gr (x). gr (x) can be generated in a piecewise manner as follows: Algorithm 1 For every pixel, a: If ĝa = 0 then gr (x) = 0, a ≤ x < a + 1 If ĝa = 1 then gr (x) = 1, a ≤ x < a + 1 If 0 < ĝa < 1 then if lim gr (x) = 0 or x→a− lim x→(a+1)+ gr (x) = 1 then gr (x) = if lim− gr (x) = 1 or x→a lim x→(a+1)+ a ≤ x < a + 1 − ĝa a + 1 − ĝa ≤ x < a + 1 0, 1, gr (x) = 0 then gr (x) = 1, 0, a ≤ x < a + ĝa a + ĝa ≤ x < a + 1 178 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION (a) (b) Figure 6.9: Two ways of interpreting the sample sequence 0.5, 1.0, 0.7, if each pixel is allowed to contain one edge. Only (b) is possible if edges are constrained to be at least one sample space apart. This algorithm will not give an exact reconstruction for all possible g(x). In theorem 3 we see what restrictions must be put on g(x) for gr (x) to be an exact reconstruction, that is: to ensure that gr (x) = g(x). The last part of this algorithm shows that, in addition to the intensity level of the edge pixel, one extra bit of information is required to uniquely determine the location of the edge. This bit is the orientation of the edge: whether it is rising from zero to one, or falling from one to zero. The orientation can be ascertained from the orientation of the reconstructed function in either of the adjacent pixels. A single zero or one pixel is required in the entire inﬁnite sequence to ﬁx the orientation for all edges. This information can then be propagated through the sequence to produce gr (x). An alternative representation of gr (x) would be to ﬁnd the values (. . . , (x−1 , g−1 ), (x0 , g0 ), (x1 , g1 ), . . .) which deﬁne g(x) (equation 6.2). This can be done from gr (x), as produced by algorithm 1. An algorithm to directly generate the (xc , gc ) pairs must be constructed so as to take into account the possibility of edges which fall exactly at pixel boundaries. Here the pixels either side of the edge will have sample values zero and one indicating the discontinuity, rather than a single pixel with a value between zero and one indicating it1 . Notice that algorithm 1 implicitly deals with this situation in its ﬁrst two cases. Theorem 3 Algorithm 1 exactly reconstructs gr (x) = g(x) if adjacent edges in g(x) are spaced more than one sample space apart. That is: xc+1 − xc > 1, ∀c. Proof: if xc+1 − xc > 1, ∀c then: 1. No pixel can contain more than one edge, thus ensuring that any grey pixel codes the position of one, and only one, edge. This edge can be reconstructed using orientation information and equation 6.3 or 6.4. 2. Any edge occurring at a pixel boundary generates a zero sample value on one side and a one sample value on the other. This ensures that no dichotomies arise in the reconstruction. For example, if we imposed the lesser restriction that each pixel could only contain a single edge (but that the edges could be any distance apart) then the sample sequence . . . , 0.5, 1, 0.7, . . . could be interpreted in two diﬀerent ways, as can be seen in ﬁgure 6.9, whereas the tighter restriction imposed here permits the unique interpretation in ﬁgure 6.9(b). 3. Orientation information can be generated by any pair of adjacent pixels. If their sample 1 The assumption that the samples on either side will have values zero and one is valid if the conditions in theorem 3 are adhered to. 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION 179 values are ĝc and ĝc+1 then the following table gives the orientation information: ĝc ĝc+1 0 1 − − α α − − 0 1 β >1−α β <1−α lim x→(c+1)− gr (x) lim x→(c+1)+ 0 1 − − 1 0 gr (x) − − 0 1 1 0 where 0 < α, β < 1. We can never have a situation where β = 1 − α because this would mean that two edges were exactly one sample space apart which is not allowed. From this information we can produce an algorithm which generates the exact reconstruction, gr (x) from local information, rather than depending on some global method to propagate the orientation information back and forth along the number line. Such an algorithm is essential for the reconstruction of a ﬁnite length sequence of samples because it is possible to conceive of a situation where no zero or one samples would occur within a ﬁnite length to provide the orientation information for algorithm 1. The improved algorithm, for a sequence of inﬁnite length, is: Algorithm 2 For every pixel, c: If ĝc = 0: gr (x) = 0, c ≤ x < c + 1 If ĝc = 1: gr (x) = 1, c ≤ x < c + 1 If 0 < ĝc < 1: If ĝc−1 < 1 − ĝc : gr (x) = If ĝc−1 > 1 − ĝc : gr (x) = 0, 1, c ≤ x < c + 1 − ĝc c + 1 − ĝc ≤ x < c + 1 1, 0, c ≤ x < c + ĝc c + ĝc ≤ x < c + 1 An algorithm to ﬁnd the ordered pairs that deﬁne g(x) is: Algorithm 3 1. For every pixel, c: If ĝc = 0 and ĝc−1 = 1: output (c, 0) If ĝc = 1 and ĝc−1 = 0: output (c, 1) If 0 < ĝc < 1: If ĝc−1 < 1 − ĝc : output (c + 1 − ĝc , 1) If ĝc−1 > 1 − ĝc : output (c + ĝc , 0) 2. Sort the ordered pairs, (xi , gi ), which have been output, into an ordered set on their x-values. 180 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Finite-length black and white functions A ﬁnite length function can be deﬁned as g(x) = gc , xc ≤ x < xc+1 , c ∈ {0, 1, 2, ..., N } (6.5) where: ((−∞, g0 ), (x1 , g1 ), (x2 , g2 ), . . . , (xN , gN ), (+∞, φ)) is an ordered set of ordered pairs, xc < xc+1 − 1. gN +1 = φ is irrelevant, as its value is never used. g(x) is deﬁned over all real numbers; a ﬁnite length version of g(x) must be deﬁned over at least x ∈ [x1 − 1, xN + 1) giving at least samples ĝx1 −1 , ĝx1 , . . . , ĝxN −1 , ĝxN . The reason for these limits is to ensure that even an edge at x1 = x1 or xN = xN can be detected. The algorithms to reconstruct g(x) from its samples are very similar to algorithm 2 and algorithm 3. The main diﬀerence is that the ﬁrst pixel is a special case. Note that the algorithms given below only work if there are at least two samples in the sample set. Edge eﬀects are avoided by assuming that the intensity value at either end of the sequence continues to inﬁnity. This is explicit in the deﬁnition of g(x) (equation 6.5). Algorithm 4 To reconstruct gr (x) from samples ĝa , ĝa+1 , . . . , ĝb−1 , ĝb : For every value c ∈ {a, a + 1, . . . , b − 1, b}: If ĝc = 0: gr (x) = 0, c ≤ x < c + 1 If ĝc = 1: gr (x) = 1, c ≤ x < c + 1 If 0 < ĝc < 1: If c = a: If ĝc+1 > 1 − ĝc : gr (x) = If ĝc+1 < 1 − ĝc : gr (x) = If c = a: If ĝc−1 < 1 − ĝc : gr (x) = If ĝc−1 > 1 − ĝc : gr (x) = 0, 1, c ≤ x < c + 1 − ĝc c + 1 − ĝc ≤ x < c + 1 1, 0, c ≤ x < c + ĝc c + ĝc ≤ x < c + 1 0, 1, c ≤ x < c + 1 − ĝc c + 1 − ĝc ≤ x < c + 1 1, 0, c ≤ x < c + ĝc c + ĝc ≤ x < c + 1 This algorithm exactly reconstructs the values of g(x) over the range a ≤ x < b, given the following conditions: 1. xc < xc+1 − 1, ∀c ∈ {0, 1, ..., N } (that is: all adjacent edges are greater than one sample space apart); 2. a < b (that is: at least two samples) 3. a ≤ x1 − 1 and b ≥ xN (that is: enough information to adequately reconstruct the edges at both ends). The last contstraint allows us to exactly reconstruct the intensity surface over the entire number line. If we only need the values of g(x) for a ≤ x < b, they can be relaxed to: a ≤ x1 and b ≥ xN − 1 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION (a) 181 (b) Figure 6.10: The two possible interpretations of a grey pixel value for a single pixel. The need for constraint 1 was determined in theorem 3, and the need for constraint 2 is implicit in algorithm 4 requring two adjacent pixels to ascertain the edge position in certain cases. If there is only one pixel then we cannot extract the orientation information and thus there are two possiblities for a grey pixel, as illustrated in ﬁgure 6.10. The ﬁnite length algorithm to ﬁnd the edge positions (that is: the (xc , gc ) pairs) is similar to algorithm 3. To exactly reconstruct g(x) it requires constraints 1, 2, and 3 as given above. Algorithm 5 To reconstruct gr (x) from samples ĝa , ĝa+1 , . . . , ĝb−1 , ĝb : 1. Set n ← 0, x0 ← −∞ 2. If ĝa = 0: g0 ← 0 If ĝa = 1: g0 ← 1 If 0 < ĝa < 1: (a) n ← 1 (b) If ĝa+1 < 1 − ĝa : g0 ← 1, x1 ← a + ĝa , g1 ← 0 If ĝa+1 > 1 − ĝa : g0 ← 0, x1 ← a + 1 − ĝa , g1 ← 1 3. For c = a + 1 to b: If ĝc = 0 and ĝc−1 = 1: (a) n ← n + 1 (b) gn ← 0, xn ← c If ĝc = 1 and ĝc−1 = 0: (a) n ← n + 1 (b) gn ← 1, xn ← c If 0 < ĝc < 1: (a) n ← n + 1 (b) If gn−1 = 0: gn ← 1, xn ← c + 1 − ĝc If gn−1 = 1: gn ← 0, xn ← c + ĝc 4. xn+1 ← ∞, gn+1 ← φ 5. gr (x) = gz , xz ≤ x < xz+1 , z ∈ {0, 1, 2, . . . , n} We now present an example of how algorithm 5 works: Let ((x0 , g0 ), (x1 , g1 ), (x2 , g2 ), (x3 , g3 ), (x4 , g4 ), (x5 , g5 )) = ((−∞, 0), (0.4, 1), (2.2, 0), (3.4, 1), (5, 0), (∞, φ)) This deﬁnes g(x) as illustrated in ﬁgure 6.11 (6.6) 182 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION g( x) (a) 1 0 -1 x 0 1 2 3 4 5 6 (b) 0.6 1 0.2 0.6 1 0 0 1 2 3 4 5 Figure 6.11: The intensity curve used in the example on page 181 is shown in (a). It is deﬁned by equation 6.6 and equation 6.5. The ﬁnite image sampled from this intensity curve is shown in (b). Taking ĝn , n ∈ {0, 1, 2, 3, 4, 5} we get: ĝ0 = 0.6, ĝ1 = 1, ĝ2 = 0.2, ĝ3 = 0.6, ĝ4 = 1, ĝ5 = 0 Stepping through algorithm 5: Step 1. 2. 3.(c = 1) 3.(c = 2) 3.(c = 3) 3.(c = 4) 3.(c = 5) 4. (a) (b) Actions n ← 0, x0 ← −∞ n←1 g0 ← 0, x1 ← 0.4, g1 ← 1 (a) (b) (a) (b) n←2 g2 ← 0, x2 ← 2.2 n←3 g3 ← 1, x3 ← 3.4 (a) (b) n←4 g4 ← 0, x4 ← 5 x5 ← ∞, g5 ← φ In step 5, gr (x) is speciﬁed by the ordered set of ordered pairs: ((−∞, 0), (0.4, 1), (2.2, 0), (3.4, 1), (5, 0), (∞, φ)) which is identical to the speciﬁcation of the original intensity curve, g(x) (equation 6.6). 6.3.2 The grey level case The extension to grey levels immediately increases the amount of information required to locate an edge’s position. In the black and white case we require the intensity value of the pixel in which the edge falls (hereafter called the ‘edge pixel’) plus one bit of orientation information to ascertain whether the edge is a rise from zero to one or a fall from one to zero. In the grey level case, where the original function is deﬁned in equation 6.2, we require three pieces of information to exactly locate an edge. These three are the intensity of the edge pixel and the intensities of the grey level plateaux on either side of the edge. In the absence of any other information we assume that these latter two values are provided by the two pixels immediately adjacent to the edge pixel. This situation is illustrated in ﬁgure 6.12. 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION 183 g l g r a-1 Sample values: a x c a+1 ^g ^g a-1 a+2 ^g a+1 a Figure 6.12: An edge in the grey level case. The intensity to the left of the edge is gl , to the right, gr . The three sample values are: ĝa−1 = gl , ĝa = gr + (xc − a)(gl − gr ), and ĝa+1 = gr . (a) 0.1 0.2 (b) 0.1 (c) 0.3 insufficient information 0.3 0.4 0.5 0.2 0.5 0.6 0.7 0.4 0.7 0.8 insufficient information 0.6 0.8 Figure 6.13: Two possible ways of reconstructing the same ﬁnite-length sequence of pixel values. (a) shows a sequence of pixel values. (b) and (c) show the two possible ways of reconstructing these values. Note that each case has one pixel in which there is insuﬃcient information to decide what happens within that pixel. Also, it is every second pixel that contains an edge. We can ﬁnd the position of the edge from the three sample values as: xc = a + ĝa+1 − ĝa ĝa+1 − ĝa−1 (6.7) and the value of the plateau in the intensity curve immediately to the right of this edge as gc = ĝa+1 . Therefore, to exactly reconstruct an edge, there must be a single edge within the edge pixel and no edges within the two adjacent pixels. We will assume that this is the case for all edges in the intensity curve, thus no edge pixel will be adjacent to any other edge pixel. How then do we ascertain which are edge pixels and which are not? For simplicity consider only the cases where no edges occur at pixel boundaries (we will shortly examine the special case where they do occur here). The problem is that we cannot determine from an individual sample’s value whether it is an edge pixel or not. In the black and white case it is simple: an edge pixel has a value between zero and one, whilst a non-edge pixel has a value of either zero or one (remember, we are not considering edges that occur at pixel boundaries). In this grey-scale case it is simple to invent a ﬁnite-length sequence which is ambiguous. For example the set of samples 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 can be reconstructed in two possible ways, shown in ﬁgure 6.13. It proves to be easiest to identify pixels which do not contain edges and to ascertain from these which pixels do contain edges. There are two cases where a pixel can be deﬁnitely said not to contain an edge: 1. if an adjacent pixel has the same value as this pixel then neither can contain an edge; 2. if both adjacent pixels have diﬀerent values to this pixel and they are either both greater or both lesser values than this pixel’s value, then this pixel does not contain an edge. If a pixel’s value is zero or one, then one of these two cases will always be true. 184 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION (a) α a -2 α a -1 β a (b) β a +1 γ a +2 a -2 α a -1 β a β a +1 a +2 Figure 6.14: The arrangement required to identify an edge at a pixel boundary is shown in (a): two pixels either side of the edge with the same intensity. (b) shows a situation that can be interpreted wrongly. If γ < α < β then an edge will be found in the α pixel rather than an edge at x = a and one in the γ pixel, as occurred in the original intensity curve. Pixel number: Pixel value: ^g -1 x 0 0 1 2 3 ^ g ^ g ^ g ^ g 0 1 1 2 2 3 ^ g 3 4 4 Figure 6.15: The situation for a four-pixel sequence. Having found a single non-edge pixel, the entire inﬁnite length becomes an unambiguous encoding of g(x). The only case where no non-edge pixel could be identiﬁed using the above rules is where the sample values continuously ascend or descend, never reaching one or zero at either end. Let us now consider the case of an edge at a pixel boundary. Equation 6.7 can still be used to ﬁnd the edge position. But, to prevent ambiguity, there must be two samples of equal value on either side of the edge, giving a sequence of four samples with the edge in the middle of the sequence. Figure 6.14 shows the ambiguity which arises if this constraint is not obeyed. This constraint, and the constraint that the two pixels adjacent to an edge pixel may not be edge pixels, can be drawn into a single rule: that any two edges in the original image must be farther than two sample spaces apart. That is, in equation 6.2: xa+1 − xa > 2 (6.8) It is debatable whether this equation should have ≥ rather than >. With the black and white case we found that the greater than without the equality allowed us to unambiguously reproduce the reconstructed surface from any pair of sample values. Proceeding on that precedent, we will attempt to ﬁnd some unambiguous reconstruction from a ﬁnite-length set of contiguous samples, assuming that equation 6.8 holds. Is there an unambiguous reconstruction from a four-pixel sequence? The shortest length which can contain two potential pairs of edges is four pixels long. If these four pixels contain either of the two situations described on page 183 then the original function can be unambiguously reconstructed over at least the central two pixels (the outer two may require one extra pixel of information in order for us to apply equation 6.7). In the cases where neither situation occurs the four sample values will be either increasing or decreasing. Without loss of generality we can take the increasing case with the left-most pixel being pixel zero, as illustrated in ﬁgure 6.15. We know the values of ĝ0 , ĝ1 , ĝ2 , ĝ3 . They are related by: 0 < ĝ0 < ĝ1 < ĝ2 < ĝ3 < 1 The values of ĝ−1 and ĝ4 are unknown, but are used in some of the intermediate formulae. There are two possibilities: either edges occur in pixels 0 and 2 or in pixels 1 and 3. We want to ﬁnd some way of distinguishing between these two. The only constraint on the original function 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION Possible edge pixels 0 and 2 1 and 3 Both Neither Total Number of cases 1,167,877 1,092,634 793,406 867,308 3,921,225 185 Percent of cases 29.8% 27.9% 20.2% 22.1% 100.0% Table 6.1: The results for a four pixel long sequence, with pixels having values of 0.01n, n ∈ {1, 2, . . . , 99}. The table shows the number of cases in which edges could occur in pixels 0 and 2 only, pixels 1 and 3 only, either of these cases, or neither of them. is that any two edges must be more than two sample spaces apart. Therefore we need to ﬁnd the potential edge locations, x0 ; x2 and x1 ; x3 , and the distances between these pairs of edges ∆0,2 and ∆1,3 . The edge locations can be found using equation 6.7 and the distance between edges thus become: ∆0,2 = ∆1,3 = ĝ3 − ĝ2 ĝ1 − ĝ0 − ĝ3 − ĝ1 ĝ1 − ĝ−1 ĝ4 − ĝ3 ĝ2 − ĝ1 2+ − ĝ4 − ĝ2 ĝ2 − ĝ0 2+ (6.9) (6.10) The distance between edges must be greater than two, so, to ensure unambiguous reconstruction either ∆0,2 > 2 or ∆1, 3 > 2. If both inequalities are true, then either case could hold and reconstruction is ambiguous. A problem with ﬁnding when these inequalities hold is that the unknown value ĝ−1 appears in equation 6.9 and ĝ4 in equation 6.10. If we set the former to 0 and the latter to 1, we maximise the values of ∆0,2 and ∆1,3 . We need to know these maxima in order to ascertain if unambiguous reconstruction can be achieved given just the values ĝ0 , ĝ1 , ĝ2 , and ĝ3 . Setting these two values to these extrema gives us the following conditions: 1. For it to be possible for edges to be located in pixels 0 and 2, we require: ĝ3 − ĝ2 ĝ1 − ĝ0 − >0 ĝ3 − ĝ1 ĝ1 (6.11) 2. For it to be possible for edges to be located in pixels 1 and 3, we require: 1 − ĝ3 ĝ2 − ĝ1 − >0 1 − ĝ2 ĝ2 − ĝ0 (6.12) It can easily be shown that both conditions can be true for some combinations of sample values, 1 2 4 9 3 , ĝ1 = 20 , ĝ2 = 20 , and ĝ3 = 20 produces, in equation 6.11: 14 > 0, and in for example: ĝ0 = 20 1 equation 6.12: 48 > 0. To give some idea of what is going on, an experiment was performed using all possible values of ĝ0 , ĝ1 , ĝ2 , ĝ3 ∈ {0.01, 0.02, 0.03, . . . , 0.98, 0.99}. This gave the results shown in table 6.1. In nearly 26% of the cases where one of the edge cases is possible, the other is possible too. Therefore the inequalities in equation 6.9 and equation 6.10 do not eﬀectively arbitrate between the two cases. Longer sequences An obvious extension to this idea is to use a larger context to determine where the edges fall. Rather than explicitly extend to a ﬁve pixel context, then to a six pixel context, and so on, we present the general case, for an n pixel context. 186 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Possible edge pixels Even only Odd only Either Neither Total Number of cases 9,349,562 35,036,048 2,624,600 28,277,310 75,287,520 Percent of cases 12.4% 46.5% 3.5% 37.6% 100.0% Table 6.2: The results for a ﬁve pixel long sequence, with pixels having values of 0.01n, n ∈ {1, 2, . . . , 99}. The table shows the number of cases in which edges could occur in the even numbered pixels only, the odd numbered pixels only, either even or odd, or neither even nor odd. Possible edge pixels Even only Odd only Either Neither Total Number of cases 189,397,344 179,280,225 8,826,079 814,548,752 1,192,052,400 Percent of cases 15.9% 15.0% 0.74% 68.3% 100.0% Table 6.3: The results for a six pixel long sequence, with pixels having values of 0.01n, n ∈ {1, 2, . . . , 99}. The table shows the number of cases in which edges could occur in the even numbered pixels only, the odd numbered pixels only, either even or odd, or neither even nor odd. Given n consecutive pixel values: ĝ0 , ĝ1 , . . . , ĝn−1 , such that 0 < ĝa < 1 and ĝa < ĝa+1 , and given that any two edges will be greater than two sample spaces apart: 1. For it to be possible for the edges to the edges to be in the even numbered pixels, we require: δn−3,n−1 > 0, n odd δ0,2 > 0, δ2,4 > 0, . . . , δn−4,n−2 > 0, n even 2. For it to be possible for the edges to occur in the odd numbered pixels, we require: δn−3,n−1 > 0, n even δ1,3 > 0, δ3,5 > 0, . . . , δn−4,n−2 > 0, n odd where: δa−1,a+1 = ĝa+2 − ĝa+1 ĝa − ĝa−1 − ĝa+2 − ĝa ĝa − ĝa−2 and the two required but unknown pixel values are set to: ĝ−1 ĝn = = 0 1 When n = 5, both cases are possible for, for example: ĝ0 = 0.2, ĝ1 = 0.3, ĝ2 = 0.4, ĝ3 = 0.5, and 1 . ĝ4 = 0.7. With these values: δ0,2 = 16 , δ1,3 = 16 , and δ2,4 = 10 When n = 6, both cases are still possible. For example, with ĝ0 = 0.01, ĝ1 = 0.02, ĝ2 = 0.03, 3 2 4 ĝ3 = 0.05, ĝ4 = 0.1, and ĝ5 = 0.3: δ0,2 = 16 , δ1,3 = 14 , δ2,4 = 15 , and δ3,5 = 63 . For n = 5 and n = 6 similar experiments were performed to that for n = 4. The results of these are shown in tables 6.2 and 6.3. For the case of n = 6 we would expect an equal number of cases for even only and odd only. The diﬀerence in table 6.3 is most likely due to bias in the particular 1.2 billion samples we took. Obviously the proportion of situations where either case could be true declines as n increases from 4 to 6; but the fact that there are still situations where either case is possible means that we cannot use these inequalities to arbitrate between the cases. 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION 187 We have been unable to ﬁnd a proof which shows that, for any length, there are always situations where both cases are possible, or that there is a length for which it is possible to unambiguously arbitrate between the two cases. However the series of sample values: ĝa = ln(ca + k), with suitable choices for c and k, appears to provide situations where either case could be true for very large n. For example: c = 0.001, k = 1.001 provides a situation where both cases are true for any n, n ≤ 1717. To discover if a logarithmic sequence can produce a situation where both cases are true for arbitrarily large n seems an insoluble problem in itself. For example, computer calculations show that c = 10−4 and k = 1 + 10−4 gives a situation where either case could be true for n ≤ 17, 181, whilst c = 10−6 and k = 1 + 10−6 does not give such a situation for any n. In the face of this failure to ﬁnd an unambiguous reconstruction from a ﬁnite-length sequence, what can we say? Firstly, with n = 6, only about 2.3% of the situations where one case could be true can also represent the other cases. Thus, we could use n = 6 and, in ambiguous situations, depend on pixels beyond these six to provide suﬃcient contextual information to unambiguously reconstruct. This option would succeed almost always, but, as we have seen, even with a very long sequence of samples there exist situations which are ambiguous over the entire sequence. Secondly, these calculations have been based on the assumption that any two adjacent edges are greater than two sample spaces apart (equation 6.8). We could increase this limit. The simplest way to ensure unambiguous reconstruction for a ﬁnite-length sequence is to have one known non-edge pixel in the sequence. Non-edge pixels can be identiﬁed using one of the two rules on page 183. Pixels obeying the second rule are impossible to ensure for any length of sequence with any minimum limit on the distance between adjacent edges. It is possible to ensure that at least one non-edge pixel can be identiﬁed using the ﬁrst rule, provided that: 1 xa+1 − xa > 2 + n−1 (6.13) 2 for a sequence of n pixels. This condition ensures that, in most cases, there are adjacent pixels in the sequence of n pixels which can be identiﬁed as non-edge pixels using the ﬁrst rule on page 183. In the remaining cases, where no two adjacent pixels have the same value, we can be sure that the pixels at the extreme ends of the sequence are both non-edge pixels. This latter situation only occurs when n is odd, The limiting value can obviously be made closer to two by increasing the length of the sequence. 6.3.3 Practical reconstruction To perfectly reconstruct a function from a sequence of samples we need to convert the above results to a practical algorithm. One such algorithm considers sequences of three pixels, thus requiring all edges to be greater than three sample spaces apart. By considering longer sequences a smaller minimum distance can be used, and by considering all N samples a distance very close to two could be achieved for large enough N . The algorithm for three pixel sequences is: Algorithm 6 Given a sequence of samples: ĝa , ĝa+1 , . . . , ĝb , generated from a function g(x), where g(x) is described by equation 6.2, and the ordered set of pairs is ((−∞, g0 ), (x1 , g1 ), . . . , (xm , gm ), (∞, φ)). Then an exact reconstruction, gr (x), can be made from the samples in the range x ∈ [a + 1, b), if: 1. b − a ≥ 2 (at least three samples); 2. xc+1 − xc > 3, ∀c ∈ {1, 2, . . . , m − 1} (all adjacent edges greater than three sample spaces apart). 188 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION gr (x) is generated piecewise as follows: For every value c ∈ {a + 1, a + 2, . . . , b − 2, b − 1}: If ĝc−1 = ĝc or ĝc+1 = ĝc then gr (x) = ĝc , c ≤ x < c + 1 If ĝc−1 = ĝc and ĝc+1 = ĝc ĝc−1 , c ≤ x < c + where = then gr (x) = ĝc+1 , c + ≤ x < c + 1 ĝc+1 −ĝc ĝc+1 −ĝc−1 If ĝa = ĝa+1 then gr (x) = ĝa , a ≤ x < a + 1 otherwise gr (x) is undeﬁned for a ≤ x < a + 1 If ĝb = ĝb−1 then gr (x) = ĝb , b ≤ x < b + 1 otherwise gr (x) is undeﬁned for b ≤ x < b + 1 You will note that this algorithm perfectly reconstructs the original function in the range x ∈ [a + 1, b). It is undeﬁned in the range x ∈ [a, a + 1) and x ∈ [b, b + 1) if there are edges in pixels a and b respectively. This is a manifestation of the edge eﬀect problem — there is not enough information to exactly reconstruct the function in these two end pixels. An equivalent algorithm to ﬁnd the (xi , gi ) pairs will be unable to ascertain the values at the extreme ends of the range of pixels in the same situations. An alternative strategy is to allow adjacent edges to be closer together but insist that no edge occur in either of the pixels at the extreme ends of the sample sequence. These changes to the criteria would enable us to always exactly reconstruct an intensity curve. This ﬁnal, one-dimensional algorithm, exactly reconstructs the entire original function, provided that the original function obeys the given criteria. Algorithm 7 Given a set of samples: ĝa , ĝa+1 , . . . , ĝb−1 , ĝb generated from a function g(x), an exact reconstruction, gr (x), of g(x) can be generated by this algorithm provided: • g(x) is generated by equation 6.2 from the ordered set of ordered pairs: ((−∞, g0 ), (x1 , g1 ), (x2 , g2 ), . . . , (xm , gm ), (∞, φ)) where xc < xc+1 , ∀c ∈ {1, 2, . . . , m − 1}; • xc+1 − xc > 2, ∀c ∈ {1, 2, . . . , m − 1}; • a + 1 ≤ x1 ; • xm < b. gr (x) is generated by: 1. m ← 0 x0 ← −∞ g0 ← ĝa previous ← NotEdge 2. For c ← a + 1 to b − 1: If ĝc = ĝc−1 and previous = NotEdge then: (a) m ← m + 1 6.3. ONE-DIMENSIONAL SHARP EDGE RECONSTRUCTION 189 (b) gm ← ĝc+1 , ĝc+1 −ĝc xm ← c + ĝc+1 −ĝc−1 , previous ← Edge otherwise: previous ← NotEdge 3. xm+1 ← ∞, gm+1 ← φ 4. gr (x) = ge , xe ≤ x < xe+1 , e ∈ {0, 1, 2, . . . , m} For an inﬁnite length sequence all that need be done is to identify a single non-edge pixel and apply this algorithm forward toward positive inﬁnity and a slightly modiﬁed version of it backward toward negative inﬁnity. Thus, provided that there is a single identiﬁable non-edge pixel in the inﬁnite image, a perfect reconstruction can be generated. As equation 6.8 must be true in this algorithm, an identiﬁable non-edge pixel will almost certainly appear in an inﬁnitely long sequence. 6.3.4 Use in resampling If we are resampling functions of the type described by equation 6.2 then the above algorithms give an exact reconstruction of the original function. This is better than any of the NAPK methods could do with this particular set of original functions, and even better than the APK methods disscussed in section 6.1, because neither the sampling kernal nor the original function are band limited. The band-unlimited nature of these functions is discussed in the next section. The ﬁnal algorithm in the previous section (algorithm 7) generates a description of the function which can be transformed and the new description rendered as illustrated by ﬁgure 6.3. Thus the resampled version is as accurate as it could possibly be, having been rendered from an exact description of the function. 6.3.5 Relationship to sampling theory It is well known that any function, f (x), can be exactly reconstructed from single point samples, fˆi , provided that the highest spatial freqency in f (x) has wavelength longer than two sample spacings (section 2.4.1). If, without loss of generality, we take the sample spacing to be one, then this condition is that the highest spatial frequency is less than one half. Perfect reconstruction is performed by convolving the samples by an appropriate sinc function. The algorithms given in this chapter have shown that any function, g(x), which can be deﬁned by equation 6.2 can be exactly reconstructed from its exact-area samples ĝi , provided that any two edges in the image are greater than two sample spaces apart and that there is at least one identiﬁable non-edge pixel in the sample set. Perfect reconstruction in this case is performed by algorithm 7 modiﬁed for an inﬁnite set of samples, as explained on page 189. The sets of functions which can be exactly reconstructed by these two methods are disjoint except for the trivial case f (x) = g(x) = K, a constant. This can be proven by noting that, except for the trivial case, g(x) always contains sharp edges, which require inﬁnite frequencies to generate whilst f (x) must always be bandlimited. Classical sampling theory produces criteria for perfect reconstruction based on Fourier transform theory [Shannon, 1949]. The resulting method of perfect reconstruction is a simple mathematical process: convolution of the samples by a sinc function. The method of perfect reconstruction presented here, on the other hand, consists of an algorithm which generates a description of the original function. 190 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Original function Sinc Reconstruction ∞ f (x) = −∞ F (ν)ei2πνx dν Sharp Edge Reconstruction Speciﬁed by its spectrum: F (ν) an ordered set of ordered pairs of edge position and intensity level information: ((−∞, g−∞ ), . . . , (x−1 , g−1 ), (x0 , g0 ), (x1 , g1 ), . . . , (∞, φ)). Restrictions on the speciﬁcation |ν| < 12 , this is equvailent to |λ| > 2 where the wavelength λ = ν1 xc+1 − xc > 2, ∀c, plus at least one identiﬁable non-edge pixel Sampling method Single point sampling Exact area sampling Reconstruction method Convolution by a sinc function Using algorithm 7 extended to the inﬁnite case Comments Only works for an inﬁnite set of samples Can work on both inﬁnite and ﬁnite sets, to work on a ﬁnite set requires an extra restriction, that is: no edge in the ﬁrst or last pixel in the set g(x) = ga , xa ≤ x < xa+1 Table 6.4: Comparison of sinc and sharp edge reconstruction assuming unit spaced samples. A further diﬀerence between the two is in the types of function and types of sampling which are used. Classical sampling theory considers a function to be a (ﬁnite) sum of sinusoids, whilst the process described in this chapter considers a piecewise constant function. The former is typical of audio and AC electrical signals, whilst the latter is typical of visual signals. Classical sampling theory considers single point samples, while this process considers exact-area sampling which is a good approximation to the form of sampling employed by a charge coupled device (CCD) sensor [Kuhnert, 1989] or by a microdensitometer with appropiate aperture shape [Ward and Cok, 1989], both image capture devices. A summary of the diﬀerences between the two perfect reconstruction is given in table 6.4. The eﬀect of errors Two types of error beset these algorithms: one due to the ﬁnite number of samples, the other due to errors in the sample values themselves. Sinc reconstruction will only work for an inﬁnite number of samples. A ﬁnite set causes edge eﬀects, which must be dealt with using one of the methods in chapter 4 to produce an approximately correct reconstruction. Exact grey scale reconstruction can deal with ﬁnite numbers of samples, as described in algorithm 7 or algorithm 6. Errors in the sample values themselves, from whatever source, prevent either method from producing an exact reconstruction of the original function. Sinc reconstruction degrades gracefully as the magnitude of the error increases. With small errors in the sample values sinc still produces a good approximation to the original function. Exact grey scale reconstruction, on the other hand, degrades gracefully as quantisation error increases but fails completely for any other type of error. This failure is due to an inability to ensure that adjacent non-edge pixels have the same intensity, a fact which is assumed by all the algorithms. With only quantisation errors, adjacent non-edge pixels will have the same intensity, but it will be a quantised version of the correct intensity. With other errors, it is possible to modify the reconstruction algorithm to take into account an error distribution over the sample values; perhaps by considering two adjacent pixels to have the same grey-value if their values lie within a certain 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 191 distance of each other. Such a modiﬁcation is left for future work. An analysis of the algorithm’s performance for quantisation errors shows that, if the two grey levels either side of an edge quantise to the same qunatisation level, then that edge cannot be reconstructed at all: it ‘dissappears’. All other edges will tend to move slightly. The smaller the diﬀerence in grey level either side of the edge, the larger the maximum possible diﬀerence in edge position between the original and the reconstructed functions, but the less visible the edge to the eye and so possibly, the less the eye will care about the shift. For N quantisation levels, when ∆g, the diﬀerence between adjacent grey levels, is ∆g N 1−1 ; this maximum shift in edge position is 1 approximately ± ∆g (N −1) . See appendix B.9 for details. We have demonstrated that, in an error-free system, a class of intensity curves can be exactly reconstructed by sharp edge reconstruction, which cannot be exactly reconstructed by classical sampling theory methods. We now consider if such a class exists for two-dimensional intensity surfaces. 6.4 Two-dimensional sharp edge reconstruction The original intensity surface in this case consists of areas of constant intensity bound by sharp edges. The image is produced by taking exact-area samples oﬀ this intensity surface. We will consider a square pixel grid and, without loss of generality, unit spaced samples. Thus if the original function is g(x, y), then ĝi,j is taken over the area i ≤ x < i + 1, j ≤ y < j + 1; that is: i+1 j+1 g(x, y) dy dx ĝi,j = i j A good deal of research has been done on locating the positions of edges in images. These tend to be developed for and tested on real, noisy images, and give approximate locations of edges in the image. By constrast we are here assuming a perfectly sampled, perfect intensity surface — thus no errors appear in the sample values. The aim is to ascertain whether, in the absence of any errors, perfect reconstruction is possible. If so, what criteria must the original intensity surface obey for perfect reconstruction to be possible. 6.4.1 Single point sampling case Before looking at exact-area sampling, we examine how much information we can extract from a single point sampled image. Here a pixel’s sample value is the value of the intensity surface at the pixel’s center. An edge will thus be identiﬁable as being close to where adjacent pixels have diﬀerent intensities. In the one-dimensional case the edge occurs in a single pixel, and thus the error in edge position is ± 12 sample spaces. In the two dimensional case, an inﬁnitely long edge will pass through an inﬁnite number of pixels. Thus there is potential to extract accurate information about the line’s slope and position by considering more of the evidence than a single pixel’s worth. Theorem 4 If we have an intensity surface containing a single, inﬁnitely long straight edge, single point sampled over all Z 2 , then the slope of that edge can be exactly determined, whilst its position may have some inaccuracy. Proof: Consider only edges with a slope between zero and one: y = mx + c, 0 ≤ m ≤ 1 where the area below the edge is white and above it is black. All other cases can be transformed into this case by rotation of 90n◦, n ∈ Z and/or reﬂection in the line y = 0. 192 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Key Pixel centre Position of edge Black pixel White pixel hi 1 0 1 1 0 1 0 1 Figure 6.16: An example of a single point sampled single edge. The hi are a Rothstein code for the edge. For every integer x-coordinate we deﬁne a number hi , which is either one or zero, depending on the values of the pixels near the edge. To ascertain the value of hi we ﬁrst ﬁnd j such that ĝi,j = ĝi,j+1 , then: 1, ĝi−1,j = ĝi,j hi = 0, ĝi−1,j = ĝi,j Part of an edge is illustrated in ﬁgure 6.16, with the hi values given underneath. A value of 1 indicates that this column contains the ﬁrst white pixel on a row, a value 0 indicates that it does not. . . . , h−1 , h0 , h1 , h2 , . . . is a Rothstein code describing the line [Weiman, 1980; Rothstein and Weiman, 1976]. The slope of the line is the number of 1s in the sequence divided by the total number of digits in the sequence: k hi m = lim k→∞ i=−k k (6.14) 1 i=−k If the sequence of hi values is aperiodic then equation 6.14 converges on the real value of m [Rothstein and Weiman, 1976, p.123]. A ﬁnite length fragment gives an approximation to the real slope [ibid., p.110]. These statements are also true for a periodic sequence of hi values, but in this case the value of m is rational. If the length of a period is q and each period contains p 1s then the slope is [ibid., p.109]: p m= q Thus the slope of the edge can be exactly determined, in all cases, because we have an inﬁnite number of hi values. If hi is periodic (that is: m is rational) then the position, c, can only be determined to within some error bounds. If the slope is expressed as m = pq with p and q relatively prime then c 1 can be determined with an error of ± 2q . This is illustrated in ﬁgure 6.17. The truth of this error bound can be seen by ﬁnding the y-intercepts, yi , of the edge with each of the sampling lines; y = i + 12 , i ∈ Z. The fractional parts of these values repeat with period q, and within a period they are evenly spaced. Thus, if the fractional part of yi is ei = yi − yi , then, because yi+1 = yi + pq , we can write, for some en : en = o + where n is chosen so that en < 1 q np mod q q and thus o, the oﬀset from zero, is such that o < 1q . (6.15) 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 193 Key Pixel centre Position of edge Black pixel White pixel 1 q Figure 6.17: The error bounds on the position of a line of rational slope, in this case the slope m = pq = 35 . The line can be positioned anywhere over a vertical distance of 1q . The values of ei are periodic, with period q. They are also evenly distributed. The minimum diﬀerence between any two diﬀerent values of ei is 1q . Thus o can range over values up to 1q apart and yet the same sample values will be generated. The error in the position, c, is plus or minus 1 . half this number, that is ± 2q If we wish to know the positional error in the direction perpendicular to the edge, rather than 1 along the y-axis then, given that the error along the y-axis is y = ± 2q , then the positional error perpendicular to the edge is: 1 ⊥ = ± 6 2 2 p + q2 If the sequence hi is aperiodic (that is: m is irrational), then we can condider the situation as q goes to inﬁnity: lim ⊥ = 0 q→∞ Therefore, there exists only one possible position for a line of irrational slope, and c can be exactly determined in this case. QED. For a ﬁnite length edge the slope can only be determined approximately. When a sequence of m hi values are available, h1 , h2 , . . . , hn , then the slope is approximated by: n m̌ = hi i=1 n We can see that a good approximation to the line’s slope and position can be obtained from the single point sample case. Can the exact-area sample case improve on these results? 6.4.2 Exact area sampling — black and white case, single edge Given an image, area sampled from an intensity surface consisting of a single edge between an area of constant intensity zero and an area of constant intensity one, what can we say? Firstly the position of the edge can be approximately determined from the location of the grey pixels in the image. The edge must pass through all these grey pixels in the image. We call such pixels ‘edge pixels’. Secondly, the exact slope and positions of the edge should be determinable from a ﬁnite number of pixel values. (In the single point sample case an inﬁnite number of pixel values were requried to exactly determine the slope, and there existed situations where the position could not be exactly determined). 194 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION y θ ρ x Figure 6.18: The (ρ, θ) parameterisation of a straight line. θ is the angle the line makes with the positive x-axis, ρ is the perpendicular distance from the line to the origin. Hypothesis: The position and slope of a single, inﬁnitely long, straight edge between an area of constant intensity zero and an area of constant intensity one can be exactly determined from two exact-area sample values. This hypothesis has its foundation in the recognition that only two numbers are required to exactly specify a straight line. In section 6.3.1 we saw that a single number speciﬁed the position of a sharp edge in one dimension and that this positional information was coded in a single pixel values (plus one extra bit of orientation information which could be derived from a neighbouring pixel). Here we will see that the slope and direction of a straight line are indeed coded in the sample values of two adjacent edge pixels plus some extra bits of orientation information, which can be derived from neighbouring pixels. Because of the need for this orientation information, the hypothesis is incorrect, but it is not far from the truth. The equation of a straight line Three parameterisation of a straight line, which prove useful when determining the position of an edge, are: 1. y = mx + c, the standard equation of the line, using the slope, m, and y-intercept c, as the parameters. This equation for a straight line has problems in that a vertical line cannot be represented and a small change in m makes a large diﬀerence in the line when m is small and a small diﬀerence in the line when m is large, meaning that an error measure on m (or on c which has a similar problem) is diﬃcult. 2. x = t cos θ + ρ sin θ, y = t sin θ − ρ cos θ, using parameters ρ and θ. ρ is the perpendicular distance from the line to the origin, and θ is the angle the line makes with the positive x-axis (ﬁgure 6.18). These two parameters are better than m and c in that they render a uniform description for any position and orientation (error bounds will be the same for all orientations and positions) and have no special case (such as the vertical lines in the previous case) [Kitchen and Malin, 1987]. However, they can also make the algebra more diﬃcult. 3. x = x0 t, y = y0 (1 − t). The two parameters x0 and y0 are the x- and y-intercepts of the line as shown in ﬁgure 6.19(a). This parameterisation can represent any line except one for which x0 = y0 = 0, or a vertical or a horizontal line which does not pass through the origin. An alternative parameterisation using the same philosophy is that shown in ﬁgure 6.19(b) using y0 and y1 , the y-intercepts of the line at x = 0 and x = 1 respectively. This gives: x = t, y = (1 − t)y0 + ty1 . This parameterisation cannot represent vertical lines. Neither of these parameterisations renders a uniform description for all positions and orientations. In the former, for example, when either parameter is close to zero, a small change in the other will cause a large change in the line. However, both of these very similar parametrisations prove useful in the ensuing work. 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION (a) y (b) y x=1 x x y 195 x y 0 y 0 1 0 Figure 6.19: Two intercept parameterisations of a straight line. (a) in terms of its x- and y-axis intercepts, x0 and y0 . (b) in terms of two y-intercepts, y0 and y1 . value = 0.25 value = 0.5 value = 0.75 θ 0o 26.6 o 45 o Figure 6.20: A single edge pixel’s value can represent an edge at any orientation. Here we see three pixel values (0.25, 0.5, and 0.75) each with ﬁve of the many possible edges that could pass through the pixel. All three parameterisations completely determine the position and slope of a line in two parameters. It is therefore not unreasonable to expect that two pixel values will also completely determine the position and slope of the edge. What can one pixel value tell us? To again simplify matters we consider only an edge with slope between zero and one, and with black (intensity zero) above it and white (intensity one) below it. All other possible situations can be transformed into this set of situations by reﬂection and/or rotation. The single edge pixel will have a grey value between zero and one. Its value reduces the set of possible edges through the pixel from an inﬁnite set to a smaller, but still inﬁnite, set. Examples are shown in ﬁgure 6.20. The ambiguity in the line’s slope and position can be removed by considering the value of an adjacent edge pixel. Before doing this we classify edge pixels into three groups based on how the edge passes through the pixel. Type I pixels have the edge passing through their bottom and right sides (ﬁgure 6.21(a)). Type II have the edge passing through the left and right sides (ﬁgure 6.21(b)), and Type III have the edge passing through the left and top sides (ﬁgure 6.21(c)). No other combination of sides can occur for an edge with slope between zero and one. Edges passing through the bottom-left or top-right corner can be considered to be of any appropriate type. If we now apply a local coordinate system to the pixel, with the origin at the bottom left corner and the pixel of size 1 × 1 (ﬁgure 6.21(a)), then we can describe the edge simply using the third of our three parameterisations of a straight line on page 194. These descriptions are shown in ﬁgure 6.21(b), (c), and (d). 196 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION y (a) y (b) 1 1 y 0 x 1 0 y (c) y (d) 1 1 x 0 x 1 1 y y x 1 y 1 0 0 0 x 1 0 1 x Figure 6.21: The coordinate system for a pixel and the three types of pixel. (a) Local coordinate system for the pixel. (b) Type I pixel, parameterised by x0 and y1 . (c) Type II pixel, parameterised by y0 and y1 . (d) Type III pixel, parameterised by y0 and x1 . Case I.II Third intercept calculation 1 y2 = y1 (1 + 1−x ) 0 I.III x2 = 1 + II.II y2 = 2y1 − y0 II.III x2 = III.I y2 = 1−y1 y1 (1 − x0 ) 1−y0 y1 −y0 0 y0 + 1−y x1 Table 6.5: How the third intercept is calculated from the ﬁrst two in all ﬁve possible cases. The sample value, A, for each type of pixel can easily be determined using these simple formulae: AI = AII = AIII = 1 (1 − x0 )y1 2 1 (y0 + y1 ) 2 1 1 − x1 (1 − y0 ) 2 (6.16) (6.17) (6.18) The information that can be gathered from two adjacent edge pixels Given an edge pixel we will consider also the next edge pixel to the right, or, if this is not an edge pixel, then the next edge pixel above. This gives the ﬁve possible situations shown in ﬁgure 6.22. No other situations are possible given the restrictions we have placed on edge orientation. In each case the third intercept is given the subscript 2 in order to save confusion and unify the notation for all cases. This third intercept can be derived from the other two, because the ﬁrst two uniquely deﬁne the lines. These derivations are shown in table 6.5. Given the two pixel values, call the left one, A, and the right, B, it is possible to recover the values 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 197 I.II y y x III.I 2 y 1 0 y II.II y y y 1 2 1 0 II.III x y y x 0 2 I.III x 2 1 y 0 x 2 1 0 Figure 6.22: The ﬁve possible arrangements of two adjacent edge pixels for a line of slope between zero and one. Each is named by the types of the two pixels ordered from left to right, or bottom to top. The origin is always considered to be at the bottom left corner of the bottom left pixel. of the ﬁrst and second intercepts, thus uniquely determining the edge position and slope from the two pixels’ values. Case II.II This is the simplest case. The pixel values are given by: 1 (y0 + y1 ) 2 1 (y1 + y2 ) 2 1 (3y1 − y0 ) 2 A = B = = where the last step is produced by replacing y2 with its equivalent equation in table 6.5. Some fairly simple algebra leads to the results: y0 = y1 = 1 (3A − B) 2 1 (A + B) 2 From these, and table 6.5, we can also ﬁnd y2 : y2 = 1 (3B − A) 2 The other four cases are not as simple as this. Rather than show the working here we present only the start and end of the algebraic manipulation. 198 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Case I.II The equations for the pixel values are: 1 (1 − x0 )y1 2 1 (y1 + y2 ) = 2 y1 = y1 + 2(1 − x0 ) A = B which produce these equations for the intercepts: y1 x0 6 = −2A + 2 A2 + AB 2A √ = 1− −2A + 2 A2 + AB For the denominator to be zero in the latter equation requires A = B = 0, which means that the edge passes along the lower boundary, rather than intercepting it, so the value of x0 is undeﬁned. Such a situation should be considered as case II.II, not case I.II. Case I.III The equations for the pixel values are: A B 1 (1 − x0 )y1 2 1 = 1 − (x2 − 1)(1 − y1 ) 2 1 (1 − y1 )2 = 1− (1 − x0 ) 2 y1 = which produce these equations for the intercepts: √ A y1 = √ √ A+ 1−B 6 x0 = 1 − 2A − 2 A(1 − B) For the denominator to be zero in the equation for y1 requires A = 0 and B = 1, which implies that the slope of the line is greater than one. Therefore this situation cannot arise. Case II.III The equation for the pixel values are: A = B = = 1 (y0 + y1 ) 2 1 1 − (1 − y1 )(x2 − 1) 2 " # 1 − y0 1 −1 1 − (1 − y1 ) 2 y1 − y0 which produce these equations for the intercepts: 6 y1 = 1 + 2(1 − B) − 2 (1 − B)2 + (1 − A)(1 − B) 6 y0 = 1 − 2(1 − A) − 2(1 − B) + 2 (1 − B)2 + (1 − A)(1 − B) 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION III.I I.III y0 2 − x2 x1 1 − y1 y2 2 − x0 A B C A I.II II.III x0 2 − x2 y1 1 − y1 y2 1 − y0 A 1−B B 1−A 199 II.II II.II y0 1 − y2 y1 1 − y1 y2 1 − y0 A 1−B B 1−A Table 6.6: The equivalences between the various cases. Note that II.II is equivalent to itself. Substituting the values in the right hand column in the place of those in the left hand column in the left hand case’s equations will produce the right hand case’s equations. Case III.I This is diﬀerent from the other four cases in that it uses the pixel above pixel A, rather than to the right. We will identify this pixel’s value as C. The pixel to the right has value B = 1. The equations for the pixel values are: 1 1 − (1 − y0 )x1 2 1 (1 − x1 )(y2 − 1) = 2 1 (1 − x1 )2 = (1 − y0 ) 2 x1 A = C which produce these equations for the intercepts: √ 1−A √ x1 = √ 1−A+ C y0 = 6 1 − 2(1 − A) − 2 (1 − A)C For the denominator to be zero in the equation for x1 requires A = 1 and C = 0, which means that the edge passes along the boundary between the two pixels, rather than intercepting it, so the value of x1 is undeﬁned. Such a situation should be considered as case II.II, not case III.I. Equivalences between cases In ﬁgure 6.22 we can see that cases I.II and II.III are very similar. By a substitution of variables one case can be made equivalent to the other, thus linking the sets of equations for the two cases. The same is true of cases I.III and III.I. The equivalences are shown in table 6.6. Substituting the values in the right hand column in the place of those in the left hand column in the left hand case’s equations will produce the right hand case’s equations. Identifying the case Given the edge pixel, of value 0 < A < 1, if the next pixel to the right is not an edge pixel (that is: B = 1) then we know that we have case III.I. The pixel, of value C, above pixel A is used in the calculation. If C = 0 and B = 1 then the edge passes through the top right corner of the pixel. Case III.I still calculates the intercepts correctly; they are: x1 = 1 y0 = 2A − 1 If 0 < B < 1 then we could be dealing with any of the other four cases. Fortunately the values A and B uniquely identify which to apply. 200 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION C A B Figure 6.23: The arrangement of pixels A, B, and C. I.II I.III II.II 0 ≤ x0 ≤ 1 0 ≤ x0 ≤ 1 0 ≤ y0 ≤ 1 0 ≤ y1 ≤ 1 0 ≤ y1 ≤ 1 0 ≤ y1 ≤ 1 0 ≤ y2 ≤ 1 1 ≤ x2 ≤ 2 0 ≤ y2 ≤ 1 II.III 0 ≤ y0 ≤ 1 0 ≤ y1 ≤ 1 1 ≤ x2 ≤ 2 Table 6.7: The restrictions on the values of the parameters in the four cases which use pixels A and B. Possible values of A and B are restricted to lie in the square: 0≤A≤1 0≤B≤1 Further restrictions are produced by making the slope of the edge lie between zero and one: a minimum slope of zero means that A≤B A maximum slope of one aﬀects only case, I.III, where it implies that x2 − x1 ≥ 1 (in the other three cases the slope can never exceed one). Working through the algebra of this gives us the restriction: √ 1 1 B ≤ − A + 2A, 0 ≤ A ≤ 2 2 These four restrictions deﬁne the boundary in ﬁgure 6.24 within which A and B must lie. The other lines in ﬁgure 6.24 are set by restricting the values of the intercepts in each case to lie within the boundaries listed in table 6.7. These lines bound the areas with which each case deals. Some involved algebraic manipulation or a dense sampling of A, B pairs will determine which areas within the boundary are dealt with by each case. Figure 6.25 shows these areas. They do not overlap and completely ﬁll the area within the boundary. The boundaries between the various areas are where the edge passes through the corner of one of the pixels. These are illustrated in ﬁgure 6.26, along with the situations that occur at the boundaries of the whole area. These results are all well and good for edges with a slope between zero and one, allowing us to exactly reconstruct the line from two adjacent sample values. It leaves the question of how to identify the orientation of a general edge so that it can be transformed to this case and its intercepts determined. Identifying the orientation Given two edge pixels A and B, arranged as shown in ﬁgure 6.23, the orientation of the edge can be determined to one of two octants. This is shown in ﬁgures 6.27 and 6.28. A similar diagram can be drawn for edge pixels A and C arranged as in ﬁgure 6.23. Examples are given in ﬁgure 6.29. To determine which of these two choices is correct we require more information than supplied by the two pixel values. 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 201 L4 B L3 1 L2 075 ry L1 bo un da 0.5 0 A 0 0.25 0.5 1 Figure 6.24: The restrictions on A and B values for the various cases, this ﬁgure shows the bounding lines of the areas covered by each case. The light line shows the boundary within which A, B pairs must lie, the equations deﬁning this boundary are given in the text. The other four lines are generated by imposing the restrictions in table 6.7. They are: 1 L1 : B = 2 − A − 4A L2 : L3 : L4 : (B− 1 )2 2 A = 1−B B = 1 − 13 (1 − A) B = 3A 202 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION II.III I.III II.II I.III II.III I.II II.II I.II Figure 6.25: The four disjoint areas covered by the four cases. (a) (b) (d) (c) (e) (f) (g) (h) (i) (j) (k) (l) (m) Figure 6.26: The special cases, on boundaries in ﬁgure 6.25. They are deﬁned by their locations in (A, B) space. (a) (0, 12 ), the point on the left boundary where I.II meets I.III; (b) ( 14 , 34 ), the point in the middle where all four regions meet; (c) ( 12 , 1), the point on the top boundary where I.III and II.III meet; (d) (0, 0), the bottom left corner; (e) (1, 1), the top right corner; (f) moving along the boundary line A = B from (0, 0) to (1, 1); (g) moving around the boundary from (0, 12 ) to ( 12 , 1); (h) moving along the top boundary from ( 12 , 1) to (1, 1); (i) moving along L3 from ( 14 , 34 ) to (1, 1); (j) moving along L2 from (0, 12 ) to ( 14 , 34 ); (k) moving up the left boundary from (0, 0) to (0, 12 ); (l) moving along L4 from (0, 0) to ( 14 , 34 ); (m) moving along L1 from ( 14 , 34 ) to ( 12 , 1). 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 203 B 1 α β 0.5 γ δ 0 A 0 0.5 1 Figure 6.27: These are four areas of A, B space: corresponding to four pairs of octants. We have been considering area β. An edge in any area can be identiﬁed to be in either of two possible locations, shown in ﬁgure 6.28. Case α octant 2 octant 7 Case β Case γ octant 1 octant 8 octant 4 octant 5 Case δ octant 3 octant 6 Figure 6.28: The four areas in ﬁgure 6.27 can identify an edge to be one of just two possibilities. In each case these two edges lie in two diﬀerent quadrants. Examples are shown here for each case. (a)(ii) (b)(i) (a)(i) 1 3 (b)(ii) (b)(iii) 1 8 2 3 7 8 (a)(iii) Figure 6.29: Two examples of the two possibilities for a given pair of adjacent grey pixel values. (a)(i) can represent either (a)(ii) or (a)(iii), (b)(i) can represent either (b)(ii) or (b)(iii). κ µ C λ A B ν Figure 6.30: The 3 × 3 context of the edge pixel A, with the pixel labels used in the text. 204 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Exclusion Zone 4 sample spacings 4 sample spacings Figure 6.31: To ensure that every edge generates at least one edge pixel at the centre of a 3 × 3 context through which no other edge passes, there must be at least one edge point on every edge such that no other edge passes through a 4 × 4 sample spacing ‘exclusion zone’ cantered on that edge point. If we pick a single edge pixel and use the 3 × 3 pixel region around it we can be certain to get the orientation right. The 3 × 3 region is illustrated in ﬁgure 6.30. If the 3 × 3 region is rotated and reﬂected so that κ ≤ λ ≤ µ ≤ ν then we can be certain that the edge is in the ﬁrst octant. We can then apply our equations to A and B (or A and C, if B = 1) and then transform the results back to the correct octant. Note that, after rotation and reﬂection, κ = 0 and ν = 1. 6.4.3 Extension to grey scale — single edge A single edge, where the intensity on either side may have any value, can be exactly reconstructed as easily as a black and white edge. Taking a 3 × 3 area, as in ﬁgure 6.30. Again rotate and reﬂect it so that κ ≤ λ ≤ µ ≤ ν. Then, if A = κ and A = ν, A is an edge pixel. All values in the 3 × 3 square can be modiﬁed so that κ = 0 and ν = 1. Taking o to be the old pixel value and m, the modiﬁed pixel value, the modiﬁcation is simply m= o−κ ν−κ for each pixel value. The edge can now be found in the same way as for the black and white case. 6.4.4 Extension to many edges Many edges in an intensity surface means that it consists of areas of constant intensity separated by sharp edges. Provided every edge generates at least one edge pixel at the centre of a 3 × 3 block of pixels through which no other edges run, then the method of the previous section can be used to ﬁnd the slope and position of all the edges in the image. To ensure that every edge generates at least one block of pixels of this type there must be at least one position along the edge such that no other line passes inside a square of side four sample spacings centred at that point. An example is given in ﬁgure 6.31. A 4 × 4 region is required because we do not necessarily know where the pixel edges are going to fall, but we can be sure that there will be a 3 × 3 pixel block inside this 4 × 4 exclusion zone, and that the central pixel of this block will contain this edge point, thus this central pixel will be an edge pixel. 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 205 Key An edge Potential edge pixel Potential area pixel Pixel that has been identified as both a potential edge and a potential area pixel Figure 6.32: The algorithm discussed in the text would classify the pixels as shown for this combination of edges. Figure 6.33: No edge can generate three edge pixels of the same intensity in any of these L-shaped arrangements. How edge pixels are found This restriction on the original intensity surface ensures that there will be at least one 3 × 3 pixel block for every edge through which no other edge passes. This allows us to calculate the slope and position of the edge. The problem is now to separate those 3 × 3 blocks which contain a single edge from those which contain no edges or which contain two or more edges. One solution to this problem is to ﬁrst ﬁnd all of the four-connected areas of constant intensity and mark all of the eight-connected neighbours of this area as potential edge pixels. An example is shown in ﬁgure 6.32. The second step is to apply our equations to the 3 × 3 region around each potential edge pixel (and any pixels which were not identiﬁed as either potential edge pixels or area pixels). The edge information is only retained for those regions for which it is consistent with the pixel values. This gives us the position and slope of all edges in the image. These two steps need some ﬂeshing out. The ﬁrst step, area-ﬁnding, is achieved by a single observation and a modiﬁcation of a well-known algorithm. The observation is that no single edge can generate three edge-pixels of the same intensity in an L-shaped arrangement (those illustrated in ﬁgure 6.33). Such an arrangement of three pixels can thus be assumed to be part of an area or caused by a combination of several edges. In the ﬁrst case none of the three pixels is an edge pixel, in the latter, none is at the centre of a 3 × 3 block containing only a single edge. Thus no pixel identiﬁed by this method need be passed to the second step. A pixel from an L-shaped block of three of the same intensity can be used as the seed to a ﬂood-ﬁll algorithm [Foley et al, pp.979–982]. Every pixel in the ﬁlled area is identiﬁed as a possible area pixel and every pixel not in, but eight-connected to, the area is identiﬁed as a possible edge pixel. This process is repeated until all identiﬁable areas have been ﬂood-ﬁlled. If an L-shaped block is generated by multiple edges the whole algorithm should still work, because neither these pixels, nor any pixel of the same intensity four-connected to them, will uniquely identify a single edge. 206 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION Note that some pixels may be identiﬁed as both possible edge pixels and possible area pixels (for example: when an edge falls on a pixel boundary), this is not a problem and, for the next stage, these pixels should be treated as possible edge pixels. In the second step every possible edge pixel is processed individually. All pixels that have not been identiﬁed as possible edge or possible area pixels must also be processed. Such pixels may arise in a small single pixel wide area with no L-shaped group of pixels in it. An attempt is made to rotate and reﬂect the 3×3 region centred on each pixel so that κ ≤ λ ≤ µ ≤ ν in ﬁgure 6.30. If this is impossible then we can be certain that more than one edge pixel passes through this region, otherwise we apply our equations to generate a slope and direction for an edge, along with the intensity on either side of the edge. These four values can then be used to generate pixel values for all nine pixels in the region. If these match the true pixel values then there is only one edge in the region and we can output the edge parameters and the grey values either side of the edge, otherwise we ignore these values. This will generate the parameters for all of the edges in the image (duplicate sets of parameters can be discarded). This process gives us the slope and positions of all the edges in the image, because every edge has at least one pixel in a 3 × 3 area which will uniquely identify it. As a byproduct the process gives us the intensities of the areas either side of each edge. What it does not give us are the end points of the edges. However, for a scene to be consistent all areas must be of constant intensity and bound by straight edges. Hence it should be possible for an algorithm to be devised which generates the end points of all the edges given the image, the slopes and positions of all the edges, and the grey values either side of each edge; all of which can be derived from the above methods and equations. Such an algorithm has not been developed, it is left for future work. Some idea of how if could operate is given below: Every edge’s edge pixels are rendered assuming that it is the only edge in the image. The values produced are compared with the values in the image and all correct values noted as being pixels through which this edge is likely to be the only edge. The pixels at either end of a run of such edge pixels are likely to contain two or more edges. Such end pixels can be rendered using all suitable edges in the description. If the correct value is generated it is likely that the edges used pass through the pixel. Intersections between edges must be found to render these pixel values and these intersections are likely to be the ends of lines. When three or more edges pass through a pixel all combinations of intersections may need to be tried to ﬁnd the correct one. Without designing such an algorithm it is impossible to say whether or not any extra restrictions must be placed on the input intensity surface to allow for exact reconstruction. 6.4.5 Discussion Whether or not extra restrictions are required, this work shows that there exists a class of twodimensional intensity surface which can be exactly reconstructed from their exact-area sampled images. Apart from the trivial, constant intensity surface, no member of this class can be exactly reconstructed from its single point sampled image using classical sampling theory. Such an exact reconstruction would be extremely useful in resampling as it allows the resampling to produce the best possible resulting image. However, the class of intensity surfaces that can be exactly reconstructed by this method is extremely limited. Real images Several types of real image could potentially be exactly reconstructed by these methods. For example: printed matter (text and diagrams) and especially cartographic documents (maps) may 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 207 Step Extended edge Peak Roof or bar Line Shading Edge Figure 6.34: Several diﬀerent types of edge which occur in real images. Only the step and line edges are adequately modelled in our exact reconstruction process (after Marr [1976] and Horn [1977]). meet the requirement for areas of constant intensity separated by sharp edges. The ability to scan a map into a computer and get an exact digital representation would be extremely useful. Unfortunately, the nature of the maps (for example line widths of 0.1mm or ﬁner are common-place [Woodsford, 1988]) means that an extremely high-resolution scan would be required to produce an image which met our criteria for exact reconstruction. There is the additional problem that the task does not just involve ﬁnding edges, but requires signiﬁcant intelligence in recognising what each area of colour and each edge represents (information on the problems involved in digitising maps may be found in Waters [1984, 1985], Woodsford [1988], Laser-Scan [1989] and Waters, Meader and Reinecke [1989]). Real images of real scenes, as opposed to real images of two dimensional ‘ﬂat’ scenes (books, paper, maps) pose diﬃculties. A major problem is that the things we humans perceive as edges may bear little resemblance to the sharp discontinuity used to model an edge in this work. Horn [1977] discusses edges in real scenes and explains how diﬀerent edges in real scenes produce diﬀerent intensity proﬁles. He and Marr [1976] classify the diﬀerent types of edge, some of which are shown in ﬁgure 6.34. Only two of these types match our model of an edge as a discontinuity. Despite these problems with applying our work to real images, the process developed here produces interesting theoretical results in that a class of intensity surfaces which cannot be exactly reconstructed by classical sampling theory can be exactly reconstructed by these methods and equations. 208 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION (a) (b) (c) Figure 6.35: Layout of CCD sensors. The solid lines show the receptive area of each pixel’s sensor. (a) common CCD sensor, (b) advanced CCD sensor, (c) ideal CCD sensor. (After Kuhnert [1989] ﬁg 3.) Errors As in the one-dimensional case, the introduction of errors causes problems. Again, quantisation error introduces an inaccuracy into the calculated edge parameters, while other errors require parameter calculations based on probabilities rather than being deterministic as they have been in the above discussion. The advantage here over the one-dimensional case is that every line has many edge pixels and each pair of these can be used to estimate the slope and position of the line (note that not only adjacent pairs of edge pixels can be used, but any two edge pixels provided we know their types (I, II, or III) and their relative position). Also the position of all the edge pixels in the pixel grid gives bounds on the slope and position of the line, because the line has to pass through all of its edge pixels. Therefore we can expect to be able to generate a fairly accurate approximation from a noisy image provided that none of the edges are very short. Other published work Klassman [1975] considered the case of thick lines. These can be considered as two parallel edges between which lies a black region and outside which lie two white regions. Klassman’s results indicated that, even with no errors in the pixel values, there will always be an error in the estimated position of the line no matter how thick it is. Our process, on the other hand, would be able to exactly ﬁnd the position the line provided its width is such that the 4 × 4 exclusion zone constraint is satisﬁed for some point on each edge. Kuhnert [1989] considers rectangular pixels as used in a CCD device, he gives three types of CCD cell, shown in ﬁgure 6.35. We have been using an ideal CCD-sensor layout (ﬁgure 6.35(c)). Kuhnert uses the advanced layout (ﬁgure 6.35(b)). He considers an edge between two areas of constant intensity but concludes that his model of the sensor does not permit a unique conclusion on the position and slope of the edge. He says that this is unsurprising given that the original intensity surface violates the sampling theorem. However, we have seen here that exact reconstruction of such an intensity surface is possible, given exact-area sampling. Hyde and Davis [1983] consider a similar situation to ours: a single edge, of ﬁnite length, between two areas of constant intensity. They use two least-squares edge estimation techniques, assuming some noise in the image. The ﬁrst step, which they assume is done for them, is to identify the edge pixels. Then, one technique gives the least-squares edge ﬁtted to the centres of the edge pixels, that is: no intensity information is utilised. This is similar to the black and white case discussed in section 6.4.1. The second technique gives a least-squares ﬁt based on the intensity values of the edge pixels. The sum of the squares of the diﬀerence between the true intensity values and the estimated intensity values, is minimised. Their conclusion is that the diﬀerences between the two techniques are not signiﬁcant enough to warrant the use of the second, more costly technique. That is: taking the intensity values into consideration gives little advantage over simply using the position of all the edge pixels to calculate the edge’s parameters. Compare this with our method which gives the exact position of an edge, 6.4. TWO-DIMENSIONAL SHARP EDGE RECONSTRUCTION 209 Figure 6.36: Kiryati and Bruckstein [1991] use exact-area sampling over abutting circular areas. The dots are the pixel centres, the circles are the pixel areas. (After Kiryati and Bruckstein [1991] ﬁg 1.) Figure 6.37: A case which violates the restrictions of Kiryati and Bruckstein’s [1991] exact reconstructor. using grey-level information. However, it must be said that our method does not take noise into account whilst Hyde and Davis assume a noisy image. Kitchen and Malin [1989] consider a single, inﬁnitely long, straight edge between areas of intensity zero and one, which has been exact-area sampled. This study the behaviour of several common edge operators on this step edge. Their results show that none of these operators gives a particularly good approximation to the correct orientation or position of the edge. The principle conclusion is that, even under ideal conditions, the common edge operators are not very accurate. The most recent piece of research is that of Kiryati and Bruckstein [1991]. They consider a similar situation to ours, except that they use exact-area sampling in abutting circular pixels, as shown in ﬁgure 6.36. Their work is detailed and they show that, given an intensity surface which obeys certain restrictions, an exact reconstruction can be made from its circular exact-area sampled image. This again shows that classical sampling theory is not the ﬁnal arbiter of which intensity surfaces can be exactly reconstructed from their images and which cannot. One interesting point is that Kiryati and Bruckstein’s restrictions exclude intensity surfaces like that shown in ﬁgure 6.37. Provided the six edges in this intensity surface are long enough that each has a point surrounded by a 4 × 4 zone excluding all the other edges, there seems no reason why the algorithms outlined in this chapter could not exactly reconstruct this image. 6.4.6 Extensions Three possible extensions to the described reconstructor are: 1. To make it more useful this algorithm could be altered to recognise sharp edges between areas of slowly varying intensity. Then exact reconstruction could be used at the edges and some local reconstructor in the slowly varying parts. Cumani, Grattoni and Guiducci [1991] have tried a similar idea for image coding: exactly storing the image near sharp edges and 210 CHAPTER 6. A PRIORI KNOWLEDGE RECONSTRUCTION storing very little information in the slowly varying areas between regions. Such an idea could be tried with resampling where the intensity surface in the slowly varying regions is reconstructed diﬀerently to that near the edges. 2. A straight edge can be represented by two parameters. A circle by three. An extension could be made to ﬁnd a circle’s parameters from three pixel values on the circle’s edge. Preliminary results show that the algebra involved in these computations is considerably more involved than that for straight edges. 3. Two problems with images that have been enlarged using good NAPK reconstructors are: (1) the straight edges become blurry; and (2) textured areas do not reveal any ﬁner texture (for example, enlarging a tree produces a blurry tree, you do not see more of the tree’s structure). Kajiya and Kay [1989] call this eﬀect ‘the painter’s illusion’. The image gives the impression that more detail is there at the sub-pixel level than is actually present. When the image is enlarged this lack of ﬁne detail becomes readily apparent. The work in this section has attempted to solve the ﬁrst of the two problems. An extension would be to tackle the second: how to reconstruct, or resample, a textured region so that the resampled image is visually acceptable. 6.5 Summary We have seen that, given knowledge about how an image was sampled, a better reconstructed intensity surface can be generated. The bulk of this chapter dealt with sharp edge reconstruction. We have proven that, for a certain set of one-dimensional intensity curves, exact reconstruction is possible from the exact area sampled image of an intensity curve in the set. We have also given strong evidence that a similar result holds in two dimensions. The signiﬁcance of this result is two-fold. Firstly, it shows that classical sampling theory is not the ﬁnal word on which intensity surfaces can be exactly reconstructed from images. Secondly, it indicates that a good reconstruction of this particular class of real intensity surface may be possible. Chapter 7 Transforms & Resamplers In this chapter we draw together the threads of reconstruction, sampling, and transformation. We discuss the overall resampler, the measurement of quality, and some points about transforms in general. The transform is at the heart of the resampling. The other two sub-processes can be considered as simply format converters (chapter 1) allowing the transform to be performed in the continuous domain, where it is simple, rather than in the discrete domain, where it is complicated. In general the transform moves intensity information around but neither loses nor adds any information. Reconstruction and sampling, on the other hand, tend to be lossy operations. Thus, it is the choice of reconstructor and sampler which determines the quality of the overall resampler. These should therefore be the best possible, regardless of the transform. For certain classes of images there are perfect reconstruction and sampling processes, but, for the average image, such perfect processes do not exist. So we must content ourselves with the best imperfect resampler that we can implement. Before we continue we need to note that, in chapter 1, a resampler consisted of a speciﬁc reconstructor, transform, and sampler. Often however, we will wish to discuss the same reconstructor and sampler coupled with many diﬀerent transforms, or some class of transforms. In these cases we still speak of a resampler but it is one in which the transform can be any of a set of transforms or sometimes any transform at all, whilst the reconstructor and sampler remain ﬁxed. 7.1 Practical resamplers In general the better the reconstructor or sampler, the more diﬃcult it is to implement and so we need to trade oﬀ between a good reconstructor or sampler and an easily implemented one. We need to make as many simpliﬁcations as possible, with the least eﬀect on quality in order to produce a quick, practical, and good resampler. For certain transforms it transpires that changing the reconstructor or changing the sampler makes little diﬀerence on the overall result of the resampling, while changing the other can make a great diﬀerence. Heckbert [1989, pp.51–52] shows that, for large magniﬁcations, the reconstruction method dominates the resampler and the sampler is irrelevant, so it may as well be a single point sampler for simplicity. He also shows that for large miniﬁcation the sampling method dominates, and the reconstructor is irrelevant. In this case we can use a simple reconstructor with much the same eﬀect as a complex one. For a resampler which may have to cope with both large magniﬁcations and large miniﬁcations, we require either a good reconstructor and a good sampler, or some sort of combination where a 211 212 CHAPTER 7. TRANSFORMS & RESAMPLERS switch is made from a good magniﬁcation resampler to a good miniﬁcation resampler as the scale factor passes through one [Heckbert, 1989, p.52]. Near a scale factor of one, the evidence indicates that changing the sampler makes little or no diﬀerence to the quality of the resampled image1 , and that there is little to choose between the quality oﬀered by the better reconstructors. With a magniﬁcation factor near one, reconstruction by nearest-neighbour produces noticeable artifacts and reconstruction by linear interpolation often gives a blurry result, but other methods produce few, if any artifacts (see section 7.3.2 which discusses the eﬀect of reconstructors at large scale factors compared with their eﬀect at a scale factor of one). Therefore, good samplers are required for miniﬁcation, and good reconstructors required for same-scale operations and magniﬁcation. 7.1.1 Restricted resamplers While theoretical resamplers can be applied to any transform, a practical resampling method may only be applicable to a limited set of transforms. A resampler employing point sampling can be used with any invertible transform provided that the reconstructor can give the value of the original intensity surface at an arbitrary point. An invertible transform here is one for which any output point can be mapped, by the computer program, to a point in the input intensity surface. Many transforms are covered by this restriction. Those which are not can be closely approximated by transforms which are. Practical resamplers which employ area sampling tend to be more restricted in the types of transforms they can be used with. This restriction is due to their implementation, discussed in section 1.8.2. Greene and Heckbert [1986], for example, consider a resampler composed of a Gaussian reconstructor and a Gaussian area sampler. This resampler only works with aﬃne transforms. However, a local aﬃne approximation to the true transform can be used, allowing the reconstructor to approximate the correct results for a wide range of transforms. Other area sampling resamplers give a transform-dependent approximation to a transform-independent sampler. Such resamplers include those developed by Crow [1984], Glassner [1986], and Fournier and Fiume [1988]. Their starting point in algorithm development is a transform-independent area sampler. The ﬁnal algorithm has a transform-dependent area sampler in order to be practical. Williams [1983] has an interesting approach to approximating a nearest-neighbour reconstruction/exact area sampling resampler. If we look at his entry in appendix A, we see that his method is equivalent to point sampling oﬀ a three-dimensional reconstructed volume. The ﬁrst two dimensions are the usual spatial coordinates whilst the third dimension, d, is related to the size of the output pixel projected back onto original image space. The volume is created by ﬁrst minifying the image several times (by a nearest-neighbour reconstruction/exact area sampling resampler) to generate an image pyramid. Each image in the pyramid is then reconstructed by linear interpolation and the whole volume ﬁlled in by linear interpolation between the intensity planes in the pyramid (ﬁgure 7.1). 7.1.2 Resampler equivalences The implementation of an area sampler, described in section 1.8.2, is possible because of the equivalence of two diﬀerent decompositions. In general, any resampler can be decomposed in many diﬀerent ways; the transform remaining the same but the reconstructor and sampler being different. As an example, consider a resampler consisting of nearest-neighbour reconstruction, a 1 The evidence, indicating this result for samplers, is in the papers which discuss resampling for geometric correction of images. Such correction does not require large scale changes and none of these papers discuss any form of sampling other than single point. Our own observations indicate that an exact area sampler makes no appreciable diﬀerence in near-same scale transforms, compared with single point sampling, unless a nearest-neighbour reconstructor is used. 7.2. TRANSFORM DIMENSIONALITY 213 d =4 d =3 Linear Interpolation d =2 Linear Interpolation d =1 d=0 d y x Intensity Surfaces Image Pyramid y x Intensity Volume Figure 7.1: Williams’ [1983] reconstructor. An image pyramid is formed by nearestneighbour/exact area scaling by a factor of two, several times. Each level of the pyramid is then enlarged by linear interpolation, and then the third dimension is ﬁlled in by linear interpolation between the resulting intensity surfaces. Reconstruction Filter (a) Transform Sampling Filter translation (b) @ @ translation 6 (c) 6 translation @ @ Figure 7.2: Three equivalent decompositions of the same reconstructor: (a) nearestneighbour/exact area, (b) linear/single point, and (c) delta-function/triangle ﬁlter. translation, and exact area sampling. This same resampler can be decomposed into linear interpolation and single point sampling; or into Dirac delta function reconstruction and triangle ﬁltered area sampling. These three decompositions are illustrated in ﬁgure 7.2, for one dimensional signals. If we now change the transform, these three decompositions are no longer equivalent. For example: take a resampler with nearest-neighbour reconstruction, scaling transform, and exact area sampling. Its equivalent decomposition with point sampling is shown in ﬁgure 7.3. The reconstructor depends on the scale of the transform. This equivalence can lead to confusion. Durand and Faguy [1990] for example, mistakenly equate scaling using nearest-neighbour reconstruction and exact area sampling with scaling using linear intepolation and single point sampling, when this equivalence is only correct for a scale factor of one. For a general transform using transform-independent reconstruction and area sampling, the equivalent decomposition with single point sampling will have a reconstructor that is transform-dependent. We hypothesise that, given a transform-independent reconstructor, an arbitrary transform, and a transform-independent sampler, any equivalent resampler will have a transform-dependent reconstructor and/or a transform-dependent sampler. This hypothesis shows why area sampling resamplers are so much more diﬃcult to implement than point sampling ones. It is because, in the method of section 1.8.2, the area sampler has to be converted to a point sampler to be implementable, and so the reconstructor in the equivalent point sampling resampler is transform-dependent, and hence diﬃcult to implement. 7.2 Transform dimensionality Resampling has been applied to one-, two-, and three-dimensional images. One use of resampling is in texture mapping where a two-dimensional image is mapped onto a three-dimensional surface which is then projected onto a two-dimensional plane, giving an overall 2D to 2D transform. Image warping and rectiﬁcation are also generally 2D to 2D transforms. For any dimensionality the theory is much the same. However, resamplings where a many di- 214 CHAPTER 7. TRANSFORMS & RESAMPLERS Reconstruction Filter 6 1 ? (a) 1k<1 1 1+ PP P 1- 1 k 6 @ 1 @ ? 2 @ (b) k=1 k>1 6 B 1 B ? - 1 - 1 k k 1 + 1 Transform B Sampling Filter scale ×k ? k 6 1- scale ×k 6 1 ? 1 6 k Figure 7.3: Two equivalent decompositions of the same reconstructor: (a) nearest-neighbour/exact area, (b) a transform-dependent reconstructor combined with point sampling. A third decomposition could be generated with delta-function reconstruction and a transform-dependent sampler. (a) In two dimensions Nearest neighbour reconstruct & Shear Exact-area sample (b) For each scan line Nearest neighbour reconstruct & Shear Exact-area sample Figure 7.4: A shear transform, with nearest-neighbour reconstruction and exact area sampling. (a) The 2D resampling; (b) the 1 12 D resampling process for a single scanline. mensional transform is split into two or more one-dimensional passes are a special case (see section 1.7.5). There are two types of one-dimensional resampling used in such multipass algorithms on two-dimensional images. We have choosen to call these 1D resampling and 1 12 D resampling. In 1D resampling each scan-line is treated as a one-dimensional resampling problem. That is: the scan-line has length but not breadth [Euclid, 300BC]. It is thus easy to use the one-dimensional theory to develop resamplers for this case. In 1 12 D resampling each scan-line is considered to be one pixel wide, and the transform applied to it does not necessarily have to be the same across the entire width of the line, it is thus a restricted two-dimensional resampling. An example of 1 12 D resampling is given by Paeth [1990], where he implements rotation as three shears. Each shear is implemented as a one-dimensional resampling on each row or column of the image (ﬁgure 7.4). Paeth takes this exact 1 12 D resampling and approximates it by a 1D nearest-neighbour reconstruction/exact area sampling resampler. This idea can be extended to, for example, three dimensions. With one-dimensional resampling, in 3D, we can either have a breadth-less 1D resampling, or a volume with cross-section of size a 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS Image Reconstruct- Intensity Surface Magnify- Intensity Surface Single Point Sample - Image 215 Displayed DisplayIntensity Surface Figure 7.5: The reconstruction process we want to evaluate is on the left, we would like to view the intensity surface that it produces. The other three processes are needed for us to be able to view an approximation to the reconstructed intensity surface on a physical display device. pixel width by a pixel width. This could be called 1 23 D resampling. 7.3 Evaluation of reconstructors and resamplers In chapter 3 we said that reconstructors are evaluated either on their visual results or on their mathematical properties. But how do we visually evaluate a reconstructor? It produces an intensity surface which cannot be represented inside the computer and cannot be displayed directly unless a piece of hardware is built which takes the discrete image as input and produces the reconstructed intensity surface in a form that can be viewed directly by the eye. In general, such a piece of hardware would be impossible, or at best, expensive and diﬃcult to build. The usual way to visually assess reconstructors is to resample an image using the appropriate reconstructor, a magniﬁcation, and single point sampling, and then display this much larger image on an ordinary CRT [Andrews and Patterson, 1976; Schreiber and Troxel, 1985; Mitchell and Netravali, 1988; Wolberg, 1990; Boult and Wolberg, 1992]. The process involved is shown in ﬁgure 7.5. The intensity surface we are evaluating is therefore not the intensity surface which we view, but is separated from it by a sampling and another (probably dissimilar) reconstruction. How valid are the results of this type of evaluation? We can conclude that they are adequate if the eﬀects of the sampling and the display reconstruction are small compared to the eﬀects of the reconstructor we are studying. Such a situation will arise under a signiﬁcant magniﬁcation of the intensity surface and when the displayed intensity surface is viewed from suﬃcient distance that the display’s reconstruction eﬀects are not noticeable (see the discussion in section 7.3.1 on the eﬀect of viewing distance on the perceived intensity surface). The former point about signiﬁcant magniﬁcation is justiﬁed because of two results of insuﬃcient magniﬁcation. The ﬁrst is that, under low magniﬁcation, the sampling signiﬁcantly aﬀects the result. For small magniﬁcation the combination of a reconstructor and sampler can produce very diﬀerent results to those expected from considering the reconstructor alone. This happens because the combination of reconstructor and sampler generate a discrete reconstructor with diﬀerent properties to the continuous one. The solution is to either magnify by a large factor (for example: ×8), which ameliorates this problem; or to magnify by an irrational factor (for example: ×e or ×π), which randomly distributes errors across the magniﬁed image rather than producing a regular error pattern. The latter solution is not preferred because the error is still there, it is just not as obvious because it is random [Cook, 1986]. The second reason for using a large magniﬁcation factor is so that the reconstruction ﬁlter is large compared to the display reconstruction ﬁlter. This means that the display reconstruction ﬁlter has little eﬀect on the overall ﬁlter, and so what we see is very similar to the reconstruction ﬁlter which we are trying to evaluate. If we assume that the display reconstructs using a typical Gaussian convolved with a box ﬁlter2 , then the overall reconstruction ﬁlter viewed is shown in ﬁgure 7.6 for various magniﬁcation factors of a linear interpolant. Thus, for our results to be valid we need large magniﬁcation. A factor of four is the minimum 2 the box ﬁlter is the nearest-neighbour reconstruction used in most display drivers, and the Gaussian is the blur caused by the phosphor spot on a CRTs surface. 216 CHAPTER 7. TRANSFORMS & RESAMPLERS (a) (b) (e) (c) (f) (d) (g) Figure 7.6: The eﬀect of magniﬁcation factor on displaying a reconstructors results. On the left is the linear interpolation ﬁlter, (a), whose eﬀects we are trying to view. After sampling this becomes (b) for magniﬁcation by 2, (c) for magniﬁcation by 4, and (d) for magniﬁcation by 8. Reconstruction by the display’s Gaussian∗box reconstructor gives an overall reconstructor shown in (e), (f) and (g). As the magniﬁcation factor increases, so the overall reconstructor better approximates (a), the reconstructor we are trying to evaluate. that will be useful, eight is better. Andrews and Patterson [1976] used a factor of 16, which is the largest factor recorded in the literature. Whilst useful for evaluating reconstructors, enlargement of this nature is not a particularly useful function of resampling. By enlarging a function to, say, four times its size in both dimensions, we are basically using the original as a compressed version of the resampled image. It is possible to achieve much higher quality reproduction, using the same number of bits of information, by using a better compression method than sampling at one quarter of the frequency and reproducing the original by resampling. However, Reed, Algazi, Ford and Hussain [1992] have used such resampling as part of a more complex compression method. First, the image is subsampled at an eighth the resolution in each dimension, and this miniﬁed image encoded and transmitted. Secondly the diﬀerence between the original image and this small image enlarged by cubic interpolation is quantised, encoded and transmitted. At the receiver, the miniﬁed image is decoded and enlarged by the same resampling method, and the diﬀerence image is decoded and added back to the enlarged image. Resampling cannot create extra information. So reconstruction is the process of arranging the available information in the best possible way. By enlarging an image we stretch the available information over a wide area and the artifacts of the reconstruction method become readily apparent. This is why we use enlargement to subjectively study reconstruction methods, it is a way of getting a handle on what the reconstructed surface actually looks like. However, in practical applications, it is desirable to have as much information as possible available. Thus, if we know beforehand the range to which we may have to transform our original image, then we should ensure that the original contains as many, if not more, samples than the ﬁnal image will require. This ensures that we will be discarding information rather than having to ‘ﬁll in the gaps’, and thus 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS 217 our ﬁnal image will contain the maximum amount of information possible. This is a tall order because often we do not know what transformations will have to be performed on an image when it is captured. However, if the choice exists between recapturing the image at higher resolution or resampling it to higher resolution, then recapturing it is by far more preferable from the point of view of image quality. It is interesting to consider an application where the size of the ﬁnal image is roughly known before the original is captured. Venot et al [1988] and Herbin et al [1989] describe a system for the registration of medical images. The normal process is that two or more images are captured over the period of treatment and registered with one another to track the progress of healing. The ﬁrst image to be captured is the standard. Subsequent images are then taken over a period of weeks and are registered against the standard. Venot et al [1988, p.300] give their default range of scale factors to be 0.8 to 1.2. This implies that they expect subsequent images to be at roughly the same scale as the ﬁrst. This is a sensible assumption: too much larger than the original and excessive processing would be required to reduce it down to size and, more importantly, with a ﬁxed size image some of the necessary parts of the second image could be lost oﬀ the edges (for example: the edge of a region of scar tissue could fall outside the image’s edge). Much smaller than the original and there would be insuﬃcient detail in the second image, it would need to be enlarged too much, and of course no extra information would be available. Our recommendation is that, ideally, the second image should be a little larger than the ﬁrst to allow for some information loss during the resampling process. Some information is bound to be lost in this process, so having more than enough to start with will partly compensate for the quality degradation due to the resampling. 7.3.1 Display artifacts Now, consider the eﬀect of the display reconstructor on the resampled image. Pavlidis [1990] and Blinn [1989b] both say that it is the display reconstructor which causes at least some of the rastering eﬀects in the continuous intensity surfaces that we view. This is certainly true. Pavlidis [1990] suggests that we could improve the performance of our graphics displays by using a better reconstructor between the digital image and the displayed intensity surface. Such an idea merits consideration, but there are two factors which may make it impractical or unnecessary. The ﬁrst is that this idea cannot be implemented easily with CRT technology. A better reconstructor than the box convolved with a Gaussian that is currently used could be designed for reconstruction along a raster scan line, by altering the digital to analogue converter used to drive the CRT. But it is diﬃcult to see how it could be made to work between scan lines, because the reconstruction method in this direction is inherent in the CRT and cannot be altered by changing the process which drives the display. Perhaps extra scan lines could be used between the true ones? The second is that, regardless of the reconstruction method, from a great enough distance, the reconstructed image looks the same. Using nearest-neighbour, linear, Catmull-Rom and FFT reconstruction we found that the four magniﬁed versions were indistinguishable when a single pixel subtended a visual angle of less than about 2 minutes. Thus, provided the CRT is always viewed from at least this distance (50–60cm for a typical display), rastering eﬀects caused by the display should be undetectable. As we cannot guarantee that the display will always be viewed from at least this distance, there is value in Pavlidis’ suggestion, although it may require non-CRT display technology to implement. As to worries about evaluating resampled images on an imperfect display device: (1), the image will always have to be displayed somehow, so that it may be a good thing to evaluate it on a typical display; (2), if we are worried about it, we can enlarge the image using any reconstructor we think desirable, and then view the image on the display from a much greater distance, ameliorating the eﬀects of the display reconstructor, as discussed earlier. 218 CHAPTER 7. TRANSFORMS & RESAMPLERS Figure 7.7: Two images rotated by 5◦ using four reconstructors and single point sampling. In both cases: top left: nearest-neighbour; top right: linear interpolation; bottom left: Catmull-Rom cubic interpolation; bottom right: FFT interpolation. 7.3.2 Judging a resampler by its reconstructor’s quality Resamplers are evaluated visually by examining the displayed resampled image and comparing it with the displayed original image (as in ﬁgure 1.8). They are evaluated mathematically by comparing the original and resampled images in some way, or by considering the mathematical features of the whole resampler. Resamplers are often evaluated by considering only their reconstructors and assuming that the quality of the resampler depends solely on the quality of the reconstructor. We have already seen that this is not necessarily a valid assumption. One practical example of this error of reasoning, occurred in the following situation. Four reconstruction methods, nearest-neighbour, linear, Catmull-Rom and an FFT method, were used with single point sampling. Three diﬀerent transformations were used: magniﬁcation by a factor of four, and rotation by ﬁve and forty-ﬁve degrees. The ﬁrst transformation is a standard way of assessing the quality of diﬀerent reconstructors [Mitchell and Netravali, 1988; Andrews and Patterson, 1976]. The results from the magniﬁcation (ﬁgures 7.9 and 7.10) indicated that the FFT method is inferior to the Catmull-Rom method. The nearest-neighbour and linear methods are inferior also3 . The results from the rotation by 5 degrees (ﬁgure 7.7) however, indicate that the FFT method is as good as the Catmull-Rom method. The nearest-neighbour and linear methods are still inferior4 . However, when rotating by forty-ﬁve degrees, all four methods produce virtually identical results (ﬁgure 7.8). The trained eye can pick up some blurriness in the linear version and some minor defects in the nearest-neighbour version but these are almost negligible. The point here is that a resampling method cannot be judged on its reconstructor alone, as some are wont to do, but rather on the reconstructor, transform, and sampler. Here we see that reconstructors of widely diﬀering quality can produce resamplers of similar quality because the transform and sampler are such that the diﬀerences in reconstructor quality are negligible in the overall resampler. Obviously, from this, there are situations where a low quality reconstructor will be just as eﬀective as a high quality one. The best example of this is in miniﬁcation, where the quality of 3 The main problems with the reconstructors are: blockiness (nearest-neighbour), blur (linear) and ringing (FFT). 4 Nearest-neighbour produces a lot of obvious ‘jaggies’, linear produces a slightly blurry image. 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS 219 Figure 7.8: Two images rotated by 45◦ using four reconstructors and single point sampling. In both cases: top left: nearest-neighbour; top right: linear interpolation; bottom left: Catmull-Rom cubic interpolation; bottom right: FFT interpolation. the sampler is important, and the quality of the reconstructor is practically irrelevant. 7.3.3 Artifacts in resampled images Mitchell and Netravali [1988] consider several artifacts which can occur in a resampled image: sample frequency ripple, anisotropic eﬀects, ringing, blurring, and aliasing. If we are comparing diﬀerent type of resampler, (for example: linear against cubic against FFT reconstructors), rather than searching for, say, the best cubic reconstructor, then we can assume that the ﬁrst two problems have been designed out of the resampler. These two tend to be easy to remove early on in resampler design. If they do occur, the solution is to use a better reconstructor. This leaves us with several artifacts which may occur in an image, Mitchell and Netravali’s latter three plus an additional few. They all have their causes and most can be corrected for: • Aliasing — caused by bad sampling. May be corrected by sampling well, or may be desirable in certain cases (for example: the exact reconstruction method in chapter 6 required an aliased image as input). • Ringing — caused by a combination of aliasing and near ‘perfect’ reconstruction. May be corrected by using a less than perfect reconstructor (for example: Catmull-Rom cubic, rather than sinc) or by not using aliased images. May also be caused by a well sampled image and near perfect reconstruction giving a result that is exactly correct, but which we conceive to be very wrong (see section 2.7). • Rastering — an artifact caused by bad reconstruction. Most noticeable with nearest-neighbour reconstruction where it produces the well-known ‘jaggy’ eﬀect (for example, ﬁgure 7.10(top left)). The solution is to use a better reconstructor. • Blurring — caused by certain types of reconstruction. Notably, all strictly positive reconstructors produce blurred results. The solution is to use a reconstructor with negative lobes, or an exact reconstructor, as in chapter 6. The former solution can lead to negative intensities. There is a trade-oﬀ here, between removing blur and wanting to be certain of always having positive intensities. 220 CHAPTER 7. TRANSFORMS & RESAMPLERS Photograph Checks 1 and 2 The image ﬁle for this photograph is unavailable Photograph Checks 3 and 4 The image ﬁle for this photograph is unavailable Figure 7.9: A checkerboard image enlarged fourfold using four reconstructors and single point sampling. Top left: nearest-neighbour; top right: linear interpolation; bottom left: Catmull-Rom cubic interpolation; bottom right: FFT interpolation. For this particular image the nearestneighbour method gives an exact reconstruction, an unusual occurrence. 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS 221 Photograph Kings 1 and 2 The image ﬁle for this photograph is unavailable Photograph Kings 3 and 4 The image ﬁle for this photograph is unavailable Figure 7.10: A captured image enlarged fourfold using four reconstructors and single point sampling. Top left: nearest-neighbour; top right: linear interpolation; bottom left: Catmull-Rom cubic interpolation; bottom right: FFT interpolation. 222 CHAPTER 7. TRANSFORMS & RESAMPLERS • Missing detail — this was mentioned at the end of chapter 6. It occurs when a textured region is enlarged and no extra detail becomes apparent. There is no obvious solution to this problem. • Blurry edges — occur, under magniﬁcation, even with good reconstructor. They are due to a lack of information about edge position. One solution to this was developed in chapter 6. These last two artifacts are due to the original image being a representation at a speciﬁc scale of an intensity surface. Magnifying the image reveals the lack of underlying detail in this representation. (‘the painter’s illusion’ [Kajiya and Kay, 1989]). There are certain methods which try to produce underlying detail. One is the exact reconstructor of chapter 6 which can reconstruct straight edges. Another is the fractal compression technique [Sloan, 1992] which represents an image as a set of fractal deﬁnitions. This allows an image to be magniﬁed and produce ﬁner detail than was in the original. There is of course, no guarantee that this ﬁne detail will be accurate. In general, we cannot access any underlying detail and so must content ourselves with the information that exists in the original image. 7.3.4 Quality measures One very useful tool in the evaluation of resamplers would be a single number which could be calculated from an image, or images, which would indicate the quality of the resampling. For example, say we had a quality measure which we could apply to a single image. When we generate several resampled images from the same original, using the same transform but diﬀerent resamplers, we could calculate this measure for them. Ideally, the measure should order the resampled images in order of increasing visual quality. In practice, subjective visual judgements are often used. For example, Mitchell and Netravali [1988] used subjective visual judgement by experts to assess image quality. Vogt [1986] and Fiume [1989] both bemoan the lack of a formal measure of goodness, which thus leaves it up to a human to make arbitrary statements about how good an image is. Vogt especially comments that it is diﬃcult to compare more than four images at one time without losing judgement. Pratt [1978] discusses qualitative measures of image quality. He says that they are desirable because: • they form a basis for the design and evaluation of imaging systems; • they eliminate the cumbersome, and often inaccurate eﬀorts of image rating by human observers; • they can form the framework for the optimisation of image processing systems. He goes on to comment that, unfortunately, developing such quantitative measures is tricky and that, while much progress has been made, the measures that have been developed are not perfect. Counter-examples can often be generated that possess a high quality rating but that are subjectively poor in quality and vice-versa. He claims that the key to the formulation of better image quality measures is a better understanding of the human visual system. At the present time, Pratt states that the most common and reliable judgement of image quality is subjective rating by human observers. Nagioﬀ [1991] discussed the problem for comparing image compression techniques. She concluded that the best way of judging image quality, to date, is root mean square diﬀerence between compressed and original versions combined with visual inspection, but that neither on its own is a suﬃciently good quality measure. The image quality measures that have been developed, can be divided into two classes [Pratt, 1978]: 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS 223 • bivariate — a numerical comparison between two images; • univariate — a numerical rating assigned to a single image based upon measurements on that image. Many bivariate measures have been used. One such is the root mean square diﬀerence. In image resampling we are faced with the problem that we cannot use diﬀerence measures of this type because the original and resampled images are not aligned. A solution to this problem is to resample the image and then resample the resampled image to realign it with the original. Hammerin and Danielsson [1991] and Parker et al [1983] do this to get mean square diﬀerences for rotated images. The argument here is that the degradation in both rotations is roughly the same and so the mean square diﬀerence is a valid measure of the quality of a single rotation. Unfortunately, this argument fails when the transform involves a scale change. Magnifying an image and reducing an image are very diﬀerent operations and an algorithm that is good at one can be hopeless at the other. For example, we performed an experiment in which an image was magniﬁed and reduced, and the root mean square diﬀerence between the original and the doubly resampled image calculated. This was done for a range of magniﬁcations and a number of resamplers. The results are shown in appendix C. The worst resampler (nearest-neighbour reconstruction/single point sampling) performed the best in this test because it allows the original image to be exactly reproduced for any magniﬁcation factor greater than one, yet it produces the worst magniﬁed image. The better resamplers produced worse numerical results on this test, but better visual results in the magniﬁed images. Thus the root mean square diﬀerence between an image and its doubly resampled version is not a good measure of quality for transforms involving any sort of scale change. What is required is a univariate quality measure, one which can be calculated for a single image. Such a measure is likely to be very hard to ﬁnd. Pratt [1978 pp.175-176] only gives one true univariate measure, the ‘equivalent rectangular passband measure’, and he says that this has not proven to be well correlated with subjective testing. One possible candidate is the mean intensity level. However, this should be constant for all images resampled from the same source, and so is unlikely to be a good discriminator between resamplers. Oakley and Cunningham [1990, p.191] claim that their accuracy metric is closely related to the visual quality of the results. They propose to present a future paper discussing the relationship between the theoretical and visual performance of the optimal reconstruction ﬁlter generated using this accuracy metric. To give some idea of the relative merits of various possible measures, we performed experiments, comparing ﬁve diﬀerent quality measures, on six diﬀerent images, using four diﬀerent resamplers, over a range of rotations from zero to forty-ﬁve degrees. The ﬁrst quality measure was root mean square diﬀerence between a doubly rotated image and the original; it is a valid measure here because there is no scale change in the transform. The other four were calculated from the central portion of a single, resampled image. These were determined as follows: Three were measures of image entropy. One was the entropy itself, deﬁned as: H=− N −1 pi lnpi i=0 where pi is the probability that a pixel will have intensity i and N is the number of intensity levels. Entropy was chosen because it is often proposed as a measure of the amount of information in an image. It is debatable how well this information measure relates to image quality. What it does measure is the minimum number of bits per pixel required to encode the image, with no relationships between adjacent pixels taken into consideration. 224 CHAPTER 7. TRANSFORMS & RESAMPLERS Figure 7.11: The six images used in the quality measure experiment. See text for a description of the images. The other two entropy measures do take adjacent pixel values into consideration. One of these was the entropy of the diﬀerences between a pixel’s intensity and each of its four-connected neighbours’ intensities. This we termed the ‘diﬀerence entropy’. The third entropy measure was second order entropy which uses the probability that a pixel will be a certain intensity given the intensity of one of its four-connected neighbours. The hypothesis in choosing these measures is that it is not so much the absolute intensity of each pixel that is important in image quality, but the relative intensities; the eye being more sensitive to intensity diﬀerences (for example: edges), than to absolute intensities. The ﬁfth potential quality measure that was tested was developed based on the same hypothesis. The concept behind this is to take the weighted root mean square of the diﬀerence between each pixel and every other pixel: : ; ; (fi − fj )2 × wi,j ; ; i j=i I=; ; < wi,j i j=i The weights, wi,j , were set to be the inverse square of the distance between the two pixels. The measure we actually implemented in our experiment is an approximation to this, which considers only a pixel’s eight-connected neighbours. A low value of this measure should indicate a smooth image and a high value a heavily corrugated one. Figure 7.11 shows the six images used in the experiment. Images 1 and 6 have only two intensity levels (a dark and a light grey, not black and white) and hence very low entropies. Images 2 and 3 are extracts from natural images, one of a log cabin and one of a woman with some party decorations in the background. Image 4 is extracted from a piece of computer art with many grey levels and some patterned areas. Image 5 is a computer generated sine wave, sampled very close to its Nyquist limit. It is thus a pathological case for which we would expect any resampler to fail except one using sinc reconstruction. Four resamplers were tested, all using single point sampling. The reconstructors involved were nearest-neighbour, linear interpolation, Catmull-Rom cubic interpolation, and FFT/Catmull-Rom interpolation, which approximates sinc reconstruction. The results of the experiments are graphed in appendix C. 7.3. EVALUATION OF RECONSTRUCTORS AND RESAMPLERS 225 The visual evidence indicates that the Catmull-Rom and FFT methods give similar results, both producing acceptable resampled (rotated) images, the linear interpolant often gives blurry results and the nearest-neighbour gives results which usually contain obvious jaggy artifacts, most noticeable along edges. 226 CHAPTER 7. TRANSFORMS & RESAMPLERS Analysis of the results in appendix C gives some interesting information about these resamplers and the images they produce: 1. Apart from pathological image 5, the RMS diﬀerence between original and doubly resampled images remains almost constant for the linear, Catmull-Rom, and FFT methods for all images at all rotation angles. This indicates that there is something inherent in these methods which causes the diﬀerences, rather than an angle-dependent factor. For nearest-neighbour the RMS diﬀerence climbs as the angle increases. This indicates that, as the angle increases, more and more pixels are being lost or moved to the wrong location by this method. 2. The nearest-neighbour method preserves all the entropy measures, some extremely precisely, even in cases where the other methods produce widely diﬀerent values. This is presumably due to the fact that nearest-neighbour does not produce any new (interpolated) pixel values, merely moves the old ones to new locations, and adjacent pixels in the original image usually remain adjacent in the resampled image. 3. The other three methods increase the entropy of the image. Image 1 is an interesting case here. In order to produce a visually acceptable resampled image, the resamplers produce an image with far more entropy than the original. The Catmull-Rom and FFT methods depend on the side lobe sharpening eﬀect to produce sharp edges, requiring a far larger range of pixel values than the original which simply had sharp edges. Two very diﬀerent ways of producing the same visual result. This data implies that more ‘information’ is required to represent a resampled image than is required to represent the original. This is not an intuitive result as no resampling process can increase the amount of real information in an image, in fact they tend to lose information. This data could thus mean that a resampler compensates for this lost information by producing an image which generates the correct visual stimulus in a more complex way than the original. 4. The other two entropy measures also increase with the linear, Catmull-Rom, and FFT methods. The exceptions to this are the two natural images (2 and 3) with linear interpolation, where the entropy reduces. This may be due to linear interpolation’s tendency to blur (smooth) an image, reducing the diﬀerences between adjacent pixels, but, if this is so, why is the same eﬀect not apparent in the artiﬁcial images? 5. Of all the entropy measures, the FFT result is always higher than the linear result except in the case of image 5’s second order entropy and the Catmull-Rom result is higher than linear in all but image 3’s entropy, where they are the same. The FFT and Catmull-Rom methods are not as consistently ordered: in images 2, 3, 4 and 6 the FFT value is always higher than Catmull-Rom’s. In images 1 and 5 it is not. Quite how this relates to subjective image quality has yet to be tested. We suspect that there is a relation but a probabilistic rather than a consistent one. 6. The ﬁnal ‘smoothness’ measure gives interesting insights. The nearest-neighbour method produces a less smooth (higher value) or similarly smooth image to the original. Linear and Catmull-Rom give smoother images, perhaps indicating that they blur the original in the resampling. Linear consistently produces a smoother image than Catmull-Rom. Catmull-Rom, in turn, consistently produces a smoother image than the FFT method. The most surprising result is that the FFT method gives almost exactly the same value for the resampled image as for the original in three of the six images. The consistent ordering provided by this measure, except for the nearest neighbour case, gives some hope that it may be a step toward a useful quality measure. Further work is required to conﬁrm or contradict this view. While these results are interesting, they have not given us a conclusive quality measure. It may yet prove possible to ﬁnd a good univariate measure. 7.4. SUMMARY 7.4 227 Summary In a resampler, it is the reconstructor and sampler which cause degradation in the image. For transforms which minify the image, good sampling is required; for transforms which magnify, good reconstruction is required; and for same-scale transforms, the quality of both reconstructor and sampler have little eﬀect on the overall quality of the resampler, except that skeleton, delta function, or nearest-neighbour reconstruction cause obvious degradation. If the required transform is known before the original image is captured, then it should be captured slightly larger than the ﬁnal image will be in order to ensure that no magniﬁcation is required. This will maximise the information in the resampled image. When subjectively evaluating visual reconstructor quality, the image should be magniﬁed using that reconstructor, before displaying it on a display device. This minimises the eﬀect of the sampler and the display on the displayed intensity surface. Subjective image quality is diﬃcult to evaluate mathematically, especially as a univariate measure is required for measurements on any resampler which involves a scale change. We experimented with four univariate measures, and commented on our results. However, we could draw no deﬁnite conclusions. Further work is required on these measures, especially on the last ‘smoothness’ measure, in order to discover their true utility. Chapter 8 Summary & Conclusions We have examined many facets of digital image resampling, concentrating on the reconstruction sub-process. In this ﬁnal chapter we summarise the main results and conclusions, brieﬂy mention other results, and end with three suggestions for future research. 8.1 Major results In the area of piecewise local polynomials we have shown that zero-phase, piecewise local quadratics can be derived. Of the two that we did derive, one gives practically identical visual quality to the Catmull-Rom cubic interpolant, which is the best all-round piecewise local cubic interpolant. This quadratic can be evaluated in about 60% of the time of the cubic, giving a signiﬁcant time saving in situations where we require the Catmull-Rom’s visual quality. Moving to non-local reconstructors, we have derived the kernels of the interpolating cubic B-spline and interpolating Bezier cubics, and shown them to be the same. This kernel has allowed us to create an eﬃcient way of extending the one-dimensional cubic B-spline to a two-dimensional reconstructor. For edge extension, we concluded that replication or windowed extrapolation are the best methods, with constant extension the fastest. With sinc reconstruction: constant edge extension, or windowed extrapolation, are the only practicable edge extension methods, giving an O(N 4 ) resampling time for an N × N image. However, FFT reconstruction reduces this to O(N 2 log N ), with replication edge extension. We have provided solutions to two major problems with FFT reconstruction: (1) Edge eﬀects, for which four solutions were presented. The fourth, DCT reconstruction, uses reﬂection edge extension and paves the way for us to show when and how a general transform can be used for image reconstruction. (2) Time and storage requirements, for which a tiling solution was developed. The tiled version greatly reduces the memory required for performing the transform, and, when the untiled version must be executed using virtual memory, the tiled version can reduce the time requirement as well. The fact that the tiled version gives very similar visual results to the untiled version gives weight to the hypothesis that local, rather than global information is important in image reconstruction. In a priori knowledge reconstruction, we considered one particular set of one-dimensional intensity curves, coupled with one particular type of sampling. Here we proved that, given certain conditions on the original intensity curve, it can be exactly reconstructed from its exact area samples. These intensity curves cannot be exactly reconstructed using classical sampling theory. For the twodimensional case the evidence strongly suggests that a similar proof can be found. Finally, drawing the strands together, we show that the quality of the resampler is not dependent solely on the quality of its reconstructor, but on all three sub-processes. We have performed 228 8.2. OTHER RESULTS 229 some experimental work on measuring the quality of an image, which is a step in the search for a univariate quality measure for images. 8.2 Other results An analysis of the factors involved in designing piecewise local polynomials has been given, showing that it is very diﬃcult to include all of the desirable factors in a single piecewise local polynomial reconstructor. An analysis of several inﬁnite reconstructors has shown that all decay exponentially, except sinc, allowing them to be truncated locally with negligible loss of quality. Sinc can be windowed locally, greatly changing its nature, or non-locally, which does not solve the problem of undesirable ripples being generated near edges in non-bandlimited images. We have proven the result that no ﬁnite image can be bandlimited. The three-part decomposition developed in chapter 1 has underpinned all of the other work in this dissertation. We believe it has shown its utility throughout all this work. 8.3 Future work We could extend the work on piecewise local polynomials to ﬁfth and higher orders, to ascertain whether such extension is worthwhile. Our opinion is that there is probably little to gain from such extension, and that, to achieve better reconstruction, methods other that piecewise local polynomials must be considered. The work on the two-dimensional sharp edge reconstructor in section 6.4 has stopped short of a working implementation. Development of such an implementation, following the outline given in section 6.4, is a necessary piece of future work. Finally, more analysis of the quality measures examined in section 7.3.4 is required. Special attention should be paid to the ‘smoothness’ measure to ascertain its utility. Appendix A Literature Analysis Many resamplers have been considered over the years. This appendix contains a table of the resamplers considered in many of the papers in the ﬁeld. Each set of resamplers is decomposed into its three constituent parts, as described in chapter 1 (ﬁgure 1.14). We have deliberately omitted the books and surveys on the subject as these consider a very large number of resamplers and draw heavily on the papers listed here. Speciﬁcally omitted are Heckbert [1986a], Heckbert [1989], Wolberg [1990] and Foley et al [1990]. 230 Reference Andrews and Patterson [1976] Bier and Sloan [1986] Blinn [1989b] Boult and Wolberg [1992] Braccini and Marino [1980] Catmull and Smith [1980] Crow [1984] Reconstructors Nearest-neighbour; linear; approximating quadratic B-spline; approximating cubic B-spline; four-piece piecewise linear. Don’t say. Nearest-neighbour; linear; Gaussian; sinc; Lanczos windowed sinc. Linear; interpolating cubic B-spline; Catmull-Rom; cubic interpolant (a = −1); quadratic APK reconstructor based on rectangular and Gaussian area samplers; two piece cubic APK reconstructor based on rectangular area sampler. Nearest-neighbour. Don’t say (but see e.g. Smith [1981, 1982]). Linear. Transforms Scaling ×16. Further experiments with nearest-neighbour and linear reconstruction only: scaling ×8, ×4, ×2 and ×1. Mapping a two-dimensional image onto a three-dimensional surface via an intermediate simple three-dimensional surface (plane, box, cylinder, or sphere). None (frequency domain analysis of reconstructors). Scaling by 8. Rotation; scaling; general aﬃne; non-aﬃne; four-point transformations. Two-pass, one-dimensional rotation; rational linear; bilinear. Arbitrary. Samplers Single point. Don’t say. None (frequency domain analysis of reconstructors). Single point. Single point. Don’t say (but see e.g. Smith [1981, 1982]). Exact area sampling over a rectangle in the original image space, this area based on the size and position of the ﬁnal pixel inverse mapped into original image space. Reference Durand [1989] Durand and Faguy [1990] Fant [1986] Feibush, Levoy and Cook [1980] Fournier and Fiume [1988] Fraser [1987, 1989a, 1989b] Fraser, Schowengerdt and Briggs [1985] Reconstructors Nearest-neighbour on an abutting circular pixel grid, zero between the circles. Nearest-neighbour; quadratic and cubic B-splines. Nearest-neighbour; nearest-neighbour/linear crossbred. Linear. Rotation. Samplers Exact area on an abutting circular pixel grid. Scaling by a rational factor. Exact area. Arbitrary 1D transforms in multi-pass operations. Perspective. Exact area. Piecewise polynomials (and arbitrary reconstructors). Conformal; perspective. FFT/Catmull-Rom with and without windowing; Catmull-Rom; cubic interpolation with a = 0 and a = −1; linear. Nearest-neighbour; Catmull-Rom. Rotation (one pass 2D and two pass 1D). Piecewise polynomial approximations to various sampling ﬁlters; speciﬁcally Gaussian, sinc and diﬀerence of Gaussian. Single point; sinc ﬁltering. Frederick and Schwartz [1990] see Williams [1983]. Glassner [1986] Nearest-neighbour. Transforms Two-pass, one-dimensional, separable operations: speciﬁcally rotation and third-order polynomial warping. Also the transpose operation required between the two passes. Conformal mapping. Either (a) directly (if the mapping is invertible), or (b) approximately via triangulation and two-dimensional linear interpolation over the triangles. Arbitrary (none speciﬁcally analysed). Single point, exact area, Gaussian. Single point. see Williams [1983]. An approximation to exact area sampling. Reference Goshtasby [1989] Reconstructors Nearest-neighbour. Greene and Heckbert [1986] Hammerin and Danielsson [1991] Hanrahan [1990] Gaussian. Heckbert [1986b] Linear. Herbin et al [1988] Hou and Andrews [1978] see Venot et al [1989] Cubic B-splines. Also brieﬂy mentions nearest-neighbour, linear, quadratic B-splines, polynomial interpolation and sinc interpolation. Compares nearest-neighbour, linear, interpolating cubic B-spline and approximating cubic B-spline for magniﬁcation. Compares cubic B-spline with sinc for miniﬁcation. Linear; 2-, 4- and 8-point cubic splines. Catmull-Rom cubic interpolation (but discusses the theory of perfect reconstruction). Transforms Correction of image deformation using a Bezier patch method (local scaling is ×0.97–×1.03 (±3%), thus little change in scale; shearing is also involved but no transformation is of high magnitude). Perspective. One-pass (2D), two-pass (1D) and three-pass (1D) rotation. Three-dimensional aﬃne transformations (but discusses two-dimensional and more complex three-dimensional transformations). Arbitrary. Scaling ×r, r ∈ R (in software). Scaling × M N , M, N ∈ Z (in hardware). Samplers Single point. Gaussian. Single point Single point (but discusses pre-ﬁltering). Various area samplers over a rectangle in the original image space, this area based on the size and position of the ﬁnal pixel inverse mapped into original image space. Single point. Reference Keys [1981] Lee [1983] Maeland [1988] Maisel and Morden [1981] Mitchell and Netravali [1988] Namane and Sid-Ahmed [1990] Oakley and Cunningham [1990] Reconstructors Linear and Catmull-Rom (also brieﬂy looks at nearest-neighbour; sinc; six-point cubic and piecewise cubic Lagrange interpolant). Linear; interpolating cubic B-spline and restoring cubic B-spline (also discusses nearest-neighbour, Catmull-Rom, and sinc). Interpolating cubic B-spline; linear; cubic convolution with a = 0 (two point cubic), a = − 12 (Catmull-Rom), a = − 43 (also mentions a = −1). Sinc. Two parameter family of cubic reconstructors; 30 unit wide windowed sinc. Find the border pixels of the character, scale these pixels, interpolate between the scaled points and ﬁll in the character. A matched APK reconstructor based on a Gaussian area sampler; compared against Catmull-Rom cubic and nearest-neighbour. Transforms 1 Scaling ×5 15 32 (horizontally), ×5 4 (vertically) for linear and Catmull-Rom reconstruction. Samplers Single point. Scaling ×z, only integral values of z are used by Lee but the method is not restricted to integers. Single point. Arbitrary — no speciﬁc transformation discussed. Single point. Two pass 1D rotation (frequency domain analysis). Scaling. Single point. Scaling (note that the reconstructor and transform are intimately tied together. All processing is in the discrete domain, no continuous intensity surface is produced in this method). Scaling. Single point (note that this processing is performed on a binary image and produces a binary image). Single point. Single point. Reference Oka, Tsutsui, Ohba, Kurauchi and Tago [1987] Paeth [1986, 1990] Reconstructors Nearest-neighbour. Transforms Block aﬃne approximation to general transformation. Nearest-neighbour. Three pass 1D rotation. Park and Schowengerdt [1982] Park and Schowengerdt [1983] Nearest-neighbour; linear; cubic with a = −1; sinc. Nearest-neighbour; linear; cubic interpolating piecewise polynomials (including Catmull-Rom); sinc. Nearest-neighbour, linear, approximating cubic B-spline, cubic convolution with a = − 21 (Catmull-Rom) and a = −1 (also mentions a = − 34 ). See Schreiber and Troxel [1985]. Mitchell and Netravali’s [1988] two parameter family of cubics. Digital ﬁlters: Lagrange interpolation; linear interpolation; optimum FIR interpolation. Nearest-neighbour; linear; truncated Gaussian; truncated sharpened Gaussian; truncated sinc; approximating cubic B-spline; two types of sharpened cubic B-spline. FFT; FFT with ARMA algorithm; nearest-neighbour. None (frequency domain analysis of reconstructors). None (frequency domain analysis of reconstructors). Parker, Kenyon and Troxel [1983] Ratzel [1980] Reichenbach and Park [1982] Schafer and Rabiner [1973] Schreiber and Troxel [1985] (based on Ratzel [1980]) Smit, Smith and Nichols [1990] Samplers Minimum enclosing rectangle of pixel preimage (approximating exact area area sampling). Approximation to exact area (both 1D and 1 12 D); triangle and higher order sampling ﬁlters. None (frequency domain analysis of reconstructors). None (frequency domain analysis of reconstructors). General two-dimensional transformations. Speciﬁc examples used are rotation and subpixel translation. Applied to image registration. Single point. None (frequency domain analysis of reconstructors). Scaling × M N , M, N ∈ Z. None (frequency domain analysis of reconstructors). Single point. Scaling ×5 (also scaling by other factors to ﬁnd best sharpened Gaussian preﬁlter and best sharpened Gaussian reconstructor). Single point. Scaling ×2k , k ∈ Z (with potential to truncate result to a sub-image). Single point. Reference Smith [1981] Smith [1982] Smith [1987] Reconstructors Sinc. Catmull-Rom; approximating cubic B-spline; various windowed sincs. Not stated (but see e.g. Smith [1981, 1982]). Trussell et al [1988] Inverse sinc. Turkowski [1990] Nearest-neighbour; linear; two Gaussians; two Lanczos windowed sincs. Don’t say. Venot et al [1988], Herbin et al [1989] Ward and Cok [1989] Linear; Catmull-Rom (with and without coeﬃcient bins); Hamming windowed sinc. Watson [1986] Weiman [1980] FFT Nearest-neighbour. Transforms Scaling. Scaling and arbitrary. Two-pass; one-dimensional superquadric; bi-cubic and bi-quadratic warps. Correction of one-dimensional sample position errors. Scaling ×2, 3, 4; × 12 , 13 , 14 . Registration of two-dimensional images. Translation, rotation, scaling, included in a general transform which also allows for grey-scale manipulation. Scaling × 32 , ×2 and ×4 (all three reconstructors). Non-seperable and seperable rotation (Catmull-Rom only). Scaling. Rotation, scaling, shearing by multi-pass 1D algorithms. Samplers Single point; sinc ﬁltered sampling. None (discussion of reconstruction and transforms only). Not stated (but see e.g. Smith [1981, 1982]). Single point. Single point. Don’t say. Single point. FFT Exact area. Reference Williams [1983] Wolberg and Boult [1989] Reconstructors A three-dimensional intensity volume is produced. It is generated by linear interpolation between two-dimensional intensity planes. Each plane is generated by two-dimensional linear interpolation between sample points created by a two-dimensional nearest-neighbour/exact area resampling (miniﬁcation by 2n , n ∈ {0, 1, 2, 3, . . .}) of the original image. see Fant [1986]. Transforms General. Any transformation although ones with highly non-symmetrix (non-square) pixel preimages do not work well. Samplers Single point sampling into the three-dimensional volume using spatial (x and y) coordinates and a “depth” coordinated wich is related to the size of the destination pixel mapped onto the original image. Arbitrary two-pass 1D transformations speciﬁed by spatial lookup tables. see Fant [1986]. Appendix B Proofs B.1 The inverse Fourier transform of a box function (section 2.4.1) s(x) ∞ = Box(ν)ei2πνx dν −∞ νb = = = = = = = ei2πνx dν −νb + νb + 1 i2πνx + e + i2π x −νb ! 1 ei2πνb x − e−i2πνb x i2π x 1 (cos(2πνb x) + i sin(2πνb x) − (cos(2πνb x) − i sin(2πνb x))) i2π x 1 (2i sin(2πνb x)) i2π x sin(2πνb x) πx 2νb sinc(2νb x) sinc(x) is deﬁned in the literature as either sinc(x) = we will stick with the latter. B.2 sin(x) x or as sinc(x) = sin(πx) πx . The Fourier transform of the sinc function (section 5.1) The sinc function is deﬁned as: h(x) = N sin πN x , N >0 πN x It’s Fourier transform is: ∞ H(ν) = h(x)e−i2πνx dx −∞ 238 In this thesis B.3. DERIVATION OF A = −2 + = = = √ 3 239 ∞ sin(πN x) [cos(2πνx) − i sin(2πνx)] πN x −∞ ∞ 1 i ∞ sin(πN x) sin(2πνx) sin(πN x) cos(2πνx) dx − dx π −∞ x π −∞ x 1 i A(x, ν) − B(x, ν) π π N Let y = πN x. Thus: x = y πN dy dx and = 1 πN . A(x, ν) ∞ = −∞ ∞ = −∞ sin(πN x) cos(2πνx) dx x sin(y) cos(2y/N ) dy y sin is an odd function, y1 is an odd function, and cos is an even function, therefore the function that is being integrated is even, and so: ∞ sin(y) cos(2y/N ) dy A(x, ν) = 2 y 0 + 2ν + + + 0, + N + > 1 π + 2ν + = 2, +N + = 1 + π, + 2ν N <1 [Dwight 858.9] B(x, ν) = = ∞ sin(πN x) sin(2πνx) dx x −∞ ∞ 1 1 ∞ cos(π(N + 2ν)x) cos(π(N − 2ν)x) dx − dx 2 −∞ x 2 −∞ x [Dwight 401.07] 1 x Now, cos is an even function and is an odd function, so the equation inside each of the integrals 0 ∞ is odd, meaning that each integral equates to zero ( −∞ cosxkx dx = − 0 cosxkx dx). Thus: B(x, ν) = 0 and therefore: B.3 1, |ν| < 1 H(ν) = 2 , |ν| = 0, |ν| > Derivation of a = −2 + √ 3 (section 4.5.1) From Rasala [1990] we know that: ak = − fm−k fm where fi = −4fi−1 − fi−2 , f0 = 1, f1 = −3 or − 4. N 2 N 2 N 2 240 APPENDIX B. PROOFS For i very large: fi−1 (−4fi−1 − fi−2 ) 1 = − 4 + ffi−2 i−1 fi−1 fi = = − 1 4− 1 f 4+ fi−3 i−2 = − " From this we can see that: lim i→∞ 1 4− 1 4− fi−1 fi−2 − fi fi−1 1 4− 1 .. . # =0 and so, lim a1 m→∞ = = = = lim − m→∞ lim m→∞ lim m→∞ fm−1 fm 1 4− fm−2 fm−1 1 4− fm−1 fm 1 4 − lim a1 m→∞ which implies that the limit of a1 as m goes to inﬁnity is: a1 = 2 ± √ 3. Now, the value of a2 as m goes to inﬁnity can be deﬁned in terms of this limit of a1 : a2 fm−2 fm fm−2 = a1 × fm−1 = −a21 (m → ∞) = − A similar argument for the general case as m goes to inﬁnity is: an fm−n fm fm−1 fm−2 fm−n = − × ··· × fm fm−1 fm−n+1 = −(−a1 )n = −(a)n (m → ∞) = − √ where a = −a1 = −2 ∓ 3. √ √ The case a = −2 − 3 gives a divergent series of ai values, while the case a = −2 + 3 gives a converging series. Thus: √ a = −2 + 3 B.4. THE STRAIGHTNESS CRITERIA B.4 241 The Straightness Criteria (Section 3.5.3) For ﬁrst order continuity: Given m sample points, fα−a , fα−a+1 , . . . , fα , . . . , fα+b , all having the same value, fα , we want the reconstructed segment dependent on these m values to be f (x) = fα . Thus: fx b n−1 = fα ki,j tj i=−a j=0 = fα , ∀t ∈ T [0, 1), = [− 12 , 12 ), T where: α = x − x0 ∆x m even m odd x − xα . ∆x and t = This implies that: b n−1 ki,j tj = 1 i=−a j=0 which gives us the result that: b i=−a b ki,j = 0, ∀j ∈ {1, 2, . . . , n − 1} ki,0 = 1 i=−a which is equation 3.13. For second order continuity: Given m sample points, fα−a , fα−a+1 , . . . , fα , . . . , fα+b , having value fα+i = fα + iδ, we want the reconstructed segment dependent on these m values to be f (x) = fα + tδ. Thus: f (x) = b n−1 ki,j tj (fα + iδ) i=−a j=0 = fα + tδ, ∀t ∈ T [0, 1), m even = [− 12 , 12 ), m odd T x − x0 where: α = ∆x and t = x − xα , as before. ∆x This implies that: fα + tδ = fα b n−1 ki,j tj + δ i=−a j=0 b n−1 iki,j tj i=−a j=0 and so: fα = fα b n−1 i=−a j=0 ki,j tj (B.1) 242 APPENDIX B. PROOFS tδ b n−1 = δ iki,j tj (B.2) i=−a j=0 Equation B.1 is the same as for ﬁrst order straightness, and so equation 3.13 holds for second order straightness as well. Equation B.2 reduces to the following results: b i=−a b iki,j = 0, ∀j ∈ {0, 2, 3, . . . , n − 1} iki,1 = 1 i=−a which is equation 3.14. B.5 Proof that a signal cannot be simultaneously of ﬁnite extent in both the frequency domain and the spatial domain (section 4.3) Assume that f (x) and F (ν) are a Fourier transform pair. Further assume that f (x) is real, of ﬁnite extent, and inﬁnitely piecewise diﬀerentiable: f (x) ∈ R f (x) = 0, |x| > xK and that F (ν) is bandlimited: F (ν) = 0, |ν| > νN F (ν) ∞ = f (x)e−i2πνx dx −∞ xK = f (x)e−i2πνx dx −xK = = = = = ++x xK 1 f (x)e−i2πνx ++ K − f (x)e−i2πνx dx −i2πν +−xK (−i2πν) −xK + + + [Dwight, 79] ++x xK f (x)e f (x)e−i2πνx ++ K 1 − + f (x)e−i2πνx dx 2 + 2 −i2πν (−i2πν) (−i2πν) −xK −xK −xK #+xK " + f (x) f (x) (x) (x) f f eiax − + − + · · · ++ , a = −2πν ia (ia)2 (ia)3 (ia)4 −xK " #+xK + f (x) f (x) f (x) f (x) + (cos(ax) + i sin(ax)) − + − + · · · + 2 3 4 (ia) (ia) (ia) (ia) −xK . /+ +xK f (x) f (x) cos(ax) a2 − a4 + · · · + . /+xK−xK + f (x) + sin(ax) f (x) − + · · · + a a3 −x . /+KxK + f (x) + i cos(ax) − f (x) + − · · · + a a3 −x . /+ xK K + f (x) + i sin(ax) f a(x) − + · · · + 2 a4 −i2πνx ++xK −xK B.5. A SIGNAL CANNOT BE BANDLIMITED IN BOTH DOMAINS x = + + + = 243 K cos(ax)φ(a, x)|−x xK K sin(ax)χ(a, x)|−xK xK i cos(ax)(−χ(a, x))|−x K xK i sin(ax)φ(a, x)|−xK cos(axK )(φ(a, xK ) − φ(a, −xK )) + sin(axK )(χ(a, xK ) + χ(a, −xK )) + i cos(axK )(−χ(a, xK ) + χ(a, −xK )) + i sin(axK )(φ(a, xK ) + φ(a, −xK )) " Where: φ(a, x) = f (x) f (x) − + ··· a2 a4 " and: χ(a, x) = (B.3) f (x) f (x) − + ··· a a3 # # Now, if we set equation B.3 to be zero outside the bandlimit of F (ν) (as we assumed) and remember that f (x) is a real function, then we ﬁnd that: φ(a, xK ) φ(a, xK ) = −φ(a, −xK ), |a| > νN /2π = φ(a, −xK ), |a| > νN /2π χ(a, xK ) χ(a, xK ) = −χ(a, −xK ), |a| > νN /2π = χ(a, −xK ), |a| > νN /2π These four equations show that: φ(a, xK ) = 0, |a| > νN /2π φ(a, −xK ) χ(a, xK ) = = 0, |a| > νN /2π 0, |a| > νN /2π χ(a, −xK ) = 0, |a| > νN /2π (B.4) Taking equation B.4 as an example: " φ(a, xK ) = = f (xK ) f (xK ) f (V ) (xK ) − + − ··· a2 a4 a6 0, |a| > νN /2π # which gives the following equalities: f (xK ) = 0, f (xK ) = 0, f (V ) (xK ) = 0 .. . This gives us the result: φ(a, xK ) = 0, ∀a A similar argument holds for φ(a, −xK ), χ(a, xK ) and χ(a, −xK ). From these and equation B.3 it can be seen that: F (ν) = 0, ∀ν which implies that: f (x) = 0, ∀x 244 APPENDIX B. PROOFS So there exists only one trivial case in which the original assumption holds, that is f (x) = 0 and F (ν) = 0. Therefore the Fourier transform of a real, ﬁnite-extent function, f (x) cannot be bandlimited, unless f (x) = 0. Further, the Fourier transform of a bandlimited function, F (ν) cannot be a real, ﬁnite-extent function, unless F (ν) = 0. B.6 The Fourier transform of a ﬁnite-extent comb (section 4.3) Let the ﬁnite extent, one-dimensional comb function, d(x), be deﬁned as: d(x) = b δ(x − j∆x) j=−b Its Fourier transform is: D(ν) b ∞ = −∞ j=−b = b δ(x − j∆x)e−i2πνx dx e−i2πνj∆x j=−b = b cos 2πνj∆x − i sin 2πνj∆x j=−b = cos 0 + 2 b cos 2πνj∆x [sin odd, cos even] j=1 = 1+2 b cos 2πνj∆x j=1 cos(b + 1)πν∆x sin bπν∆x [Dwight 420.2] sin πν∆x sin((2b + 1)πν∆x) sin(−πν∆x) + [Dwight 401.05] = 1+ sin πν∆x sin πν∆x sin((2b + 1)πν∆x) −1 = 1+ sin πν∆x sin((2b + 1)πν∆x) = sin πν∆x = 1+2 In two-dimensions: d(x, y) = d(x) ∗ d(y) = bx by δ(x − j∆x, y − k∆y) j=−bx k=−by Implying that the two-dimensional Fourier transform is: D(νx , νy ) = D(νx ) × D(νy ) sin((2bx + 1)πνx ∆x) sin((2by + 1)πνy ∆y) × = sin πνx ∆x sin πνy ∆y B.7. THE NOT-A-KNOT END-CONDITION FOR THE B-SPLINE B.7 245 The not-a-knot end-condition for the B-spline (section 4.5) The cubic B-spline function (equation 4.11) is: 2 3 f (x) = ki,j tj gα+i i=−1 j=0 The third derivative of this is: f (x) = 2 6 ki,3 gα+i i=−1 = −gα−1 + 3gα − 3gα+1 + gα+2 The not-a-knot condition requires f (x) to be continuous at x1 and xn−2 . At x1 : −g−1 + 3g0 − 3g1 + g2 ⇒ g−1 − 4g0 + 6g1 − 4g2 + g3 = −g0 + 3g1 − 3g2 + g3 = 0 (B.5) We know that (equation 4.12): 1 (gi−1 + 4gi + gi+1 ) 6 For f0 we can write 6f0 = g−1 + 4g0 + g1 . Subtracting this from equation B.5 gives: fi = −8g0 + 5g1 − 4g2 + g3 = −6f0 (B.6) For f2 we can write 6f2 = g1 + 4g2 + g3 . Subtracting this from equation B.6 produces: −8g0 + 4g1 − 8g2 = −6f0 − 6f2 (B.7) And for f1 we can write 48f1 = 8g0 + 32g1 + 8g2 . Adding this to equation B.7 gives: 36g1 = 48f1 − 6f0 − 6f2 (B.8) Making equation B.8 a function of g1 gives us: g1 = 1 (8f1 − f0 − f2 ) 6 QED A similar derivation holds for gn−2 . B.8 Timing calculations for the B-spline kernel (section 4.5.4) In the ﬁrst method we need to sum 113 terms (assuming we adopt the diamond shaped kernel discussed on page 139). Each term is of the form: Vi Wj fx−i,y−j Where 15 Vi and 15 Wj values must be precalculated as follows: Vi Wi = vi,3 r3 + vi,2 r2 + vi,1 r1 + vi,0 = wj,3 t3 + wj,2 t2 + wj,1 t1 + wj,0 246 APPENDIX B. PROOFS Operation Multiplies Adds Table lookups Function value lookups Method 1 320 202 120 113 Method 2 113 112 113 113 M1/M2 3× 2× 1× 1× Table B.1: Timing calculations for the two ways of implementing the interpolating cubic B-spline. where i and j range from -7 to 7, vi,n and wj,n are retrieved from a table, and the rn and tn terms are precalculated. From this we ﬁnd that we require 320 multiples, 202 adds, 120 table lookups and 113 function value lookups for each intensity surface value calculation. Using the second method, each coeﬃcient is calculated by summing 113 products of a table lookup (a|i|+|j| ) and a function value lookup (fx−i,y−j ). This makes the requirement 113 multiples, 112 adds, 113 table lookups and 113 function value lookups, per coeﬃcient. These values are laid out in table B.1, from which we can conclude that the ﬁrst method’s calculation is 2 to 3 times as long as the second method’s, thus 2c < k1 < 3c. B.9 The error in edge position due to quantisation section 6.3.5 The edge’s position is given by: xm = c + ĝc+1 − ĝc ĝc+1 − ĝc−1 (equation 6.7). Without loss of generality, take c = 0: xm = ĝ1 − ĝ0 ĝ1 − ĝ−1 (B.9) With quantisation, the values of ĝ−1 , ĝ0 , and ĝ1 in equation B.9 are quantised: ĝ−1 ĝ0 = Q(gl ) = Q(gr + (gl − gr )) ĝ1 = Q(gr ) The quantisation function is: Q(g) = 1 (N − 1)g N −1 The error caused by this is ± 2(N1−1) , so: Q(g) = g ± 1 2(N − 1) (B.10) Substituting equation B.10 into equation B.9 gives: xm = gr ± 1 2(N −1) gr ± − (gr + (gl − gr ) ± 1 2(N −1) − gl ± 1 2(N −1) ) 1 2(N −1) B.9. THE ERROR IN EDGE POSITION DUE TO QUANTISATION = = 247 (gr − gl ) ± 1 (N −1) gr − gl ± (N 1−1) (∆g) ± (N 1−1) ∆g ± (N 1−1) (B.11) where: ∆g = gr − gl . If ∆g 1 N −1 then equation B.11 becomes: xm ≈ (∆g) ± 1 (N −1) ∆g 1 = ± ∆g(N − 1) 1 As xm is an approximation to , the error in xm is ± ∆g(N −1) provided ∆g 1 N −1 . Appendix C Graphs The graphs are unavailable in this online version of the report owing to the way in which they were produced. This appendix contains the results used in section 7.3.4. There are two sets of experimental results. The ﬁrst consists of three graphs showing the root mean square diﬀerence between an image and its doubly resampled image, for a range of magniﬁcation factors (each image is magniﬁed by this factor and then miniﬁed by the inverse of the factor). The three images used are reproduced in ﬁgure C.1. The second set of results consists of thirty graphs. There are ﬁve sets of graphs, one for each of the quality measures investigated. Each set contains six graphs, one for each of the six images involved in the experiments. These images are reproduced in ﬁgure 7.11. Note that the ﬁfth measures, called ‘smoothness’ in section 7.3.4 is here called ‘bumpiness’. Figure C.1: The three images used in the ﬁrst experiment. 248 Appendix D Cornsweet’s ﬁgures This appendix should have contained two photocopies of Cornsweet’s [1970] ﬁgures 11.2 and 11.3. Unfortunately, we were unable to get permission to copy these ﬁgures before the dissertation was bound. Instead we present approximations to Cornsweet’s ﬁgures, which attempt to illustrate the same eﬀect. The two images in each photograph (ﬁgures D.1 and D.3) are generated from diﬀerent intensity distributions (shown in ﬁgure D.2), but, in each case, they give approximately the same visual sensation. 249 250 APPENDIX D. CORNSWEET’S FIGURES Figure D.1: Two intensity curves which produce roughly the same response from the human visual system. The proﬁles of the edges in the two images are shown in ﬁgure D.2(a) and (b) (left and right images respectively). Note that, in the right hand image, the intensity at the centre of the stripe is the same as the intensity at the left and right edges of the image. (After Cornsweet [1970] ﬁg 11.2). (a) (c) (b) (d) Figure D.2: The intensity proﬁles of the edges (a) and (b) in ﬁgure D.1; (c) and (d) in ﬁgure D.3. 251 Figure D.3: Two intensity curves which produce roughly the same response from the human visual system. The proﬁles of the ‘edges’ in the two images are shown in ﬁgure D.2(c) and (d) (left and right images respectively). The left hand image has no edges, while the right has slowly changing edges one quarter and three quarters of the way across the image. Thus, in the right hand image, the intensity at the centre of the image is not the same as the intensity at the left and right edges, even thought it looks virtually the same. (After Cornsweet [1970] ﬁg 11.3). 252 APPENDIX D. CORNSWEET’S FIGURES Glossary Approximant a reconstructor that does not necessarily interpolate the image samples. Compare interpolant. Area sampler a sampler which uses intensity surface values over some area to provide a value for each pixel in the image. Compare point sampler. Decomposition dividing a process into sub-processes in order to better understand the process. In this dissertation decomposition is applied to the image resampling process, notably decomposing it into three sub-processes: reconstruction, transform, and sampling. Destination image the image that is output by a resampling process, also called resampled image or ﬁnal image. Filter in an area sampler there is an equivalence that allows us to represent the sampler as a ﬁlter combined with single point sampling, such that the ﬁlter is the function that is convolved with the intensity surface before single point sampling the ﬁltered surface. Some point samplers can also be represented in this way. Some reconstructors can be represented by ﬁlters, so that the reconstructed intensity surface that they produce is the convolution of the ﬁlter with the image. Also called kernel or ‘ﬁlter kernel’. Final image the image that is output by a resampling process, also called destination image or resampled image. Final intensity surface the intensity surface generated by transforming the original intensity surface. The ﬁnal image is generated from it by sampling. Hedgehog an omnivorous mammal renowned for its prickliness. Image in this dissertation ‘image’ almost exclusively refers to a discrete image. A continuous image is known as an intensity surface. Image samples the sample values that make up a discrete image. Also called pixels or pixel values. Intensity surface a continuous surface in three dimensions, with two spatial dimensions and one intensity dimension. There is a many-to-one mapping from two-dimensional spatial coordinates to intensity values. Can be extrapolated to an intensity volume (three spatial dimensions and one intensity dimension) or to a colour surface (where colour dimensions replace the single intensity dimension). Also known as ‘continuous image’. Interpolant a reconstructor that interpolates the image samples. Compare approximant. Kernel an abbreviation of ‘ﬁlter kernel’ (see ﬁlter). Original image the image that is input to the resampling process, also called source image. Original intensity surface the intensity surface generate from the original image by reconstruction. 253 254 Pixel (a) a single sample value in an image; (b) the theoretical area represented by such a sample value in an intensity surface which generates, or is generated by, that image. Also called ‘pel’. Point sampler a sampler which only uses intensity surface values at a countable set of points to provide a value for each pixel in the image. Compare area sampler. Reconstruction applying a reconstructor to an image to produce an intensity surface. Reconstructor a process which reconstructs a continuous intensity surface from a discrete image. Resample to take a discrete image and produce another image which appears to be a spatially transformed version of the ﬁrst. Resampled image the image that is output by a resampling process, also called ﬁnal image or destination image. Resampler a process which performs image resampling. Sampler a process which produces a discrete image from a continuous intensity surface. Sampling applying a sampler to an intensity surface to produce an image. Single point sampler a point sampler which uses a single intensity surface value to provide a value for each pixel. Source image the image that is input to a resampling process, also called original image. Sub-process a process that forms part of a larger process. Sub-processes are generated by decomposition of some process. Also called: a ‘part’ or ‘stage’ of the larger process. Transform a process which takes a continuous intensity surface and produces another, transformed, intensity surface from it. Warp a synonym for resample, used more in computer graphics work; whilst ‘resample’ is used more in signal processing work. Bibliography The reference style used in this dissertation follows these rules: 1. A reference is given by the author’s name, or authors’ names, followed by the date of publication and sometimes extra information as to the location within the quoted work of the particular item referenced. 2. If the context requires a direct reference to the paper then the name(s) are placed in the text, with the other material placed in square brackets immediately after the name(s). If an indirect reference is required then all the information appears inside square brackets. 3. A work with three or more authors should be referenced by all the authors’ names the ﬁrst time it is referred to. Subsequent references use the ﬁrst author’s name followed by “et al” in place of all the other names. This last rule has been bent in the case of Venot, Devaux, Herbin, Lebruchec, Dubertret, Raulo and Roucayrol [1988] and Herbin, Venot, Devaux, Walter, Lebruchec, Dubertret and Roucayrol [1989]. These are always referred to as Venot et al [1988] and Herbin et al [1989] respectively. To aid those working at the University of Cambridge, there is a reference mark at the end of many of the entries in this bibliography which shows where the referenced work may be found. Abbreviations used in these references are: CLL Computer Laboratory Library SPL Scientiﬁc Periodicals Library UL University Library Adobe, [1985a] PostScript Language Reference Manual, Addison-Wesley, Reading, Massachusetts, USA, ISBN 0–201–10174–2. Ahmed, N., Natarajan, T. and Rao, K.R. [1974] “Discrete cosine transform”, IEEE Transactions on Computers, Vol. C–23, No. 1, Jan, 1974, pp.90–93. CLL Andrews, H.C. and Patterson, C.L. (III) [1976] “Digital interpolation of discrete images”, IEEE Transactions on Computers, Vol. C–25, No. 2, Feb, 1976, pp.197–202. CLL Bier, E.A. and Slone, K.R.(Jr) [1986] “Two-part texture mappings”, IEEE Computer Graphics and Applications, Sept, 1986, pp.40–53. CLL Binkley, T. [1990] “Digital dilemmas”, LEONARDO, Digital Image—Digital Cinema Supplemental Issue, 1990, pp.13–19. CLL Bley, H. [1980] “Digitization and segmentation of circuit diagrams”, Proceedings of the First Scandanavian Comference on Image Analysis, Studentlitteratur, Lund, Sweden, 1980, pp.264– 269. UL 348:8.c.95.1213 255 256 Blinn, J.F. [1989a] “What we need around here is more aliasing”, IEEE Computer Graphics and Applications, Jan, 1989, pp.75–79. CLL Blinn, J.F. [1989b] “Return of the jaggy”, IEEE Computer Graphics and Applications, Mar, 1989, pp.82–89. CLL Blinn, J.F. [1990] “Wonderful world of video”, IEEE Computer Graphics and Applications, Vol. 10, No. 3, May, 1990, pp.83–87. SPL E0.2.96 Boult, T.E. and Wolberg, G. [1992] “Local image reconstruction and sub-pixel restoration algorithms”, CVGIP: Image Understanding, in press. Braccini, C. and Marino, G. [1980] “Fast geometrical manipulations of digital images”, Computer Graphics and Image Processing, Vol. 13, 1980, pp.127–141. SPL E0.2.13 Brewer, J.A. and Anderson, D.C. [1977] “Visual interaction with Overhauser curves and surfaces”, Computer Graphics, Vol. 11, No. 2, Summer, 1977, pp.132–137. Brown, E.F. [1969] “Television: the subjective eﬀects of ﬁlter ringing transients”, Journal of the Society of Motion Picture and Television Engineers (SMPTE), Vol. 78, No. 4, Apr, 1969, pp.249–255. SPL E0.2.24 Catmull, E. [1974] “Computer display of curved surfaces”, Proceedings of the Conference on Computer Graphics, Pattern Recognition and Data Structure, Los Angeles, California, May 14–16, 1974, IEEE, New York, 1975, IEEE Catalogue Number 75CHO981–1C, pp.11–17. Also reprinted as: Tutorial and Selected Readings in Interactive Computer Graphics, H. Freeman (ed), IEEE, 1980, IEEE Catalogue Number EHO 156–0, pp.309–315. Catmull, E. and Rom, R. [1974] “A class of local interpolating splines”, Computer Aided Geometric Design, R.E. Barnhill and R.F. Riesenfeld (eds.), Academic Press, New York, ISBN 0–12–079050–5, pp.317–326. Catmull, E. and Smith, A.R. [1980] “3-D transformations of images in scanline order”, Computer Graphics, Vol. 14, No. 3, 1980, pp.279–285. Chen, W-H. and Pratt, W.K. [1984] “Scene adaptive coder”, IEEE Transactions on Communications, Vol. COM–32, No. 3, Mar, 1984, pp.255–232. CLL Chiou, A.E.T. and Yeh, P. [1990] “Scaling and rotation of optical images using a ring cavity”, Applied Optics, Vol. 29, No. 11, 10th Apr, 1990, pp.1584–1586. SPL F1.2.3 Chu, K-C. [1990] “B3-splines for interactive curve and surface ﬁtting”, Computers and Graphics, Vol. 14, No. 2, 1990, pp.281–288. SPL E0.1.34 Cook, R.L. [1986] “Stochastic sampling in computer graphics”, ACM Transactions on Graphics, Vol. 5, No. 1, Jan, 1986, pp.51–72. CLL Cornsweet, T.N. [1970] Visual Perception, Academic Press, 1970, Library of Congress Catalogue Card No. 71–107570. Physiological Lab. Library C6.143A Crow, F.C. [1977] “The aliasing problem in computer-generated shaded images”, Communications of the ACM, Vol. 20, No. 11, Nov, 1977, pp.799–805 CLL Crow, F.C [1978] “The use of grayscale for improved raster display of vectors and characters”, Computer Graphics, Vol. 12, No. 3, Aug, 1978, pp.1–5. CLL Crow, F.C. [1984] “Summed-area tables for texture mapping”, Computer Graphics, Vol. 18, No. 3, Jul, 1984, pp.207–212. CLL 257 Cumani, A., Grattoni, P. and Guiducci, A. [1991] “An edge-based description of colour images”, CVGIP: Graphical Models and Image Processing, Vol. 53, No. 4, Jul, 1991, pp.313–323. CLL Data Beta [1992] Product Description; Data Beta Ltd, Unit 7, Chiltern Enterprise Centre, Theale, Berkshire, United Kingdom RG7 4AA. Durand, C.X. [1989] “Bit map transformations in computer 2-D animation”, Computers and Graphics, Vol. 13, No. 4, 1989, pp.433–440. SPL E0.1.34 Durand, C.X. and Faguy, D. [1990] “Rational zoom of bit maps using B-spline interpolation in computerized 2-D animation”, Computer Graphics Forum, Vol. 9, No. 1, Mar, 1990, pp.27– 37 CLL Dwight, H.B. [1934] Tables of Integrals and other Mathematical Data, The MacMillan Company, New York, 1934. CLL X14 Euclid [300BC] Elements. Fang, Z., Li, X. and Ni, L.M. [1989] “On the communication complexity of generalised 2-D convolution on array processors”, IEEE Transactions on Computers, Vol. 38, No. 2, Feb, 1989, pp.184–194. CLL Fant, K.M. [1986] “A non-aliasing, real-time spatial transform technique”, IEEE Computer Graphics and Applications, Jan, 1986, pp.71–80. SPL E0.2.96 Fiume, E.L. [1986] “A mathematical semantics and theory of raster graphics”, PhD thesis, technical report CSRI–185, Computer Systems Research Group, University of Toronto, Toronto, Canada M5S 1A1 CLL V106–669 Published, with minor additions, as Fiume [1989]. Fiume, E.L. [1989] The Mathematical Structure of Raster Graphics, Academic Press, San Diego, 1989, ISBN 0–12–257960–7. Based on Fiume [1986]. Feibush, E.A., Levoy, M. and Cook, R.L. [1980] “Synthetic texturing using digital ﬁlters”, Computer Graphics, Vol. 14, 1980, pp.294–301. Foley, J.D. and van Dam. A. [1982] Fundamentals of Interactive Computer Graphics, AddisonWesley, Reading, Massachusetts, USA, 1982, ISBN 0–201–14468–9. Foley, J.D., van Dam, A., Feiner, S.K. and Hughes, J.F. [1990] Computer Graphics: Principles and Practice (‘second edition’ of Foley and van Dam [1982]), Addison-Wesley, Reading, Massachusetts, USA, 1990, ISBN 0–201–12110–7. Fournier, A. and Fiume, E.L. [1988] “Constant-time ﬁltering with space-variant kernels”, Computer Graphics, Vol. 22, No. 4, Aug, 1988, pp.229–238. CLL Fournier, A. and Fussell, D. [1988] “On the power of the frame buﬀer”, ACM Transactions on Graphics, Vol. 7, No. 2, Apr, 1988, pp.103–128. CLL Fraser, D. [1987] “A conceptual image intensity surface and the sampling theorem”, Australian Computer Journal, Vol. 19, No. 3, Aug, 1987, pp.119–125. SPL E0.13.3 Fraser, D. [1989a] “Interpolation by the FFT revisited — an experimental investigation”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 37, No. 5, May, 1989, pp.665– 675. SPL E0.2.34 Fraser, D. [1989b] “Comparison at high spatial frequencies of two-pass and one-pass geometric transformation algorithms”, Computer Vision, Graphics and Image Processing, Vol. 46, No. 3, Jun, 1989, pp.267–283. SPL E0.2.34 258 Fraser, D., Schowengerdt, R.A. and Briggs, I. [1985] “Rectiﬁcation of multichannel images in mass storage using image transposition”, Computer Vision, Graphics and Image Processing, Vol. 29, No. 1, 1985, pp.23–56. SPL E0.2.13 Fredrick, C. and Schwartz, E.L. [1990] “Conformal image warping”, IEEE Computer Graphics and Applications, Vol. 10, No. 2, Mar, 1990, pp.54–61. SPL E0.2.96 Gardner, I.C. and Washer, F.E [1948] “Lenses of extremely wide angle for airplane mapping”, Journal of the Optical Society of America, Vol. 38, No. 5, May, 1948, pp.421–431. SPL F1.2.1 Gibbs, A.R. [1992] Personal communication. Glassner, A. [1986] “Adaptive precision in texture mapping”, Computer Graphics, Vol. 20, No. 4, Aug, 1986, pp.297–306. CLL Glassner, A.S. [1990b] “Interpretation of texture map indices”, Graphics Gems, A.S. Glassner (ed.), Academic Press Ltd, San Diego, California, USA, 1990, ISBN 0–12–286165–5. Gonzalez, R.C. and Wintz, P. [1977] Digital Image Processing, Addison-Wesley, 1977, ISBN 0–201–02596–5. CLL U135 Goshtasby, A. [1989] “Correction of image deformation from lens distortion using Bezier patches”, Computer Vision, Graphics and Image Processing, Vol. 47, 1989, pp.385–394. SPL E0.2.13 Greene, N. and Heckbert, P.S. [1986] “Creating raster omnimax images from multiple perspective views using the elliptical weighted average ﬁlter”, IEEE Computer Graphics and Applications, Vol. 6, No. 6, Jun, 1986, pp.21–27. CLL Guibas, L.J. and Stolﬁ, J. [1982] “A language for bitmap manipulation”, ACM Transactions on Graphics, Vol. 1, No. 3, Jul, 1982, pp.191–214. Hall, A.D. [1989] “Pipelined logical convolvers for binary picture processing”, Electronics Letters, Vol. 25, No. 15, 20 Jul, 1989, pp.961–963. SPL E0.1.36 Hall, A.D. [1991] Personal communication. Hall, E.L. [1979] Computer image processing and recognition, Academic Press, New York, USA, 1979, ISBN 0–12–318850–4. CLL 1Y88 Hammerin, M. and Danielsson, P-E. [1991] “High accuracy rotation of images”, Contributions to the SSAB symposium, March, 1991, Report No. LiTH–ISY–I–1161, Department of Electrical Engineering, Linköping University, S–58183 Linköping, Sweden; pp.1–4. Hanrahan, P. [1990] “Three-pass aﬃne transforms for volume rendering”, Computer Graphics, Vol. 24, No. 5, Nov, 1990, pp.71–78. CLL Heckbert, P.S. [1986a] “Survey of texture mapping”, IEEE Computer Graphics and Applications, Nov, 1986, pp.56–67. CLL Heckbert, P.S. [1986b] “Filtering by repeated integration”, Computer Graphics, Vol. 20, No. 4, 1986, pp.315–321. CLL Heckbert, P.S. [1989] “Fundamentals of texture mapping and image warping”, Master’s thesis, report number UCB/CSD 89/516, Computer Science Division (EECS), University of California, Berkeley, California 94720, USA; June, 1989. Heckbert, P.S. [1990a] “What are the coordinates of a pixel?”, Graphics Gems, A.S. Glassner (ed.), Academic Press, 1990, ISBN 0–12–28616–5, pp.246–248. 259 Heckbert [1990b] “Texture mapping”, lecture presented to the ACM/BCS International Summer Institute, “State of the art in computer graphics”, 2–6 July, 1990, Edinburgh, U.K. Heckbert, P.S. [1991a] “Review of ‘Digital Image Warping’ [Wolberg, 1990]”, IEEE Computer Graphics and Applications, Jan, 1991, pp.114–116. CLL Heckbert, P.S. [1991b] Response to Wolberg [1991] and McDonnell’s [1991] replies to Heckbert’s [1991a] review of Wolberg’s [1990] book (“Letters to the Editor” column), IEEE Computer Graphics and Applications, May, 1991, p.5. CLL Herbin, M., Venot, A., Devaux, J.Y., Walter, E., Lebruchec, J.F., Dubertret, L. and Roucayrol, J.C. [1989] “Automated registration of dissimilar images: application to medical imagery”, Computer Vision, Graphics and Image Processing, Vol. 47, No.1, Jul, 1989, pp.77–88. SPL E0.2.13 Horn, B.K.P. [1977] “Understanding image intensities”, Readings in Computer Vision, M.A. Fischler and O. Fischler (eds.), Morgan Kaufmann Publishers Inc., 95 First Street, Los Altos, California 94022, USA, 1987, ISBN 0–934613–33–8. Originally in Artiﬁcial Intelligence, Vol.8, No.2, 1977, pp.201–231. Hou, H.S. and Andrews, H.C. [1978] “Cubic splines for image interpolation and digital ﬁltering”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP–26, No. 6, Dec, 1978, pp.508–517. SPL E0.2.34 Hyde, P.D. and Davis. L.S. [1983] “Subpixel edge estimation”, Pattern Recognition, Vol. 16, No. 4, 1983, pp.413–420. SPL E0.0.1 Kajiya, J.T. and Kay, T.L. [1989] “Rendering fur with three-dimensional textures”, Computer Graphics, Vol. 23, No. 3, Jul, 1989, pp.656–657. CLL Keys, R.G. [1981] “Cubic convolution interpolation for digital image processing”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP–29, No. 6, Dec, 1981, pp.1153– 1160. SPL E0.2.34 Kiryati, N. and Bruckstein, A. [1991] “Gray levels can improve the performance of binary image digitizers”, CVGIP: Graphical Models and Image Processing, Vol. 53, No. 1, Jan, 1991, pp.31–39. CLL Kitchen, L.J. and Malin, J.A. [1989] “The eﬀect of spacial discretization on the magnitude and direction response of simple diﬀerential operators on a step edge”, Computer Vision, Graphics and Image Processing, Vol. 47, No.2 , Aug, 1989, pp.234–258. SPL E0.2.13 Klassman, H. [1975] “Some aspects of the accuracy of the approximated position of a straight line on a square grid”, Computer Graphics and Image Processing, Vol. 4, No. 3, Sept, 1975, pp.225–235. SPL E0.2.13 Kornfeld, C.D. [1985] “Architectural elements for bitmap graphics”, PhD thesis, technical report CSL–85–2, June, 1985, Xerox Corporation, Palo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304, USA, 1985. CLL V106–900 Kornfeld, C. [1987] “The image prism: a device for rotating and mirroring bitmap images”, IEEE Computer Graphics and Applications, Vol. 7, No. 5, 1987, pp.21–30. CLL Kuhnert, K-D. [1989] “Sensor modelling as a basis of subpixel image processing”, Proceedings of the SPIE, Vol. 1135, “Image Processing III”, chair and editor: J.F. Duverney; 27–28 April, 1989, Paris, France, pp.104–111. SPL F1.2.6 Laser-Scan, [1989] “VTRAK”, Laser-Scan Laboratories Ltd, Software Product Speciﬁcation, Jan 12, 1989, issue 1.2. Laser-Scan, Science Park, Milton Rd, Cambridge, CB4 4FY. 260 Lee, C-H. [1983] “Restoring spline interpolation of CT images”, IEEE Transactions on Medical Imaging, Vol. MI–2, No. 2, Sept, 1983, pp.142–149. Lynn, P.A. and Fuerst, W. [1989] Introductory digital signal processing with computer applications, John Wiley and Sons, Ltd, Chichester, 1989, ISBN 0–471–91564–5. Mach, E. [1865] “On the visual sensations produced by intermittent excitations of the retina”, Philosophical Magazine, Vol. 30, No. 2, Fourth Series, 1865, pp.319–320. SPL A0.1.2 Maeland, E. [1988] “On the comparison of the interpolation methods”, IEEE Transactions on Medical Imaging, Vol. 7, No. 3, Sept, 1988, pp.213–217. SPL E0.2.103 Maisel, J.E. and Morden, R.E [1981] “Rotation of a two-dimensional sampling set using onedimensional resampling”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. ASSP–29, No. 6, Dec, 1981, pp.1218–1222. SPL E0.2.34 Mantas, J. [1987] “Methodologies in patternrecognition and image analysis — a brief survey”, Pattern Recognition, Vol. 20, No. 1, 1987, pp.1–6. SPL E0.0.1 Marion, A. [1991] An introduction to image processing, Chapman and Hall, 2–6 Boundary Row, London, 1991, ISBN 0–412–37890–6. Marr, D. [1976] “Early processing of visual information”, Philosophical Transactions of the Royal Society of London, Vol. 257, Series B, 1976, pp.483–524. Martin, G.E. [1982] Transformation Geometry, Springer-Verlag, New York, USA, 1982, ISBN 0–387–90636–3. McDonnell, M. [1991] “Response to Heckbert’s [1991a] review of Wolberg’s [1990] book”, “Letters to the Editor” column, IEEE Computer Graphics and Applications, May, 1991, p.4–5. Mersereau, R.M [1979] “The processing of hexagonally sampled two-dimensional signals”, Proceedings of the IEEE, Vol. 67, No. 6, Jun, 1979, pp.930–949. CLL Mitchell, D.P. and Netravali, A.N. [1988] “Reconstruction ﬁlters in computer graphics”, Computer Graphics, Vol. 22, No. 4, Aug, 1988, pp.221–288. CLL Murch, G.M. and Weiman, N. [1991] “Gray-scale sensitivity of the human eye for diﬀerent spacial frequencies and display luminance levels”, Proceedings of the Society for Information Display, Vol. 32, No. 1, 1991, pp.13–17. Nagioﬀ, O. [1991] Personal communication. Namane, A. and Sid-Ahmed, M.A. [1990] “Character scaling by contour method”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 12, No. 6, Jun, 1990, pp.600–606. SPL E0.2.98 Newman, W. and Sproull, R. [1979] Principles of Interactive Computer Graphics (second edition), McGraw-Hill, 1979, ISBN 0–07–046338–7. NMFPT, [1992] National Museum of Film, Photography, and Television, Bradford, West Yorkshire, personal communication. Oakley, J.P. and Cunningham, M.J. [1990] “A function space model for digital image sampling and its application in image reconstruction”, Computer Graphics and Image Processing, Vol. 49, No. 2, Feb, 1990, pp.171–197. SPL E0.2.13 Oka, M., Tsutsui, K., Ohba, A., Kurauchi, Y. and Tago, T. [1987] “Real-time manipulation of texture-mapped surfaces”, Computer Graphics, Vol. 21, No. 4, Jul, 1987, pp.181–188. CLL 261 Olsen, J. [1990] “Smoothing enlarged monochrome images”, Graphics Gems, A.S. Glassner,(ed.), Academic Press Ltd, San Diego, CA, 1990, ISBN 0–12–286165–5, pp.166–170. Oskoui-Fard, P. and Stark, H. [1988] “Tomographic image reconstruction using the theory of convex projections”, IEEE Transactions on Medical Imaging, Vol. 7, No. 1, Mar, 1988, pp.45–58. SPL E0.2.103 Overhauser, A.W. [1968] “Analystical deﬁnition of curves and surfaces by parabolic blending”, Mathematical and Theoretical Sciences Department, Scientiﬁc Laboratory, Ford Motor Company, Report No. SL–68–40, 1968. Paeth, A.W. [1986] “A fast algorithm for general raster rotation”, Proceedings, Graphics Interface ’86/Vision Interface’86, 26–30 May, 1986, pp.77–81. CLL 1Y712 Paeth, A.W. [1990] “A fast algorithm for general raster rotation”, Graphic Gems, A.S. Glassner (ed.), Academic Press, 1990, ISBN 0–12–286165–5. Park, S.K. and Schowengerdt, R.A. [1982] “Image sampling, reconstruction and the eﬀect of sample scene phasing”, Applied Optics, Vol. 21, No. 17, 1 Sept, 1982, pp.3142–3151. SPL F1.2.3 Park, S.K. and Schowengerdt, R.A. [1983] “Image reconstruction by parametric cubic convolution”, Computer Vision, Graphics and Image Processing, Vol. 23, No. 3, Sept, 1983, pp.258–272. SPL E0.2.13 Parker, J.A., Kenyon, R.V. and Troxel, D.E. [1983] “Comparison of interpolation methods for image resampling”, IEEE Transactions on Medical Imaging, Vol. MI–2, No. 1, Mar, 1983, pp.31–39. Pavlidis, T. [1982] Algorithms for Graphics and Image Processing, Springer-Verlag, 1982. CLL 1Y138 Pavlidis, T.[1990] “Re: Comments on ‘Stochastic Sampling in Computer Graphics’ [Cook, 1986]”, Letter to the Editor, ACM Transactions on Graphics, Vol. 9, No. 2, Apr, 1990, pp.233–326. CLL Petersen, D.P. and Middleton, D. [1962] “Sampling and reconstruction of wave-number-limited functions in N -dimensional Euclidean spaces”, Information and Control, Vol. 5, No. 4, Dec, 1962, pp.279–323. SPL E0.2.121 Pettit, F. [1985] Fourier Transforms in Action Chartwell-Bratt (Publishing and Training) Limited, 1985, ISBN 0–86238–088–X. CLL 1Y505 Piegl, L. [1991] “On NURBS: a survey”, IEEE Computer Graphics and Applications, Jan, 1991, pp.55–71. CLL Prasad, K.P. and Satyanarayana, P. [1986] “Fast interpolation algorithm using FFT”, Electronics Letters, Vol. 22, No. 4, 13 Feb, 1986, pp.185–187. SPL E0.1.36 Pratt, W.K [1978] Digital Image Processing, John Wiley and Sons, New York, USA, 1978, ISBN 0–471–01888–0. UL 9428.c.3814 Pratt, W.K [1991] Digital Image Processing (second edition), John Wiley and Sons, New York, USA, 1991, ISBN 0–471–85766–0. Press, W.H., Flannery, B.P., Teukolsky, S.A. and Vetterling, W.T. [1988] Numerical Recipes in C: The Art of Scientiﬁc Computing, Cambridge University Press, Cambridge, 1988, ISBN 0–521–35465–X. 262 Rabbani, M. and Jones, P.W. [1991] Digital Image Compression Techniques, SPIE — The Society of Photo-Optical Instrumentation Engineers, P.O. Box 10, Bellingham. Washington, USA, 1991, ISBN 0–8194–10648–1. Rasala. R. [1990] “Explicit cubic spline interpolation formulas”, Graphic Gems, A.S. Glassner (ed.), Academic Press, 1990, ISBN 0–12–286165–5. Ratliﬀ, F. [1965] Mach Bands: Quantitative studies on neural networks in the retina, HoldenDay Inc, San Francisco, USA, 1965. Psychology Lab. Library: C.6.68A Ratzel, J.N. [1980] “The discrete representation of spatially continuous images”, PhD thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA, Aug, 1980. Reed, T.R., Algazi, V.R., Ford, G.E. and Hussain, I. [1992] “Perceptually based coding of monochrome and color still images”, Proceedings of the Data Compression Conference, March 24–27, 1992, Snowbird, Utah, USA; J.A. Storer and M. Cohn (eds.), IEEE Computer Society Press, Los Alamitos, California, USA; ISBN 0–8186–2717–4 (case bound); pp.142– 151. Reichenbach, S.E. and Park, S.K. [1989] “Two-parameter cubic convolution for image reconstruction”, Proceedings of the SPIE, Vol. 1199, “Visual Communications and Image Processing IV (1989)”, pp.833–840. Richards, P.I. [1966] “Sampling positive functions”, Proceedings of the IEEE (Letters), Vol. 54, No. 1, Jan, 1966, pp.81–82. CLL Robertson, P.K. [1987] “Fast perspective views of images using one-dimensional operations”, IEEE Computer Graphics and Applications, Vol. 7, No. 2, Feb, 1987, pp.47–56. SPL E0.2.96 Rogers, D.F. and Adams, J.A. [1976] Mathematical Elements for Computer Graphics, McGrawHill, 1976, ISBN 0–07–053527–2. Rosenenfeld, A. [1988] “Computer vision: basic principles”, Proceedings of the IEEE, Vol. 76, No. 8, Aug, 1988, pp.863–868. CLL Rosenfeld, A. and Kak, A.C. [1982] Digital Picture Processing (second edition, two volumes), Academic Press, Orlando, Florida, USA, 1982, ISBN 0–12–597301–2. CLL 1Y213 Rothstein, J. and Weiman, C. [1976] “Parallel and sequential speciﬁcation of a context sensitive language for straight lines on grids”, Computer Graphics and Image Processing, Vol. 5, No. 1, Mar, 1976, pp.106–124. SPL E0.2.13 c van Sant, T. [1992] “The Earth/From Space”, a satellite composite view of Earth, Tom van Sant/The Geosphere Project, Santa Monica, California, USA, 800 845–1522. Reproduced in Scientiﬁc American, Vol. 266, No. 4, April, 1992, p.6. Schafer, R.W. and Rabiner, L.R. [1973] “A digital signal processing approach to interpolation”, Proceedings of the IEEE, Vol. 61, No. 6, Jun, 1973, pp.692–702. Schoenberg, I.S. [1973] “Cardinal spline interpolation”, Regional Conference Series in Applied Mathematics, Society for Industrial and Applied Mathematics, (SIAM), Philadelphia, Pennsylvania, USA, Vol. 12, 1973. Schreiber, W.F. and Troxel, D.E. [1985] “Transformation between continuous and discrete representations of images: a perceptual approach”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI–7, No. 2, Mar, 1985, pp.178–186. SPL 263 Sedgewick, R. [1988] Algorithms (second edition), Addison-Wesley, Reading, Massachusetts, USA, 1988, ISBN 0–201–06673–4. Shannon, C.E. [1949] “Communication in the presence of noise”, Proceedings of the IRE, Vol. 37, No. 1, Jan, 1949, pp.10–21. SPL E0.2.17 Sloan, A.D. [1992] “Fractal image compression: a resolution independent representation from imagery”, Proceedings of the NASA Space and Earth Science Data Compression Workshop, March 27, 1992, Snowbird, Utah, USA; J.C. Tilton and S. Dolinar (eds.); NASA Conference Publications; to appear. Smit, T., Smith, M.R. and Nichols, S.T. [1990] “Eﬃcient sinc function interpolation technique for centre padded data”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 38, No. 9, Sept, 1990, pp.1512–1517 SPL E0.2.34 Smith, A.R. [1981] “Digital ﬁltering tutorial for computer graphics”, Technical Memo, Lucasﬁlm Ltd, (available from PIXAR, 1001 West Cutting Richmond, CA 94804, USA). Technical Memo No. 27, 1981. Smith, A.R. [1982] “Digital ﬁltering tutorial, part II”, Technical Memo, Lucasﬁlm Ltd, (available from PIXAR, 1001 West Cutting Richmond, CA 94804, USA). Technical Memo No. 44, 10 May, 1982. Smith, A.R. [1987] “Planar 2-pass texture mapping and warping”, Computer Graphics, Vol. 21, No. 4, Jul, 1987, pp.263–272. CLL Smith, T.G. [1986] Industrial Light and Magic: the art of special eﬀects, Virgin Publishing, 338 Ladbroke Grove, London, UK, 1986, ISBN 0–86287–142–5. Sproull, R.F. [1986] “Frame-buﬀer display architectures”, Annual Review of Computer Science, Vol. 1, 1986, pp.19–46. SPL A0.2.39 Trussell, H.J., Ardner, L.L., Moran, P.R. and Williams, R.C. [1988] “Corrections for nonuniform sampling distributions in magnetic resonance imagery”, IEEE Transactions on Medical Imaging, Vol. 7, No. 1, Mar, 1988, pp.32–44. SPL E0.2.103 TRW, [1988] , Data Book: Data Converters — DSP Products, TRW LSI Products Inc., P.O. Box 2472, La Jolla, California 92038, USA, 1988. Turkowski, K. [1986] “Anti-aliasing in topological color spaces”, Computer Graphics, Vol. 20, No. 4, Aug, 1986, pp.307–314. CLL Turkowski, K. [1990] “Filters for common resampling tasks”, Graphics Gems, A.S. Glassner (ed.), Academic Press, 1990, ISBN 0–12–286165–5. Venot, A., Devaux, J.Y., Herbin, M., Lebruchec, J.F., Dubertret, L., Raulo, Y. and Roucayrol, J.C. [1988] “An automated system for the registration and comparison of photographic images in medicine”, IEEE Transactions on Medical Imaging, Vol. 7, No. 4, Dec, 1988, pp.298–303. SPL E0.2.103 Vogt, R.C. [1986] “Formalised approaches to image algorithm development using mathematical morphology”, Environmental Research Institute of Michigan, Ann Arbor, Michigan, USA; personal communication, 1986. Ward, J. and Cok, D.R. [1989] “Resampling algorithms for image resizing and rotation”, Proceedings of SPIE, Vol.1075, “Digital Image Processing Applications”, 17–20 Jan, 1989, Y-W. Lin and R. Srinivason (eds.), Los Angeles, California, USA. Also available: (with colour plates)as: Technical Report 24800H, Hybrid Imaging Systems Division (PRL-PPG), Kodak Research Laboratories, Eastman Kodak Company, Rochester, New York, 14650–2018, USA. SPL F1.2.6. 264 Waters, R. [1984] “Automated digitising: the ‘remaining’ obstacles”, presented to “Copmputer Graphics in Mapping and Exploration”, Computer Graphics ’84, Wembley, London, Oct 11, 1984. Laser-Scan, Science Park, Milton Rd, Cambridge, CB4 4FY. Waters, R. [1985] “LASERTRAK: a vector scanner”, presented at AM/FM Keystone Conference, Aug, 1985. Laser-Scan, Science Park, Milton Rd, Cambridge, CB4 4FY. Waters, R., Meader, D., and Reinecke, G. [1989] “Data Capture for the Nineties: VTRAK”, Laser-Scan, 12343F Sunrise Valley Drive, Reston, Virginia 22091, USA. Watson, A.B. [1986] “Ideal shrinking and expansion of discrete sequences”, NASA Technical Memorandum 88202, Jan, 1986, NASA, Ames Research Center, Moﬀett Field, California 94035, USA. Weiman, C.F.R. [1980] “Continuous anti-aliased rotation and zoom of raster images”, Computer Graphics, Vol. 14, No. 3, 1980, pp.286–293. SPL White, B. and Brzakovic, D. [1990] “Two methods of image extension”, Computer Vision, Graphics and Image Processing, Vol. 50, No. 3, Jun, 1990, pp.342–352. SPL E0.2.13 Whittaker, E.T. [1915] “On the functions which are represented by the expansions of the interpolation theory”, Proceedings of the Royal Society of Edinburgh, Vol. 35, Part 2, 1914–1915 session (issued separately July 13, 1915), pp.181–194. SPL A1.1.10 Williams, L.[1983] “Pyramidal parametrics”, Computer Graphics, Vol. 17, No. 3, Jul, 1983, pp.1–11. CLL Wolberg, G. [1988] “Geometric tranformation techniques for digital images: a survey”, Department of COmputer Science, Columbia University, New York, New York 10027, USA; Technical Report CUCS–390–88, Dec, 1988. Wolberg, G. [1990] Digital Image Warping, IEEE Computer Society Press Monograph, IEEE Computer Society Press, 10662 Los Vaqueros Circle, P.O.Box 3013, Los Alamitos, CA 90720– 1264, USA, 1990, ISBN 0–8186–8944–7 (case bound). Wolberg, G. [1991] “Response to Heckbert’s [1991a] review of Wolberg’s [1990] book”, “Letters to the Editor” column, IEEE Computer Graphics and Applications, May, 1991, p.4. CLL Wolberg, G. and Boult, T.E [1989] “Separable image warping with spatial lookup tables”, Computer Graphics, Vol. 23, No. 3, Jul, 1989, pp.369–378. CLL Woodsford, P.A. [1988] “Vector scanning of maps”, Proceesings of IGC Scanning Conference, Amsterdam, 1988. Laser-Scan, Science Park, Milton Rd, Cambridge, CB4 4FY. Wüthrich, C.A. and Stucki, P. [1991] “An algorithmic comparison between square and hexagonal based grids”, CVGIP: Graphical Models and Image Processing, Vol. 53, No. 4, Jul, 1991, pp.324–339. CLL Wüthrich, C.A., Stucki, P. and Ghezal, A. [1989] “A frequency domain analysis of squareand hexagonal-grid based images”, Raster Imaging and Digital Typography, Proceedings of the International Conference, Ecole Polytechnique Fédérale, Lausanne, October, 1989; André, J. and Hersch, R.D. (eds.); Cambridge University Press, UK, 1989; ISBN 0–521– 37490–1, pp.261–270

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

### Related manuals

Download PDF

advertisement