Instituto de Sistemas e Robótica Instituto Superior Técnico Av. Rovisco Pais, 1, 1096 Lisboa Codex, Portugal SMART Semi-Autonomous Monitoring and Robotics Technologies WorkShop Notes A я Lisbon, 27-28 April 1995 Ч Portugal | i „о o res) e я i ea Be oe e. Pue A ЗЫ 4 p= 3 = #4" Say nok BEE ca a "Za EA 2 } Yo E > a e AE PZA A a LIT “x 9 Re TA hn) EE —= pre ier N The ISR Multi-Degrees-of-Freedom Active Vision Robot Head Design and Calibration Jorge Batista, Jorge Dias, Helder Araujo, A. T. Almeida Institute of Systems and Robotics (ISR), and Electrical Engineering Department, University of Coimbra, 3000 Coimbra - PORTUGAL; phone: +351-39-34876/34884 ; fax: +351-39-35672 ; e-mail: [email protected] Abstract At the present, there is a growing trend in computer vision to consider the visual system in the context of the behaviour of robot interacting with dynamic environment. The Active Vision approach observed that constraints derived from camera motion can replace other assumptions that had previously been employed to solve mathematically ill-posed problems. Experiments in active vision require having the ability to manipulate the visual parameters. The central issue in develloping a Multi-Degrees-of-Freedom (MDOF) Active Vision Robot Head is the design strategy of the system. Bringing all the issues of such a system and their solutions together and build a head-eye system with reasonable Pa performance vs. cost is an engineering problem. This problem in this work was formulated as : "how should a head-eye system be designed, what are the design criteria, how and in accordance with what strategy should the head be designed and controlled, what kind of degrees of freedom must be included ?”. To be able to effectively use multi degree of freedom (MDOF) camera systems we need to know how variations in the camera’s control parameters are going to cause changes in the produced images. For this we need to have good mathematical models describing the relationships between the control parameters and the parameters of the resulting images, 1.e., we need to calibrate the system. In this paper, we present methods and algorithms for camera calibration of an head-eye system developed at the ISR — Coimbra. The calibration of the system includes two parts: the camera calibration problem, 1.e., the calibration of the intrinsic and extrinsic parameters of the camera, and the so-called kinematic calibration to calibrate relationships (rotation and translation) between different systems. In this paper only the camera calibration problem is addressed. A method for computing the camera parameters by tracking features in the image when the camera undergoes pure rotation is used. I - INTRODUCTION Experiments in active vision require having the ability to manipulate the visual parameters. This ability and associated issues form the subject of this abstract. The central issue in developing a Multi-Degrees-of-Freedom (MDOF) Active Vision Robot Head is the design strategy of the system. Bringing all the issues of such a system and their olutions together and build a head-eye system with a reasonable performance vs. cost is an engineering problem. The design of a MDOF head-eye system for active vision is dependent on what we put into this notion. First of all, an active vision system is not just an optomechanical device feeding a computer and carrying out the commands from the computer. The degree of integration is crucial for such a system, and of course the issue of real-time processing and control. These factors determine the behaviour one can obtain. The more elaborately the visual system reacts to the surrounding environment, the more evolved the primary tasks will be. The nature of the visual processes we want to integrate will be related to the architeture chosen. Much work has been done in developing vision systems to study how these and other features of the human visual system are used to facilitate perception. One important aspect in the designing stage of these robotic systems is the performance they should accomplish. The analisys of some characteristics of the human active visual system can be usefull for determining performance requirements for velocity and acceleration of a mechanical device that intend to simulate the human visual system behaviour. To be able to effectively use multi degree of freedom (MDOF) camera systems we need to know how variations in the camera’s control parameters are going to cause changes in the produced images. For this we need to have good mathematical models describing the relationships betwein the control parameters and the parameters of the resulting images, 1.e., we need to calibrate the system. In modeling systems, and in special active vision robot systems, it helps to have an understanding of the underlying physical process involved. If the process are straight forward or if they are accessible and can be directly measured then the modeling task can be greatly simplified. This is the case of the extrinsic parameters of the camera calibration model, such as rotation and translation. If instead the underlying physical process is complex or inaccessible then the modeling task can be much more difficult. This is the case of the camera’s optical degrees of freedom, such as focal length, focus distance and aperture. In modeling MDOF optical systems where there is a limited understanding of the underlying physical process involved, extensive calibration is often necessary to develop the model and to collect calibration data. Until recently, most of the existing algorithms for camera calibration required the use of predefined patterns for calibration. It is difficult to achieve and typically requires special objects to be placed in front of the cameras, made and measured to high precision. For this reason a lot of emphasis has been placed on algorithms that do not require any camera calibration and on camera calibration techniques that allows the camera to calibrate itself as it moves in an unstructured world. “Eyes in humans and other animals do not need any artificial assistance for calibration”. The answer to the above observation cames with the fact that eye movements simplify the calibration problem. It would be a great advantage to be able to calibrate a camera using only feature coordinates in the image plane. This cannot be done with a single image. Instead it requires camera motion. Methods for computing the camera parameters by tracking features in the image, without using special patterns for calibration, have been developed by Faugeras [31], Hartley [26], Dron [18], McLauchlan [24], Brady [5], Stein [6] and Basu [2]. An important aspect for the MDOF optical systems calibration is the fact that the parameters are changing from time to time, which requires a real-time calibration of the parameters or a pre-calibration of these parameters to build up a look-up-table. We based our approach on the use of feature correspondences from a set of images where the camera has undergone pure rotation. The basic idea of this method is very simple. Given any pair of images obtained by a camera which undergoes pure rotation, if the intrinsic camera parameters and the angle and axis of rotation are known, one can compute where the feature points from one image will appear in the second image. If there is an error in the intrinsic parameters, the features in the second image will not coincide with those computed from the first image. The closer the features are located to the camera the more important it is that the camera does not undergo any translation during the rotation. A method to ensure that the axis of rotation passes close to the center of projection (front nodal point in a thick lens model) is presented. We obtain the camera’s physical parameters by first correcting the distortion in image plane and then applying the pure rotation calibration method to obtain the camera’s parameters. A method based on the cross ratio invariance is used to obtain the point of best radial distortion symmetry, and the first coefficient of the radial distortion is obtained using the principle of collinearity of three collinear projected points. To be able to adjust the intrinsic parameters in real-time, we modelize the underlying behaviour of the camera for a large group of different setups, changing the intrinsic parameters on these setups. We use bivariate cubic polynomials to model the relationships between the camera’s control parameters and the parameters of the resulting images, such has image magnification, focus distance, optical center adjustment, etc. . The calibration involves performing a least square fit of the model to collected data to determine the coefficients of best fit for the bivariate polynomials. For the extrinsic parameters (pose estimation of the camera), and since the movements of the head-eye system are controlled, we know how much it has been moved relatively to some initial position. With this procedure, the calibration of the extrinsic parameters must only be done at the initial position, and real-time calibration of the extrinsic parameters can be performed. On this paper we will describe the design and calibration of a MDOF Active Vision system head developed at the ISR Coimbra. The head has 16 degrees of freedom, being 6 degrees of freedom for each eye (pan, tilt, zoom, focus, aperture, and optical center adjustment), the baseline degree of freedom and three degrees of freedom for the neck (swing, pan and tilt). II - THE ISR-COIMBRA MDOF Active VISION HEAD Active vision systems are often modeled on attributes of the human visual system since this is the most well-studied visual system. The human oculomotor is one of the best known functions of our brain. Two attributes of the human vision system, ocular motion and foveal-peripheral vision, are essential to human visual perception. Ocular motion allows movements of the eyes to direct the view point of the visual system. Foveal-peripheral vision enables humans to perceive small regions in fine detail in combination with a wide field of view at coorse detail. Taking advantages of the ocular visual system, the human head also has the capability of changing gaze and fixate on features of the environment. The combination of these (human) capabilities was our main purpose for building this robotic head. Ignoring the dependencies enforced by the particular model of the oculomotor system, the degrees of freedom in the human head can be listed as follows: Eyes-mechanical Each eye has three degrees of freedom (a total of six degrees of freedom) which are: eo superior-inferior (tilt) e lateral-medial (pan) e cyclotorsion (swing about the optical axis) Neck-mechanical The neck has three degrees of freedom. which are: / Eye Pan Eye Tilt Neck Pan Neck Tilt Range of Motion +45° +45° +80*% +90 up, —60” down Peak Acceleration 35000deg/ s* Peak Velocity 600deg/s Interocular distance ~ 64.0mm Foveal-Peripheral resolution ratio 10:1 Table 1. Human active visual system characteristics o tilt e pan e swing or lateral tilt movement Eyes-optical Each eye has two degrees of freedom (a total of four) which are: e accommodation e iris manipulation The design of a MDOF head-eye system for active vision is dependent on what we put into this notion. First of all, _an active vision system is not just an optomechanical device feeding a computer and carrying out the commands from he computer. The degree of integration is crucial for such a system, and of course the issue of real-time processing and control. These factors determine the behaviour one can obtain. The work that will be presented on the paper is part of a project, in which the control issue was for the first time adressed in such a way that all dependencies in different possible actuation shemes of the system are defined by the organization of the control system. The problem in this work was formulated as : “how should a head-eye system be designed, what are the design criteria, how and in accordance with what strategy should the head be designed and controlled, what kind of degrees of freedom must be included ?”. Our main purpose for building these vision robot head was not just having an active vision system with the basic degrees of freedom (vergence, pan and tilt) to perform tracking or to be used on mobile platforms, but also to have a device where we can study and simulate some of the human visual behaviours. This active vision robot head was designed taking into account some of the geometric aspects of the human head, such has for instance the fact that the human head has a translation between the neck pan rotation and the location of the eyes. These aspects will give us the chance to analyse what kind of visual information are processed by the human brain. We also included some optical degrees of freedom that the human beings does not have, such has the capability of changing zoom. One important aspect in the designing stage of these robotic systems is the performance they should accomplish. The analisys of some characteristics of the human active visual system can be usefull for determining performance requirements for velocity and acceleration of a mechanical device that intend to simulate the human visual system behaviour. As an example, the human eye executes a saccadic motion that has as initial acceleration of up to 30000° /s*, which is then followed by a smaller deceleration to terminate the movement with a slight overshoot. The amplitude of a saccade of the human eye rarely reaches above 15°, at which point the acceleration of the motion saturates at 35000°/s%. Saccades of amplitude larger than 20° involve both eye motion and head motion, and move at even higher speeds. The receptors in the fovea centralis of the human eye are significantly more dense than at the periphery, giving 10CA : Optical Center Adjustment Precision Range Velocity Neck Pan 0.0036° [—110°.. + 110°] ~ 360°/s Neck Swing 0.0036° [278.4 27.5" = 360° /s Neck Tilt 0.0036° [—32°.. + 32°] ~ 360° /s Eye Pan 0.0036° |—45°.. + 45°] ~ 360°/s Eye Tilt 0.0031° [—20°.. 4+ 20°] ~ 330°/s OCA * 8nm [0..80)mm ~ 1lmm/s Baseline 20nm [137..287]тт. ~ 5mm/s Zoom Range /60000 [12, 5..75)mm ~ 1.8 * Range/s Aperture Range [75000 [1, 2..16] тт ~ 1.75 * Range/s Focus Range /60000 [1..c0]mm ~ 1.8 x Range/s Table 2. ISR MDOF active vision system characteristics Fig. 1. ISR MDOF Active Vision Head a resolution ratio between fovea and periphery of about 10:1 The fovea centralis, that presents much higher resolution, plays a critical role in object discrimination, identification and manipulation. Some characteristics of the human visual system are sumarized in table 1, being the information presented in the table obtained from Carpenter [25], Yarbus [1], Geiger and Yuille [4], Webb Associates [33] and Wurth and Goldberg [30]. Compared with the degrees of freedom of the human head described above, the ISR MDOF active vision system has the following degrees of freedom (see fig. 1): Eyes-mechanical Each eye has two degrees of freedom (a total of six) : e elevation (tilt) e azimuth (pan) e cyclotorsion * an aditional degree of freedom is included to kee elevation axes of the lens. Neck-mechanical The neck has three degrees of freedom : e tilt p the optical center at the crosspoint of the azimuth and e pan * swing or lateral tilt movement Eyes-optical Each eye has three degrees of freedom (a total of six) 2: e focusing ® zooming * Iris manipulation Baseline The ability of mechanicaly change the distance between the two eyes. This MDOF active vision robot head is probably the head that currently has more degrees of freedom. In addition to the common degrees of freedom for camera heads (pan, tilt and independent vergence for each of the eyes), this head includes the swing movement of the head neck, cyclotorsion of the lenses and the ability of adjusting the optical center of the lenses. The latter is to ensure pure rotation when verging the cameras and compensate for the translation movement of the optical center when changing the focal length of the lens. Most of the existing heads uses standart motorized lenses with presets for feedback information. These lenses has the disadvantage of moving too slow for real-time acommodation purposes (5-6 seconds to full range movement), and the accuracy for position control is not very good due to the type of information they provide as feedback. New motorized lenses are now being developed which will enable this head to acommodate the optical system in almost real time, with very good precision (more than 30000 discrete position for the full range of each optical degree of freedom) Some characteristics of the ISR MDOF active vision system are sumarized on table 2. 2A new motorized lens is been develo ped at the ISR based on DC motors and using incremental encoders as feedback information. They will have the ability to work almost in real time. Full range travel in less than half a second for each degree of freedom. III - OPTICAL CALIBRATION : Mathematical Background A. The Perspective Projection Camera Model In the following, the underlying camera model is described briefly. We assume the existence of a Cartesian coordinate system C, centered at the optical center of the camera lens and the lens viewing down the Z, axis. The image plane is located at a distance f in front of the optical center and is orthogonal to the Z, axis. The x; axis of the image coordinate system I; is parallel to the X. axis of Cy, and the y; axis of the I, is parallel to the Y, axis of C; (see fig. 2). The overall transformation from a 3D camera point P.(z., ye, 2c) can be decomposed into the following steps: 1. Projection from 3D camera coordinate point Pe(zc, ye, zc) to the ideal image coordinate point Pu perspective projection with pin-hole camera geometry to a computer image frame buffer point py (zy, yy) (Tu; Yu), Using Tc Ye Ty =f -— =f:—. 1 u=f E Yu =f Z ( ) 2. Off the shelf cameras and lenses which are often used for computer vision have large amounts of lens distortion, in particularly wide angle lenses. The most important types of lens distortions are the radial and decentering distortion, and the model for these distortions to map distorted image coordinates, that are observable, to the undistorted (or ideal) image coordinates which are not physically measurable are: Zu = даб Yu =i+8, (2) where à, and 6, are the corrections for lens distortion and can be found by de = га (Кута + Кота ++.) + [Pi (rà + 2x5) + 2P2xava] [1 + Pa (73 + --)] (3) dy = yalKırz+ Кота +...) + [Pi (ri + 294) + 2Por aya) [1+ Ps (3 +--)] (4) and where r = 13 +42. The first term in equations 3 and 4 is the radial distortion with parameters K, and K» and the second term is the decentering distortion (or tangential distortion) with parameters P,, Pa and Ps. Decentering distortion is due to misalignment of the lens elements and the non-perpendicularity of the lens assembly and the image plane. This distortion is much less important than the radial distortion. An alternate solution to the decentering distortion can be obtained using just the radial distortion, supposing that the optical axis of the lens is not perpendicular to the image plane. With this assumption, the optical axis will no longer pass through the principal point (which is the point where a line starting at the center of projection intersects perpendicularly the image plane) but through another point which will be call the point of best radial symmetry (Cor, С r). A conrdigale shift 1s taking place to the point of best radial symmetry by translating the distorted image point py by substituting z/, = xq — cz, and Ya = Yd — Cyr, Where z/; and у’, are the coordinates of a point p,. Taking only the | pt q Pr y d, wp : A Codo, | LY y I - . | Y Y с я Center of Projection Fig. 2. Pin-Hole Camera Model, with point of best radial symmetry radial distortion terms of the equations 3 and 4 and using the p', points instead the pg points, the lens distortion is now defined by 8, = 2 (Kirj + Korg +...) (5) by = yılKırf + Korff +.) (6) where as before e = a +y7. Taking only the first coefficient of the radial distortion K1, and expanding the equations 5 and 6 we get oy = zaKırz = Jy [cer (ri + 2:3) + 2¢yr Taya] + K31 [xacy, + 2xdeZ, + QyacarCyr — E. — car] (7) by = yaKırz = Kı [Er (ri "т 2y3) + 2czrVaXd] + K, [vací, + 2 aci, + 2T aCyr Car — Bo — & Cyr | : (8) The first term of equations 7 and 8 is the radial distortion term as in equations 3 and 4. Considering P; = —K; cz, and Py = —Kjcy, the second term of equations 7 and 8 is the decentering distortion term, and since c,, and Cyr are usually at most tens of pixels, the third term is small. Based on that, the two forms to represent the distortion are equivalents and we can include the decentering distortion in our camera model taking into account only the radial distortion coefficients. 3. Finally, the point is converted to frame buffer coordinates taking the distorted image coordinates and using the relationships ту = К. та + С; Yr = ky - Yd + Cy (9) where (cz, cy) is the principal point in frame buffer coordinates and (k., ky) are the horizontal and vertical scale factors respectively. If we take the information provided for the camera manufacture about the CCD array, the scale factors can be defined as k; = S; - Ny; - (dz + Nez)7! and k, = а where Ny: is the number of the frame buffer pixels per row, Nez is the number of CCD sensors per row, (dr, dy) is the horizontal and vertical distance between CCD sensors respectively and finally S; that represents the horizontal uncertainty scale factor. In order to understand the meaning of the uncertainty scale factor S; it is important to understand the imaging process, which is the transfer of image information from the image plane to the frame buffer computer memory. In CCD cameras, the image plane ls a rectangular array of light sensors elements, which gives a discrete spatial sampling of the image plane. In digital cameras each individual element along a CCD row is digitized and transfered directly to the Image frame buffer memory, so the i!” element on a row of the frame buffer corresponds to the it" along the row of the CCD array. However, in regular video cameras the CCD values along each row are output as a time varying analog voltage signal which 1s resampled probably at a different frequency of the pixel clock frequency of the CCD. Since the original pixel clock frequency is lost in this process, the one-to-one pixel correspondence is no longer valid, and the uncertainty scale factor intends to model this frequency unadjustment. B. Correction of Radial Lens Distortion Under the radial and decentering distortion model obtained assuming the point of best radial symmetry, we can compute the camera’s physical parameters by first correcting the distortion in image plane and then applying the pure rotation calibration method to obtain the camera's parameters. B.1. Cross-Ratio for Point of Best Radial Symmetry The cross-ratio is the basic invariant in projective geometry. Definition 1 CROSS-RATIO OF FOUR POINTS. Let A, B, C, D be four collinear 3D points, we can define what we call their cross-ratio as: (4, В; С, р) = X will > Se >| 5 SIS where AB is the algebraic measure of AB. Definition 2 CROSS-RATIO OF FOUR LINES. The cross-ratio of a pencil of four lines ly, ls, la, la going through O is defined as the cross-ratio (A, B;C, D) of the points of intersection of the l; ;-1 4 with any line L not passing through O. This is, of course, independent of the choise of L. Based on definition 2, consider the existence of a line L in 3D space containing four collinear points P,, Po, P3 and Pa with known distances between each other. These points are ideally projected on the image plane, resulting the four projective points py,, Puy, Pus and py,, which due to radial lens distortion have four homologue distorted points pg, , Pd,, Pd, and pa, (see fig. 3). Consider O(czr, cy-) the point of best radial symmetry, which we distinguish from the image center for the reasons that we have already described on the previous section. Since O is the distortion center, then, Opa; is in the radial © Center of Projection age Plane Fig. 3. Projective Invariant for Point of Best Radial Distortion direction, and ри, should be on Opz,, with à = 1.4. The vectors Opa,, Which will be denoted by li i=1.4, form with Pi i=1.4 ON L a projective mapping, which makes the cross-ratio invariant [7;13] (see definition 2). Thus we have (Pi, Pa; Ps, Pa) = (1, la;l3, la). (10) Let the cross ratio of the 3D line points be РР, + ВР (P,, Pa; Pa, P)) = > _ CR. (11) PaP4- PIP. Based on the relationship 10 we could also define the cross ratio CR as _ sin 013 sin вод (12) a sin 034 sin 019 where 0;; is the angle between line Opa, and line Opa;- Consider that the point of best radial symmetry have the coordinates [Crr;, Cyr ,) in the frame buffer coordinate system, and each distorted image point pg; have the coordinates (Td;, Ya;)i=1..4 With respect to the point of best radial distortion symmetry. On the frame buffer coordinate system these points have the coordinates (zy,,yy,)i=1. 4. Based on this we have Opa; X Opa; n 6;; = | Ea x Opa;| (13) Opa; |OPa; | where Ора, — (Zd;, Ya) = (77; — Cory EZ, (ys, — Суг, ) К°) and |Opa; X Opa;| — EU A WU Substituting 13 into 12, we obtain for the cross ratio f13f24 _ 14 CR faafi2 (14) Where Л; = (24; — cary) (yy, — Cyr;) — (24: — UE — Cyr, )- Observing equation 14 it is easy to see that is independent of the radial distortion coefficient, pose of the camera, scale factors of the image and the focal length of the lens. It is a nonlinear equation on the two variables ( CzryyCyry)- Given more than one collinear set of calibration points we can solve for ( Cxr; » Cyr; ) defining the minimization function f13f24 F=CR= =. f3af12 Using the local minimization of the previous relationship about an initial guess and proceeding with iterative search a solution to the point of best radial distortion symmetry can be obtained. We use as initial guess to the iterative search the center of the frame buffer. To obtain the point of best radial symmetry we used the pattern present on figure 4, which defines six different collinear set of calibration points. B.2. Computing the Aspect Ratio and Distortion Coefficient by Collinearity In the ideal situation of having an image free of distortion and since perspective projection preserves collinearity of collinear points, image points produced by collinear world points should be collinear. Fig. 4. Pattern used to obtain the point of best radial symmetry This is the basic principle to obtain the first coefficient of the radial distortion k;. As we presented on section III.A, assuming the existence of a point of best radial symmetry and taking only the first coefficient of the radial distortion is enough to characterize the radial and the decentering distortion. If we consider the existence of only radial distortion, the frame buffer coordinates of a point ( to create its undistorted image coordinate by simplifying equation 3 and equation 4 to take int distortion and then expressing the undistorted image coordinates by 1; and y; according to Ya = (Ys — yr, Jk; 1, resulting Tf, 5) can be corrected o account only the radial zd = (24 — Cor, )E7! and y Ty = kz (zg Crr,) + (24 — Our Y KE + zy — Crr;) (yr — cyr,) 2k) (15) du = ky (yy —cyr,)+ (27 — corp) (yr — yr, JEL EZ + (27s —Crr, *K1) (16) where (z,,y,) are defined with respect to the Image aspect ratio. Take the example of the following three collinear points Pdi,i=1..3- Since the collinearity condition for the three corrected image points p,,,_, , = = (Zu,, Yu,) remains valid under the perspective projection, it can be defined as point of best radial symmetry, k} = kik; and k = ky /k, is called the Е = (Zus Zu, )(Yuz Yu, ) a (Zu, Zuy ) (Vus a Yu, ) = 0. (17) Combining equations 15 and 16 with the e coefficient k, and the aspect ratio k. Note scale factor k. Since the vertical scale facto radial distortion coefficient can be obtained to obtain the point of best radial symmetry radial distortion k} can be taken to be zero With the knowledge of the coordinates fo distortion coefficient and aspect ratio, we c coordinates. Using the equations 15 and 16 quation 17, we get a nonlinear equation on the equivalent radial distortion that the radial distortion coefficient k, can only be obtained up to the r is defined by the vertical distance between CCD sensors (ky = d7'), the easily. Since the resulting equation is nonlinear, the same procedure used can be used to obtain k! and k. The initial guess values to the equivalent and the aspect ratio k to be one. r the point of best radial symmetry and the correct values to the radial an now correct the radial and decentering distortion for the frame buffer we can define the correction equations to the frame buffer coordinates as Zi. = Tp + (25 — cor, RIK + (25 = cor) ) (yy — cyr, ) 2K, (18) Yfe = Yf + (25 — Cary) (Vf — Cyr, )ELE? + (уу — сут, )° К; (19) where (zy, y;.) are the corrected (or ideal) coordinates of the frame buffer coordinates (Zf, Y). Care must be taken with this procedure to correct for lens distortion since this procedure relies only on the distortion effect. If we are using lenses with very small lens distortion the calculation of the point of best radial symmetry will have a very large uncertainty. The extreme case is when we are using a lens totally free of any kind of distortion. In this case the point of best radial symmetry could be anywhere in the image without violating the cross ratio invariance. Fortunately this situation never happens with the off-the-shelf lenses used in computer vision, so the procedure proposed to solve the lens distortion is as a whole reliable. C. The Rotation Method - Theory Given any pair of images where the camera has undergone pure rotation over some axis, if the intrinsic camera parameters and the angle and axis of rotation are known one can compute where some feature points from one image Pure Rotation over Ye Pure Rotation | : over Xc ı 1 I , 1 Fig. 5. Pure Rotation around some axis will appear in the second image after rotation. This is the main obs obtain some of the intrinsic camera parameters. Let us assume that the camera is rotated in a rigid world environment around some axis ( existence of a camera coordinate system located at the lens o of the lens. A 3D point P,(z., ye, Ze) through the matricial relationship ervation that allows us to use pure rotation to see fig. 5). Also assume the ptical center and the Z axis viewing along the optical axis in the camera coordinate system will move after rotation to a point Pl(z., y., zl) 711 T12 T13 P=R-P.=| ry T2 T7 | Pe. (20) 731 T32 Tas Using the perspective projection pin-hole geometry the 3D camera point P! projects to the undistorted Image point PulZu, Y) where x r11Xe+T r132 ff 11Z¢ + T12Yc + T132¢ (21) Ze T31%c + T32Yc + T332c у T21Zc + T22Yc + T232 у, = f= = f= - = (22) Ze T31Tc + T32Yc + T332c Multiplying equations 21 and 22 by f/z. and substituting z, = f(z./z.) and y, = Fyc/zc) results / "Lu + 7129u + 7i3f 9 т, = (23) T312u + T32Yu + T33f TATu + 7224u + T23f L =f—— - | (24) T31Ty + T32Yu + Ta3f Observing these last two equations, we can conclude that the position of the point in the image after pure rotation depends only on the intrinsic camera parameters, the rotation matrix and the location of the point in the image before the rotation. The 3D point coordinates are not required in the case of pure rotation. As we will see in the nest point, this 1s not the case when we have rotation with translation. C.1. The Importance of Pure Rotation If the axis of rotation does not pass exactly through the optical center of the lens (center of projection) then there will be some translation in addition to the rotation around the center of projection. Considering the existence of a translation vector T = [ ¢, ty t; 1 the camera coordinate of the point P, after rotation is obtained using P.= R-P.+T and the location of an image point after rotation and translation will be pt ¡Tu + rife + raf + f= (25) и T31Zy + T32Yu + T33f + Joa y _ 721% + 722Yu + T23f + f= (26) - T31Tu + T32Yu + T33f + E - As we can see from the equation 25 and equation 26 the location of the point in the image after rotation is no longer independent of the depth of the 3D camera point and it also depends on the translation vector. However, if we use feature points that are located far from the camera (z, considerably large negligible (z. >> ft, г. > Ро, & > FL). C.2 How to Obtain Pure Rotation ), then the effect of the translation becomes As we change the focus or the zoom position of the lens, the optical center of the lens (center of projection along its optical axis. To compensate this displacement the MDOF Active Vision Hea the ability to move the lens along its optical axis with an accuracy of 0.015um. The test whether there is little or no translation is very simple and it involves the use of parallax. The motion of parallax is based on the fact that two 3D points P., and P., and the center of projection all lie on a straight line, even if we perform pure rotation of the camera along its center of projection. If we don’t have pure rotation, and some translation occurs during the rotation, then the three points will no longer be on a straight line and the two points Pc, and Pc, will project at two different image points 3. ‘ x ’ ) will move d build at the ISR-Coimbra has Transparent Vergence A Eye Tilt Optical Center Adjustment Fig. 6. The Optical Center Adjustment. Assume that the rotation is around the vertical axis (Y axis of the camera coordinate system). We placed on the wall a pattern with black vertical lines on a white background, printed on a laser printer. To create the parallax effect we placed between the camera and the pattern a transparent acrylic sheet with just one vertical black line. The thickness of this line is less than the thickness of the lines of the pattern in order to create the illusion that this single line is a extension of one of the lines of the pattern (see fig.6). This adjustment has been done by hand, and an edge detector was used to confirm the straightness of the resulting line. If after the rotation the straightness is not the same, this means that we don’t have pure rotation and the position of the center of projection must be adjusted by displacing the lens camera body along the optical center adjustment (OCA) degree of freedom. In our MDOF Active Vision Head we only have the ability to adjust for pure rotation the vergence axis. This only ensures that the rotation is over the Y axis of the camera coordinate system. The same procedure must be done if we want to ensure that pure rotation is occurring also over the X axis of the camera coordinate system. C.3 The Rotation Method - Implementation and Ezperimental Details The main purpose of this pure rotation calibration procedure is to find camera parameters which will enable one to best predict the effects of camera rotation in some optimal manner. Since after rotation we have a pair of images, we have chosen to minimize the sum of squared distances between the feature points in the image obtained after rotation and those computed from the initial image and using the pure rotation model described in III.C, summed over all the feature points of each pair of images. To be more precise, the intrinsic parameters can be obtained using N pairs of images taken with the camera rotated at various angles. The relative angles of rotation are measured precisely. The ISR-Coimbra MDOF active vision system *This observation is not valid for the case of a translation along the projective line defined by the three points. Since the translation 1s due to rotation about some point, the direction of translation continuously change and therefore only momentarily is aligned with the projective line. | 10 has a rotation degree of freedom for each camera (vergence) with a precision of 0.0036°. each pair of images are found and their pixels coordinates are extracted. There is no special number M of features in each images, but this is what we did in practice. To use the pure rotation calibration model we must consider the undistor points. Since we have obtained in advance the distortion correction constraints, we can correct the frame-buffer coordinates of the fea We define the cost function: Corresponding features in reason to detect the same ted coordinates of each pair of feature parameters using the cross-ratio and the collinearity ture points using equations 18 and 19. N M 2 2 — k k k k я E a > > [EA oo) + (3. a eo | (27) k=1n=1 where (27. y. ‚ .) are the coordinates of (2% ‚У ) after rotation from image i to image j in each pair. rot; Toti; т; . ? ny = 7 Combining the cost function with equations 23 and 24 and definin cost function E can now be defined by <= k k ? k k ê = / / EY (Ch, -)+ (va) k=1n=1 g the image points on the frame-buffer plane the where tk 11 (2 uy BN Cz) + mal, a cy)k=1 + ri3fzkz 1 na = kz z k Paz rai (2% —C7)+ 32 (Vf. — су) К-1 + тзз ЛК rales, — cz )k + (y — Cy) + "23 fyky ra(ef, —co)k + ray; — cy) +rasfyky The task is now to find the intrinsic parameters of the camera (fzkz, fyky, k, cz, cy) by a straightforward nonlinear search. Since we known the vertical scale factor ky = d;* the horizontal scale factor can be obtained using the aspect ratio k and using the scale factors values the horizontal and vertical focal lengths can also be obtained. C.4 First Guess Values for the Intrinsic Camera Parameters k vr, = ky fy Observing the cost function defined by the equation 27 we conclude that this equation 1s a nonlinear function, and 16 must be linearized before it can be used in the adjustment solution. This can be achieved using Newton's first order approximation and taking some initial guess values to the unknown parameters. From the five unknowns included in the cost function E , only the initial values for the focal length and principal point coordinates are difficult to achieve. Due to the fact that we already know the image aspect ratio, the initial values to the image scale factors can be obtained using the relationships defined on point III.A. With the knowledge of aspect ratio, the uncertainty horizontal scale factor can be obtained directly. Let us define the procedures used to obtain initial guess values to these two parameters. C.4.1 Determination of the First Guess Value for the Focal Length Taking the perspective projection camera model, an object of height h is shown to generate an image of height h’. Newton’s equation for magnification can be written as _R JÁ ERT a1 (28) where h, h' are the object and image heights, respectively, z. is the object distance measured from the front nodal point (center of projection) and f the effective focal length of the lens. To determine f, an object is imaged at sharpest focus at two different locations and the magnification is measured in each case. Employing for two object-image conjugates, we have ese u 7 m1 = my = —= = : 29) hi ze, — f ° ho z,—f | If the object is changed by Az between the two object-image arrangements, we have ze, = z., + Az. Combining this relationship with equation 29 we have e f 1 ha ге, + Аг - } (30) By subtracting equations 29 and 30, we obtain Az = = 31 f ( Fa D ‘ (31) ma mi The above relationship can be used to determine the effective focal the previous relationship, Az is a change in distance, and thus can have the ability to move the lens along its optical axis using the OCA degree of freedom, we create the two conjugate object-image pairs moving the lens from back to front with the OCA degree of freedom. This displacement corresponds to move the lens 10cm along its optical axis, which is the total range of the OCA degree of freedom. | C.4.2 Determination of the First Guess Value for the Principle Point Coordinates Several methods can be found in the literature that ji principal point. Under the pin-hole camera model, focusing and zooming is equivalently to change distance between optical center and image plane. When focusing or zooming, each Image point will move radially along a line passing through the principal point. If we take a sequence of images by changing the focus or the zoom, the lines defined by the projected Image points taken at different focus or zooming will intersect at a common point, which is the principal point. However, we find this approach not suitable to be used in our active calibration since focus and zoom changes may cause displacements of the principal point. Another solution to this problem, and much more reliable, was presented by Michael Penna [21 fact that the image of a sphere under perspective projection is an ellipse whose axis of | through the principal point [23]. Considering a set of several spheres displaced on t the principle point coordinates can be obtained through the intersection of the least i this approach does not need to change the optical settings of the lens, the principal reliable. length of the lens f. It should be noted that in be measured accurately. Since in our system we ntended to solve the problem of finding the coordinates of the ] and is based on the east Inertia is on a line passing he field of view of the camera, nertia axis of each sphere. Since point coordinates are much more V. CONCLUSIONS Some characteristics of the ISR MDOF Active Vision system calibration have presented in this paper. The ability of the active vision systems to Systems and some mathematical background of its optical perform accurate movements have been used to perform the active calibration using just features of the environment has the camera undergoes an accurate and controllable movement. In the case of the approach used and described on the paper a pure rotation of the camera was considered and a procedure to ensure pure rotation around the center of projection was presented. The correction of the lens distortion was considered, and the definition of a point of best radial fundamental to characterize the radial and dec The great advantage of this active calibra the camera using only features taken from th symmetry was entering distortion using just the first coefficient of the radial distortion. tion approach is that is allows to calibrate the Intrinsic parameters of e environment, assuming that the camera is able to perform controllable and precise movements. The use of bivariate polynomials to modelize the relationships between the physical camera parameters and the controllable parameters (zoom, focus and aperture position) allows a pseudo-real-time calibration of the intrinsic parameters through these functional relationships. ACKNOWLEDGEMENT This work has been partially supported by Junta Nacional de Investigacáo Científica e Tecnológica (JNICT) project VARMA. | References [1] A. Yarbus, “Eye Movements and Vision”, Plenum Press, New-York, 1967. [2] A. Basu, “Active Calibration: Alternative Strategy and Analysis”, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 495-500, 1993. [3] B. Caprile and V. Torre, “Using Vanishing Points for Camera Calibration”, in Int. Journal of Computer Vision, 4, pp. 127-140, 1990. [4] D. Geiger, A. Yuille, “Stereo and Eye Movements” in Biological Cybernetics, vol. 62, pp. 117-128, 1989. [5] F. Du and M. Brady, “Self-Calibration of the Intrinsic Parameters of Cameras for Active Vision”, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 477-482, 1993. [6] G. Stein, “Internal Camera Calibration using Rotation and Geometric Shapes”, Msc. Thesis, MIT, 1993. [7] G. Wei, S. De Ma, Implicit and Explicit Camera Calibration: Theory and Experiments”, in IEEE Trans. on Pattern Anal. and Machine Intelligence, vol. 16, no. 5, pp. 469-480, 1994. [8] H. Christensen, K. Bowyer, H. Bunke, editors, “Active Robot Vision : Camera Heads, model based navigation and reactive control”, vol.6, World Scientific, 1994. [9] I.Faig, “Calibration of Close-Range Photogrammetric Systems: Mathematical Formulation” Eng. Remote Sensing, vol. 14, pp. 1479-1486, 1975. [10] J. Batista, J. Dias, H. Araújo, A. T. Almeida, “Monoplanar Camera Calibration - Iterative Multi-Step Approach”, in Proc. British Machine Vision Conference, pp. 479-488, 1993. [11] J. Crowley, P. Bobet, C. Schmid, “Auto-calibration by direct observation of objects” Computing Journal, vol. 11, no. 2, pp. 67-81, 1993. , In Photogrammetric , In Image and Vision 12 y [12] J. Lavest, G. Rives, and M. Dhome, “Three-Dimensional Reconstruction by Zooming”, in IEEE Trans. on Robotics and Automation, vol. 9, No. 2, 1993. [13] J. L. Mundy, A. Zisserman, ” Geometric Invariance in Computer Vision”, in MIT Press 1992. [14] K. Pahlavan, “Active Robot Vision and Primary Ocular Processes”, PhD Thesis, C VAP, KTH, 1993. [15] K. Tarabanis, R. Tsai, and D. Goodman, “Modeling of a Computer-Controlled Zoom Lens”, in Proc. IEEE Int. Conf. on Robotics and Automation, pp. 1545-1551, 1992. [16] K. Tarabanis, R. Tsai, and D. Goodman, “Calibration of a Computer Controlled Robotic Vision Sensor with a Zoom Lens”, in CVGIP:Image Understanding, vol.59, No.2, pp. 226-241, 1994. [17] K. Tarabanis, R. Tsai, and P. Allen, “Analytical Characterization of the Feature Detectability Constraint of Resolution, Focus, and Field-of-view for Vision Sensor Planning”, in CVGIP:Image Understanding, vol.59, No.3, pp. 340-358, 1994. [18] L. Dron, “Dynamic Camera Self-Calibration from Controlled Motion Sequences”, in Proc. IEEE Conf. on Com- puter Vision and Pattern Recognition, pp. 501-506, 1993. [19] M. Li, “Camera Calibration of a Head-Eye System for Active Vision”, in Proc. European Conf. Computer Vision, pp. 543-554, 1994. [20] M. Li, “Camera Calibration of the KTH Head-Eye System”, in Technical report, CVAP-1 47, NADA, KTH, 1994. [21] M. Penna, “Camera Calibration : A quick and easy way to determine the scale factor”, in IEEE Trans. Pattern Anal. Machine Intelligence, vol. 12, no. 12, pp. 1240-1245, 1991. [22] O. Faugeras and G. Toscani, “The Calibration Problem for Stereo”, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 15-20, 1986. [23] P. Beardley, D. Murray and A. Zisserman, “Camera Calibration Using Multiple Images”, in Proc. European Conf. Computer Vision, pp. 312-320, 1992. [24] P. MacLauchlan and D. Murray, “Active Camera Calibration for a Head-Eye Platform using a Variable State- Dimension Filter”, submitted to IEEE Trans. on Pattern Analysis and Machine Intelligence. [25] R. Carpenter, ” Movements of the Eyes”, Pion Limited : London, England, 1988. [26] R. Hartley, “Self-Calibration from Multiple Views with a Rotating Camera”, in Proc. European Conf. Computer Vision, pp. 471-478, 1994. [27] R. Tsai, “An Efficient and Accurate Camera Calibration Technique for 3-D Machine Vision”, in Proc. IEEE Conf. Comput. Vision and Pattern Recognition, pp. 364-374, 1988. [28] В. Willson and S. Shafer, “What is the Center of the Image?” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 670-671, 1993. [29] R. Willson and S. Shafer, “Precision Imaging and Control for Machine Vision Research at Carnegie Mellon University”, in Proc. SPIE Conf. on High Resolution Sensors and Hybrid Systems, 1992. [30] R. Wurtz, M. Goldberg, “The Neurobiology of saccadic Eye Movements”, Elsivier, New-York, 1989. [31] S. Maybank and O. Faugeras, “A Theory of Self-Calibration of a Moving Camera”, in International Journal of Computer Vision, 8:2, pp. 123-151, 1992. [32] W. Gorsky and L. Tamburino, “A Unified Approach to the Linear Camera Calibration Problem”, in Proc. Int. Conf. Computer Vision, pp. 511-515, 1987. [33] Webb Associates, “Anthropometric Source Book, vol I: Anthropometry for Designers”, NASA Reference Publi- cation 1024, 1978. [34] Y. Abdel-Aziz and H. Karara, “Direct Linear Transformation into Object Space Coordinates in Close-Range Photogrammetry”, in Symp. Close-Range Photogrammetry, pp. 1-18, 1971. 13

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement