SMART
Instituto de Sistemas e Robótica
Instituto Superior Técnico
Av. Rovisco Pais, 1, 1096 Lisboa Codex, Portugal
SMART
Semi-Autonomous Monitoring and Robotics Technologies
WorkShop Notes
A
я Lisbon, 27-28 April 1995
Ч Portugal
|
i
„о
o
res)
e
я i ea
Be oe
e.
Pue
A ЗЫ
4 p=
3 = #4"
Say
nok
BEE
ca
a
"Za
EA
2
} Yo
E
> a e
AE
PZA
A a
LIT
“x 9
Re
TA
hn)
EE
—= pre
ier
N
The ISR Multi-Degrees-of-Freedom
Active Vision Robot Head
Design and Calibration
Jorge Batista, Jorge Dias, Helder Araujo, A. T. Almeida
Institute of Systems and Robotics (ISR), and Electrical Engineering Department, University of Coimbra,
3000 Coimbra - PORTUGAL;
phone: +351-39-34876/34884 ; fax: +351-39-35672 ; e-mail: [email protected]
Abstract
At the present, there is a growing trend in computer vision to consider the visual system in the context of
the behaviour of robot interacting with dynamic environment. The Active Vision approach observed that constraints
derived from camera motion can replace other assumptions that had previously been employed to solve mathematically
ill-posed problems.
Experiments in active vision require having the ability to manipulate the visual parameters. The central issue
in develloping a Multi-Degrees-of-Freedom (MDOF) Active Vision Robot Head is the design strategy of the system.
Bringing all the issues of such a system and their solutions together and build a head-eye system with reasonable
Pa performance vs. cost is an engineering problem.
This problem in this work was formulated as : "how should a head-eye system be designed, what are the design
criteria, how and in accordance with what strategy should the head be designed and controlled, what kind of degrees
of freedom must be included ?”.
To be able to effectively use multi degree of freedom (MDOF) camera systems we need to know how variations in
the camera’s control parameters are going to cause changes in the produced images. For this we need to have good
mathematical models describing the relationships between the control parameters and the parameters of the resulting
images, 1.e., we need to calibrate the system.
In this paper, we present methods and algorithms for camera calibration of an head-eye system developed at the
ISR — Coimbra.
The calibration of the system includes two parts: the camera calibration problem, 1.e., the calibration of the
intrinsic and extrinsic parameters of the camera, and the so-called kinematic calibration to calibrate relationships
(rotation and translation) between different systems. In this paper only the camera calibration problem is addressed.
A method for computing the camera parameters by tracking features in the image when the camera undergoes
pure rotation is used.
I - INTRODUCTION
Experiments in active vision require having the ability to manipulate the visual parameters. This ability and
associated issues form the subject of this abstract. The central issue in developing a Multi-Degrees-of-Freedom (MDOF)
Active Vision Robot Head is the design strategy of the system. Bringing all the issues of such a system and their
olutions together and build a head-eye system with a reasonable performance vs. cost is an engineering problem.
The design of a MDOF head-eye system for active vision is dependent on what we put into this notion. First of all,
an active vision system is not just an optomechanical device feeding a computer and carrying out the commands from
the computer. The degree of integration is crucial for such a system, and of course the issue of real-time processing
and control. These factors determine the behaviour one can obtain.
The more elaborately the visual system reacts to the surrounding environment, the more evolved the primary tasks
will be. The nature of the visual processes we want to integrate will be related to the architeture chosen. Much work
has been done in developing vision systems to study how these and other features of the human visual system are used
to facilitate perception.
One important aspect in the designing stage of these robotic systems is the performance they should accomplish.
The analisys of some characteristics of the human active visual system can be usefull for determining performance
requirements for velocity and acceleration of a mechanical device that intend to simulate the human visual system
behaviour.
To be able to effectively use multi degree of freedom (MDOF) camera systems we need to know how variations in
the camera’s control parameters are going to cause changes in the produced images. For this we need to have good
mathematical models describing the relationships betwein the control parameters and the parameters of the resulting
images, 1.e., we need to calibrate the system.
In modeling systems, and in special active vision robot systems, it helps to have an understanding of the underlying
physical process involved. If the process are straight forward or if they are accessible and can be directly measured
then the modeling task can be greatly simplified. This is the case of the extrinsic parameters of the camera calibration
model, such as rotation and translation. If instead the underlying physical process is complex or inaccessible then the
modeling task can be much more difficult. This is the case of the camera’s optical degrees of freedom, such as focal
length, focus distance and aperture. In modeling MDOF optical systems where there is a limited understanding of
the underlying physical process involved, extensive calibration is often necessary to develop the model and to collect
calibration data.
Until recently, most of the existing algorithms for camera calibration required the use of predefined patterns for
calibration. It is difficult to achieve and typically requires special objects to be placed in front of the cameras, made
and measured to high precision. For this reason a lot of emphasis has been placed on algorithms that do not require
any camera calibration and on camera calibration techniques that allows the camera to calibrate itself as it moves in
an unstructured world.
“Eyes in humans and other animals do not need any artificial assistance for calibration”.
The answer to the above observation cames with the fact that eye movements simplify the calibration problem.
It would be a great advantage to be able to calibrate a camera using only feature coordinates in the image plane.
This cannot be done with a single image. Instead it requires camera motion.
Methods for computing the camera parameters by tracking features in the image, without using special patterns for
calibration, have been developed by Faugeras [31], Hartley [26], Dron [18], McLauchlan [24], Brady [5], Stein [6] and
Basu [2].
An important aspect for the MDOF optical systems calibration is the fact that the parameters are changing from
time to time, which requires a real-time calibration of the parameters or a pre-calibration of these parameters to build
up a look-up-table.
We based our approach on the use of feature correspondences from a set of images where the camera has undergone
pure rotation.
The basic idea of this method is very simple. Given any pair of images obtained by a camera which undergoes pure
rotation, if the intrinsic camera parameters and the angle and axis of rotation are known, one can compute where the
feature points from one image will appear in the second image. If there is an error in the intrinsic parameters, the
features in the second image will not coincide with those computed from the first image.
The closer the features are located to the camera the more important it is that the camera does not undergo any
translation during the rotation. A method to ensure that the axis of rotation passes close to the center of projection
(front nodal point in a thick lens model) is presented.
We obtain the camera’s physical parameters by first correcting the distortion in image plane and then applying the
pure rotation calibration method to obtain the camera’s parameters. A method based on the cross ratio invariance is
used to obtain the point of best radial distortion symmetry, and the first coefficient of the radial distortion is obtained
using the principle of collinearity of three collinear projected points.
To be able to adjust the intrinsic parameters in real-time, we modelize the underlying behaviour of the camera for a
large group of different setups, changing the intrinsic parameters on these setups. We use bivariate cubic polynomials
to model the relationships between the camera’s control parameters and the parameters of the resulting images, such
has image magnification, focus distance, optical center adjustment, etc. . The calibration involves performing a least
square fit of the model to collected data to determine the coefficients of best fit for the bivariate polynomials.
For the extrinsic parameters (pose estimation of the camera), and since the movements of the head-eye system
are controlled, we know how much it has been moved relatively to some initial position. With this procedure, the
calibration of the extrinsic parameters must only be done at the initial position, and real-time calibration of the
extrinsic parameters can be performed.
On this paper we will describe the design and calibration of a MDOF Active Vision system head developed at the
ISR Coimbra. The head has 16 degrees of freedom, being 6 degrees of freedom for each eye (pan, tilt, zoom, focus,
aperture, and optical center adjustment), the baseline degree of freedom and three degrees of freedom for the neck
(swing, pan and tilt).
II - THE ISR-COIMBRA MDOF Active VISION HEAD
Active vision systems are often modeled on attributes of the human visual system since this is the most well-studied
visual system. The human oculomotor is one of the best known functions of our brain. Two attributes of the human
vision system, ocular motion and foveal-peripheral vision, are essential to human visual perception. Ocular motion
allows movements of the eyes to direct the view point of the visual system. Foveal-peripheral vision enables humans
to perceive small regions in fine detail in combination with a wide field of view at coorse detail. Taking advantages
of the ocular visual system, the human head also has the capability of changing gaze and fixate on features of the
environment. The combination of these (human) capabilities was our main purpose for building this robotic head.
Ignoring the dependencies enforced by the particular model of the oculomotor system, the degrees of freedom in the
human head can be listed as follows:
Eyes-mechanical Each eye has three degrees of freedom (a total of six degrees of freedom) which are:
eo superior-inferior (tilt)
e lateral-medial (pan)
e cyclotorsion (swing about the optical axis)
Neck-mechanical The neck has three degrees of freedom. which are:
/
Eye Pan Eye Tilt Neck Pan Neck Tilt
Range of Motion +45° +45° +80*% +90 up, —60” down
Peak Acceleration 35000deg/ s*
Peak Velocity 600deg/s
Interocular distance ~ 64.0mm
Foveal-Peripheral resolution ratio 10:1
Table 1. Human active visual system characteristics
o tilt
e pan
e swing or lateral tilt movement
Eyes-optical Each eye has two degrees of freedom (a total of four) which are:
e accommodation
e iris manipulation
The design of a MDOF head-eye system for active vision is dependent on what we put into this notion. First of all,
_an active vision system is not just an optomechanical device feeding a computer and carrying out the commands from
he computer. The degree of integration is crucial for such a system, and of course the issue of real-time processing
and control. These factors determine the behaviour one can obtain.
The work that will be presented on the paper is part of a project, in which the control issue was for the first time
adressed in such a way that all dependencies in different possible actuation shemes of the system are defined by the
organization of the control system.
The problem in this work was formulated as : “how should a head-eye system be designed, what are the design
criteria, how and in accordance with what strategy should the head be designed and controlled, what kind of degrees
of freedom must be included ?”.
Our main purpose for building these vision robot head was not just having an active vision system with the basic
degrees of freedom (vergence, pan and tilt) to perform tracking or to be used on mobile platforms, but also to have a
device where we can study and simulate some of the human visual behaviours.
This active vision robot head was designed taking into account some of the geometric aspects of the human head,
such has for instance the fact that the human head has a translation between the neck pan rotation and the location
of the eyes. These aspects will give us the chance to analyse what kind of visual information are processed by the
human brain. We also included some optical degrees of freedom that the human beings does not have, such has the
capability of changing zoom.
One important aspect in the designing stage of these robotic systems is the performance they should accomplish.
The analisys of some characteristics of the human active visual system can be usefull for determining performance
requirements for velocity and acceleration of a mechanical device that intend to simulate the human visual system
behaviour.
As an example, the human eye executes a saccadic motion that has as initial acceleration of up to 30000° /s*,
which is then followed by a smaller deceleration to terminate the movement with a slight overshoot. The amplitude
of a saccade of the human eye rarely reaches above 15°, at which point the acceleration of the motion saturates at
35000°/s%. Saccades of amplitude larger than 20° involve both eye motion and head motion, and move at even higher
speeds. The receptors in the fovea centralis of the human eye are significantly more dense than at the periphery, giving
10CA : Optical Center Adjustment
Precision Range Velocity
Neck Pan 0.0036° [—110°.. + 110°] ~ 360°/s
Neck Swing 0.0036° [278.4 27.5" = 360° /s
Neck Tilt 0.0036° [—32°.. + 32°] ~ 360° /s
Eye Pan 0.0036° |—45°.. + 45°] ~ 360°/s
Eye Tilt 0.0031° [—20°.. 4+ 20°] ~ 330°/s
OCA * 8nm [0..80)mm ~ 1lmm/s
Baseline 20nm [137..287]тт. ~ 5mm/s
Zoom Range /60000 [12, 5..75)mm ~ 1.8 * Range/s
Aperture Range [75000 [1, 2..16] тт ~ 1.75 * Range/s
Focus Range /60000 [1..c0]mm ~ 1.8 x Range/s
Table 2. ISR MDOF active vision system characteristics
Fig. 1. ISR MDOF Active Vision Head
a resolution ratio between fovea and periphery of about 10:1 The fovea centralis, that presents much higher resolution,
plays a critical role in object discrimination, identification and manipulation.
Some characteristics of the human visual system are sumarized in table 1, being the information presented in the table
obtained from Carpenter [25], Yarbus [1], Geiger and Yuille [4], Webb Associates [33] and Wurth and Goldberg [30].
Compared with the degrees of freedom of the human head described above, the ISR MDOF active vision system
has the following degrees of freedom (see fig. 1):
Eyes-mechanical Each eye has two degrees of freedom (a total of six) :
e elevation (tilt)
e azimuth (pan)
e cyclotorsion
* an aditional degree of freedom is included to kee
elevation axes of the lens.
Neck-mechanical The neck has three degrees of freedom :
e tilt
p the optical center at the crosspoint of the azimuth and
e pan
* swing or lateral tilt movement
Eyes-optical Each eye has three degrees of freedom (a total of six) 2:
e focusing
® zooming
* Iris manipulation
Baseline The ability of mechanicaly change the distance between the two eyes.
This MDOF active vision robot head is probably the head that currently has more degrees of freedom. In addition
to the common degrees of freedom for camera heads (pan, tilt and independent vergence for each of the eyes), this
head includes the swing movement of the head neck, cyclotorsion of the lenses and the ability of adjusting the optical
center of the lenses. The latter is to ensure pure rotation when verging the cameras and compensate for the translation
movement of the optical center when changing the focal length of the lens.
Most of the existing heads uses standart motorized lenses with presets for feedback information. These lenses has
the disadvantage of moving too slow for real-time acommodation purposes (5-6 seconds to full range movement), and
the accuracy for position control is not very good due to the type of information they provide as feedback. New
motorized lenses are now being developed which will enable this head to acommodate the optical system in almost real
time, with very good precision (more than 30000 discrete position for the full range of each optical degree of freedom)
Some characteristics of the ISR MDOF active vision system are sumarized on table 2.
2A new motorized lens is been develo
ped at the ISR based on DC motors and using incremental encoders as feedback information. They
will have the ability to work almost in
real time. Full range travel in less than half a second for each degree of freedom.
III - OPTICAL CALIBRATION : Mathematical Background
A. The Perspective Projection Camera Model
In the following, the underlying camera model is described briefly.
We assume the existence of a Cartesian coordinate system C, centered at the optical center of the camera lens and
the lens viewing down the Z, axis. The image plane is located at a distance f in front of the optical center and is
orthogonal to the Z, axis. The x; axis of the image coordinate system I; is parallel to the X. axis of Cy, and the y;
axis of the I, is parallel to the Y, axis of C; (see fig. 2).
The overall transformation from a 3D camera point P.(z., ye, 2c)
can be decomposed into the following steps:
1. Projection from 3D camera coordinate point Pe(zc, ye, zc) to the ideal image coordinate point Pu
perspective projection with pin-hole camera geometry
to a computer image frame buffer point py (zy, yy)
(Tu; Yu), Using
Tc Ye
Ty =f -— =f:—. 1
u=f E Yu =f Z ( )
2. Off the shelf cameras and lenses which are often used for computer vision have large amounts of lens distortion, in
particularly wide angle lenses. The most important types of lens distortions are the radial and decentering distortion,
and the model for these distortions to map distorted image coordinates, that are observable, to the undistorted
(or
ideal) image coordinates which are not physically measurable are:
Zu = даб Yu =i+8, (2)
where à, and 6, are the corrections for lens distortion and can be found by
de = га (Кута + Кота ++.) +
[Pi (rà + 2x5) + 2P2xava] [1 + Pa (73 + --)] (3)
dy = yalKırz+ Кота +...) +
[Pi (ri + 294) + 2Por aya) [1+ Ps (3 +--)] (4)
and where r = 13 +42.
The first term in equations 3 and 4 is the radial distortion with parameters K, and K» and the second term is
the decentering distortion (or tangential distortion) with parameters P,, Pa and Ps. Decentering distortion is due
to misalignment of the lens elements and the non-perpendicularity of the lens assembly and the image plane. This
distortion is much less important than the radial distortion.
An alternate solution to the decentering distortion can be obtained using just the radial distortion, supposing that
the optical axis of the lens is not perpendicular to the image plane. With this assumption, the optical axis will no
longer pass through the principal point (which is the point where a line starting at the center of projection intersects
perpendicularly the image plane) but through another point which will be call the point of best radial symmetry
(Cor, С r).
A conrdigale shift 1s taking place to the point of best radial symmetry by translating the distorted image point py
by substituting z/, = xq — cz, and Ya = Yd — Cyr, Where z/; and у’, are the coordinates of a point p,. Taking only the
| pt q Pr
y d, wp :
A Codo,
| LY y
I - .
|
Y Y
с я
Center of Projection
Fig. 2. Pin-Hole Camera Model, with point of best radial symmetry
radial distortion terms of the equations 3 and 4 and using the p', points instead the pg points, the lens distortion is
now defined by
8, = 2 (Kirj + Korg +...) (5)
by = yılKırf + Korff +.) (6)
where as before e = a +y7.
Taking only the first coefficient of the radial distortion K1, and expanding the equations 5 and 6 we get
oy = zaKırz = Jy [cer (ri + 2:3) + 2¢yr Taya] +
K31 [xacy, + 2xdeZ, + QyacarCyr — E. — car] (7)
by = yaKırz = Kı [Er (ri "т 2y3) + 2czrVaXd] +
K, [vací, + 2 aci, + 2T aCyr Car — Bo — & Cyr | : (8)
The first term of equations 7 and 8 is the radial distortion term as in equations 3 and 4. Considering P; = —K; cz,
and Py = —Kjcy, the second term of equations 7 and 8 is the decentering distortion term, and since c,, and Cyr are
usually at most tens of pixels, the third term is small. Based on that, the two forms to represent the distortion are
equivalents and we can include the decentering distortion in our camera model taking into account only the radial
distortion coefficients.
3. Finally, the point is converted to frame buffer coordinates taking the distorted image coordinates and using the
relationships
ту = К. та + С; Yr = ky - Yd + Cy (9)
where (cz, cy) is the principal point in frame buffer coordinates and (k., ky) are the horizontal and vertical scale factors
respectively. If we take the information provided for the camera manufacture about the CCD array, the scale factors
can be defined as k; = S; - Ny; - (dz + Nez)7! and k, = а where Ny: is the number of the frame buffer pixels
per row, Nez is the number of CCD sensors per row, (dr, dy) is the horizontal and vertical distance between CCD
sensors respectively and finally S; that represents the horizontal uncertainty scale factor. In order to understand the
meaning of the uncertainty scale factor S; it is important to understand the imaging process, which is the transfer
of image information from the image plane to the frame buffer computer memory. In CCD cameras, the image plane
ls a rectangular array of light sensors elements, which gives a discrete spatial sampling of the image plane. In digital
cameras each individual element along a CCD row is digitized and transfered directly to the Image frame buffer
memory, so the i!” element on a row of the frame buffer corresponds to the it" along the row of the CCD array.
However, in regular video cameras the CCD values along each row are output as a time varying analog voltage signal
which 1s resampled probably at a different frequency of the pixel clock frequency of the CCD. Since the original pixel
clock frequency is lost in this process, the one-to-one pixel correspondence is no longer valid, and the uncertainty scale
factor intends to model this frequency unadjustment.
B. Correction of Radial Lens Distortion
Under the radial and decentering distortion model obtained assuming the point of best radial symmetry, we can
compute the camera’s physical parameters by first correcting the distortion in image plane and then applying the pure
rotation calibration method to obtain the camera's parameters.
B.1. Cross-Ratio for Point of Best Radial Symmetry
The cross-ratio is the basic invariant in projective geometry.
Definition 1 CROSS-RATIO OF FOUR POINTS. Let A, B, C, D be four collinear 3D points, we can define what we call
their cross-ratio as:
(4, В; С, р) = X
will >
Se
>| 5
SIS
where AB is the algebraic measure of AB.
Definition 2 CROSS-RATIO OF FOUR LINES. The cross-ratio of a pencil of four lines ly, ls, la, la going through O is
defined as the cross-ratio (A, B;C, D) of the points of intersection of the l; ;-1 4 with any line L not passing through
O. This is, of course, independent of the choise of L.
Based on definition 2, consider the existence of a line L in 3D space containing four collinear points P,, Po, P3 and
Pa with known distances between each other. These points are ideally projected on the image plane, resulting the four
projective points py,, Puy, Pus and py,, which due to radial lens distortion have four homologue distorted points pg, ,
Pd,, Pd, and pa, (see fig. 3).
Consider O(czr, cy-) the point of best radial symmetry, which we distinguish from the image center for the reasons
that we have already described on the previous section. Since O is the distortion center, then, Opa; is in the radial
©
Center of Projection
age Plane
Fig. 3. Projective Invariant for Point of Best Radial Distortion
direction, and ри, should be on Opz,, with à = 1.4. The vectors
Opa,, Which will be denoted by li i=1.4, form with
Pi
i=1.4 ON L a projective mapping, which makes the cross-ratio invariant [7;13] (see definition 2). Thus we have
(Pi, Pa; Ps, Pa) = (1, la;l3, la). (10)
Let the cross ratio of the 3D line points be
РР, + ВР
(P,, Pa; Pa, P)) = > _ CR. (11)
PaP4- PIP.
Based on the relationship 10 we could also define the cross ratio CR as
_ sin 013 sin вод (12)
a sin 034 sin 019
where 0;; is the angle between line Opa, and line Opa;-
Consider that the point of best radial symmetry have the coordinates [Crr;, Cyr ,) in the frame buffer coordinate
system, and each distorted image point pg; have the coordinates (Td;, Ya;)i=1..4 With respect to the point of best radial
distortion symmetry. On the frame buffer coordinate system these points have the coordinates (zy,,yy,)i=1. 4. Based
on this we have
Opa; X Opa;
n 6;; = | Ea x Opa;| (13)
Opa; |OPa; |
where Ора, — (Zd;, Ya) = (77; — Cory EZ, (ys, — Суг, ) К°) and |Opa; X Opa;| — EU A WU
Substituting 13 into 12, we obtain for the cross ratio
f13f24
_ 14
CR faafi2 (14)
Where Л; = (24; — cary) (yy, — Cyr;) — (24: — UE — Cyr, )-
Observing equation 14 it is easy to see that is independent of the radial distortion coefficient, pose of the camera,
scale factors of the image and the focal length of the lens. It is a nonlinear equation on the two variables (
CzryyCyry)-
Given more than one collinear set of calibration points we can solve for (
Cxr; » Cyr; ) defining the minimization function
f13f24
F=CR= =.
f3af12
Using the local minimization of the previous relationship about an initial guess and proceeding with iterative search
a solution to the point of best radial distortion symmetry can be obtained. We use as initial guess to the iterative
search the center of the frame buffer. To obtain the point of best radial symmetry we used the pattern present on
figure 4, which defines six different collinear set of calibration points.
B.2. Computing the Aspect Ratio and Distortion Coefficient by Collinearity
In the ideal situation of having an image free of distortion and since perspective projection preserves collinearity of
collinear points, image points produced by collinear world points should be collinear.
Fig. 4. Pattern used to obtain the point of best radial symmetry
This is the basic principle to obtain the first coefficient of the radial distortion k;. As we presented on section III.A,
assuming the existence of a point of best radial symmetry and taking only the first coefficient of the radial distortion
is enough to characterize the radial and the decentering distortion.
If we consider the existence of only radial distortion, the frame buffer coordinates of a point (
to create its undistorted image coordinate by simplifying equation 3 and equation 4 to take int
distortion and then expressing the undistorted image coordinates by 1; and y; according to
Ya = (Ys — yr, Jk; 1, resulting
Tf, 5) can be corrected
o account only the radial
zd = (24 — Cor, )E7! and
y
Ty = kz (zg Crr,) + (24 — Our Y KE + zy — Crr;)
(yr — cyr,) 2k) (15)
du = ky (yy —cyr,)+ (27 — corp) (yr — yr, JEL EZ +
(27s —Crr, *K1) (16)
where (z,,y,) are defined with respect to the
Image aspect ratio.
Take the example of the following three collinear points Pdi,i=1..3- Since the collinearity condition for the three
corrected image points p,,,_, , =
= (Zu,, Yu,) remains valid under the perspective projection, it can be defined as
point of best radial symmetry, k} = kik; and k = ky /k, is called the
Е = (Zus Zu, )(Yuz Yu, ) a (Zu, Zuy ) (Vus a Yu, ) = 0. (17)
Combining equations 15 and 16 with the e
coefficient k, and the aspect ratio k. Note
scale factor k. Since the vertical scale facto
radial distortion coefficient can be obtained
to obtain the point of best radial symmetry
radial distortion k} can be taken to be zero
With the knowledge of the coordinates fo
distortion coefficient and aspect ratio, we c
coordinates. Using the equations 15 and 16
quation 17, we get a nonlinear equation on the equivalent radial distortion
that the radial distortion coefficient k, can only be obtained up to the
r is defined by the vertical distance between CCD sensors (ky = d7'), the
easily. Since the resulting equation is nonlinear, the same procedure used
can be used to obtain k! and k. The initial guess values to the equivalent
and the aspect ratio k to be one.
r the point of best radial symmetry and the correct values to the radial
an now correct the radial and decentering distortion for the frame buffer
we can define the correction equations to the frame buffer coordinates as
Zi. = Tp + (25 — cor, RIK + (25 = cor) ) (yy — cyr, ) 2K, (18)
Yfe = Yf + (25 — Cary) (Vf — Cyr, )ELE? + (уу — сут, )° К; (19)
where (zy, y;.) are the corrected (or ideal) coordinates of the frame buffer coordinates (Zf, Y).
Care must be taken with this procedure to correct for lens distortion since this procedure relies only on the distortion
effect. If we are using lenses with very small lens distortion the calculation of the point of best radial symmetry will
have a very large uncertainty. The extreme case is when we are using a lens totally free of any kind of distortion.
In this case the point of best radial symmetry could be anywhere in the image without violating the cross ratio
invariance. Fortunately this situation never happens with the off-the-shelf lenses used in computer vision, so the
procedure proposed to solve the lens distortion is as a whole reliable.
C. The Rotation Method - Theory
Given any pair of images where the camera has undergone pure rotation over some axis, if the intrinsic camera
parameters and the angle and axis of rotation are known one can compute where some feature points from one image
Pure Rotation
over Ye
Pure Rotation | :
over Xc ı
1 I
, 1
Fig. 5. Pure Rotation around some axis
will appear in the second image after rotation. This is the main obs
obtain some of the intrinsic camera parameters.
Let us assume that the camera is rotated in a rigid world environment around some axis (
existence of a camera coordinate system located at the lens o
of the lens. A 3D point P,(z., ye, Ze)
through the matricial relationship
ervation that allows us to use pure rotation to
see fig. 5). Also assume the
ptical center and the Z axis viewing along the optical axis
in the camera coordinate system will move after rotation to a point Pl(z., y., zl)
711 T12 T13
P=R-P.=| ry T2 T7 | Pe. (20)
731 T32 Tas
Using the perspective projection pin-hole geometry the 3D camera point P! projects to the undistorted Image point
PulZu, Y) where
x r11Xe+T r132
ff 11Z¢ + T12Yc + T132¢ (21)
Ze T31%c + T32Yc + T332c
у T21Zc + T22Yc + T232
у, = f= = f= - = (22)
Ze T31Tc + T32Yc + T332c
Multiplying equations 21 and 22 by f/z. and substituting z, = f(z./z.) and y, = Fyc/zc) results
/ "Lu + 7129u + 7i3f 9
т, = (23)
T312u + T32Yu + T33f
TATu + 7224u + T23f
L =f—— - | (24)
T31Ty + T32Yu + Ta3f
Observing these last two equations, we can conclude that the position of the point in the image after pure rotation
depends only on the intrinsic camera parameters, the rotation matrix and the location of the point in the image before
the rotation. The 3D point coordinates are not required in the case of pure rotation. As we will see in the nest point,
this 1s not the case when we have rotation with translation.
C.1. The Importance of Pure Rotation
If the axis of rotation does not pass exactly through the optical center of the lens (center of projection) then there
will be some translation in addition to the rotation around the center of projection. Considering the existence of
a translation vector T = [ ¢, ty t; 1 the camera coordinate of the point P, after rotation is obtained using
P.= R-P.+T and the location of an image point after rotation and translation will be
pt ¡Tu + rife + raf + f= (25)
и T31Zy + T32Yu + T33f + Joa
y _ 721% + 722Yu + T23f + f= (26)
- T31Tu + T32Yu + T33f + E -
As we can see from the equation 25 and equation 26 the location of the point in the image after rotation is no longer
independent of the depth of the 3D camera point and it also depends on the translation vector. However, if we use
feature points that are located far from the camera (z, considerably large
negligible (z. >> ft, г. > Ро, & > FL).
C.2 How to Obtain Pure Rotation
), then the effect of the translation becomes
As we change the focus or the zoom position of the lens, the optical center of the lens (center of projection
along its optical axis. To compensate this displacement the MDOF Active Vision Hea
the ability to move the lens along its optical axis with an accuracy of 0.015um.
The test whether there is little or no translation is very simple and it involves the use of parallax. The motion of
parallax is based on the fact that two 3D points P., and P., and the center of projection all lie on a straight line, even
if we perform pure rotation of the camera along its center of projection. If we don’t have pure rotation, and some
translation occurs during the rotation, then the three points will no longer be on a straight line and the two points
Pc, and Pc, will project at two different image points 3.
‘
x
’
) will move
d build at the ISR-Coimbra has
Transparent
Vergence
A
Eye
Tilt
Optical Center
Adjustment
Fig. 6. The Optical Center Adjustment.
Assume that the rotation is around the vertical axis (Y axis of the camera coordinate system). We placed on the
wall a pattern with black vertical lines on a white background, printed on a laser printer. To create the parallax
effect we placed between the camera and the pattern a transparent acrylic sheet with just one vertical black line. The
thickness of this line is less than the thickness of the lines of the pattern in order to create the illusion that this single
line is a extension of one of the lines of the pattern (see fig.6). This adjustment has been done by hand, and an edge
detector was used to confirm the straightness of the resulting line. If after the rotation the straightness is not the
same, this means that we don’t have pure rotation and the position of the center of projection must be adjusted by
displacing the lens camera body along the optical center adjustment (OCA) degree of freedom.
In our MDOF Active Vision Head we only have the ability to adjust for pure rotation the vergence axis. This only
ensures that the rotation is over the Y axis of the camera coordinate system. The same procedure must be done if we
want to ensure that pure rotation is occurring also over the X axis of the camera coordinate system.
C.3 The Rotation Method - Implementation and Ezperimental Details
The main purpose of this pure rotation calibration procedure is to find camera parameters which will enable one to
best predict the effects of camera rotation in some optimal manner. Since after rotation we have a pair of images, we
have chosen to minimize the sum of squared distances between the feature points in the image obtained after rotation
and those computed from the initial image and using the pure rotation model described in III.C, summed over all the
feature points of each pair of images.
To be more precise, the intrinsic parameters can be obtained using N pairs of images taken with the camera rotated
at various angles. The relative angles of rotation are measured precisely. The ISR-Coimbra MDOF active vision system
*This observation is not valid for the case of a translation along the projective line defined by the three points. Since the translation
1s due to rotation about some point, the direction of translation continuously change and therefore only momentarily is aligned with the
projective line. |
10
has a rotation degree of freedom for each camera (vergence) with a precision of 0.0036°.
each pair of images are found and their pixels coordinates are extracted. There is no special
number M of features in each images, but this is what we did in practice.
To use the pure rotation calibration model we must consider the undistor
points. Since we have obtained in advance the distortion correction
constraints, we can correct the frame-buffer coordinates of the fea
We define the cost function:
Corresponding features in
reason to detect the same
ted coordinates of each pair of feature
parameters using the cross-ratio and the collinearity
ture points using equations 18 and 19.
N M
2 2
— k k k k я
E a > > [EA oo) + (3. a eo | (27)
k=1n=1
where (27. y. ‚ .) are the coordinates of (2% ‚У ) after rotation from image i to image j in each pair.
rot; Toti; т;
. ? ny
= 7
Combining the cost function with equations 23 and 24 and definin
cost function E can now be defined by
<= k k ? k k ê
= / /
EY (Ch, -)+ (va)
k=1n=1
g the image points on the frame-buffer plane the
where
tk 11 (2
uy BN Cz) + mal, a cy)k=1 + ri3fzkz
1 na = kz z k
Paz
rai (2% —C7)+ 32 (Vf. — су) К-1 + тзз ЛК
rales, — cz )k + (y — Cy) + "23 fyky
ra(ef, —co)k + ray; — cy) +rasfyky
The task is now to find the intrinsic parameters of the camera (fzkz, fyky, k, cz, cy) by a straightforward nonlinear
search. Since we known the vertical scale factor ky = d;* the horizontal scale factor can be obtained using the aspect
ratio k and using the scale factors values the horizontal and vertical focal lengths can also be obtained.
C.4 First Guess Values for the Intrinsic Camera Parameters
k
vr, = ky fy
Observing the cost function defined by the equation 27 we conclude that this equation 1s a nonlinear function, and
16 must be linearized before it can be used in the adjustment solution. This can be achieved using Newton's first order
approximation and taking some initial guess values to the unknown parameters.
From the five unknowns included in the cost function E , only the initial values for the focal length and principal
point coordinates are difficult to achieve. Due to the fact that we already know the image aspect ratio, the initial
values to the image scale factors can be obtained using the relationships defined on point III.A. With the knowledge
of aspect ratio, the uncertainty horizontal scale factor can be obtained directly.
Let us define the procedures used to obtain initial guess values to these two parameters.
C.4.1 Determination of the First Guess Value for the Focal Length
Taking the perspective projection camera model, an object of height h is shown to generate an image of height h’.
Newton’s equation for magnification can be written as
_R JÁ
ERT a1 (28)
where h, h' are the object and image heights, respectively, z. is the object distance measured from the front nodal
point (center of projection) and f the effective focal length of the lens.
To determine f, an object is imaged at sharpest focus at two different locations and the magnification is measured
in each case. Employing for two object-image conjugates, we have
ese u 7
m1 =
my = —= = : 29)
hi ze, — f ° ho z,—f |
If the object is changed by Az between the two object-image arrangements, we have ze, = z., + Az. Combining
this relationship with equation 29 we have
e f
1 ha ге, + Аг - } (30)
By subtracting equations 29 and 30, we obtain
Az
= = 31
f ( Fa D ‘ (31)
ma mi
The above relationship can be used to determine the effective focal
the previous relationship, Az is a change in distance, and thus can
have the ability to move the lens along its optical axis using the OCA degree of freedom, we create the two conjugate
object-image pairs moving the lens from back to front with the OCA degree of freedom. This displacement corresponds
to move the lens 10cm along its optical axis, which is the total range of the OCA degree of freedom. |
C.4.2 Determination of the First Guess Value for the Principle Point Coordinates
Several methods can be found in the literature that ji
principal point.
Under the pin-hole camera model, focusing and zooming is equivalently to change distance between optical center
and image plane. When focusing or zooming, each Image point will move radially along a line passing through the
principal point. If we take a sequence of images by changing the focus or the zoom, the lines defined by the projected
Image points taken at different focus or zooming will intersect at a common point, which is the principal point.
However, we find this approach not suitable to be used in our active calibration since focus and zoom changes may
cause displacements of the principal point.
Another solution to this problem, and much more reliable, was presented by Michael Penna [21
fact that the image of a sphere under perspective projection is an ellipse whose axis of |
through the principal point [23]. Considering a set of several spheres displaced on t
the principle point coordinates can be obtained through the intersection of the least i
this approach does not need to change the optical settings of the lens, the principal
reliable.
length of the lens f. It should be noted that in
be measured accurately. Since in our system we
ntended to solve the problem of finding the coordinates of the
] and is based on the
east Inertia is on a line passing
he field of view of the camera,
nertia axis of each sphere. Since
point coordinates are much more
V. CONCLUSIONS
Some characteristics of the ISR MDOF Active Vision
system calibration have presented in this paper.
The ability of the active vision systems to
Systems and some mathematical background of its optical
perform accurate movements have been used to perform the active
calibration using just features of the environment has the camera undergoes an accurate and controllable movement.
In the case of the approach used and described on the paper a pure rotation of the camera was considered and a
procedure to ensure pure rotation around the center of projection was presented.
The correction of the lens distortion was considered, and the definition of a point of best radial
fundamental to characterize the radial and dec
The great advantage of this active calibra
the camera using only features taken from th
symmetry was
entering distortion using just the first coefficient of the radial distortion.
tion approach is that is allows to calibrate the Intrinsic parameters of
e environment, assuming that the camera is able to perform controllable
and precise movements. The use of bivariate polynomials to modelize the relationships between the physical camera
parameters and the controllable parameters (zoom, focus and aperture position) allows a pseudo-real-time calibration
of the intrinsic parameters through these functional relationships.
ACKNOWLEDGEMENT
This work has been partially supported by Junta Nacional de Investigacáo Científica e Tecnológica (JNICT) project
VARMA. |
References
[1] A. Yarbus, “Eye Movements and Vision”, Plenum Press, New-York, 1967.
[2] A. Basu, “Active Calibration: Alternative Strategy and Analysis”, in Proc. IEEE Conf. on Computer Vision and
Pattern Recognition, pp. 495-500, 1993.
[3] B. Caprile and V. Torre, “Using Vanishing Points for Camera Calibration”, in Int. Journal of Computer Vision,
4, pp. 127-140, 1990.
[4] D. Geiger, A. Yuille, “Stereo and Eye Movements” in Biological Cybernetics, vol. 62, pp. 117-128, 1989.
[5] F. Du and M. Brady, “Self-Calibration of the Intrinsic Parameters of Cameras for Active Vision”, in Proc. IEEE
Conf. on Computer Vision and Pattern Recognition, pp. 477-482, 1993.
[6] G. Stein, “Internal Camera Calibration using Rotation and Geometric Shapes”, Msc. Thesis, MIT, 1993.
[7] G. Wei, S. De Ma, Implicit and Explicit Camera Calibration: Theory and Experiments”, in IEEE Trans. on
Pattern Anal. and Machine Intelligence, vol. 16, no. 5, pp. 469-480, 1994.
[8] H. Christensen, K. Bowyer, H. Bunke, editors, “Active Robot Vision : Camera Heads, model based navigation and
reactive control”, vol.6, World Scientific, 1994.
[9] I.Faig, “Calibration of Close-Range Photogrammetric Systems: Mathematical Formulation”
Eng. Remote Sensing, vol. 14, pp. 1479-1486, 1975.
[10] J. Batista, J. Dias, H. Araújo, A. T. Almeida, “Monoplanar Camera Calibration - Iterative Multi-Step Approach”,
in Proc. British Machine Vision Conference, pp. 479-488, 1993.
[11] J. Crowley, P. Bobet, C. Schmid, “Auto-calibration by direct observation of objects”
Computing Journal, vol. 11, no. 2, pp. 67-81, 1993.
, In Photogrammetric
, In Image and Vision
12
y
[12] J. Lavest, G. Rives, and M. Dhome, “Three-Dimensional Reconstruction by Zooming”, in IEEE Trans. on Robotics
and Automation, vol. 9, No. 2, 1993.
[13] J. L. Mundy, A. Zisserman, ” Geometric Invariance in Computer Vision”, in MIT Press 1992.
[14] K. Pahlavan, “Active Robot Vision and Primary Ocular Processes”, PhD Thesis, C VAP, KTH, 1993.
[15] K. Tarabanis, R. Tsai, and D. Goodman, “Modeling of a Computer-Controlled Zoom Lens”, in Proc. IEEE Int.
Conf. on Robotics and Automation, pp. 1545-1551, 1992.
[16] K. Tarabanis, R. Tsai, and D. Goodman, “Calibration of a Computer Controlled Robotic Vision Sensor with a
Zoom Lens”, in CVGIP:Image Understanding, vol.59, No.2, pp. 226-241, 1994.
[17] K. Tarabanis, R. Tsai, and P. Allen, “Analytical Characterization of the Feature Detectability Constraint of
Resolution, Focus, and Field-of-view for Vision Sensor Planning”, in CVGIP:Image Understanding, vol.59, No.3,
pp. 340-358, 1994.
[18] L. Dron, “Dynamic Camera Self-Calibration from Controlled Motion Sequences”, in Proc. IEEE Conf. on Com-
puter Vision and Pattern Recognition, pp. 501-506, 1993.
[19] M. Li, “Camera Calibration of a Head-Eye System for Active Vision”, in Proc. European Conf. Computer Vision,
pp. 543-554, 1994.
[20] M. Li, “Camera Calibration of the KTH Head-Eye System”, in Technical report, CVAP-1 47, NADA, KTH, 1994.
[21] M. Penna, “Camera Calibration : A quick and easy way to determine the scale factor”, in IEEE Trans. Pattern
Anal. Machine Intelligence, vol. 12, no. 12, pp. 1240-1245, 1991.
[22] O. Faugeras and G. Toscani, “The Calibration Problem for Stereo”, in Proc. IEEE Conf. Computer Vision and
Pattern Recognition, pp. 15-20, 1986.
[23] P. Beardley, D. Murray and A. Zisserman, “Camera Calibration Using Multiple Images”, in Proc. European Conf.
Computer Vision, pp. 312-320, 1992.
[24] P. MacLauchlan and D. Murray, “Active Camera Calibration for a Head-Eye Platform using a Variable State-
Dimension Filter”, submitted to IEEE Trans. on Pattern Analysis and Machine Intelligence.
[25] R. Carpenter, ” Movements of the Eyes”, Pion Limited : London, England, 1988.
[26] R. Hartley, “Self-Calibration from Multiple Views with a Rotating Camera”, in Proc. European Conf. Computer
Vision, pp. 471-478, 1994.
[27] R. Tsai, “An Efficient and Accurate Camera Calibration Technique for 3-D Machine Vision”, in Proc. IEEE Conf.
Comput. Vision and Pattern Recognition, pp. 364-374, 1988.
[28] В. Willson and S. Shafer, “What is the Center of the Image?” in Proc. IEEE Conf. on Computer Vision and
Pattern Recognition, pp. 670-671, 1993.
[29] R. Willson and S. Shafer, “Precision Imaging and Control for Machine Vision Research at Carnegie Mellon
University”, in Proc. SPIE Conf. on High Resolution Sensors and Hybrid Systems, 1992.
[30] R. Wurtz, M. Goldberg, “The Neurobiology of saccadic Eye Movements”, Elsivier, New-York, 1989.
[31] S. Maybank and O. Faugeras, “A Theory of Self-Calibration of a Moving Camera”, in International Journal of
Computer Vision, 8:2, pp. 123-151, 1992.
[32] W. Gorsky and L. Tamburino, “A Unified Approach to the Linear Camera Calibration Problem”, in Proc. Int.
Conf. Computer Vision, pp. 511-515, 1987.
[33] Webb Associates, “Anthropometric Source Book, vol I: Anthropometry for Designers”, NASA Reference Publi-
cation 1024, 1978.
[34] Y. Abdel-Aziz and H. Karara, “Direct Linear Transformation into Object Space Coordinates in Close-Range
Photogrammetry”, in Symp. Close-Range Photogrammetry, pp. 1-18, 1971.
13
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement