Exploiting Single View Geometry in Pan-Tilt-Zoom Camera Networks A. Del Bimbo1 , F. Dini1 , A. Grifoni2 , F. Pernici1 1 2 MICC University of Florence, Italy Thales Security Solutions Florence, Italy Abstract. PTZ (pan-tilt-zoom) camera networks have an important role in surveillance systems. They have the ability to direct the attention to interesting events that occur in the scene. In order to achieve such behavior the cameras in the network use a process known as sensor slaving: one (or more) master camera monitors a wide area and tracks moving targets so as to provide the positional information to one (or more) slave camera. The slave camera foveates at the targets in high resolution. In this paper, we propose a simple method to solve two typical problems that are the basic building blocks to create high level functionality in PTZ camera networks: the computation of the world to image homographies and the computation of camera to camera homographies. The first one is used for computing the image sensor observation model in sequential target tracking, the second one is used for camera slaving. Finally a cooperative tracking approach exploiting the use of both homographies is presented. 1 Introduction In the last few years, with the advent of smart, computer-enabled surveillance technologies and IP cameras, the use of surveillance systems for security reasons has exploded. Moreover control equipment such as PTZ cameras (also known as dome camera) are and will be of invaluable help for monitoring wide outdoor areas with a minimal number of sensors. For these cameras, however, precalibration is almost impossible. In fact, transportation, installation, changes in temperature and humidity as present in outdoor environments, typically affect the estimated parameters. Moreover, it is impossible to recreate the full range of zoom and focus settings. A tradeoff has to be made for simplicity against strict geometric accuracy. It is well known that in areas where the terrain is planar the relation between image pixels and terrain locations or between the image pixels of two cameras, is a simple 2D homography. While finding at least four well distributed image point features to compute the world to image mapping is easy in an indoor environment (provided that calibration grids are pasted on the floor of the room), in outdoor environments this is proved to be more complicated especially if the scene area is not sufficiently textured. For the same reason also image to image homographies between two different views taken from the same camera or different cameras are not easy to estimate. In this paper we propose a calibration and a tracking method for PTZ cameras that greatly simplifies cooperative target tracking. The key contributions 2 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion of the paper are threefold: first, we show how to combine single view geometry and planar mosaic geometry to compute the world to image and the image to image homography (The first is used for computing image sensor likelihood for sequential target tracking, the second is used for camera slaving). Second we show how line features in the mosaic computed with a parameterization with a minimum number of parameters gives globally better results than using 3D known coordinate of human-measurable feature points. Third the method can be used to enhance future surveillance systems to keep track of targets at high resolution that may not necessarily be captured within one field of view. 2 Related Work In the literature, several methods exist to calibrate one or several PTZ cameras. These methods can be distinguished according to the particular task to perform. The method [8] can be used to self-calibrate (without calibration targets) a single PTZ camera by computing the homographies induced by rotating and zooming the camera. In [6] the same approach has been analyzed considering the effect of imposing different constraints on the intrinsic parameters of the camera. They reported that best results are obtained when the principal point is assumed to be constant throughout the sequence although it is known to be varying in reality. In [12] a very thorough evaluation of the same method is performed with more than one hundred images. Then the internal calibration of the two PTZ camera is used for 3D reconstruction of the scene through essential matrix and triangulation by using the mosaic images as a stereo pair. Another class of methods using self-calibration based on moving objects has been proposed. For example [5] and [14] use LEDs. As the LED is moved around and visits several points, these positions make up the projection of a virtual object (modeled as 3D point cloud) with unknown position. However the need of synchronization prevent the use of the approaches for IP camera networks. After the VSAM project [2] new methods have been proposed for calibrating PTZ cameras with simpler and more flexible approaches suitable for outdoor environment. These are mainly targeted to high resolution sensing of objects at a distance, and therefore the zoom usage is of mandatory importance in these methods. Of particular interest are the works [15], [7] and [4] where the PTZ camera scheduling problem is addressed. 3 Problem Formulation In section 3.1 the PTZ camera network with master-slave configuration is defined in terms of its relative geometry. The section 3.2 describe how to compute this basic geometry using the single view and the planar mosaic geometry. Finally in section 4 the estimated geometry is exploited to cooperatively track a target over an extended area at high resolution. 3.1 PTZ Camera Networks with Master-Slave Configuration PTZ cameras are particularly effective when configured in a master-slave configuration [15]: the master camera is set to have a global view of the scene so M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion 3 that it can track objects over extended areas using simple tracking methods with adaptive background subtraction. The slave camera, can then follow the trajectory to generate close-up imagery of the object. Evidently their respective roles can be exchanged. The master-slave configuration can be extended to the case of multiple PTZ cameras. Fig.1 shows the pair-wise relationship between two cameras in this configuration. H is the world to image homography of the master camera , H′ is the homography relating the image plane of C with the reference image plane Π′ of the slave camera C′ and Hk is the homography relating the reference image plane Π′ with the current image plane of the slave camera. Once Hk and H′ are known the imaged location x1 of a moving target X1 tracked by the stationary camera C can be transferred to the zoomed view of C′ by: Tk = Hk · H′ (1) With this pairwise relationship between cameras the number of possible network configuration can be calculated. Given a set of PTZ cameras Ci viewing a planar scene, we define N = {Csi }M i=1 a PTZ camera network with the master slave relationship, where M denotes the number of cameras in the network and s defines the state of each camera. At any given time these cameras can be in one of two states si = {master, slave}. The network N can be in one of Tk = Hk ⋅ H′ 2M − 2 possible state configurations. H Hk All cameras in a master state, or all Π′ cameras in a slave state cannot be deΠ fined. It is worth noticing that from X C x1 this definition more than one camera C′ can act as a master and/or slave H′ camera. In principle without any loss of generality if all the cameras in a Fig. 1. The pair-wise relationship be- network have an overlapping field of tween two cameras in master-slave con- view (i.e. they are in a full connected figuration. The camera C is the tracking topology) the cameras can be set in a camera and C′ is the slave camera. H master-slave relationship each other and H′ are respectively the world to im- (not only in a one to one relationage homography and the image to image ship). For example in order to cover homography. Π is the 3D world plane large areas more master cameras can while Π′ is the mosaic plane of the slave be placed with adjacent fields of view. In this case if they act as a master camera. camera, one slave camera suffice to observe the whole area. Several master cameras can have overlapping fields of view so as to achieve higher tracking accuracy (multiple observations of the same object from different cameras can taken into account to obtain a more accurate measurement and determine a more accurate foveation by the slave camera). Similarly, more than one camera can act as a slave camera while just one can be used as a master for tracking, for example for capturing high resolution images of moving objects from several viewpoints. 1 4 3.2 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion Minimal PTZ Camera Model Parameterization We consider the pin-hole camera model projecting the three-dimensional world onto a two-dimensional image, with fixed principal point, without modelling the radial distortion. It is assumed that the camera rotates around its optical center with no translation. The pan and tilt axes are assumed to intersect. The 3 × 3 matrix Ki contains the intrinsic parameters of the camera for the image taken at time i and the 3 × 3 matrix Ri defines its orientation. It is possible to model the whole projection as Pi = [Ki Ri 0], where the equality denotes equality up to a scale factor. As described also in [9] it is possible to derive the inter-image homography, between image i and image j as: Hji = Kj Rji K−1 i . Due to the mechanical nature of PTZ cameras it is possible to assume that there is no rotation around the optical axis: θ = 0. We will assume that the center of projection lies at the image center, the pan-tilt angles between spatially overlapping images is small and the focal length does not vary too much between two overlapping images fi = fj = f . Under these assumptions, the image-toimage homography can be approximated by: 1 Hji = 0 0 1 −ψji −φji f f f ψji 1 0 h1 −f φji = 0 1 h2 h3 h4 1 1 (2) where ψji and φji are respectively the pan and tilt angles from image j to image i, [1]. Each point match contributes with two rows in the measurement matrix. Since there are only four unknowns, (h1 h2 h3 h4 ), two point matches suffice to estimate the homography. Estimates for ψ, φ and f can be calculated from the entries of Hji . With this parameterization matching and minimization are generally more simple than using the full 8 DOF homography. While with this parameterization calibration parameters are not accurate, it is still possible to create a wide single view (i.e. a planar mosaic) maintaining the projective properties of image formation (i.e. straight lines are still straight lines in the mosaic). This new view, provided that a moderate radial distortion is present, can be considered as a novel wide angle single perspective image. Recovering The Homographies. It is well known that given three orthogonal vanishing points v1 ,v2 ,v3 they can be used to calibrate a natural pinhole camera computing the focal length (1 DOF) and principal point (2 DOFs). This can be done referring to the image of the absolute conic (IAC) ω using the following constraints [10]: v1 ωv2 = 0, v2 ωv3 = 0, v3 ωv1 = 0. Other constraints on ω can be obtained from circles [13] [3] and can be exploited for example in the case of sport video analysis. The ω is responsible for internal camera parameters according to: ω = K−T K−1 . However as shown below we don’t need to compute explicitly the entries of K for recovering the homographies. When ω and the vanishing line l∞ of a 3D world plane are known it is possible to compute up to a similarity transformation the metric structure of the plane. The rectifying homography [10] can be computed from the image of the absolute conic ω and M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion 5 the vanishing line l∞ of the scene plane as: β −1 −α β −1 0 1 Hr = l1 l2 0 0 , 1 (3) where l∞ = (l1 , l2 , 1) is the representation of the vanishing line in homogeneous coordinates while α and β are two scalars that can be computed from the imaged circular points i and j. The imaged circular points are two complex conjugate point pairs (i.e. i = conj(j)) that are responsible for the metric properties of imaged planes. They are the projection of the circular points I and J. The circular points I and J are in the Euclidean world (the scene plane) at canonical coordinates I = (1, i, 0), J = conj(I). It can be shown that the following relationship holds [10]: i = H−1 r (1, i, 0) = (α − iβ, 1, −l2 − l1 α + il1 β). So the two scalars α and β are directly extracted from the first component of i. The vanishing line l∞ is obtained as l∞ = v1 × v2 , where v1 and v2 are the vanishing points of the two orthogonal directions in the scene plane. The imaged circular points are computed as the intersections of l∞ with ω. The transformation of eq.3 relates the world to the image up to an unknown similarity transformation Hs . The Hs transformation has 4 DOF: two for translation, one for rotation and one for scaling. Two correspondences suffice to compute Hs (i.e. the world coordinates of two points with their projection onto the mosaic). Without any loss of generality it is possible to choose the first point as the origin O = (0, 0) and the second point as the distance from the first in the 3D world reference. Operatively, just a length is measured in 3D. The final world to image homography can finally be computed as: H = Hr Hs (4) It is important, for the accuracy of the computation of the vanishing points, to define the reference image where to stitch the mosaic. For example it does not make any sense to choose images where the image plane is either parallel or orthogonal with scene plane or with any scene plane orthogonal to it. In fact in these viewing conditions the vanishing points meet at infinity in the image, producing high uncertainty in their localization. Fig. 2(b) shows the planar mosaic obtained from a set of images acquired by a PTZ camera (these images are shown in fig.2(a)). In particular in the same figure are also shown the three orthogonal vanishing points v1 , v2 , v3 and the vanishing line l∞ used to obtain the world to image homography H of eq.4. Fig.2(c) shows the rectified mosaic of the area under surveillance. For the computation of the inter-image homography, it is necessary to choose four well spaced pairs of corresponding points or lines in the two mosaics. Due to the wide angle view of the mosaic, the problem is considerably well posed. Fig.2(d) show four well distributed pairs of corresponding point features in the mosaic image of two PTZ cameras viewing a planar scene. Fig.2(e) shows the slave-camera view of fig.2(d)(top) as seen from the master-camera view of fig.2(d)(bottom). 6 4 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion Cooperative target tracking The homographies described in the previous section are now exploited to cooperatively track a target moving in a wide area. The image to world homography H is used to compute the image sensor likelihood for sequential target tracking in the master camera and the image to image homography H′ is used for camera slaving by computing the homography Tk of eq.1 (i.e. to transfer imaged target position from the master to the slave camera). In this section it is shown how to compute the time variant homography Hk . We adopt a SIFT based matching approach to detect the relative location of the current image wrt the reference image: at each time step we extract the SIFT features from the current image and match with those extracted from the reference frame obtaining a set of points’ pairs. The SIFT features extracted in the reference image can be considered as visual landmarks. Once visual landmarks are matched to the current view, the registration errors between these points are used to drive a particle filter with state the parameters defining Hk . This allows to stabilize the recovered motion, characterize the uncertainties and reduce the area where matches are searched. Moreover, because the keypoints are detected in scale–space, the scene does not necessarily have to be well–textured which is often the case of planar man–made scene. Tracking using SIFT Visual Landmarks Let us denote with Hk the homography between the PTZ camera reference view and the frame grabbed at time step k. What we want to do is to track the parameters that define the homography Hk , using a bayesian recursive filter. Under the assumptions we made, the homography of eq.2 is completely defined once the parameters ψk , φk , and fk are known, we used this model to estimate the homography Hk relating the reference image plane Π′ with the current image at time k (see fig.1). Thus we adopt the state vector xk , which defines the camera parameters at time step k: xk = (ψk , φk , fk ) . We use a particle filter to compute estimates of the camera parameters in the state vector. Given a certain observation zk of the state vector at time step k, particle filters build an approximated representation of the Np (called posterior pdf p(xk |zk ) through a set of weighted samples {(xik , wki )}i=1 ”particles”), where the weights sum to 1. Each particle is thus an hypothesis on the state vector value, with a probability associated to it. The estimated value of the state vector is usually obtained through the weighted sum of all the particles. As any other bayesian recursive filter, the particle filter algorithm requires a probabilistic model for the state evolution between time steps, from which a prior pdf p(xk |xk−1 ) can be derived, and an observation model, from which a likelihood p(zk |xk ) can be derived. Basically there is no prior knowledge of the control actions that drive the camera through the world, so we adopt a simple random walk model as a state evolution model. This is equivalent to assume the actual value of the state vector to be constant in time and rely on a stochastic noise vk−1 to compensate for unmodeled variations: xk = xk−1 +vk−1 . vk−1 ∼ N (0, Q) is a zero mean Gaussian process noise with covariance matrix Q accounting for camera maneuvers. M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion 7 The way we achieve observations zk of the actual state vector value xk is a little more complex and deserves a few more explanations. Let us denote with S0 = {sj0 }N j=0 the set of SIFT points extracted from the reference view of the PTZ camera (let us assume for the moment a single reference view), and with ′ Sk = {sjk }N j=0 the set of SIFT points extracted from the frame grabbed at time step k. From S0 and Sk we can extract pairs of SIFT points that match (through their SIFT descriptors) in the two views of the PTZ camera. After removing outliers from this initial set of matches through a RANSAC algorithm, what remains can be used as annobservation for the particle filter. In facts, the set of remaining Ñ o Ñ pairs: Pk = (s10 , s1k ), ..., (sÑ , s ) implicitly suggests a homography between 0 k the reference view and the frame at time step k, one that maps the points 1 Ñ ˜ ˜ ˜ {s10 , ..., sÑ 0 } into {sk , ..., sk }. Thus, there exist a triple (ψk , φk , fk ) which, in the above assumptions, uniquely describes this homography, and that can be used as a measure zk of the actual state vector value. To define the likelihood p(zk |xik ) of the observation zk given the hypothesis xik we take into account the distance between the homography Hik corresponding to xik and the one associated to the observation zk : qP j j 2 M 1 i j=1 (Hk ·s0 −sk ) p(z |xi ) ∝ e− λ (5) k k where Hik · sj0 is the projection of sj0 in the image plane of frame k through the homography Hik , and λ is a normalization constant. It is worth to note that the SIFT points on the frame k do not need to Π′ be computed upon the whole frame. In facts, after the particle filter prediction step it is possible to reduce H(n-1) m H1m the area of the image plane where the I I I I H2 m Hn m I SIFT points are computed to the area ... ... where the particles are propagated. This reduces the computational load KD-TREE of the SIFT points computation and of the subsequent matching with the SIFT points of the reference image. Fig. 3. Each landmark in the database To increment robustness of the rehas a set of descriptors that corresponds cursive tracking described above, durto location features seen from different ing a learning stage a database of the vantage points. Once the current view scene feature points is build. SIFT of the PTZ camera matches an image Il keypoints extracted to compute the in the database, the inter-image homogmosaic are merged into a large KDraphy Hlm is used to transfer the current Tree together with the estimated mo′ view into the reference plane Π . saic geometry. The match for a SIFT feature extracted from the current frame is searched according to the Euclidean distance of the descriptor vectors. The search is performed so that bins are explored in the order of their closest distance from the query description vector, and stopped after a given number of data points has been considered [11]. m 1 2 n−1 n 8 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion Once the image Il closest to the current view Ik is found the homography G relating Ik to Il is computed at run time with RANSAC. The homography Hlm that relates the image Il with the mosaic plane Π′ retrieved in the database is used to finally compute the likelihood. Eq.5 becomes: 1 p(zk |xik ) ∝ e− λ qP M j=1 2 (Hik ·sj0 −Hlm ·G·sjk ) (6) As shown in fig.3 the image points of the nearest neighbor image Il wrt to current view Il and the current view (i.e. the query to the database) are projected in Π′ to compute the likelihood of eq.6. In particular Im in the figure is the reference image used to compute the mosaic. 5 Experimental Results In order to test the validity of the presented method we have acquired 40 images with two IP PTZ-camera Sony SNC-RZ30 in a master-slave configuration. Fig.2(a) shows the images used taken from one PTZ camera. The input images are taken at a resolution of 736 × 544 pixels. The images from the other camera are not shown. The images has been captured at the minimal zoom of the device and with a pan and tilt angle increment of respectively ψ = 27.14 and φ = 10.0, so as to have some overlap between images. For a given image, matches are searched only at the 8 neighbor images in the grid (apart for those images in the border of the grid). Once correspondences are formed through RANSAC, the homographies parameterized by eq.2 are computed and are successively refined by non linear minimization through bundle adjustment. The RANSAC strategy successful rejects outliers, even if several moving objects are present in the scene. The image in the second row, fifth column in the grid of fig.2(a) (i.e. the reference image) is used to stitch the mosaic so as to avoid degeneracies in the estimation of the image of the absolute conic ω. Parallel lines have been manually extracted by following the imaged linear boundaries. Fig.2(b) shows the features point over the image boundaries in the image mosaic representing the pairs of mutually orthogonal parallel lines used to compute the vanishing points. The lines are fitted by orthogonal regression. In the same figure it is also shown the orthocenter p of the vanishing point triangle and the reference image is indicated with a rectangle. Fig.2(c) shows the image mosaic transformed by the rectifying homography of eq.3. The global Euclidean structure of the 3D world plane is recovered. In extended areas, obtaining a ground truth homography is operatively difficult to be made with a rule. This prevents an extensive statistical evaluation in real scenario. For this reason we preferred to compare the method described with a conventional method in a real scene. We measured four world point coordinates which project onto the input image located in the first row, seventh column of the grid in fig.2(a). Hence we compute the world to image homography which is used to rectify the mosaic. The world points are distant no more than 15 meters each other. Their positions are computed by measuring the inter-distance between the 3D marker and then solving for the 3D coordinates. The result is M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion 9 shown in fig.2(f), it is evident that the lines of the street in the courtyard are no more parallel after the rectification. The right angle of the sidewalk of the courtyard can be used to validate the accuracy. The angle is measured to be nearly 90◦ by the rectification using eq.3, while the angle in fig.2(f) measures 76◦ . This can be explained by the fact that the world to image homography is quite accurate locally where the measurements are taken, while as we move from these position errors increase. To further appreciate this behavior fig.2(g) shows a regularly grid of equispaced points backprojected over the imaged planar region in the mosaic. In particular it is also shown the reference image (the rectangle) and the imaged world points used to compute the homography with the conventional method. Fig.2(h) shows a global view of same figure. The imaged grid is very inaccurate outside the reference image. The homography computed by the measured local features does not give good global results also because large outdoor scene areas may deviate from being planar. For testing purposes, a simple algorithm has been developed to automatically track a single target using the recovered homographies. The target is localized with the wide angle stationary camera using background subtraction and its motion within the image is modeled using an Extended Kalman Filter (EKF). The observation model is obtained by the linearization of eq.4. Images are used to compute respectively the image to world homography for the master and the inter-image homography relating the mosaic plane of the two PTZ cameras. Because of the limited extension of the monitored area, a wide angle view of the master camera suffice to track the target. The feature points of the slave camera images are used to build the database of SIFT for camera tracking. In fig.4(a) are shown some frames extracted from an execution of the proposed system: on the top row is shown the position of the target observed with the master camera, on the bottom the frames of the slave camera view. The particles show the uncertainty on the position of the target. Since the slave camera does not explicitly detects the target, the background color similar to the foreground color does not influence the estimated localization of the target. A quantitative result for the estimated camera parameters is depicted in fig.4(b). It can be seen that increasing the focal length usually causes a significant increase in the variance also, which means that the estimated homography between the two cameras become more and more inaccurate. Observing in detail the particle filter for camera tracking, we noticed from our experiments that the uncertainty increase with the zoom factor. This is caused by the fact that features at high resolution that match with those extracted from the reference image obviously decrease when zooming, causing SIFT match to be less accurate. However this error remains bounded below certain zoom factors, about 70%. 6 Summary and Conclusions In this paper we have shown how to combine single view geometry and planar mosaic geometry in order to define and solve the two basic building blocks defining PTZ camera networks. Those are the world to image and the inter-image 10 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion homography. The main virtue of our results lies in the simplicity of the method. Future research will investigate the joint optimization of the single view geometry together with the mosaic registration in terms of the IAC parameterization. However the most interesting direction (currently under investigation) will use the presented framework to compute a selective attention strategy, aimed to track multiple targets by tasking the sensors in the network. References 1. A. Bartoli, N. Dalal, and R. Horaud. Motion panoramas. Computer Animation and Virtual Worlds, 15:501–517, 2004. 2. Collins, Lipton, Kanade, Fujiyoshi, Duggins, Tsin, Tolliver, Enomoto, and Hasegawa. A system for video surveillance and monitoring: Vsam final report. Technical report CMU-RI-TR-00-12, Robotics Institute, Carnegie Mellon University, May 2000. 3. C. Colombo, A. D. Bimbo, and F. Pernici. Metric 3d reconstruction and texture acquisition of surfaces of revolution from a single uncalibrated view. IEEE Transaction on Pattern Analysis and Machine Intelligence, 27(1):99–114, 2005. 4. C. J. Costello, C. P. Diehl, A. Banerjee, and H. Fisher. Scheduling an active camera to observe people. Proceedings of the 2nd ACM International Workshop on Video Surveillance and Sensor Networks, pages 39–45, 2004. 5. J. Davis and X. Chen. Calibrating pan-tilt cameras in wide-area surveillance networks. In Proc. of ICCV 2003, 1:144–150, 2003. 6. L. de Agapito, E. Hayman, and I. D. Reid. Self-calibration of rotating and zooming cameras. International Journal of Computer Vision, 45(2), November 2001. 7. A. del Bimbo and F. Pernici. Distant targets identification as an on-line dynamic vehicle routing problem using an active-zooming camera. IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance (VS-PETS’05) in conjunction with ICCV, Beijing, China, pages 15–21, October 2005. 8. R. Hartley. Self-calibration from multiple views with a rotating camera. in Proc. European Conf. Computer Vision, pages 471–478, 1994. 9. R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge University Press, ISBN: 0521623049, 2000. 10. D. Liebowitz, A. Criminisi, and A. Zisserman. Creating architectural models from images. In Proc. EuroGraphics, volume 18, pages 39–50, September 1999. 11. D. G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision, 60(2):91–110, 2004. 12. S. Sinha and M. Pollefeys. Towards calibrating a pan-tilt-zoom cameras network. P. Sturm, T. Svoboda, and S. Teller, editors, OMNIVIS, 2004. 13. P. P. Sturm and Y. Wu. Euclidean structure from n≥2 parallel circles: Theory and algorithms. In Proc. of the 9th European Conference on Computer Vision (ECCV’2006)., pages 238–252, 2006. 14. T. Svoboda, H. Hug, and L. V. Gool. Viroom – low cost synchronized multicamera system and its self-calibration. In Pattern Recognition, 24th DAGM Symposium, number 2449 in LNCS,, pages 515–522, September 2002. 15. X. Zhou, R. Collins, T. Kanade, and P. Metes. A master-slave system to acquire biometric imagery of humans at a distance. ACM SIGMM 2003 Workshop on Video Surveillance, pages 113–120, 2003. M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion v v2 l∞ 1 11 p v3 (a) (b) (c) (d) (f) (e) (g) (h) Fig. 2. (a): The grid of 40 input images captured at the minimal zoom of the device and with a pan and tilt angle increment of respectively ψ = 27.14 and φ = 10.0. (b): The planar mosaic with superimposed the vanishing point triangle v1 , v2 , v3 . (c): The rectified mosaic. In this picture is shown the rectification through the homography of eq.3. The global Euclidean structure of the 3D world plane is recovered. (d): Four well spaced pairs of corresponding points used compute the inter-image homographies relating the two PTZ camera mosaics. (e): The slave-camera field of regard view as seen from the master-camera field of regard. (f): Mosaic planar rectification using 3D known measures. The image used to compute the world to image homography is also used as a reference image to stitch the mosaic. (g): The reference image (rectangle) in the mosaic. The figure also shows the backprojection of the four known 3D points, and the backprojection of a grid of points. (h): A global view of fig.(g) superimposed in the mosaic (top-right rectangle). The grid is inaccurate outside the reference image. 12 M2SFA2 2008: Workshop on Multi-camera and Multi-modal Sensor Fusion (a) (b) Fig. 4. (a): On the top row, the master slave camera tracking a human target. The world to image homography is estimated from the vanishing points in the mosaic image and used as observation model in an Extended Kalman Filter. On the bottom row, the slave camera viewing the target tracked from the master. The particles show the joint position uncertainty of the target and the slave camera. (b): Nine frames showing the probability distributions (histograms) of the slave camera parameters. In particular in each frame are shown (left-to-right top-to-bottom) the current view, the pan, the tilt and the focal length distributions. As one would expect, uncertainty increase with zoom factor.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement