Whiteboard Scanning and Image Enhancement Zhengyou Zhang Li-wei He June 2003

Whiteboard Scanning and Image Enhancement Zhengyou Zhang Li-wei He June 2003
Whiteboard Scanning and Image Enhancement
Zhengyou Zhang
Li-wei He
{zhang,lhe}@microsoft.com
http://research.microsoft.com/˜zhang/
June 2003
Technical Report
MSR-TR-2003-39
A whiteboard can be an easy tool for collaboration such as brainstorming, and is widely
used, but the content on a whiteboard is hard to archive and share. While digital cameras
can be used to capture whiteboard content, the images are usually taken from an angle,
resulting in undesired perspective distortion. They may contain other distracting regions
such as walls and shadows. The visual quality of those images is usually poor. This paper
describes a system that automatically locates the boundary of a whiteboard, crops out the
whiteboard region, rectifies it into a rectangle, and corrects the color to make the whiteboard completely white. In case a single image is not enough (e.g., large whiteboard and
low-resolution camera), we have developed a robust feature-based technique to automatically stitch multiple overlapping images. The system has been tested extensively, and
very good results have been obtained.
Acknowledgment: A short version of this paper, entitled “Notetaking with a Camera:
Whiteboard Scanning and Image Enhancement”, appears in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), May 17-21,
2004, Montreal, Quebec, Canada.
Microsoft Research
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
http://www.research.microsoft.com
Contents
1 Introduction
3
2 Overview of the System
4
3 Details of Image Enhancement
3.1 Automatic Whiteboard Detection . . . . . . . . . . . . . . . . . . . . . .
3.1.1 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Experimental Results on Automatic Whiteboard Detection . . . .
3.2 Determining the Physical Aspect Ratio of a Whiteboard . . . . . . . . . .
3.2.1 Geometry of a Rectangle . . . . . . . . . . . . . . . . . . . . . .
3.2.2 Estimating Camera’s Focal Length and Rectangle’s Aspect Ratio .
3.2.3 Experimental Results on Aspect Ratio Estimation . . . . . . . . .
3.3 Rectification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 White Balancing and Image Enhancement . . . . . . . . . . . . . . . . .
3.4.1 Experimental Results on Image Enhancement . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
6
6
7
11
11
13
14
15
15
17
4 Whiteboard Scanning Subsystem
17
5 Conclusions
20
1
List of Figures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Diagram of the system architecture drawn on a whiteboard. (a) Original image; (b)
Processed image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An example of bad quadrangles . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 1. Automatic whiteboard detection and rectification: (a) Original image
together with the detected corners shown in small white squares; (b) Edge image; (c)
Hough image with ρ in horizontal axis and θ in vertical axis; (d) Cropped and rectified
whiteboard image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 2. Automatic whiteboard detection and rectification: (a) Original image together with the detected corners shown in small red dots; (b) Edge image; (c) Hough
image with ρ in horizontal axis and θ in vertical axis; (d) Cropped and rectified whiteboard image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 3. Automatic whiteboard detection and rectification: (a) Original image
together with the detected corners shown in small red dots (note that the upper right
corner is outside of the image); (b) Edge image; (c) Hough image with ρ in horizontal
axis and θ in vertical axis; (d) Cropped and rectified whiteboard image. . . . . . . .
Geometry of a rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Six images of the same whiteboard, taken from different angles . . . . . . . . . . . .
Rectification of an whiteboard image. Left: original shape; Right: rectified shape . .
Rectified version of the first two images shown in Figure 7, using the estimated aspect
ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An example of the S-shaped curve . . . . . . . . . . . . . . . . . . . . . . . . . . .
Whiteboard image enhancement of Example 2, shown in Figure 4: (a) Estimated
whiteboard color; (b) Final enhanced image. . . . . . . . . . . . . . . . . . . . . . .
Whiteboard image enhancement of Example 1 and 3: (a) Enhanced image from Example 1, shown in Figure 3; (b) Enhanced image from Example 3, shown in Figure
5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Whiteboard image enhancement in a cluttered office: (a) Original image together with
the detected corners shown in small red dots; (b) Final enhanced image. . . . . . . .
Diagram of the scanning subsystem: (a) Original image; (b) Processed image. . . . .
User interface for whiteboard scanning. Note that half of the previous acquired image
and half of the live video are shown in transparent to guide the user to take the next
snapshot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An example of whiteboard scanning. (a) Original images overlayed with detected
points of interest; (b) Stitched image; (c) Processed image using the technique described in the last section. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A second example of whiteboard scanning. (a) Three original images; (b) Stitched
image; (c) Final processed image. . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
5
7
8
9
10
11
14
16
17
18
19
19
20
21
21
22
23
1 Introduction
A whiteboard provides a large shared space for collaborative meetings or lectures. It is not only
effective but also economical and easy to use – all you need is a flat board and several dry-ink pens.
While whiteboards are widely used, they are not perfect. The content on the whiteboard is hard
to archive or share with others who are not present in the session. Imagine that you had a fruitful
brainstorming session with all the nice drawings on the whiteboard, and you have to copy them in
your laptop. If you have another meeting right after, you may not have time to copy the contents; if
other people reserve the meeting room and use it right after, the contents on the whiteboard will be
erased. Because digital cameras are becoming accessible to average users, more and more people are
using digital cameras to take images of whiteboards instead of copying manually, thus significantly
increasing the productivity. The system we describe in this paper aims at reproducing the whiteboard
content as a faithful, yet enhanced and easily manipulable, electronic document through the use of a
digital (still or video) camera.
However, images are usually taken from an angle to avoid highlights created by flash, resulting
in undesired perspective distortion. They can also contain other distracting regions such as walls.
Our system uses a series of image processing algorithms. It automatically locates the boundary of a
whiteboard as long as there is a reasonable contrast near the edges, crops out the whiteboard region,
rectifies it to a rectangle with the estimated aspect ratio, and finally corrects the colors to produce a
crisp image.
Besides image enhancement, our system is also able to scan a large whiteboard by stitching multiple images automatically. Imagine that you only have a built-in camera with maximum resolution
640×480; this is usually not high enough to produce a readable image of a large whiteboard. Our
usability study shows that we need about 25 pixels per inch1 in order to read whiteboard images with
normal writing. Our system provides an intuitive interface to assist a using in taking multiple images of the whiteboard with overlap. It then stitches them automatically to produce a high-resolution
image. The stitched image can finally be processed and enhanced as mentioned earlier.
The whiteboard scanning subsystem is similar to the ZombieBoard system developed at Xerox
PARC [11]. The difference is that they reply on a pan-tilt video camera while we can use a freemoving (still or video) camera as long as there is an overlap between successive views.
The only commercial product we are aware of is Whiteboard Photo from PolyVision [10]. Compared with our system, it lacks two features:
• Aspect ratio estimation: Whiteboard Photo uses either the original image size or the aspect
ratio of the bounding box for the final image, therefore aspect ratio of the final image does not
correspond to the actual aspect ratio of the whiteboard.
• Whiteboard scanning: Whiteboard Photo does not have the functionality to scan a large whiteboard and stitch multiple images together.
In ICASSP 2003, we presented a whiteboard capture system for a conference room setup [6]. In
that system, a high-resolution digital camera is mounted on the opposite wall of the whiteboard and
fixed toward the whiteboard, and a microphone is installed in the middle of the table. Both whiteboard content and audio signals are captured during the meeting. The whiteboard image sequence is
post-analyzed, and strokes and keyframes are produced and time-stamped. Therefore the whiteboard
content serves as a visual index to efficiently browse the audio meeting. On the other hand, the system
presented in this paper is very light-weight. It can be used to archive whiteboard content whenever
the user feels necessary.
1
one inch (1’) ≈ 2.54cm.
3
The paper is organized as follows. Section 2 provides an overview of the system. Section 3 describes some details of the image processing techniques implemented in the system. Section 4 presents
the whiteboard scanning subsystem. Extensive experimental results with real data are provided.
2 Overview of the System
Before going further, let us look at Figure 1. On the top is an original image of a whiteboard taken
by a digtal camera, and on the bottom is the final image produced automatically by our system. The
content on the whiteboard gives a flow chart of our system.
As is clear in the diagram shown in Fig. 1b, the first thing we need to decide is whether it is
enough to take a single image of the whiteboard. If the whiteboard is small (e.g., 40’ by 40’) and
a high-resolution digital camera (e.g., 2 mega pixels) is used, then a single image is usually enough.
Otherwise, we need to call the whiteboard scanning subsystem, to be described separately in Section 4,
to produce a composite image that has enough resolution for comfortable reading of the whiteboard
content. Below, we assume we have an image with enough resolution.
The first step is then to localize the boundaries of the whiteboard in the image. Because of perspective projection, the whiteboard in an image usually appears to be a general quadrangle, rather than
a rectangle. The quadrangle is localized by detecting four strong edges satisfying certain criteria. If
a whiteboard does not have strong edges, a GUI (graphical user interface) is provided for the user to
manually specify the quadrangle.
The second step is image rectification. For that, we first estimate the actual aspect ratio of the
whiteboard from the detected quadrangle based on the fact that it is the projection of a rectangle in
space (see Appendix A for details). Besides the aspect ratio, we can also estimate the focal length
of the camera. From the estimated aspect ratio, and by choosing the “largest” whiteboard pixel as
the standard pixel in the final image, we can compute the desired resolution of the final image. A
planar perspective mapping (a 3× 3 homography matrix) is then computed from the original image
quadrangle to the final image rectangle, and the whiteboard image is rectified accordingly.
The last step is white balancing of the background color. This involves two procedures. The first
is the estimation of the background color (the whiteboard image under the same lighting if there were
nothing written on it). This is not a trivial task because of complex lighting environment, whiteboard
reflection and strokes written on the board. The second concerns the actual white balancing. We make
the background uniformly white and increase color saturation of the pen strokes. The output is a crisp
image ready to be integrated with any office document or to be sent to the meeting participants.
Although we have not yet implemented in our current system, image vectorization is logically the
final step. It transforms a bitmap image into vector drawings such as free-form curves, lines and arcs.
TabletPC inks use a vector representation, and therefore a whiteboard image after vectorization can
be exported into TabletPC.
3 Details of Image Enhancement
We now provide details of the image processing techniques used in our system. The whiteboard
scanning system will be described in the next section.
4
(a)
(b)
Figure 1: Diagram of the system architecture drawn on a whiteboard. (a) Original image; (b) Processed image.
5
3.1 Automatic Whiteboard Detection
As was mentioned in the introduction, this work was motivated by developing a useful tool to capture
the whiteboard content with a digital camera rather coping the notes manually. If the user has to click
on the corners of the whiteboard, we have not realized the full potential with digital technologies.
In this section, we describe our implementation of automatic whiteboard detection. It is based on
Hough transform, but needs a considerable amount of engineering because there are usually many
lines which can form a quadrangle. The procedure consists of the following steps: 1. Edge detection;
2. Hough transform; 3. Quadrangle formation; 4. Quadrangle verification; 5. Quadrangle refining. Combining with the technique described earlier, we have a complete system for automatically
rectifying whiteboard images. Experiments will be provided in subsection 3.1.2.
3.1.1 Technical Details
We describe the details of how a whiteboard boundary is automatically detected.
Edge detection. There are many operators for edge detection (see any textbook on image analysis
and computer vision, e.g., [2, 4, 7]). In our implementation, we first convert the color image into
a gray-level image, and use the Sobel filter to compute the gradient in x and y direction with the
following masks:
−1 −2 −1
−1 0 1
0
0
Gx = 0
and Gy = −2 0 2
1
2
1
−1 0 1
We then compute the overall gradient approximately by absolute values: G = |Gx | + |Gy |. If the
gradient G is larger than a given threshold TG , that pixel is considered as an edge. TG = 40 in our
implementation.
Hough transform. Hough transform is a robust technique to detect straight lines, and its description can be found in the books mentioned earlier. The idea is to subdivide the parameter space into
accumulator cells. An edge detected earlier has an orientation, and is regarded as a line. If the parameters of that line fall in a cell, that cell receives a vote. At the end, cells that receive a significant number
of votes represent lines that have strong edge support in the image. Our implementation differs from
those described in the textbooks in that we are detecting oriented lines. The orientation information is
useful in a later stage for forming a reasonable quadrangle, and is also useful to distinguish two lines
nearby but with opposite orientation. The latter is important because we usually see two lines around
the border, and if we do not distinguish them, the detected line is not very accurate. We use the normal
representation of a line:
x cos θ + y sin θ = ρ .
The range of angle θ is [−180◦ , 180◦ ]. For a given edge at (x0 , y0 ), its orientation is computed by
θ = atan2(Gy , Gx ), and its distance ρ = x0 cos θ + y0 sin θ. In our implementation, the size of each
cell in the ρθ-plane is 5 pixels by 2◦ .
Quadrangle formation. First, we examine the votes of the accumulator cells for high edge concentrations. We detect all reasonable lines by locating local maxima whose votes are larger than five
percent of the maximum number of votes in the Hough space. Second, we form quadrangles with
these lines. Any four lines could form a quadrangle, but the total number of quadrangles to consider
could be prohibitively high. In order to reduce the number, we only retain quadrangles that satisfy the
following conditions:
• The opposite lines should have quite opposite orientations (180◦ within 30◦ ).
6
• The opposite lines should be quite far from each other (the difference in ρ is bigger than one
fifth of the image width or height).
• The angle between two neighboring lines should be close to ±90◦ (within 30◦ ).
• The orientation of the lines should be consistent (either clockwise or counter-clockwise).
• The quadrangle should be big enough (the circumference should be larger than (W + H)/4).
The last one is based on the expectation that a user tries to take an image of the whiteboard as big as
possible.
Figure 2: An example of bad quadrangles
Quadrangle verification. The lines detected from Hough space are infinite lines: they do not
say where the supporting edges are. For example, the four lines in Figure 2 would pass all the tests
described in the previous paragraph, although the formed quadrangle is not a real one. To verify
whether a quadrangle is a real one, we walk through the sides of the quadrangle and count the number
of edges along the sides. An edge within 3 pixels from a side of the quadrangle and having similar
orientation is considered to belong to the quadrangle. We use the ratio of the number of supporting
edges to the circumference as the quality measure of a quadrangle. The quadrangle having the highest
quality measure is retained as the one we are looking for.
Quadrangle refining. The lines thus detected are not very accurate because of the discretization
of the Hough space. To improve the accuracy, we perform line fitting for each side. For that, we first
find all edges with a small neighborhood (10 pixels) and having similar orientation to the side. We
then use least-median squares to detect outliers [12], and finally we perform a least-squares fitting to
the remaining good edges [2].
3.1.2 Experimental Results on Automatic Whiteboard Detection
We have tested the proposed technique with more than 50 images taken by different people with
different cameras in different rooms. All the tuning parameters have been fixed once for all, as we
already indicated earlier. The success rate is more than 90%. The four failures are due to poor
boundary contrast, or to too noisy edge detection. In this subsection, we provide three examples
(Figures 3 to 5).
Figure 3 is a relatively simple example because the whiteboard boundary is very clear. The image
resolution is 2272×1704 pixels. The detected edges are shown in white in Fig. 3b. As can be seen
in the Hough image (Fig. 3c), the peaks are quite clear. The corners of whiteboard are accurately
estimated, as shown in small white squares in Fig. 3a. The cropped and rectified image is shown in
Fig. 3d. The estimated aspect ratio is 1.326, very close to the ground truth 4/3. The estimated focal
length is 2149 pixels.
7
(a)
(b)
(c)
(d)
Figure 3: Example 1. Automatic whiteboard detection and rectification: (a) Original image together
with the detected corners shown in small white squares; (b) Edge image; (c) Hough image with ρ in
horizontal axis and θ in vertical axis; (d) Cropped and rectified whiteboard image.
8
(a)
(b)
(c)
(d)
Figure 4: Example 2. Automatic whiteboard detection and rectification: (a) Original image together
with the detected corners shown in small red dots; (b) Edge image; (c) Hough image with ρ in horizontal axis and θ in vertical axis; (d) Cropped and rectified whiteboard image.
9
(a)
(b)
(c)
(d)
Figure 5: Example 3. Automatic whiteboard detection and rectification: (a) Original image together
with the detected corners shown in small red dots (note that the upper right corner is outside of the
image); (b) Edge image; (c) Hough image with ρ in horizontal axis and θ in vertical axis; (d) Cropped
and rectified whiteboard image.
Figure 4 shows a different example. The image resolution is still 2272×1704 pixels. As can be
seen in the edge image (Fig. 4), the actual lower border of the whiteboard does not have strong edge
information. Our technique thus detects the line corresponding to the pen holder, which is perfectly
reasonable. The whiteboard corners estimated by intersecting the detected lines are shown in small
red dots in Fig. 4a. The cropped and rectified image is shown in Fig. 4d. The estimated aspect ratio is
1.038. The ground truth is 1.05 (the whiteboard is of the same type as in Fig. 7). Since the detected
whiteboard includes the pen holder, the estimated aspect ratio (width/height) should be a little bit
smaller than 1.05. The estimated focal length is 3465 pixels. We cannot compare the focal lengths
because of different zoom settings.
Figure 5 shows yet another example. The resolution is 1536×1024 pixels. This example has one
particular thing to notice: the upper right corner is not in the image. It does not affect the performance
of our technique since we first detect boundary lines rather than corners. In Fig. 5a, the three detected
visible corners are shown in small red discs. The fourth corner, although invisible, is also accurately
estimated, as can be verified by the cropped and rectified image shown in Fig. 5d, where the invisible
region (upper right corner) is filled with black pixels due to lack of information. The estimated aspect
ratio is 1.378. We do not have the ground truth because the image was provided by an external person.
10
The estimated focal length is 2032 pixels.
3.2 Determining the Physical Aspect Ratio of a Whiteboard
Because of the perspective distortion, the image of a rectangle appears to be a quadrangle. However,
since we know that it is a rectangle in space, we are able to estimate both the camera’s focal length
and the rectangle’s aspect ratio.
Single-view geometry of a plane, including plane rectification and mensuration, was addressed
in [1]. The case of a rectangular shape was studied in detail in [9]. Here, we address the problem of
determining the physical aspect ratio of a whiteboard, which is assumed to be a rectangle, from a single
image. Section 3.2.1 derives the basic constraints from a single view of a rectangle. Section 3.2.2
describes how to use these constraints to estimate the camera’s focal length and the actual aspect ratio
of the rectangle. Section 3.2.3 provides experimental results with real images.
For general description of projective geometry in computer vision, the reader is referred to [8, 2,
5, 3].
3.2.1 Geometry of a Rectangle
M3
h
(0,0) M1
m3
M4
z=0
M2
w
m4
m1
m2
C
Figure 6: Geometry of a rectangle
Consider Figure 6. Without loss of generality, we assume that the rectangle is on the plane z = 0
in the world coordinate system. Let the width and height of the rectangular shape be w and h. Let
the coordinates of the four corners, Mi (i = 1, . . . , 4), be (0, 0), (w, 0), (0, h), and (w, h) in the
plane coordinate system (z = 0). The projection of the rectangle in the image is an quadrangle.
The observed corners in the image are denoted by m1 , m2 , m3 , and m4 , respectively. Furthermore,
e to denote the augmented x vector by adding 1 as the element to vector x, i.e., x
e =
we will use x
T
T
[x1 , . . . , xn , 1] if x = [x1 , . . . , xn ] .
11
We use the standard pinhole model to model the projection from a space point M to an image point
m:
e = A[R t]e
λm
M
with

f 0
A =  0 sf
0 0

u0
v0
1
and R = [r1
(1)
r2
r3 ]
where f is the focal length of the camera, s is the pixel aspect ratio, and (R, t) describes the 3D
transformation between the world coordinate system, in which the rectangle is described, and the
camera coordinate system. (In the above model, we assume that pixels are rectangular.) Substituting
the 3D coordinates of the corners yields
e 1 = At
λ1 m
(2)
e 2 = wAr1 + At
λ2 m
(3)
e 3 = hAr2 + At
λ3 m
(4)
e 4 = wAr1 + hAr2 + At
λ4 m
(5)
Performing (3) - (2), (4) - (2) and (5) - (2) gives respectively
e 1 = wAr1
e 2 − λ1 m
λ2 m
(6)
e 1 = hAr2
e 3 − λ1 m
λ3 m
(7)
e 4 − λ1 m
e 1 = wAr1 + hAr2
λ4 m
(8)
e1
e 2 − λ1 m
e 3 + λ2 m
e 4 = λ3 m
λ4 m
(9)
Performing (8) - (6) - (7) yields
e 4 yields
Performing cross product of each side with m
e1 ×m
e4
e2 ×m
e 4 − λ1 m
e3 ×m
e 4 + λ2 m
0 = λ3 m
(10)
e 3 yields
Performing dot product of the above equation with m
e2 ×m
e 4) · m
e 3 = λ1 (m
e1 ×m
e 4) · m
e3
λ2 (m
i.e.,
λ2 = k2 λ1
with k2 ≡
e1 ×m
e 4) · m
e3
(m
e2 ×m
e 4) · m
e3
(m
(11)
e 2 yields
Similarly, performing dot product of (10) with m
λ3 = k3 λ1
with k3 ≡
e1 ×m
e 4) · m
e2
(m
e3 ×m
e 4) · m
e2
(m
(12)
Substituting (11) into (6), we have
r1 = λ1 w−1 A−1 n2
(13)
e2 −m
e1
n2 = k2 m
(14)
with
12
Similarly, substituting (12) into (7) yields
r2 = λ1 h−1 A−1 n3
(15)
e3 −m
e1
n3 = k3 m
(16)
with
From the properties of a rotation matrix, we have r1 · r2 = 0. Therefore, from (13) and (15), we
obtain
nT2 A−T A−1 n3 = 0
(17)
Again, from the properties of a rotation matrix, we have r1 · r1 = 1 and r2 · r2 = 1. Therefore,
from (13) and (15), we obtain respectively
1 = λ21 w−2 nT2 A−T A−1 n2
(18)
1 = λ21 h−2 nT3 A−T A−1 n3
(19)
Dividing these two equations gives the aspect ratio of the rectangular shape:
³ w ´2
h
=
nT2 A−T A−1 n2
nT3 A−T A−1 n3
(20)
This equation says clearly that the absolute size of the rectangle cannot be determined from an image.
This is obvious since a bigger rectangular shape will give the same image if it is located further away
from the camera.
3.2.2 Estimating Camera’s Focal Length and Rectangle’s Aspect Ratio
In the last section, we derived two fundamental constraints (17) and (20). We are now interested in
extracting all useful information from the quadrangle in the image.
We do not assume any knowledge of the rectangle in space (i.e., unknown width and height).
Since we only have two constraints, we have to assume some knowledge of the camera. Fortunately,
with modern cameras, it is very reasonable to assume that the pixels are square (i.e., s = 1) and the
principal point is at the image center (i.e., known u0 and v0 ). Given u0 , v0 and s, we are able to
compute the focal length f from equation (17). Indeed, we have
1
∗
n23 n33 s2
{[n21 n31 − (n21 n33 + n23 n31 )u0 + n23 n33 u20 ]s2
f2 = −
(21)
+ [n22 n32 − (n22 n33 + n23 n32 )v0 + n23 n33 v02 ]}
where n2i (resp. n3i ) is the i-th component of n2 (resp. n3 ). The solution does not exist when n23 = 0
or n33 = 0. It occurs when k2 = 1 or k3 = 1, respectively.
As soon as f is estimated, the camera’s intrinsic parameters are all known, and the aspect ratio of
the rectangle is readily computed by equation (20).
(Equation (20) can be used in a different way. If the aspect ratio of the rectangle is given, we can
also use that equation to estimate the focal length. Together with (17), we’ll have two equations to
estimate the focal length, leading a more reliable estimation. However, this is not what we assume in
this work.)
13
(a)
(b)
(c)
(d)
(e)
(f)
Figure 7: Six images of the same whiteboard, taken from different angles
Once A is known, the pose of the rectangular shape can be determined. We have from (13)
and from (15)
r1 = A−1 n2 /kA−1 n2 k
(22)
r2 = A−1 n3 /kA−1 n3 k
(23)
r3 = r1 × r2
(24)
In turn,
The translation vector can be determined from (2), i.e.,
e1
t = λ1 A−1 m
(25)
Note that the translation can only be determined up to a scale factor λ1 , which depends on the size of
the rectangle as can be seen in (18) and (19). This is obvious since a bigger rectangular shape will
give the same image if it is located further away from the camera.
3.2.3 Experimental Results on Aspect Ratio Estimation
In this section, we provide experimental results with real data. Six images of the same whiteboard, as
shown in Figure 7, were taken from different angles. The most frontal view is image (b). We manually
measured the whiteboard with a ruler, and the size is about 42’× 40’ (note: 1’≈2.54cm). The aspect
ratio is therefore 1.05, and we use this as the ground truth.
In each image, we manually clicked on the four corners of the whiteboard, and use the technique
described in the last section to estimate the focal length of the camera and the aspect ratio of the
whiteboard. The results are shown in Table 1. The second row shows the estimated values of the aspect
ratio, while the third row shows its relative error compared with the ground truth. The error is mostly
14
Table 1: Results with images shown in Figure 7
image
(a)
(b)
(c)
aspect ratio 1.028 1.035 1.031
error (%)
2.1
1.4
1.8
bounding box 0.966 1.035 0.981
difference (%) 5.1
1.4
6.6
focal length 2202 2442 2073
(d)
1.021
2.8
0.892
15.1
2058
(e)
1.019
3.0
0.843
19.7
2131
(f)
0.990
5.7
0.727
30.8
2030
less than 3%, except for image (f) which was taken from a very skewed angle. There are two major
sources contributing to the errors: the first is the precision of the manually clicked points; the second
is lens distortion that is currently not modeled. Lens distortion can be clearly observed in Figure 7.
The error of the estimated aspect ratio tends to be higher for images taken from a larger angle. This
is expected because the relative precision of the corner points is decreasing. For reference, we also
provide the aspect ratio of the bounding box of the whiteboard image in the fourth row of Table 1, and
its relative difference with respect to the ground truth in the fifth row. The relative difference can go
up to 30%. It is clear that it is not reasonable to use the aspect ratio of the bounding box to rectify the
whiteboard images. The sixth row of Table 1 shows the estimated focal length, which varies around
2200.
3.3 Rectification
The next task is to rectify the whiteboard image into a rectangular shape with the estimated aspect
ratio. For that, we need to know the size of the final image. We determine the size in order to preserve
in the rectified image maximum information of the original image. In other words, a pixel in the
original image should be mapped to at least one pixel in the rectified image. Refer to Figure 8. The
side lengths of the quadrangle in the original image are denoted by W1 and W2 for the upper and lower
sides, and by H1 and H2 for the left and right side. Let Ŵ = max(W1 , W2 ) and Ĥ = max(H1 , H2 ).
Let r̂ = Ŵ /Ĥ. Denote the estimated aspect ratio by r. We determine the size of the rectified image
as follows: W = Ŵ and H = W/r if r̂ ≥ r; otherwise, H = Ĥ and W = rH. Once the size
is determined, the rectifying matrix H (homography) can be easily computed, and the color in the
rectified image is computed through bilinear or bicubic interpolation from the original image.
Figure 9 shows two rectified images of the whiteboard using the estimated aspect ratios. They
correspond to images (a) and (b) in Figure 7. The rectified images look almost identical despite that
the original images were taken from quite different angles. Other rectified images are also similar,
and are thus not shown.
In the examples shown in Figs. 3, 4 and 5, the last picture is respectively the rectified image in
each example.
3.4 White Balancing and Image Enhancement
The goal of color enhancement is to transform the input whiteboard image into an image with the same
pen strokes on uniform background (usually white). For each pixel, the color value Cinput captured
by the camera can be approximated by the product of the incident light Clight , the pen color Cpen ,
and the whiteboard color Cwb . Since the whiteboard is physically built to be uniformly color, we can
15
W1
H1
H2
H
H
W2
W
Figure 8: Rectification of an whiteboard image. Left: original shape; Right: rectified shape
assume Cwb is constant for all the pixels, the lack of uniformity in the input image is due to different
amount of incident light to each pixel. Therefore, the first procedure in the color enhancement is to
estimate Clight for each pixel, the result of which is in fact an image of the blank whiteboard.
Our system computes the blank whiteboard image by inferring the value of pixels covered by the
strokes from their neighbors. Rather than computing the blank whiteboard color at the input image
resolution, our computation is done at a coarser level to lower the computational cost. This approach
is reasonable because the blank whiteboard colors normally vary smoothly. The steps are as follows:
1. Divide the whiteboard region into rectangular cells. The cell size should be roughly the same
as what we expect the size of a single character on the board (15 by 15 pixels in our implementation).
2. Sort the pixels in each cell by their luminance values. Since the ink absorbs the incident light,
the luminance of the whiteboard pixels is higher than stroke pixels’. The whiteboard color
within the cell is therefore the color with the highest luminance. In practice, we average the
colors of the pixels in the top 25 percentile in order to reduce the error introduced by sensor
noise.
3. Filter the colors of the cells by locally fitting a plane in the RGB space. Occasionally there are
cells that are entirely covered by pen strokes, the cell color computed in Step 2 is consequently
incorrect. Those colors are rejected as outliers by the locally fitted plane and are replaced by
the interpolated values from its neighbors.
Once the image of the blank whiteboard is computed, the input image is color enhanced in two
steps:
1. Make the background uniformly white. For each cell, the computed whiteboard color (equivalent to the incident light Clight ) is used to scale the color of each pixel in the cell: Cout =
min(1, Cinput /Clight ).
2. Reduce image noise and increase color saturation of the pen strokes. We remap the value of
p
each color channel of each pixel according to an S-shaped curve: 0.5 − 0.5 ∗ cos(Cout
π). The
steepness of the S-curve is controlled by p. In our implementation, p is set to 0.75, and the
corresponding curve is shown in Fig. 10.
16
(a)
(b)
Figure 9: Rectified version of the first two images shown in Figure 7, using the estimated aspect ratios
3.4.1 Experimental Results on Image Enhancement
Figure 11 shows the result on the example shown in Figure 4. Figure 11a shows the estimated whiteboard color as if there were nothing written on it, and Figure 11b shows the final enhanced image.
Figure 12 shows the enhanced images from the examples shown in Fig. 3 and Fig. 5, respectively.
Figure 13 shows a whiteboard in a cluttered office. As can be seen, the image contains a significant
portion of distracting objects, and our software correctly identifies the whiteboard, and does a great
job in cleaning up the image.
4 Whiteboard Scanning Subsystem
The major steps of the Whiteboard Scanning system is illustrated in Figure 14, and will be explained
below. The mathematic foundation is that two images of a planar object, regardless the angle and position of the camera, are related by a plane perspectivity, represented by a 3×3 matrix called homography H [5, 3]. The stitching process is to determine the homography matrices between successive
images, and we have developed an automatic and robust technique based on points of interest. This
has several advantages over classical stitching techniques based on minimizing color differences: (1)
less sensitive to color changes between images due to e.g. different focus; (2) less likely converged
to local minima because the points of interest contain the most useful information and because other
textureless whiteboard pixels, which would be distracting in color-based optimization, are discarded;
(3) robust to large motion because a global search based on random sampling is used.
During whiteboard scanning, we start taking a snapshot from the upper left corner, a second by
pointing to the right but having overlap with previous snapshot, and so on until reaching the upper
right corner; move the camera lower and take a snapshot, then take another one by pointing to the
left, and so on until reaching the left edge; the process continues in the “S” way until the lower border
is captured. Successive snapshots must have overlap to allow later stitching, and this is assisted with
visual feedback during acquisition, as shown in Figure 15. In the viewing region, we show both the
17
1
0.9
0.8
0.7
Cnew
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
Cout
0.6
0.7
0.8
0.9
1
Figure 10: An example of the S-shaped curve
previously acquired image and the current video view. To facilitate the image acquisition, half of
the previously acquired image is shown in opaque, while the other half, which is in the overlapping
region, is shown in semi-transparent. The current live video is also shown in half opaque and half
semi-transparent. This guides the user to take successive images with overlap. Note that the alignment
does not need to be precise. Our program will automatically align them. There are also a few buttons
to indicate the direction in which the user wants to move the camera (down, up, left, right). The
overlapping region changes depending on the direction. We have designed the default behavior such
that only the ”down” button is necessary to realize image acquisition in the ”S” way.
Referring to Figure 14. For each image acquired, we extract points of interest by using the Plessey
corner detector, a well-known technique. These points correspond to high curvature points in the
intensity surface, if we view an image as a 3D surface with the third dimension being the intensity.
An example is shown in Figure 16a, where the extracted points are displayed in red +.
Next, we try to match the extracted points between image. For each point in the previous image,
we choose an 15 × 15 window centered at it, and compare that window with windows of the same
size, centered at the points in the current image. A zero-mean normalized cross correlation between
two windows is computed. It ranges from -1, for two windows which are not similar at all, to 1, for
two windows which are identical. If the largest correlation score exceeds a prefixed threshold (0.707
in our case), then that point in the current image is considered to be the match candidate of the point
in the previous image. The match candidate is retained as a match if and only if its match candidate
in the previous image happens to be the point being considered. This two-way symmetric test reduces
many potential matching errors.
The geometric constraint between two images is the homography constraint. If two points are
correctly matched, they must satisfy this constraint, which is unknown in our case. The set of matches
established by correlation usually contains false matches because correlation is only a heuristic and
uses only local information. Inaccurate location of extracted points because of intensity variation or
18
(a)
(b)
Figure 11: Whiteboard image enhancement of Example 2, shown in Figure 4: (a) Estimated whiteboard color; (b) Final enhanced image.
(a)
(b)
Figure 12: Whiteboard image enhancement of Example 1 and 3: (a) Enhanced image from Example
1, shown in Figure 3; (b) Enhanced image from Example 3, shown in Figure 5.
19
(a)
(b)
Figure 13: Whiteboard image enhancement in a cluttered office: (a) Original image together with the
detected corners shown in small red dots; (b) Final enhanced image.
lack of strong texture features is another source of error. If we estimate the homography between the
two images based on a least-squares criterion, the result could be completely wrong even if there is
only one false match. This is because least-squares is not robust to outliers. We developed a technique
based on a robust estimation technique known as the least median squares (see e.g. [12]) to detect
both false matches and poorly located corners, and simultaneously estimate the homography matrix
H.
This incremental matching procedure stops when all images have been processed. Because of
incremental nature, cumulative errors are unavoidable. For higher accuracy, we need to adjust H’s
through global optimization by considering all the images simultaneously.
Once the geometric relationship between images (in terms of homography matrices H’s) are determined, we are able to stitch all images as a single high-resolution image. There are several options,
and currently we have implemented a very simple one. We use the first image as the reference frame of
the final image, and map subsequent original images to the reference frame. If a pixel in the reference
frame appears several times in the original images, then the one in the newest image is retained.
Two examples are shown in Fig. 16 and Fig. 17.
5 Conclusions
We have presented a digital notetaking system by scanning the content on a whiteboard into the computer with a camera. Images are enhanced for better visual quality. The system has been tested extensively, and very good results have been obtained. Because digital cameras are becoming ubiquitous,
our technology may contribute to a significant increase in productivity.
20
(a)
(b)
Figure 14: Diagram of the scanning subsystem: (a) Original image; (b) Processed image.
Figure 15: User interface for whiteboard scanning. Note that half of the previous acquired image and
half of the live video are shown in transparent to guide the user to take the next snapshot.
21
(a)
(b)
(c)
Figure 16: An example of whiteboard scanning. (a) Original images overlayed with detected points
of interest; (b) Stitched image; (c) Processed image using the technique described in the last section.
22
(a)
(b)
(c)
Figure 17: A second example of whiteboard scanning. (a) Three original images; (b) Stitched image;
(c) Final processed image.
23
References
[1] A. Criminisi, Accurate Visual Metrology from Single and Multiple Uncalibrated Images, SpringerVerlag, 2001.
[2] O. Faugeras, Three-Dimensional Computer Vision: A Geometric Viewpoint, MIT Press, 1993.
[3] O. Faugeras and Q.-T. Luong, The Geometry of Multiple Images, MIT Press, 2001.
[4] R.C. Gonzalez and R.E. Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2002.
[5] R. Hartley and A. Zisserman, Multiple View Geometry, Cambridge University Press, 1998.
[6] L. He, Z. Liu, and Z. Zhang, “Why take notes? use the whiteboard system,” in Proc. International
Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), Hong Kong, Apr. 2003,
vol. V, pp. 776–779.
[7] R. Jain, R. Kasturi, and B.G. Schunck, Machine Vision, McGraw-Hill, Inc., 1995.
[8] K. Kanatani, Geometric Computation for Machine Vision, Oxford University Press, 1993.
[9] D. Liebowitz, Camera Calibration and Reconstruction of Geometry from Images, Ph.D. dissertation, University of Oxford, 2001.
[10] PolyVision, Whiteboard Photo, http://www.polyvision.com/products/wbp.asp.
[11] E. Saund, Image Mosaicing and a Diagrammatic User Interface for an Office Whiteboard Scanner. Technical Report, Xerox Palo Alto Research Center, 1999.
[12] Z. Zhang, Parameter estimation techniques: a tutorial with application to conic fitting”, Image
and Vision Computing, 15(1):59–76, 1997.
24
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement