Boosting for Fast Face Recognition Guo-Dong Guo and Hong-Jiang Zhang Microsoft Research China 5F, Beijing Sigma Center No. 49, Zhichun Road, Haidian District Beijing 100080, P. R. China E-mail: guodong guoyahoo:om Abstract We propose to use the AdaBoost algorithm for face recognition. AdaBoost is a kind of large margin classifiers and is efficient for on-line learning. In order to adapt the AdaBoost algorithm to fast face recognition, the original Adaboost which uses all given features is compared with the boosting along feature dimensions. The comparable results assure the use of the latter, which is faster for classification. The AdaBoost is typically a classification between two classes. To solve the multi-class recognition problem, a majority voting (MV) strategy can be used to combine all the pairwise classification results. However, the number of pairwise comparisons n n = is huge, when the number of individuals n is very large in the face database. We propose to use a constrained majority voting (CMV) strategy to largely reduce the number of pairwise comparisons, without losing the recognition accuracy. Experimental results on a large face database of 1079 faces of 137 individuals show the feasibility of our approach for fast face recognition. ( 1) 2 Keywords: Face recognition, large margin classifiers, AdaBoost, constrained majority voting (CMV), principal component analysis (PCA). 1. Introduction Face recognition technology can be used in a wide range of applications such as identity authentication, access control, and surveillance. Interests and research activities in face recognition have increased significantly over the past few years  . Two issues are central for face recognition, i.e., what features to use to represent a face, and how to classify a new face based on the chosen representation. Principal Component Analysis (PCA), is a classical technique for signal representation . Turk and Pentland  developed a well known face recognition method, the eigenfaces, based on the PCA technique for face representation. Some other complex methods such as ICA or non-linear approaches  can also be used to extract face features. Here, we focus on the classification problem, and choose to use the simple and efficient PCA technique  for face feature extraction. In the standard eigenfaces approach , the nearest center (NC) criterion is used to recognize a new face. In , a probabilistic visual learning (PVL) method is developed for face recognition. Another way of Bayesian classification of faces is proposed in , called probabilistic reasoning models (PRM), based on some assumptions of the class distributions. More recently, the support vector machine (SVM)  is popular for visual object recognition . The SVM constructs a hyperplane between two classes of examples based on the criterion of large margin. The face recognition accuracy based on SVM is relatively high . However, in SVM, both the training and testing process is a little time consuming if the face database is very large. Recently, Freund and Schapire  proposed another kind of large margin classifiers, AdaBoost, to tackle the machine learning problems, which is fast and efficient for online learning. AdaBoost algorithm has the potential of fast training and testing for real-time face recognition. Hence, we concentrate on the AdaBoost algorithm and evaluate its performance for face recognition. In the next Section, we describe the AdaBoost algorithm and give our strategies to adapt it for fast face recognition. Then, the constrained majority voting is presented in Section 3 to tackle the multi-class recognition problems. Section 4 shows the experimental evaluations of AdaBoost in face recognition. Finally, conclusions and discussions are given in Section 5. 2. AdaBoost Boosting is a method to combine a collection of weak classification functions (weak learner) to form a stronger classifier. AdaBoost is an adaptive algorithm to boost a sequence of classifiers, in that the weights are updated dynamically according to the errors in previous learning . AdaBoost is a kind of large margin classifiers. Tieu and Viola  adapted the AdaBoost algorithm for natural image retrieval. They made the weak learner work in a single feature each time. So after T rounds of boosting, T features are selected together with the T weak classifiers. If Tieu and Viola’s version can get comparable results with the original Freund and Schapire’s AdaBoost , it will be a better choice for face recognition because of the reduced computation of T comparisons instead of T D in the original AdaBoost , where D is the feature dimension. To make it clear, we denote the original AdaBoost  as Boost.0. Because of the space limit, we do not give the original AdaBoost algorithm  here. Readers can refer to  for a detailed explanation. Tieu and Viola’s version  is briefly described as below: AdaBoost Algorithm Input: 1) n training examples x1 ; y1 ; : : : ; xn ; yn with yi or ; 2) the number of iterations T . 1 Initialize weights w1;i or 21m for yi or , respec2l tively, with l m n. Do for t ;:::;T: 1. Train one hypothesis hj for each feature j with wt , and error j P riwt hj xi 6 yi . 2. Choose ht hk such that 8j 6 k; k < j . Let t k . 3. Update: wt+1;i wt;i tei , where ei or for example xi classified correctly or incorrectly respectively, t and 1 with t t 1 t t . 4. Normalize the weights so that they are a distribution, Pnwt+1w;i . wt+1;i t+1;j j =1 Output the final hypothesis, ( =1 0 + = =1 = = = ) = [ ( )= ℄ ()= () = = log hf (x) = 1 0 P if Tt=1 t ht (x) otherwise ( ) =1 0 = =1 0 1 2 PT t=1 t = =0 = log =0 = 0 01 01 = 01 = 0 01 =0 3. Multi-class Recognition AdaBoost is typically used to solve two-class classification problems. In a multi-class scenario, we can use a majority voting (MV) strategy to combine all pair-wise classification results. However, it needs n(n2 1) pairwise comparisons, where n is the number of classes. In order to speed up the process for fast face recognition, we first use the nearest center criterion to rank all classes with respect to a given query. The class labels appear on the top list if the class centers are nearest to the query. Then, top m classes are selected and used for voting. We call it Constrained Majority Voting (CMV), which can largely reduce the number of comparisons. We compare the performance of CMV with the majority voting which uses all pairs of classes. We also show the face recognition results with the method of probabilistic reasoning models (PRM) , which is an approximation to the Bayesian classifier with the assumption that the covariance matrix is diagonal. The recognition accuracy of the standard eigenface is also shown for comparison. 4. Experiments (1) It should be noted, however, a problem emerges when Tieu and Viola’s boosting is used for face recognition. Since it starts with the most discriminative feature and adds another one in next round of boosting, the algorithm may begins with a feature having zero classification error, i.e., t 1 t , then t . So t 1 t t can not be defined, and the boosting should stop there . Because boosting is based on the classification error in previous round . It is explicit that very few rounds of boosting and hence very small number of features are not sufficient for the complicated task of face recognition. In fact, we =0 find the phenomenon of zero boosting error in quite a few cases. To solve this problem and make the boosting process go forward, we define a small value for t instead of zero in case of t . We let t : and : and compare their effects on the recognition results. We call them Boost.1 and Boost.2 respectively corresponding to different settings of t values. One suspicion still exits. That is, whether we need to weight the features with the distribution wt in step 1 of AdaBoost. It should be clarified by experiments. For this purpose, we do not weight the features in Boost.1 and Boost.2, and call it Boost.3 if the distribution wt is used to weight : , if t , the features (and simultaneously set t which is experimentally better than t : ). The weak learner is the simple nearest center classifier, as that in . Different versions of AdaBoost from Boost.0 to Boost.3 are evaluated on a compound face database with 1079 face images of 137 persons. 4.1. Face Database The face database is a collection of five databases: (1). The Cambridge ORL face database which contains 40 distinct persons. Each person has ten different images. (2). The Bern database contains frontal views of 30 persons, each with 10 images. (3). The Yale database contains 15 persons. For each person, ten of its 11 frontal view images 0.9 0.85 recognition accuracy 0.8 0.75 0.7 Boost.3 Boost.2 Boost.1 Boost.0 PRM eigenface 0.65 0.6 Figure 1. Examples of face images in our face database. are randomly selected. (4). Five persons are selected from the Harvard database, each with 10 images. (5). A database composed of 179 images of 47 Asian students, each with three or four images. The face images are cropped and scaled to the same size of pixels in our database. The face images have large variations in facial expressions and facial details, and changes in light, face size and pose. Some face examples in the database are shown in Fig. 1. The face database is divided into two non-overlapping sets for training and testing. The training data consist of 544 images: five images per person are randomly chosen from the Cambridge, Bern, Yale, and Harvard databases, and two images per person are randomly selected from the Asian students database. The remaining 535 images are used for testing. 128 128 10 15 Firstly, the principal components are calculated from the face images in the training set. The projection coefficients of face images to these principal components are computed and used as the features. Then, different algorithms are used for face recognition with respect to the number of features or rounds of boosting in AdaBoost. We compare the original Adaboost (Boost.0) with Boost.1 ( : , if , without using the distribution wt to weight a new coming feature), Boost.2 which is the same as Boost.1 except for setting : , if , and Boost.3 (using previous distribution wt to weight a new feature in boosting, and setting : , if ). It is shown in Fig. 2 the recogni- = 0 01 =0 20 25 30 number of features 35 40 45 50 Figure 2. Face recognition performance with respect to rounds of boosting (or number of features). Only a small number of boosting is sufficient for AdaBoost. tion rates of each algorithm with respect to rounds of boosting or the number of features (for PRM and eigenfaces). We can observe several results: 1). The four versions of boosting can give comparable results. This guarantee to use the simple boostings instead of the original complex one. In detail, the best recognition rate of Boost.0 is : (T , dim ), while Boost.1 is : (T or ), Boost.2 is : (T ), and Boost.3 is : (T ). The results of Boost.2 and Boost.3 are slightly better than Boost.1, and comparable to Boost.0; 2). The different behavior of Boost.1 to Boost.3 shows the effects of various settings of interior parameters on the recognition performance. Boost.3 is preferable in small number of boosting rounds (T ); 3). The recognition accuracy of Boost.3 is not lower than the approximate Bayesian recognition classification, PRM , which gives : accuracy with features. This demonstrates the acceptability of boosting for face recognition; 4). Both boosting and PRM can improve the recognition rates over the standard eigenfaces, however, the boosting algorithms select less features to use (15 features are sufficient); 5). The problem of over-fitting is serious for boosting on face data. When T > , the performance deteriorates obviously. It is interesting to observe that, when the round of boosting is small (T ), the recognition performance is Boost.3 > Boost.2 > Boost.1, where “>” represents “better than”. While Boost.3 degenerates more rapidly if the number of boosting rounds becomes larger (T > ). In above, we use majority voting to solve the multi-class pairwise comrecognition problem. Thus it should do parisons for a single query. In order to speed up the process, = 10 15 = 15 = 30 85 23% = 15 84 11% 15 40 4.2. Experimental Results =01 =0 = 01 =0 5 85 79% 15 15 15 9316 85 98% = 10 85 98% 0.9 0.95 0.85 0.9 0.8 recognition accuracy recognition accuracy 1 0.85 0.8 0.7 NCtop2 NCtop4 NCtop8 NCtop16 NCtop32 0.75 0.7 0.75 5 10 15 20 25 30 number of features 35 40 45 0.65 50 Figure 3. The recognition accuracy of top m classes ranked by NC criterion with respect to the number of features for m ; ; ; ; . we propose to use a constrained majority voting (CMV) strategy. To do it, we must firstly demonstrate the efficiency of class ranking with the nearest center (NC) criterion. It is shown in Fig. 3 the recognition rates of top m classes, for m ; ; ; ; . The selection of m is arbitrary here, and we just set m as the power of . We can find that top classes (with 25 features) can cover : correct classes. Hence it is safe to use only a small number of classes to feed into the multi-class solver – CMV. In our experiments with CMV, we only use top 4 classes, ranked by NC with 25 features. The number of pairto . The wise comparisons is largely reduced from recognition performance with CMV is shown in Fig. 4. We try both Boost.2 and Boost.3 using CMV, denoted as CMVBoost.2 and CMVBoost.3 respectively, and compare their results with Boost.3 in Fig. 2, but here denoted as MVBoost.3. It is interesting to observe that the boosting behavior has changed somewhat with respect to boosting rounds. The best results of boosting with CMV now correspond to 45 rounds of boosting. But the computation time is still reduced explicitly as compared with MVBoost.3. The best recognition rate of CMVBoost.2 is : , and , comparable to the MVBoost.3 of CMVBoost.3 is : : (15 rounds of boosting). 2 32 98 88% 9316 6 85 98% 85 98% 5 10 15 20 25 30 rounds of boosting 35 40 45 50 Figure 4. The recognition accuracy of boosting by using the constrained majority voting (CMV) strategy, with respect to rounds of boosting. = 2 4 8 16 32 = 2 4 8 16 32 0.6 CMVBoost.3 CMVBoost.2 MVBoost.3 86 17% 5. Conclusions and Discussions We have evaluated the AdaBoost algorithm for face recognition. Boosting along dimensions can give comparable results as that using all features in each round. Hence both learning and testing processes can be largely sped up. =0 =01 in some beginning To overcome the problem of t rounds of boosting, two small values are tried for t . From the experiments, it is better to set t : than : . Further more, it makes little difference to weight the features or not in the boosting process along the feature dimensions. To further speed up the multi-class face recognition process, the constrained majority voting (CMV) strategy can be used, which is faster than the traditional majority voting strategy using all pairs, without explicitly losing the recognition accuracy. As a result, both CMVBoost.2 and CMVBoost.3 can be used for fast face recognition. Additional observation is that over-fitting is a serious problem for boosting on face data. Our experimental evaluations should stimulate more research on boosting method itself for face recognition, which can be expected to further improve the face recognition accuracy. 0 01 More recently, a new web site on AdaBoost, http://www.boosting.org, was just opened for researchers to exchange their research results or do discussions. This is useful to stimulate and speed up the research on boosting methods and their applications. 6. Acknowledgements The authors would like to thank Kinh Tieu and Gunnar Ratsch for their helpful discussions on AdaBoost algorithm. References  R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: A survey”, Proc, IEEE, vol. 83, 705-741, May, 1995.  Y. Freund and R. E. Schapire, A decision-theoretic generalization of online learning and an application to boosting. J. Comp. & Sys. Sci., 55(1):119-139, 1997.  G. Guo, S. Li, and K. Chan, Face recognition by support vector machines, Proc. of the International Conferences on Automatic Face and Gesture Recognitionz, 196-201, 2000.  C. Liu and H. Wechsler, Probablistic reasoning models for face recognition, in Proc. of Computer Vision and Pattern Recognition, 827-832, 1998.  B. Moghaddam and A. Pentland, Probablistic visual learning for object representation, IEEE Trans. Pattern Anal. Machine Intell., v. 19, 696-710, 1997.  B. Moghaddam, Principal manifolds and bayesian subspaces for visual recognition, Proc. of IEEE conf. Compter Vison, 1131-1136, 1999.  M. Pontil and A. Verri, “Support Vector Machines for 3-D Object Recognition”, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 20, 637-646, 1998.  A. Samal and P. A. Iyengar, “Automatic recognition and analysis of human faces and facial expressions: A survey”, Pattern Recognition, vol. 25, 65-77, 1992.  L. Sirovich and M. Kirby, “Low-dimensional procedure for the characterization of human faces”, J. Opt. Soc. Amer. A, vol. 4, no. 3, 519-524, 1987.  K. Tieu and P. Viola, Boosting image retrieval, in Proc. of Computer Vision and Pattern Recognition, v. 1, 228-235, 2000.  M. A. Turk and A. P. Pentland, “Eigenfaces for recognition”, J. Cognitive Neurosci., vol. 3, no. 1, 71-86, 1991.  V. N. Vapnik, Statistical learning theory, John Wiley & Sons, New York, 1998.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project