Appearance Based Object Recognition with a Large Dataset using Decision Trees Philip Blackwell1,2 email@example.com 1 Robotic Systems Lab, RSISE Australian National University, ACT 0200, Australia Abstract A framework for a new object recognition system is presented that tailors the classic decision tree to the specific needs of object recognition. The key advantage of such a system is the ability to handle a large database of known objects with real-time performance, a failing of current object recognition systems. A new image database is presented that is challenging both in size (over 100,000 images), and content (highly similar objects). Early results, using the simplest features, are promising with a recognition rate of 89%. 1 Introduction Visual object recognition is a huge field, for which a complete general solution is unattainable. Searching for a known object in a given scene, and identifying a given object are inherently different problems. The problem considered here is the latter. We assume that the given object is represented by one un-occluded, reasonably segmented view. This allows for easier selection of features. However, the general method of decision trees does not require these assumptions. The aim here is to build a system that is capable of scaling to a large number of known objects, with real-time performance. There are many ways of comparing one object with another, much work has been done in this area. But no matter how simple and accurate the comparison is, there is no way of directly applying these methods to large object databases. Scaling with the number of recognisable objects is usually poor. Support Vector Machines (SVMs) have been used successfully in various problems of pattern recognition. They provide a simple data driven approach to classification, finding optimal splitting hyperplanes between sets of feature vectors. They solve a two class problem (i.e. is a face, is not a face), and as such have been successfully used for object detection [Osuna et al., 1997]. David Austin1,2 firstname.lastname@example.org 2 National ICT Australia, Locked Bag 8001, Canberra, ACT 2601, Australia There are ways to consider multi-class problems as extensions of two class problems in [Pontil and Verri, 1998], splitting hyperplanes are defined between each pair of objects, and classification is carried out in the form of a tennis tournament. This achieves good results, but doesn’t allow for a large number of objects, since the number of pairs grows as the square of the number of objects. An alternative method is presented in [Cortes and Vapnik, 1995], where splitting hyperplanes are generated one against k, resulting in linear order, but results are less clear, in general, the tennis tournament is preferred method. Aspect graphs [Faugeras et al., 1992] provide a good way of matching an object against a set of characteristic views. Although the majority of the work in the area is focusing on limited datasets since indexing is otherwise a problem, there have been attempts to use such methods on reasonably large databases. In [Cyr and Kimia, 2004] shock graphs are used as in improved method of indexing, a database of 64 objects is used, and good recognition results are achieved, but recognition still takes up to 45 minutes. Good results were achieved for a large database (100 objects) in [Murase and Nayar, 1995]. A universal Eigenspace is calculated based on all learned images, in which each object is represented as a manifold (for all its different views). New images are projected into the Eigenspace, then classified as the object of the nearest manifold. The main difficulty is in the calculation of the Eigenspace, although it is an offline process that can reasonably be allowed significant time, it scales poorly both with the number of objects and number of views. This will be a limiting factor to the size of the database. Ultimately, to scale efficiently, recognition must be at a higher level than simply matching against known objects. The general approach is to take some high level features that describe the object, and index them in some way. This is known as the indexing problem. Indexing will, in general, be highly sensitive to small changes in the features, this means that the features must be highly Animal, Mineral, or Vegetable? robust to noise, rotation, scale, etc. which is very difficult to achieve for any meaningful dataset. A decision tree is used to achieve fast, scalable recognition. A decision tree is a very simple and intuitive way of splitting data, a “divide and conquer” approach. Although widely used in the machine learning and data mining communities, decision trees haven’t been used much for object recognition. The main reason for this is that the data tends to be split in one dimension at a time, usually thought to be too simple for a task as complicated as object recognition. However, there is no real reason that more sophisticated splits can’t be used. Decision trees have two distinct advantages for object recognition: they naturally handle large databases, and they allow for a rich choice of features, since not all features need to be computed for classification. Section 2 gives a brief overview of decision trees. Section 3 describes the object database used. Section 4 gives an overview of some initial experiments. Section 5 discuses some of the features that are planned to be incorporated into the decision tree. 2 Vegetable Animal Mineral ... Mammal? No Plastic? Yes Yes No ... ... ... Does it fit in my mouth? No ... • Examples are the things being classified, they are labelled to identify the class they belong to. • The dataset, is the collection of examples. • Features are measures that can be derived from examples, e.g. Circularity, could be a measure of how much an example is like a circle. • Questions define boundaries in feature-space, for example if a feature took real numbered values a question might be that some feature of an example is less than 0.5. • Nodes make up the tree, they contain examples, and if they are not terminal, contain a question the answers to which are represented as further nodes. The tree is built by successively finding questions to split the dataset, in meaningful ways, until it can be split Is it a piece of Lego? No ... Decision Trees The concept of a decision tree is very simple and intuitive, and a simple example is a good starting point for the unfamiliar reader (Figure 1). Playing out very much like a game of “twenty questions”. Where a decision tree in the interrogator’s head is traversed according to the answers to questions posed at each node. Decision trees are a classic machine learning and data mining technique, most work stemming from CART [Breiman et al., 1984], and ID3, C4.5 and C5 developed by Quinlan [Quinlan, 1990; 1992]. There are two distinct processes that need to be implemented; building the tree, and then making classifications based upon it. Before describing these processes, some terminology is described. Yes Yes Is it black? ... Figure 1: A small part of an example decision tree. no more (either because the features of all examples are equal, or all examples belong to the same class). The building is top down, at each point information theory heuristics are used to decide at any point what is likely to be the “best” split. To classify an unseen example the tree is traversed depending on the answers to the question at each node, when a leaf node is reached the classification is made based on the learned examples in that node. Good classification (both in terms of accuracy and speed) depends entirely on the tree. To achieve fast, accurate recognition, the tradeoff is a complicated, possibly slow training process. Ideally, there is no real limit to what questions to ask at any node, but there must be with time constraints. However, the aim here is to push the boundaries of what questions can be asked. The key difficulty with a decision tree is that at each node there are many possible good ways to split the data, there is no way of knowing which split to aim for. Where as with, for example, an SVM finds the best way to differentiate between two given sets of examples. 3 Object Database A common problem in computer vision is having overly simple test data. In the case of object recognition, the database of known objects can be trivially small, or distinct. Often, object recognition systems are only tested on a handful of objects, where scalability cannot be Figure 3: Four grey gears (scaled to similar sizes). Figure 2: Four yellow bricks. tested, and recognition rates can be biased. There aren’t many large test image databases, and of the available databases, none really allow for proper testing of the aims of this project. Probably the most widely used is COIL-100 [Nene et al., 1996], having 100 objects with 72 views of each object, but it has some limitations. Firstly, the 72 views are quite limited, simply rotating the object around its central axis. Secondly, the objects are highly distinct. There are several simulated databases, that is databases of CAD and VRML models. These are good, because arbitrarily many views can be generated, but the absence of noise is hard to properly account for. So a new database has been generated that has many objects from a similar class (all Lego), many views of each object (in random positions), and real noise. An important aspect is that the training and testing samples are chosen, randomly, from one database of typical views of the object. The images are taken with a 640x480 webcam, with all settings on automatic. The resulting images, once cropped, are approximately 20-100 pixels in width and height. Reasonable segmentation is made possible by having the objects on an otherwise empty, white platform. The segmentation is still automated, and of course, not perfect, shadows usually remain, to differing extents, specular reflections are often removed. Occlusion is avoided by only selecting images where the object is completely surrounded by the background. Lighting is controlled up to a point; there are halogen lamps directly above the platform in a fixed position, always on, but the platform is next to a window, in a well lit room. Some example images (Figures 2, 3 and 4) of different types of bricks in the database. 3.1 Acquisition of images To give the database the desired features, the method for acquiring images was somewhat unconventional. The camera was in a fixed position, and each Lego brick was repeatedly dropped from a height of about 15cm to fall on a platform in a random pose. Obviously, the machine had to be made out of Lego itself (Figure 5). Figure 4: Four wedge pieces. 3.2 Size The database is large both in number of classes (89), and number of examples per class (at least 1000), for a current total of 100,000 images. This allows for thorough testing of performance (particularly scalability) both in terms of preprocessing/learning, and recognition. Also, the difficulty of recognition increases with database size. If there aren’t enough objects in the database, there will be trivial splits in feature space. 3.3 Range of Views Given the subject matter, the method used here to acquire the images restricts the range of views - it is extremely unlikely that a block will land on anything other than one of its sides, and since the camera is in a fixed position it will be impossible to get a directly overhead view of a block. However the limitation is nothing like that of most databases, where views are limited to a fixed set of, usually regularly separated around a viewing sphere. The database is representative of the views of the object that are likely to be seen. For example consider images of a block sitting on its base, and sitting on its end (Figure 6). The former will presumably be represented many more times in the database than the latter, and to good effect: the recognition system will be able to take into account that it is more important to be good at recognising the more common poses. Of course, this doesn’t mean that the database accurately represents the likely poses of Lego blocks “in the wild”, but the results obtained using a database with some form of real world bias will be more representative of the results to expect when using real world data. Figure 6: Two poses of an 8x2 block. 3.4 Lego Characteristics The Lego bricks in general are quite simple objects, in some ways this makes the task of recognition quite simple, but in others quite difficult. The colour of the bricks is very distinctive; a yellow object in this database is quite obviously yellow. On the other hand, the simplicity of the objects can make them difficult to differentiate; the difference between a six units long axle, and a five units long axle is in many ways quite small. Another point is that everybody can get Lego. If people want to conduct further experiments to evaluate their recognition systems when trained with this database, they will have easy access to identical objects. 4 4.1 Figure 5: First dropping a piece, then taking a picture, then brick is pushed off the (raised) platform so the process can repeat. Preliminary Results Approach The approach taken is, at a high level, quite conventional. Each object in the database is given a list of values for various features, and then, using a subset of the dataset a decision tree is built, based on those features. The tree is then tested using the remaining data. The typical approach is to define an appropriate set of features for the dataset. Although applying this general technique to a particular dataset can be a challenging and worthwhile problem, that is not the aim here. Rather, the aim is to show that using simple features a decision tree system is capable of achieving good recognition rates on a challenging, large datasets with real-time performance. 4.2 Features Since the aim isn’t to build a system that can recognise Lego bricks for the sake of recognising Lego bricks, the emphasis is not on developing new features that are descriptive for Lego bricks. The features used are fairly simple, and mostly well known. Terminology used is mainly taken from [Csetverikov, 2003]. Figure 7: The contour, and convex hull of an image. Images are assumed to be one “blob” of 8-connected pixels. Most of the features are shape based using features derived from the outer contour and convex hull (Figure 7). Colour Due to the simplicity of the colour in the database only the simplest colour features were used. Three features (Red, Green, and Blue) were derived from the average values of the respective image channels, normalised by dividing by the average intensity. Elongation The ratio of the the major axis length to the minor axis length is called Elongation. The axis lengths were taken as the width and height of the rectangle fit around the convex hull, such that the Elongation was maximised. Circularity Another commonly used shape based feature is the ratio of the area to the perimeter squared, specifically 4πA p2 . Two measures of this direct feature were taken: ContourCircularity, and ConvexHullCircularity using the measurements for area, and perimeter from the contour and convex hull respectively. A third, similar measure, Circularity, was taken incorporating the diameter (from the convex hull), and the area (the numA ber of pixels) πd 2 . No argument is made as to which of these features is a more appropriate measure of circularity - it doesn’t really matter. It is up to the decision tree to decide which feature will generate a more informative split at any given node. Ratios of Areas The ratio of the area of the convex hull to the area inside the contour is a feature called Roughness. The ratio of the total area to the area of the contour is a feature called Filled. Figure 8: Successive relaxed convex hulls. possible) . The measure RelaxedRatio(N) was taken for N ∈ [3, 8] as the ratio of the area of the convex hull to the area of the relaxed N -sided convex hull. It should be noted that this algorithm does not find the smallest N -sided polygon that contains the convex hull, nor does is it intended to. Gradient The above features take no account of the internal structure of the object. Various feature detectors were tried, but without much success. Due to the simple monotone nature of the images, and the limited resolution available from the webcam even edge detection proved very difficult, especially with the same parameters used across the database. So only two very simple features were used; AverageGradient is defined as the average intensity of the Sobel gradient, PercentEdge is defined as the ratio of the number of edge pixels (found by the Canny edge detector) to the total number of pixels. Symmetry Measures of symmetry are taken about the major, and minor axis as defined by the moments. For shape, the symmetry is defined as the ratio of the area of the convex hull to the area of the convex hull including all points mirrored about the respective axis, called CHMajorAxisSymmetry and CHMinorAxisSymmetry. A measure of symmetry that takes into account the internal structure is defined as the Normalised Cross Correlation of the images and itself mirrored about the respective axis, called MajorAxisSymmetry and MinorAxisSymmetry. 4.3 Relaxed Convex Hull One measure used, that wasn’t found in the literature attempts to measure how much the convex hull is like an N sided polygon for any given N . The convex hull is repeatedly relaxed1 (Figure 8), from its original number of vertices to three (if 1 The relaxation is a greedy process at each stage choosing the relaxation which least increases the area of the convex hull. Results Despite the aim of 1000 examples of each class, the collection process was not perfect, some classes ended up with significantly more examples and some contained a few errors. These results (Figure 9) are based on a finalised database of 89 classes each with 900 examples. When training was performed with N examples per class, testing was performed with the remaining 900 − N . The overall classification rates for each tree is defined as the 89 continued from previous page 88 Image Recognition rate (percent) 87 Class description beam-4x1 86 85 84 beam-6x1 83 82 81 100 200 300 400 500 600 700 800 beam-8x1 Number of examples used for training Figure 9: Recognition Rates for the whole database (89 classes). percentage of correctly classified (unseen) tests. The classification of a particular example is taken as the modal class in the leaf node of the decision tree that the example descends to (in almost all cases, this modal class was the only class). Classification rates are given per block (Figure 4.3) for the most exhaustive test (800 training examples per class). Table 1: Classification rates per block. For each type of block an example image is given and classification rates are given for each colour, as well as for the colour blind tests. Image Class description angleblock-2x2 anglebracket-2x2 axle-2 axle-3 axle-4 axle-5 axle-6 axlebrace-2x2 beam-2x1 Recognition black no colour grey no colour black no colour black no colour black no colour black no colour black no colour grey no colour black blue red yellow rates 0.82 0.76 0.83 0.76 0.91 0.90 0.96 0.94 0.97 0.93 0.98 0.96 0.99 0.97 0.84 0.77 0.82 0.82 0.87 0.79 beamxhole-2x1 block-2x1 block-2x2 block-3x2 block-4x2 block-6x2 block-8x2 blockcurve-3x2 cam-4 compoundgear-half Recognition no colour black green red no colour black blue red no colour blue no colour green no colour black green yellow no colour black blue green red yellow no colour black blue green red yellow no colour black blue green red yellow no colour black blue red yellow no colour black yellow no colour green no colour grey no colour grey rates 0.47 0.83 0.98 0.92 0.76 0.91 0.97 0.96 0.85 0.98 0.92 0.76 0.50 0.74 0.73 0.79 0.55 0.86 0.93 0.79 0.90 0.91 0.70 0.80 0.84 0.67 0.88 0.85 0.53 0.93 0.95 0.93 0.94 0.90 0.78 0.90 0.97 0.93 0.97 0.80 0.95 0.99 0.90 0.65 0.58 0.96 0.93 0.93 continued from previous page Image Class description connector-4 connector-l endaxle-2x1 flat-1x1 flat-2x1 flat-2x2 flat-4x1 flat-4x2 flat-8x1 flathole-4x2 flathole-6x2 gear-12 gear-16 gear-24 gear-40 gear-9 hingebase-3x1 holeconnector-3 holeconnector-5 hub-2x1 hub-medium hubtyre-medium hubtyre-small continued from previous page Recognition no colour grey no colour red no colour black no colour grey no colour grey no colour green grey no colour grey no colour yellow no colour grey no colour grey no colour grey no colour grey no colour grey no colour grey no colour grey no colour grey no colour black no colour black grey no colour grey no colour yellow no colour white no colour yellow no colour yellow rates 0.85 0.97 0.96 0.99 0.95 0.94 0.87 0.82 0.74 0.85 0.78 0.95 0.92 0.85 0.85 0.79 0.95 0.90 0.97 0.94 0.93 0.96 0.98 0.98 0.88 0.92 0.94 0.93 0.98 0.98 0.99 0.99 0.89 0.88 0.84 0.76 0.94 0.92 0.91 0.98 0.95 0.95 0.88 0.99 0.98 0.94 0.94 0.97 Image Class description lbeam-5 ornament-8 peg-1 peg-2x2 pegend-1 pulley-2 pulley-6 pulley-small rack-10 reversewedge-2x1 reversewedge-3x2 sensorpush-3x2 wedge-2x1 wedge-2x2 wing-1 Recognition no colour black no colour grey no colour grey no colour yellow no colour blue no colour grey no colour grey no colour grey no colour grey no colour black yellow no colour black no colour grey no colour black yellow no colour green no colour orange no colour rates 0.96 0.92 0.90 0.90 0.86 0.94 0.93 0.98 0.96 0.97 0.78 0.94 1.00 0.91 0.98 0.83 0.82 0.83 0.80 0.63 0.74 0.60 0.81 0.77 0.87 0.64 0.66 0.72 0.63 0.76 0.67 1.00 0.99 As can be seen there are a few classes which make up the majority of errors. In many cases this is expected, for example, there are many views from which a beamblack-2x1 looks just like a block-black-2x1. Another measure of performance is the size of the produced tree, we measure this by looking at the total number of terminal nodes (Table 2). It is quite clear that there are many classes which are easily handled, and some which are more troublesome. In general, these issues can be dealt with at a lower level through matching techniques. Colour Blind Part of the beauty of a decision tree is that high level splits can split well on simple attributes, and divide one large dataset into several small datasets. Unfortunately this makes our main objective, scalability, hard to test. Table 2: Terminal nodes. Number of examples (per class) 100 200 300 400 500 600 700 800 Table 3: Performance for the first tests (89 classes) Number of terminal nodes 757 1,311 1,809 2,238 2,658 3,101 3,452 3,842 Number of training examples (per class) 100 200 300 400 500 600 700 800 83 81 Recognition rate (percent) Avg classification time 0.2ms 0.28ms 0.55ms 0.66ms 0.9ms 0.6ms 0.67ms 0.72ms processed, and took an average of 0.1 seconds to compute for each image. This could probably be considerably improved upon since all code was implemented from scratch (without taking advantage of optimised image processing libraries). The generation of the decision trees and average classification time were timed for several training examples per class (Table 3). 82 80 79 78 77 76 75 74 73 100 200 300 400 500 600 Number of examples used for training 700 800 Figure 10: Recognition Rates for the limited database (60 classes) with the feature of colour removed. In this case the early splits will inevitably be colour based, and due to the subject matter, will be quite accurate. In effect this divides one reasonably large dataset neatly into smaller more manageable datasets. Since several objects only appear in one colour, the recognition of them is made simple, for example there is never any need to differentiate between a hub and a gear, since they are different colours, despite the fact that they are both circular. To alleviate this the system was tested in an effectively colour blind manner. That is, classes were relabelled to avoid any distinction based on colour2 , and the three features relating to colour were removed. The resulting database was reduced to 60 classes (Figure 10). Although some of the classes now had more examples, the number used to learn and test remained the same (randomly selected from all samples of the class). The results take the same format as above. 4.4 Avg time to build tree 88s 176s 407s 535s 730s 555s 674s 778s Performance r All performance benchmarks were taken on an Intel r 4 CPU 3.00GHz, with 1-Gigabyte of RAM, Pentium running Debian GNU/Linux. The features were all pre2 For example block-black-4x2 and block-green-4x2 were both labelled block-4x2 There are some small anomalies, but this is probably due to varying load on the testing system. Again, this could be significantly improved upon. For one thing the code is implemented in pure python, re-implementation in C would offer significant performance gains. It is obvious though, considering the feature calculation time, and the classification time, that at this point the bottleneck is in the feature computation. 5 Future Work Rather than using one of the standard “off the shelf” decision tree systems, a decision tree system has been implemented in Python. This offers many opportunities to improve upon the conventional approaches used. At the current stage, the decision tree system is loosely an implementation of that described in [Breiman et al., 1984], and at a high level this structure will remain the same. There are several possible improvements to this implementation, some taken from the machine learning and data mining literature, others somewhat specific to the task of object recognition. There are two key advantages using decision trees in object recognition, over the traditional general use: • Classifications can be validated. That is we have access to more information than just the features, once a classification has been made it can be tested against the learned examples in that terminal node. • One of the main applications of decision trees is finding patterns in large datasets that can easily be interpreted by domain experts. That is, at least partially, why conventional trees often make overly simple splits. In this application simple intuitive splits are less of a priority. Lazy Feature Evaluation Features only need to be computed for objects if they are needed by the decision tree. This in itself isn’t going to be a huge performance improvement, but only by doing this are later improvements possible, ultimately, this frees us from being limited to a fixed number of features. Computational Cost Based Feature Selection The classical use of decision trees is based on datasets where the attribute values are given. The difference in object recognition is that the features must be computed, but different features have different computational costs. By incorporating the computational cost into the selection of feature, deeper decision trees, which would normally be considered worse, may be considered better if though asking more questions, the questions are cheaper. Multiple Questions Per Node Going back to the example of twenty questions, it was suggested that the perfect player has access to the perfect tree, but really, is there such a thing? Quite often with human players, even though they might be asking a reasonable question, ask a question that for which the answer is not entirely clear. In which case the interrogatee will ask the interrogator to “ask another question”. Now obviously, this fuzzy boundary problem will not have a precise solution, but an attempt could be made. Suppose for a moment that at each node rather than making a pair of child trees for the best question considered, a pair was made for every question considered. Statistical analysis could be done on each set of answers to determine a range around the boundary of the answer that was considered “uncertain”. Now, when classifying at that node, if the answer to the first question was “uncertain”, the second question could be tried, and so on. A similar system was considered in [Kohavi and Kunz, 1997]. There are several reasons why this isn’t a widely used approach in the data mining community, but they might be less important in object recognition. In particular, it requires an increased number of nodes, but there are ways to limit this, especially if misclassifications are possible to detect. Secondly, the tree is generally harder to interpret by humans, but this is much less of an issue in this application. Dynamic features One problem with decision trees, and indexing algorithms in general, is that the number of features is generally required to be static. Usually the input to the learning system is a set of N -tuples, for a fixed set of N features. In general, coming up with many features isn’t too hard, but indexing them is. Several options are available with decision trees to expand the list of features as the tree is descended and Figure 11: A trivial example where a multiple dimension split is clearly optimal. the dataset is divided. This allows for features that are only applicable once other conditions have been met. For example, you could imagine a question that measured relationships between polygonal faces. This feature could be only asked at nodes where it had already been established that there were polygonal faces. Another option is parameterised features, again, as the tree is descended, the objects that are being split are somehow closer, more work may need to be done to meaningfully split the data. There appears to be little work along these lines in the literature, mainly because it is only an option in a field such as object recognition when there is more to an entry than its N features. Multi-dimensional Splits A common failing with decision trees is that they only split on one dimension, often this simply isn’t good enough(Figure 11). Again, there are several reasons that multi dimensional splits aren’t widely used in traditional decision trees. Firstly, it has generally been thought to be computationally prohibitive. The problem lies mainly in the choice of multiple dimensions - there are too many combinations. Secondly, again the trees are of less use to domain experts, since the multiple dimension splits are harder to interpret. As previously, the intuitive splits do not concern us so much, but difficulty still lies in the combinatorial problems. But this is a current area of research, in [CantuPaz, 2003] evolutionary algorithms are used to generate so called oblique trees (i.e. not axis parallel). There are many such approaches that could be applied, and room for new approaches to this problem. Traditional Machine Learning Improvements There are several techniques that can sometimes be used to improve the results of classifiers. Voting techniques, where an ensemble of classifiers are built, such as Boosting and Bagging often improve results. Also different techniques of tree pruning can be useful. However, it is best not to apply these methods blindly, as shown in [Bauer and Kohavi, 1999] can sometimes be detrimental, especially if specialisations like the above have been applied. 6 Conclusion A new object recognition system was built, capable of promising recognition rates on a large challenging image database. The performance is easily sub-second for recognition and training time very short, considering the size of the database. Decision trees are shown to handle the task of splitting up a large database very well. Several avenues of research have been considered, allowing for utilisation of the rich data available in image data within the structure of the decision tree. Clearly, there is much work that can be done in this area. Acknowledgements This work was supported by funding from National ICT Australia. National ICT Australia is funded by the Australian Government’s Department of Communications, Information Technology and the Arts and the Australian Research Council through Backing Australia’s Ability and the ICT Centre of Excellence program. References [Bauer and Kohavi, 1999] Eric Bauer and Ron Kohavi. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1-2):105–139, 1999. [Breiman et al., 1984] L. Breiman, J. H. Friedman, R.A.Olshen, and C.J. Stone. Classification and Regression Trees. 1984. [Cantu-Paz, 2003] Erick Cantu-Paz. Inducing oblique decision trees with evolutionary algorithms, 2003. [Cortes and Vapnik, 1995] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. [Csetverikov, 2003] Dmitrij Csetverikov. Basic algorithms for digital image analysis: a course, 2003. [Cyr and Kimia, 2004] C. M. Cyr and B. B. Kimia. A similarity-based aspect-graph approach to 3d object recognition, April 2004. [Faugeras et al., 1992] Olivier Faugeras, Joe Mundy, Narendra Ahuja, Charles Dyer, Alex Pentland, Ramesh Jain, and Katsushi Ikeuchi. Why aspect graphs are not (yet) practical for computer vision. In CVGIP Image Understanding Workshop, volume 55, pages 212 – 218, March 1992. [Kohavi and Kunz, 1997] Ron Kohavi and Clayton Kunz. Option decision trees with majority votes. In Doug Fisher, editor, Machine Learning: Proceedings of the Fourteenth International Conference. Morgan Kaufmann Publishers, Inc., 1997. [Murase and Nayar, 1995] H. Murase and S. K. Nayar. Visual learning and recognition of 3d objects from appearance. International Journal of Computer Vision, 14(1):5–24, 1995. [Nene et al., 1996] S. Nene, S. Nayar, and H. Murase. Columbia object image library: Coil, 1996. [Osuna et al., 1997] E. Osuna, R. Freund, and F. Girosi. Training support vector machines:an application to face detection, 1997. [Pontil and Verri, 1998] Massimiliano Pontil and Alessandro Verri. Support vector machines for 3d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):637–646, 1998. [Quinlan, 1990] J. R. Quinlan. Induction of decision trees. In Jude W. Shavlik and Thomas G. Dietterich, editors, Readings in Machine Learning. Morgan Kaufmann, 1990. Originally published in Machine Learning 1:81–106, 1986. [Quinlan, 1992] J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1992.