Automatic Defect Detection for TFT-LCD Array Process Using

Automatic Defect Detection for TFT-LCD Array Process Using
Int. J. Mol. Sci. 2011, 12, 5762-5781; doi:10.3390/ijms12095762
International Journal of
Molecular Sciences
ISSN 1422-0067
Automatic Defect Detection for TFT-LCD Array Process Using
Quasiconformal Kernel Support Vector Data Description
Yi-Hung Liu 1,* and Yan-Jen Chen 2
Department of Mechanical Engineering, Chung Yuan Christian University, Chungli 320, Taiwan
Photo Engineering Department, AU Optronics (AUO) Corporation, Taoyuan 325, Taiwan;
* Author to whom correspondence should be addressed; E-Mail:;
Tel.: +886-3-265-4306; Fax: +886-3-265-4399.
Received: 5 July 2011; in revised form: 8 August 2011 / Accepted: 16 August 2011 /
Published: 9 September 2011
Abstract: Defect detection has been considered an efficient way to increase the yield rate
of panels in thin film transistor liquid crystal display (TFT-LCD) manufacturing. In this
study we focus on the array process since it is the first and key process in TFT-LCD
manufacturing. Various defects occur in the array process, and some of them could cause
great damage to the LCD panels. Thus, how to design a method that can robustly detect
defects from the images captured from the surface of LCD panels has become crucial.
Previously, support vector data description (SVDD) has been successfully applied to LCD
defect detection. However, its generalization performance is limited. In this paper, we
propose a novel one-class machine learning method, called quasiconformal kernel SVDD
(QK-SVDD) to address this issue. The QK-SVDD can significantly improve generalization
performance of the traditional SVDD by introducing the quasiconformal transformation
into a predefined kernel. Experimental results, carried out on real LCD images provided by
an LCD manufacturer in Taiwan, indicate that the proposed QK-SVDD not only obtains a
high defect detection rate of 96%, but also greatly improves generalization performance of
SVDD. The improvement has shown to be over 30%. In addition, results also show that the
QK-SVDD defect detector is able to accomplish the task of defect detection on an LCD
image within 60 ms.
Int. J. Mol. Sci. 2011, 12
Keywords: thin film transistor liquid crystal display; array process; defect detection;
machine learning; support vector data description
1. Introduction
Over the past decade, TFT-LCD has become a very popular flat panel display in our daily life due
to its advantages over the CRT monitor such as lower power consumption and smaller volume. With
the increase of demand, every LCD manufacturer has made efforts to produce LCD panels of high
quality, especially for larger-size LCD panels. Recently, an inspection/integration department has been
introduced to set in the LCD manufacturers in order to ensure product quality. Yet the product yield
still has space to improve because the task of defect inspection heavily relies on human observers in
current practice: the inspection reliability depends on experience and physical conditions of engineers.
Therefore, automatic optical inspection (AOI) has become a solution for real-time and robust defect
inspection, and inspection method/scheme also plays a critical role in AOI in addition to the hardware
design of AOI equipment.
Previously, much of the literatures have dealt with the so-called mura defect, e.g., [1–3]. The mura
defect may be spot-type, line-type, or even region-type, and can only be observed after LCD panels are
driven to a constant gray level. Hence, the task of mura defect inspection can only be executed in the
cell process, the second process of TFT-LCD manufacturing. However, once a panel is found to have
mura defects, the panel will be discarded if not repairable, resulting in a great increase in production
costs. In practice, most mura defects are due to the defects that already occur in the former process, the
array process. For example, if the surface of a panel is scratched by a deformed cassette or glass
particles in the array process, the gate electrode of the panel is most likely to become an open circuit. If
this is the case, a line-type mura defect (usually a white line) will be observed in cell process.
Fortunately, the panels with defects can still be fixed by rework if the defects in array process are
detected in real time. Thus, building a scheme that can robustly detect defects from the surface images
of panels, which is also the focus of this paper, is still critical to every LCD manufacturer to date,
which is also the focus of this paper.
The array process is the first process in TFT-LCD manufacturing which consists of five successive
engineering processes: gate electrode (GE), semiconductor electrode (SE), source and drain (SD),
contact hole (CH), and pixel electrode (PE) engineering. Each engineering process is responsible for
generating a distinct pattern on a glass substrate. The image of a normal GE pattern is shown in Figure 1.
Moreover, various defects would occur in each engineering process due to physical factors such as
scratch, abnormal photo-resist coating, and particles. Figure 2 shows some defect images. There are
many rectangle regions on an LCD panel (please refer to Figure 3 for clearer illustration). We call
them pixel regions (PRs). The actual width of each PR is around 60 micrometers. If a defect appears in
an image, the defect would appear within one single PR or simultaneously on several PRs in the image.
Therefore, to judge whether the acquired image contains a defect or not, we just need to judge whether
the PRs in the image are all normal. If all the PRs are found to be normal, the image is normal;
otherwise the image is defective. Therefore, the defect detection problem can be regarded as a binary
Int. J. Mol. Sci. 2011, 12
classification problem: normal PR and defective PR classification. When using a binary classifier to
solve this problem, e.g., the support vector machine (SVM) [4], one has to collect a set of PRs for
training. Normal PR patterns are easy to collect and they involve only small variations in uniformity.
However, due to diverse defect modes and their occurrence frequencies, the available defective PRs
are in general under-sampled. As a result, the true distribution of the defective PR patterns is difficult
to obtain. Compared with binary classification strategy, one-class classification (also known as novelty
detection) would be more appropriate when facing the situation where one of the two classes is
under-sampled [5,6].
Figure 1. Image of a normal gate electrode (GE) pattern.
Figure 2. Examples of defect images.
Figure 3. An illustrative example for pixel region segmentation.
Int. J. Mol. Sci. 2011, 12
Liu et al. [7] have recently applied the one-class classification strategy to the defect detection in
LCD array process, and achieved a high defect detection rate on the images in GE engineering. Their
system is based on the locally linear embedding (LLE) [8] and the support vector data description
(SVDD) [9], where LLE is used for dimensionality reduction and feature extraction, and SVDD serves
as the defect detector. SVDD is a one-class machine learning method. It requires only normal PR
patterns in its training stage. By introducing a kernel function satisfying the Mercer condition, SVDD
is able to find a flexible boundary to tightly enclose all or most of the normal PRs in the original
pattern space during training. Then, the boundary is used to distinguish normal PRs from defective PRs
in the testing stage. If a test PR pattern falls inside of the boundary, it is accepted as a normal PR;
otherwise it is rejected as a defective PR. While this SVDD-based decision making strategy is simple,
it suffers from two critical problems related to testing time complexity and generalization performance.
(1) Testing time complexity. The testing time complexity of SVDD is linear in the number of
training patterns, which makes SVDD unable to classify a large number of test patterns within a
short period of time, especially for the application of LCD array defect detection where the daily
throughput is considerably high. A fast SVDD (F-SVDD) [10] has recently been proposed to
address this issue.
(2) Generalization performance. Recall that SVM embodies the principle of structural risk
minimization in its formulation. Hence, SVM is capable of finding a hyperplane with maximum
margin of separation in a kernel-induced feature space, thus having better generalization
performance than the traditional learning machines based on empirical risk minimization.
However, the formulation of SVDD does not consider the factor of class separation. More
precisely, SVDD is unable to find a decision boundary with maximum margin of separation.
The problem is not on the SVDD itself but due to the fact that only patterns of one single class
are available during training in a one-class classification problem. Consequently, although
SVDD can provide a target set with a compact description [11], satisfactory generalization
performance cannot be guaranteed, which is a shortcoming of SVDD and remains to be solved.
Although SVDD has shown success in defect detection in [7], the detection rate still has space to
improve. In addition, increasing product yield of 1% can save an LCD manufacturer at least one
million US dollars per month, according to the internal evaluation of the LCD manufacturer we
cooperate with. Accordingly, the issue of how to further improve generalization performance of SVDD
would be worth studying from both the theoretical and practical aspects. In this paper, we present a
method to address this issue by introducing a quasiconformal transform of a kernel and magnifying the
Riemannian metric around the decision boundary of SVDD. The modified version is named
quasiconformal kernel SVDD (QK-SVDD), which will be introduced in detail in Section 2. Then we
apply the proposed QK-SVDD to the LCD array defect problem described above. Remarkable
improvement in generalization performance has been indicated by our experimental results.
Int. J. Mol. Sci. 2011, 12
2. Results and Discussion
2.1. Basic Idea
According to real LCD manufacturing conditions, the number of normal LCD panels exceeds
greatly the number of defective LCD panels. Therefore, the normal PRs greatly outnumber the defective
PRs. As a result, the collected data set for training would be imbalanced if a two-class classification
approach is adopted, the SVM by Vapnik [4] for example, the class imbalance problem occurs.
The class imbalance problem has attracted growing attention in the machine learning community. In
a two-class classification problem, the class imbalance typically occurs when there are more instances of
one (majority) class than the other (minority). This problem also occurs in a multi-class classification
application if imbalances exist between the various classes. Most standard classification algorithms
assume or expect balanced class distributions or equal misclassification costs. Consequently, those
algorithms would tend to provide severely imbalanced degree of testing accuracy if the training set is
severely imbalanced.
Previously, several workshops/special issues have been held/published to discuss and address this
problem [12–15]. Various approaches for imbalanced learning have also been proposed, such as
sampling (e.g., [16–18]), integration of sampling with ensemble learning (e.g., [19,20]), cost-sensitive
learning (e.g., [21–23]), and SVM-based approach (e.g., [24–28]). These discrimination-based
(two-class) approaches have shown to be useful in dealing with class imbalance problems. In addition,
several works have also suggested that a one-class learning approach can provide a viable alternative
to the discrimination-based approaches [29–33]. Interested readers can refer to [34] for a broad
overview on the state-of-the-art methods in the field of imbalanced learning.
In practice, in addition to the class imbalance problem, the LCD defect detection also suffers from
another critical problem resulting from the absence of negative information. To facilitate the following
problem description, the normal PR class and the defective PR class are defined as the positive class
and negative class, respectively.
The main difference between a normal PR and a defective PR is that their appearances are
apparently different, as can be observed from Figure 4. The color (or gray level) of a normal PR is
nearly uniform, implying that the variation of the gray-level distribution of normal PRs is very small.
On the contrary, the surfaces of defective PR not only contain various kinds of textures, but also vary
greatly in color, implying that the variation of the true distribution for negative class in the data space
is very large. Collecting a set of positive training data that can represent the true distribution of
positive class is easy, because: (1) the variation of positive-class distribution is very small; and (2)
most of the LCD panels are normal (the number of normal PRs is considerably large). Therefore, the
positive class can be well-sampled during the data collection stage in real practice. However,
representative defective PRs are difficult to obtain in practice for several reasons. For example, there
are numerous types of defects in array process, more than 10 types at least. However, not all the
defects would occur frequently. Some of the defects seldom appear, for example the defect caused by
abnormal photo-resist coating (APRC). The defect “APRC” seldom occurs, because
equipment/process engineers maintain the coating machines periodically. Even so, the coating
machines might still break down occasionally. As a result, the number of available images containing
Int. J. Mol. Sci. 2011, 12
the APRC defects is quite limited. But, the APRC defect has a large variation in color and texture.
Unfortunately, limited APRC examples cannot stand for all kinds of APRC defects. Therefore, the
collected negative training data are most likely under-sampled. Here, the “under-sampled” means that
the collected negative training set cannot represent the true negative-class distribution in the data
space, which is the problem of absence of negative information. Due to this problem, numerous false
positive (i.e., missing defects) will be produced if a two-class classification approach (e.g., a binary
SVM) is applied to the LCD defect detection, which has been evidenced by the results reported in [7].
Compared with two-class classification approach, novelty detection approach is a better choice.
Figure 4. Different defect images contain different numbers of defective pixel regions (PRs).
The normal and defective PRs are bounded with blue and red rectangles, respectively.
Novelty detection is one-class classification [10,35], which is to solve the conventional two-class
classification problems where one of the two classes is under-sampled, or only the data of one single
class can be available for training [5,6,9–11,35–40]. As analyzed above, for the LCD defect detection
application, the normal PRs can be well-sampled, while the defective PRs are in general undersampled.
Therefore, the LCD defect detection can be treated as a typical novelty detection problem.
Accordingly, one-class classification is a better solution.
To summarize, it can be seen that the LCD defect detection suffers from two problems
simultaneously: one is the class imbalance problem, and the other is the problem of the absence of
negative information. For the first problem, there have been many sophisticated solutions, including
sampling, cost-sensitive learning, SVM-based, and one-class learning approaches. However, the only
solution to the second problem is the novelty detection approach (i.e., one-class classification
approach). Therefore, one-class classification would be a more appropriate approach to the LCD defect
detection application.
One-class classifiers (also called novelty detectors) are to find a compact description for a class
(usually being referred to target class). So, a one-class classifier is trained on the target class alone. In
a testing stage, any points that do not belong to this description are considered as outliers. In this paper
the normal PRs are treated as target data, while defective PRs are treated as outliers.
There are several approaches for one-class classification, such as density approach (e.g., Gaussian
mixture model [5]), boundary approach (e.g., SVDD [9] and one-class SVM [40]), neural network
approach [6,36], and reconstruction-based approach (e.g., the kernel principal component analysis for
novelty detection [35]). It has been proven in [9] that when a Gaussian kernel is used, the SVDD
Int. J. Mol. Sci. 2011, 12
proposed by Tax and Duin [9] is identical to the one-class SVM proposed by Schölkopf et al. [40].
This paper focuses on the SVDD since it has been applied to the same application in the works
of [7] and [10], and has shown to be effective in detecting defective PRs. However, as discussed in
Section 1, generalization performance of SVDD is limited. Therefore, the intent of this paper is on
proposing a method to improve generalization performance of SVDD, and applying the improved
SVDD to the LCD defect detection treated as a novelty detection problem. The improved SVDD is
called quasiconformal kernel SVDD (QK-SVDD). Note that the QK-SVDD and SVDD are not two
independent classifiers. To obtain QK-SVDD, one has to train an SVDD first, which will be
introduced in Section 2.4. In the following part of the paper, we first introduce the defect detection
scheme, and then derive the proposed method in details.
2.2. Overview of the Defect Detection Scheme
The array process consists of five engineering processes, each of which contains the same five
processes, including cleaning, thin film deposition, photolithography (which contains three
sub-processes: photo resist coating, exposure, and developing), etching, and stripping. By taking GE
engineering as example, in the following we introduce our defect detection scheme depicted in Figure 5.
Figure 5. Overview of the defect detection scheme.
2.2.1. Image Acquisition
After a sheet of glass substrate containing six LCD panels completes the photo process, it will be
carried to a stocker by a rail-guided vehicle (RGV). At the stocker, a cassette containing 25 sheets of
glass substrates is carried to the inspection equipment. After the cassette arrives, the inspection
equipment will start to randomly pick six out of the 25 sheets, and each of the six chosen substrates
will be put on an X-Y-theta stage by an autoloader, one at a time. Above the stage, there are four TDI
(Timing Delay Integration) line-scan cameras equipped on the inspection equipment. The cameras
begin to scan its surface once a sheet of glass substrate is placed on the stage. The scanned analog
Int. J. Mol. Sci. 2011, 12
signals are transferred to digital signals (images) via an analog-to-digital (A/D) converter. Usually, it
would take around 4 minutes to scan a sheet of glass substrate. These images will be stored in the
image computer temporarily. After the six glass substrates are scanned, all the digital images will be
stored in an image database. Each image is a 768 × 576 pixel 24-bit/pixel colored image (JPEG
format), and has the resolution of around 1.15 (pixels/µm). Finally, the cassette will be carried back to
the stocker, and sent to the next process, i.e., the etching process. Note that the inspection equipment is
placed between photo and etching processes because the defective panels can still be repaired by
rework as long as they have not yet been sent into etching process.
2.2.2. Image Preprocessing
Our scheme starts to access the images from the image databases, one image at a time. The colored
image is first transformed into a gray-level one. Following that, the PRs are automatically segmented
from the gray-level image by the projection-based PR segmentation method developed
in [7]. The segmented PR images are then resized to have the same size of 30 × 30 pixels. Then, each
PR image is represented by a vector (a datum) of 900 × 1 after row-by-row scanning. Finally, the PR
data are sent into the QK-SVDD for further classification, one PR datum at a time.
2.2.3. Defect Detection via QK-SVDD
Once the QK-SVDD receives a PR datum, it starts to judge whether the PR datum belongs to the
class “normal”. If the PR datum is classified as the class “normal”, our scheme ignores this
classification result; otherwise our scheme will output the result to engineers in the dust-free room via
intranet because the PR is defective. The engineering can repair the defective PR in real time and
diagnose the production machines to prevent the forthcoming LCD panels from suffering the same
problem, thus being able to improve the yield rate significantly.
2.3. SVDD
In order to facilitate the following introduction, a normal PR datum is simply called a target datum,
and a defective PR datum is called an outlier hereafter.
Given a target training set T = {xi ∈ R d }iN=1 , where xi are target training data and d is the dimension
of the space (d = 900), SVDD first maps the training data into a higher-dimensional feature space F
from the input space S = Rd by a nonlinear mapping φ , and then finds a minimum-volume sphere in F
such that all or most of the mapped target data are tightly enclosed by the sphere, which can be
formulated as the constrained optimization problem:
subject to
R 2 + C ∑ ξi
i =1
φ (x i ) − a F
≤ R + ξi ;
ξ i ≥ 0, ∀i,
where C ∈ [1 / N ,1] is the penalty weight; aF and R are the center and the radius of the sphere in F,
respectively; and ξi are slack variables representing training errors. The dual of (1) is
Int. J. Mol. Sci. 2011, 12
subject to
i =1
i =1 j =1
∑α i K (xi , xi ) − ∑ ∑α iα j K (xi , x j )
i =1 i
= 1; 0 ≤ α i ≤ C , ∀i,
where αi are Lagrange multipliers; and K is the kernel function defined by K ( x, y ) = φ ( x ) T φ ( y ) . We
consider only the Gaussian kernel K (x, y ) = exp( − x − y
2σ 2 ) in this paper, where
is the width of
Gaussian and a user-defined kernel parameter. The training data for which 0 <αi ≤ C are called support
vectors (SVs). The center aF of the sphere is spanned by the mapped training data:
a F = ∑ α iφ (xi ),
i =1
and the radius R of the sphere can be obtained by taking any x k ∈ UBSVs , to calculate the distance
between its image φ (x k ) and aF:
R 2 = φ (x k ) − a F
i =1
i =1 j =1
= K (x k , x k ) − 2∑ α i K (xi , x k ) + ∑ ∑ α iα j K (xi , x j ).
For a test datum x, its output can be computed by the decision function:
f ( x) = R 2 − φ ( x) − a F
= R 2 − K (x, x) + 2∑ α i K (x, xi ) − ∑∑ α iα j K (xi , x j )
i =1
i =1 j =1
If f (x) ≥ 0, x is accepted as a target (a normal PR); otherwise it is rejected as an outlier (a defective PR).
We can see from equation (5) that the decision function is nonlinearly related to the input data.
Therefore, although the decision boundary f (x) = 0 is the sphere boundary in the feature space F, it is
actually flexible (non-spherical) in the original space S , and thus being able to fit any irregular-shaped
target sets.
2.4. QK-SVDD
Looking back at equation (1), we can see that SVDD does not consider the factor of class separation
in its formulation, but consider simply the volume of the sphere in F and the number of target training
errors. Thus, the decision boundary f (x) = 0 would be too close to the target set to give satisfactory
generalization performance. In this paper, we propose a method to improve generalization performance
of SVDD, which is based on the kernel geometry in the kernel-induced feature space F.
When a Gaussian kernel is used, the associated mapping φ embeds the input space S into an
infinite-dimensional feature space F as a Riemannian manifold, and the kernel induces a Riemannian
metric in the input space S [41,42]:
 ∂φ (x) 
g ij (x) = 
 ∂xi 
 ∂φ (x)  ∂ 2 K (x, x′)
 ∂x 
x′ = x
Int. J. Mol. Sci. 2011, 12
where xi stands for the ith element of the vector x, and gij (x) is the Riemannian metric induced by a
kernel at x. The Riemannian distance ds in F caused by a small vector dx in S is given by
ds 2 = ∑∑ g ij ( x )dxi dx j
Thus, the volume form in a Riemannian space can be defined as
dV = det{G ( x )}dx1dx2 ⋅ ⋅ ⋅ dxd ,
where det{G ( x )} is a magnification factor, and G (x) is the matrix with elements gij (x). Equation (8)
shows how a local volume in S is magnified or contracted in F under the mapping of φ . Furthermore,
a quasiconformal transformation of the Riemannian metric is given by
g~ ( x ) = Ω( x ) g ( x ),
where Ω (x) is a scalar function of x. To realize this transformation, it is necessary to find a new
mapping φ . In practice, it is difficult to achieve this because the mappings are usually unknown in
kernel methods. However, if φ is defined as
φ (x ) = D( x )φ ( x ),
where D (x) is a positive real-valued quasiconformal function, then we obtain a quasiconformal
transformation of the original kernel K by using a simple kernel trick:
K ( x, x′) = D ( x ) D ( x′) K ( x, x′),
where K is called quasiconformal kernel. Finally, substituting (11) into (6) yields the new metric
g~ij ( x ) associated with K :
g~ij ( x ) = (∂D( x ) ∂xi )(∂D( x ) ∂x j ) + D( x ) 2 g ij ( x )
Suppose that the goal is to magnify the local volume around the image of a particular data point
x ∈ S , the first step is to choose a function D (x) in a way that it is the largest at the position of φ (x )
and decays with the distance from φ (x ) . By doing so, new Riemannian metric g~ij ( x ) becomes larger
around x and smaller elsewhere, as can be seen from equation (12). As a result, the local volume
around φ (x ) is magnified, and magnifying the volume around φ (x ) is equivalent to enlarging the
spatial resolution in the vicinity of φ (x ) in F.
Recently, the technique of the quasiconformal transformation of a kernel has been applied to
improve generalization performance of existing methods, including SVM [43], nearest neighbor
classifier [44], and kernel Fisher discriminant analysis (KFDA) [45]. In this paper we present a way of
introducing this technique into SVDD. The idea is as follows.
If we hope to improve generalization performance of SVDD, we need to increase the separability of
classes (target and outlier), which can be achieved by enlarging the spatial resolution around the
boundary of the minimum-enclosing sphere in F. According to the technique of quasiconformal kernel
mentioned above, the function D (x) should be chosen in a way that it is the largest at the sphere
boundary and decays with the distance from the sphere boundary in F. However, the difficulty is that
Int. J. Mol. Sci. 2011, 12
we do not know where the sphere boundary is located, because the feature space F is actually implicit.
Nevertheless, there is an indirect way. According to the Kuhn-Tucker (KT) conditions
α i [ R 2 + ξi − (φ ( x i ) − a F )T (φ ( x i ) − a F )] = 0, ∀i
ξi (C − α i ) = 0, ∀i,
the SVs can be divided into two categories: 1) the images of the SVs with 0 < αi < C are on the sphere
boundary, and 2) the images of the SVs with αi = C fall outside the sphere boundary. The SVs in the
first category called unbounded SVs (UBSVs), and the ones in the second category are called bounded
SVs (BSVs). Since the mapped UBSVs lie exactly on the SVDD sphere boundary in F, increasing the
Riemannian metric around the UBSVs in S is therefore equivalent to enlarging the spatial resolution in
the vicinity of the sphere boundary in F. As a result, the separability of classes is increased, and
generalization performance of SVDD is improved.
Accordingly, we can choose the function D (x) to have larger values at the positions of the mapped
UBSVs and smaller elsewhere. Following the suggestion from [28], the quasiconformal function D (x)
here is chosen as a set of Gaussian functions:
 φ ( x) − φ ( x )
D(x) = ∑ exp −
x i ∈UBSVs
where the parameter τi is given by
τ i2 =
∑ φ (x n ) − φ (xi )
The parameter τ i2 computes the mean squared distance from φ ( x i ) to its M nearest neighbors φ ( x n ) ,
where x n ∈ UBSVs . We set M = 3 in this study. As can be seen from (14), the function D (x) decreases
exponentially with the distance to the images of the UBSVs.
In summary, the QK-SVDD consists of three training steps:
(1) First, an SVDD is initially trained on a target training set by a primary kernel, thereby producing
a set of UBSVs and BSVs. The primary kernel is the Gaussian kernel.
(2) Second, the primary kernel is replaced by the quasiconformal kernel defined in equation (11).
(3) Then, retrain the SVDD with the quasiconformal kernel using the same target training set.
After training the QK-SVDD, a set of new Lagrange multipliers, α~ ,⋅ ⋅ ⋅, α~ , will be obtained. A new
a F = ∑i =1α~iφ ( x i ) and radius R will also be obtained. Finally, we arrive
enclosing sphere with center ~
at the decision function of QK-SVDD:
Int. J. Mol. Sci. 2011, 12
f (x) = R 2 − φ (x) − ~
~ ~
= R 2 −  φ ( x ) T φ ( x ) − 2∑ α~iφ ( x ) T φ ( x i ) + ∑ ∑ α~iα~ jφ ( x i ) T φ ( x j ) 
i =1
i =1 j =1
~ ~
= R 2 −  K ( x, x ) − 2∑ α~i K ( x, x i ) + ∑ ∑ α~iα~ j K ( x i , x j ) 
i =1
i =1 j =1
= R 2 − D ( x ) 2 + 2∑ α~i D ( x ) D ( x i ) K ( x, x i ) − ∑ ∑ α~iα~ j D ( x i ) D ( x j ) K ( x i , x j )
i =1
i =1 j =1
Note that for the Gaussian kernel, K ( x, x ) = 1, ∀x ∈ R d . For a test data point x, it is classified as a
target if f ( x ) ≥ 0 ; an outlier otherwise. Also note that the last term is a constant. Therefore, the testing
time complexity of QK-SVDD, similar to SVDD, is also linear in the number of training data.
2.5. Comparison between Our Method and the Kernel Boundary Alignment (KBA) Algorithm
Here we compare our method with the KBA algorithm proposed by Wu and Chang [28], since the
KBA algorithm is also based on the quasiconformal transformation of a kernel.
Recall that when a binary SVM is trained on an imbalanced data set, the learned optimal separating
hyperplane (OSH), denoted as f (x) = 0, would be skewed toward the minority class in a
kernel-induced feature space. The KBA was designed to deal with the class-boundary-skew problem
due to imbalanced training data sets. The KBA algorithm consists of two steps. In the first step, the
KBA algorithm estimates an “ideal” separating hyperplane within the margin of separation by an
interpolation procedure. The ideal hyperplane and the OSH are parallel to each other, but may be
different in location. If the training data set is balanced, the estimated ideal hyperplane and the OSH
will be the same; otherwise, compared with the OSH, the estimated (or interpolated) ideal hyperplane
should be closer to the majority support-instance hyperplane, defined as f (x) = −1 in [28], such that the
class-boundary-skew problem due to the imbalanced training data set can be solved. Assuming that the
distance between the ideal hyperplane and the OSH is η, the objective of this step is to find the optimal
value of η subject to the constraint: 0 ≤ η ≤ 1. Therefore, the interpolation procedure is formulated as
constrained optimization problem (see [28] for details). Then, in the second step, the KBA algorithm
chooses a feasible conformal function to enlarge the spatial resolution around the estimated ideal
hyperplane in the feature space.
The advantages of the KBA-based SVM over the regular binary SVM is two-fold: not only the
class-boundary-skew problem due to imbalanced training data sets can be solved, but also the
generalization performance can be improved simultaneously.
The design of KBA is based on information of separation margin in the interpolation procedure.
Without this information, this procedure cannot be formulated as a constrained optimization problem,
and as a result, the location of the ideal hyperplane cannot be estimated. Therefore, the KBA algorithm
cannot be applied to SVDD, since SVDD is trained on a single target class alone: there is no such margin
of separation. The decision boundary learned from SVDD is simply a sphere boundary in the feature space.
The main difference between the KBA and our method is that the KBA is designed for binary
classifier SVM while our method is designed for one-class classifier SVDD. The common is that both
Int. J. Mol. Sci. 2011, 12
KBA and our method are based on the technique of quasiconformal transformation of a kernel.
Although our method is much simpler, it works, as demonstrated in the next section.
3. Experimental Section
According to the introduction to the defect detection scheme in Section 2, we see that the
performance of the scheme highly depends on the defect detector. Therefore, in this subsection, we
conduct several experiments to test performance of the proposed QK-SVDD.
Data: A total of 100 defect images are used in the experiment. They were captured in GE engineering
in an array plant of a TFT-LCD manufacturer in Taiwan. There is a kind of defect in each image, and
the defect occupies several PRs. The numbers of the PRs in different defect images may be different,
and the numbers of the defective (or normal) PRs in different defect images may also be different. For
example, the left image in Figure 4 contains 18 PRs in which one is defective and the remaining 17
PRs are normal. In the right image, there are 30 PRs in total, where the number of defective PRs and
the number of normal PRs are 5 and 25, respectively. After performing PR segmentation on each
image, we obtain 182 defective PRs and 1706 normal PRs in total. Examples of the normal and
defective PRs are displayed in Figure 6. All the PR images are transformed into
gray-level ones, and then resized to have the same size of 30 × 30 pixels. Finally, they are represented
as vectors (data) of 900 dimensions.
Figure 6. Examples of the chosen PRs in the experiment. The PRs in the first column are
normal, while the rest are defective.
3.1. Comparison based on Balanced Test Sets
Ten different runs are executed in the experiment. In each run, we randomly collect 200 data from
the 1706 normal ones, and 100 data from the 182 defect ones. The collected 200 normal data and the
100 defect data form a target set and an outlier set, respectively. The first 100 data in the target set
Int. J. Mol. Sci. 2011, 12
were used as target training data to train the methods to be compared. The remaining 100 target data
and all the 100 outliers were used for testing the methods.
Training: In all the 10 runs, we set the penalty weight C to a constant (C = 0.4). Then, in each run we
perform the following training procedure to determine the Gaussian kernel parameter σ. The training
strategy here follows the one suggested in [9]: σ is determined by decreasing its value from a large one
(starting from a large σ is to ensure all the target training data are enclosed by the sphere at the very
beginning) until a predefined target rejection rate r on the 100 target training data is reached. The
larger the r is, the smaller the σ is. Defining such a threshold r ensures that a compact description for
the target class can be obtained. However, r cannot be too large; otherwise the trained sphere will
become too tight to get a good classification result. We set r to 0.01 and 0.05, respectively; for
example, r = 0.05 means 5% of the target training data need to be rejected by the SVDD sphere in the
training stage. Once the predefined threshold is reached, the training of SVDD is stopped, and the
value of σ is fixed. Then, the same value of σ is used to train QK-SVDD. Clearly, the values of σ in
the ten runs would not be the same because the target training sets in the runs are different.
Testing results: After training SVDD and QK-SVDD in each run, the prepared test set containing 100
target data and 100 outliers is then fed into the methods, and then three results for each of the two
methods are obtained, including target rejection rate (TRR), outlier acceptance rate (OAR), and error
rate (ER), defined as
# target data that are rejected as outliers
# target data
ER =
# outliers that are accepted as targets
# outliers
# target data that are relected as outliers + # outliers that are accepted as targets
# target data + # outliers
After the ten runs are finished, the average results are obtained and listed in Tables 1 and 2. Note
that the average ER is computed by (Average TRR + Average OAR)/2. According to the results,
QK-SVDD performs better than SVDD in both cases (r = 0.01 and r = 0.05), especially in the case of
r = 0.05 where QK-SVDD outperforms SVDD by 1.85% (5.85%–4.00%) in terms of average error rate.
The improvement in average error rate reaches 31.62% (1.85/5.85), which demonstrates the validity of
using QK-SVDD to improve generalization performance of the original SVDD.
In contrast with the average error rate, average outlier acceptance rate would be more important for
engineers in practice. As aforementioned, an outlier represents a defective PR. If a defective PR is
classified as a target (a normal PR), there will be no chance to repair the damaged LCD panel because
the defective PR is not detected, thus increasing production cost. Hence, a good defect detector should be
capable of achieving a low-enough outlier acceptance rate. We can observe from Table 2 that QK-SVDD
achieves an average outlier acceptance rate of 3.60%, which is much lower than that of SVDD (6.10%).
Moreover, improvement in average outlier acceptance rate is (6.10−3.60)/6.10 = 41.98%, which
means that the production cost can be substantially reduced if the SVDD detector is replaced by the
Int. J. Mol. Sci. 2011, 12
Table 1. Comparison of Testing Performance between support vector data description
(SVDD) and quasiconformal kernel (QK)-SVDD (r = 0.05).
Average TRR (in %) Average OAR (in %) Average ER (in %)
1.10 (±0.34)
12.80 (±2.74)
0.90 (±0.27)
10.70 (±1.95)
Table 2. Comparison of Testing Performance between SVDD and QK-SVDD (r = 0.01).
Average TRR (in %) Average OAR (in %) Average ER (in %)
5.60 (±1.54)
6.10 (±2.14)
4.40 (±1.34)
3.60 (±1.74)
Speed: During the experiment, the training time and testing time in each run are recorded in order to
compare the speeds between SVDD and QK-SVDD. Table 3 lists the average training time and testing
time. In our experiment, the methods are implemented with Matlab. A Pentium 2.80-GHz-CPU
computer (with 4 GB RAM) running on Windows 7 is used.
Table 3. Comparison of Training and Testing Time between SVDD and QK-SVDD.
Average Training Time (s) Average Testing Time (ms/PR)
Recall that QK-SVDD needs to train a SVDD by a Gaussian kernel and then retrain a SVDD by a
quasiconformal kernel. Hence, it is easy to see that SVDD has only to solve the quadratic
programming (QP) problem in equation (2) once, while QK-SVDD needs to solve the QP problem
twice, which is the main reason that QK-SVDD has a higher training time complexity (1.468 s) than
SVDD (0.623 s). The training time of QK-SVDD (1.468 s) is acceptable and can be actually ignored in
the LCD inspection application because image acquisition takes much more time: inspection
equipment takes around 4 minutes to scan a sheet of glass substrate.
However, the training time complexity of the QP problem is O (N3) 10]; hence, it is expected that
QK-SVDD will be computationally expensive in training if the proposed QK-SVDD is applied to other
problems where the training dataset is large-scale, e.g., the extended MIT face dataset [46]. A method
to reduce training time complexity of QK-SVDD is required; however, it is beyond the scope of this work.
On the other hand, QK-SVDD spends only 2.38 ms accomplishing the task of classification on a PR
datum. Also, an LCD image contains around 25 PRs in average. Therefore, QK-SVDD is able to
accomplish the defect-detection task on an LCD image within 60 ms on average.
3.2. Comparison Based on Imbalanced Test Sets
In this subsection, we further compare the two methods on imbalanced test sets with ten different
runs in the experiment. In each run, we randomly collect 1000 data from the 1706 normal ones, and
100 data from the 182 defect ones. The collected 1000 normal data and the 100 defect data form a
target set and an outlier set, respectively. The first 100 data in the target set are used as target training
data to train the methods to be compared. The remaining 900 target data and all the 100 outliers were
Int. J. Mol. Sci. 2011, 12
used for testing. During the training stage, we set r to be 0.01, 0.05, and 0.1, respectively. In each run,
a TRR and an OAR on the test set are obtained. However, since the test set in each run is highly
imbalanced, hence, compared with the usual classification error rate adopted in the last experiment on
balanced test sets, the balanced loss described in [47] would be a more appropriate performance
measure for imbalanced test sets [10,48]. The balanced loss (BL) is defined as
BL = 1 −
where TAR and ORR denote the target acceptance rate and outlier rejection rate, respectively, and
TAR = 1 – TRR and ORR = 1 – OAR. The average TRR and average ORR over the ten runs are listed
in Table 4 (r = 0.01), Table 5 (r = 0.05), and Table 6 (r = 0.1). Note that the average BL is computed
by (Average TRR + Average OAR)/2.
Table 4. Comparison of Performance on Imbalanced Test Sets (r = 0.01).
Average TRR (in %) Average OAR (in %) Average BL (in %)
0.92 (±0.44)
11.60 (±2.71)
0.89 (±0.41)
10.10 (±1.69)
Table 5. Comparison of Performance on Imbalanced Test Sets (r = 0.05).
Average TRR (in %) Average OAR (in %) Average BL (in %)
5.23 (±1.23)
5.58 (±1.87)
4.21 (±1.08)
3.70 (±1.77)
Table 6. Comparison of Performance on Imbalanced Test Sets (r = 0.1).
Average TRR (in %) Average OAR (in %) Average BL (in %)
9.81 (±1.95)
2.20 (±1.01)
7.54 (±1.31)
0.80 (±0.43)
From Table 4 to Table 6, we can see that the QK-SVDD outperforms the original SVDD under the
three different settings (r = 0.01, r = 0.05, and r = 0.1). It is worth noticing that altering r to a larger
value does not necessarily lower the average balanced loss. For example, as r is increased from 0.05 to
0.1, the values of the average BL of SVDD and QK-SVDD decrease. The main reason is that the
average TRR increases substantially and simultaneously. However, the BL (or classification error) is
not the major concern for engineers. The most important performance index is the defect detection rate
defined as:
Defect Detection Rate = 1 − OAR.
In order to maintain high competiveness in the market, quality is one of the most important factors.
If most of the defective LCD panels can be found and repaired immediately, the product yield rate can
thus be improved significantly. Therefore, For an LCD manufacture, the defect detection rate would be
the most important. In current practice, the LCD manufacture has made a specification for the defect
Int. J. Mol. Sci. 2011, 12
detection rate: it should be larger than 99%. Furthermore, under this condition, the false alarm rate
should be as small as possible: a false alarm means a normal PR is classified as a defective PR.
Namely, false alarm rate = TRR.
We can observe from Table 6 that when r = 0.1, the original SVDD obtains a low enough average
OAR (2.20%): thus the average defect detection rate is 97.8%. However, the specification made by the
manufacturer is still not satisfied. On the contrary, the QK-SVDD (99.2%) satisfies this specification.
Also, compared with the original SVDD, the QK-SVDD achieves a lower false alarm rate (7.54%).
Actually, if the value of r is increased further, say r = 0.2, it can be expected that the defect detection
rate of SVDD or QK-SVDD will go to 100% (or a value near 100%). However, it is not necessary to
do so, because the defect detection rate of QK-SVDD has been high enough (99.2%) as r = 0.1. More
importantly, it can also be predicted that as r is increased further, say r = 0.2, the resulting false alarm
rate will become too high. If it is too high, engineers will spend much time checking a great number of
LCD panels that need not to be checked, which would rise the product cycle time substantially.
Consequently, product throughput, which is also a key practical consideration in addition to product
quality, will therefore be decreased.
4. Conclusions
In this paper we have presented a defect inspection scheme for TFT-LCD array process. The core of
the scheme is a defect detector. A novel one-class classifier called quasiconformal kernel SVDD
(QK-SVDD) has been proposed as the defect detector. The QK-SVDD is designed to overcome the
weakness of the original SVDD in generalization performance. Experimental results carried out on real
LCD images have indicated that the proposed QK-SVDD substantially improves generalization
performance of SVDD in the LCD inspection application, and the improvement in generalization
performance is considerably significant, over 30%. In addition, the QK-SVDD defect detector is able
to obtain a low defect-detection error rate 4% on pixel region images, and classify each pixel region
image within 3 ms.
In this paper, the pixel region images are directly fed into QK-SVDD for classification without any
feature extractions. We believe that the error rate can be further improved by introducing useful
feature extraction methods into the defect detection scheme, such as the discrete cosine transform
(DCT) [3] and the kernel principal component analysis [49]. The feature evaluation task will be one of
our future works.
Song, Y.C.; Choi, D.H.; Park, K.H. Multiscale detection of defect in thin film transistor liquid
crystal display panel. Jpn. J. Appl. Phys. 2004, 43, 5465–5468.
Tsai, D.M.; Lin, P.C.; Lu, C.J. An independent component analysis-based filter design for defect
detection in low-contrast surface images. Pattern Recognit. 2006, 39, 1679–1694.
Chen, L.C.; Kuo, C.C. Automatic TFT-LCD mura defect inspection using discrete cosine
transform-based background filtering and ‘just noticeable difference’ quantification strategies.
Meas. Sci. Technol. 2008, 19, 015507.
Vapnik, V.N. Statistical Learning Theory; Wiley-Interscience: Hoboken, NJ, USA, 1998.
Int. J. Mol. Sci. 2011, 12
Markou, M.; Singh, S. Novelty detection: A review, part I: Statistical approaches. Signal Process.
2003, 83, 2481–2497.
Markou, M.; Singh, S. Novelty detection: A review, part II: Neural network based approaches.
Signal Process. 2003, 83, 2499–2521.
Liu, Y.H.; Huang, Y.K.; Lee, M.J. Automatic inline-defect detection for a TFT-LCD array
process using locally linear embedding and support vector data description. Meas. Sci. Technol.
2008, 19, 095501.
Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science
2000, 290, 2323–2326.
Tax, D.; Duin, R. Support vector data description. Math. Learn. 2004, 54, 45–66.
Liu, Y.H.; Liu, Y.C.; Chen, Y.Z. Fast support vector data descriptions for novelty detection.
IEEE Trans. Neural Netw. 2010, 21, 1296–1313.
Lee, K.; Kim, D.W.; Lee, K.H.; Lee, D. Density-induced support vector data description.
IEEE Trans. Neural Netw. 2007, 18, 284–289.
AAAI Tech Report WS-00-05. In Proceedings of the AAAI’2000 Workshop on Learning from
Imbalanced Data Sets, Austin, TX, USA, 31 July 2000; Japkowicz, N., Ed.; AAAI: Menlo Park,
CA, USA, 2000.
Turney, P. Types of cost in inductive concept learning. In Proceedings of the ICML’2000
Workshop on Cost-Sensitive Learning, Stanford, CA, USA, 29 June–2 July 2000.
Japkowicz, N. Class imbalance: are we focusing on the right issue? In Proceedings of the
ICML’2003 Workshop on Learning from Imbalanced Data Sets, Washington, DC, USA, 21
August 2003.
Chawla, N.V.; Japkowicz, N.; Kolcz, A. Editorial: Special issue on learning from imbalanced data
sets. ACM SIGKDD Explo. Newsl. 2004, 6, 1–6.
Liu, X.Y.; Wu, J.; Zhou, Z.H. Exploratory Under-Sampling for Class-Imbalance Learning.
In Proceedings of the 6th International Conference on Data Mining, Hong Kong, China, 18–22
December 2006; pp. 965–969.
Jo, T.; Japkowicz, N. Class imbalances versus small disjuncts. ACM SIGKDD Explor. Newsl.
2004, 6, 40–49.
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority
over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357.
Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTE Boost: Improving Prediction of
the Minority Class in Boosting. In Proceedings of the Seventh European Conference on Principles
and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, 22–26 September
2003; pp. 107–119.
Mease, D.; Wyner, A.J.; Buja, A. Boosted classification trees and class probability/quantile
estimation. J. Mach. Learn. Res. 2007, 8, 409–439.
Elkan, C. The Foundations of Cost-Sensitive Learning. In Proceedings of the17th International
Joint Conference on Artificial Intelligence, Seattle, WA, USA, 4-10 August 2001; pp. 973–978.
Ting, K.M. An instance-weighting method to induce cost-sensitive trees. IEEE Trans. Knowl.
Data Eng. 2002, 14, 659–665.
Int. J. Mol. Sci. 2011, 12
23. Sun, Y.; Kamel, M.S.; Wong, A.K.C.; Wang, Y. Cost-sensitive boosting for classification of
imbalanced data. Pattern Recognit. 2007, 40, 3358–3378.
24. Veropoulos, K.; Campbell, C.; Cristianini, N. Controlling the Sensitivity of Support Vector
Machines. In Proceedings of the International Joint Conference on Artificial Intelligence,
Stockholm, Sweden, July 31–August 6 1999; pp. 55–60.
25. Kwok, J.T. Moderating the outputs of support vector machine classifiers. IEEE Trans. Neural
Netw. 1999, 10, 1018–1031.
26. Liu, Y.H.; Chen, Y.T. Face recognition using total margin-based adaptive fuzzy support vector
machines. IEEE Trans. Neural Netw. 2007, 18, 178–192.
27. Wang, B.X.; Japkowicz, N. Boosting Support Vector Machines for Imbalanced Data Sets. In
Proceedings of the 17th International Conference on Foundation of Intelligence System, Toronto,
ON, Canada, 20–23 May 2008; Volume 4994, pp. 38–47.
28. Wu, G.; Chang, E.Y. KBA: Kernel boundary alignment considering imbalanced data distribution.
IEEE Trans. Knowl. Data Eng. 2005, 17, 786–795.
29. Japkowicz, N. Supervised versus unsupervised binary-learning by feedforward neural networks.
Math. Learn. 2001, 42, 97–122.
30. Manevitz, L.M.; Yousef, M. One-class SVMs for document classification. J. Mach. Learn. Res.
2001, 2, 139–154.
31. Raskutti, B.; Kowalczyk, A. Extreme re-balancing for SVMs: A case study. ACM SIGKDD Explor.
Newsl. 2004, 6, 60–69.
32. Lee, H.J.; Cho, S. The novelty detection approach for difference degrees of class imbalance. Lect.
Note. Comput. Sci. 2006, 4233, 21–30.
33. Manevitz, L.; Yousef, M. One-class document classification via neural networks. Neurocomputing
2007, 70, 1466–1481.
34. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21,
35. Hoffmann, H. Kernel PCA for novelty detection. Pattern Recognit. 2007, 40, 863–874.
36. Ryan, J.; Lin, M.J.; Miikkulainen, R. Intrusion Detection with Neural Networks. In Advances in
Neural Information Processing Systems; Jordan, M.I., Kearns, M.J., Solla, S.A., Eds.; MIT Press:
Cambridge, MA, USA, 1998; Volume 10, pp. 943–949.
37. Campbell, C.; Bennett, K.P. A Linear Programming Approach to Novelty Detection. In Advances
in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2001; Volume 13,
38. Crammer, K.; Chechik, G. A Needle in a Haystack: Local One-Class Optimization. In Proceedings
of the 21th International Conference on Machine Learning, Banff, Canada, 4–8 July 2004.
39. Lanckriet, G.R.G.; Ghaoui, L.E.; Jordan, M.I. Robust Novelty Detection with Single-Class MPM.
In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2003;
Volume 15, pp. 929–936.
40. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support
of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471.
Int. J. Mol. Sci. 2011, 12
41. Burges, C.J.C. Geometry and invariance in kernel based methods. In Advances in Kernel
Methods—Support Vector Learning; Schölkopf, B., Burges, C.J.C., Smola, A., Eds.; MIT Press:
Cambridge, MA, USA, 1999; pp. 89–116.
42. Schölkopf, B.; Smola, A. Learning with Kernels; MIT Press: Cambridge, MA, USA, 2002.
43. Wu, S.; Amari, S. Conformal transformation of kernel functions: A data-dependent way to
improve support vector machine classifiers. Neural Process. Lett. 2002, 15, 59–67.
44. Peng, J.; Heisterkamp, D.R.; Dai, H.K. Adaptive quasiconformal kernel nearest neighbor
classification. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 656–661.
45. Pan, J.S.; Li, J.B.; Lu, Z.M. Adaptive quasiconformal kernel discriminant analysis.
Neurocomputing 2008, 71, 2754–2870.
46. Tsang, I.W.; Kwok, J.T.; Zurada, J.M. Generalized core vector machines. IEEE Trans. Neural
Netw. 2006, 17, 1126–1140.
47. Weston, J.; Schölkopf, B.; Eskin, E.; Leslie, C.; Noble, S. Dealing with large diagonals in kernel
matrices. Annals of the Institute of Statistical Mathematics 2003, 55, 391–408.
48. Tsang, I.W.; Kwok, J.T.; Cheung, P.M. Core vector machines: Fast SVM training on very large
data sets. J. Mach. Learn. Res. 2005, 6, 363–392.
49. Schölkopf, B.; Smola, A.; Müller, K.R. Nonlinear component analysis as a kernel eigenvalue
problem. Neural Comput. 1998, 10, 1299–1319.
© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article
distributed under the terms and conditions of the Creative Commons Attribution license
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF