A THESIS ON DATA-BASED MODELING: APPLICATION IN PROCESS IDENTIFICATION, MONITORING AND FAULT DETECTION SUBMITTED BY NAGA CHAITANYA KAVURI (608CH301) FOR THE PARTIAL FULFILLMENT OF M. TECH (RESEARCH) DEGREE UNDER THE ESTEEMED GUIDANCE OF DR. MADHUSREE KUNDU DEPARTMENT OF CHEMICAL ENGINEERING NATIONAL INSTITUTE OF TECHNOLOGY ROURKELA JANUARY 2011 ABSTRACT Present thesis explores the application of different data based modeling techniques in identification, product quality monitoring and fault detection of a process. Biodegradation of an organic pollutant phenol has been considered for the identification and fault detection purpose. A wine data set has been used for demonstrating the application of data based models in product quality monitoring. A comprehensive discussion was done on theoretical and mathematical background of different data based models, multivariate statistical models and statistical models used in the present thesis. The identification of phenol biodegradation was done by using Artificial Neural Networks (namely Multi Layer Percetprons) and Auto Regression models with eXogenious inputs (ARX) considering the draw backs and complications associated with the first principle model. Both the models have shown a good efficiency in identifying the dynamics of the phenol biodegradation process. ANN has proved its worth over ARX models when trained with sufficient data with an efficiency of almost 99.99%. A Partial Least Squares (PLS) based model has been developed which can predict the process outcome at any level of the process variables (within the range considered for the development of the model) at steady state. Three continuous process variables namely temperature, pH and RPM were monitored using statistical process monitoring. Both univariate and multivariate statistical process monitoring techniques were used for the fault detection purpose. X-bar charts along with Range charts were used for univariate SPM and Principal Component Analysis (PCA) has been used for multivariate SPM. The advantage of multivariate statistical process monitoring over univariate statistical process monitoring has been demonstrated. i Hierarchical and Non-hierarchical clustering techniques along with PCA were used to find out different classes (qualities) of wine samples in the wine dataset. Once the classes present in the wine dataset were identified, the statistical and ANN based classifiers designed were used for authentication of unknown wine samples. PLS based model has been used for developing the statistical classifier, which has shown an identification efficiency of 98.5%. Two types of neural networks namely Probabilistic Neural Network (PNN) and Adaptive Resonance Theory (ART1) networks were used for the development of ANN based classifiers. ART1 networks unanimously showed their superiority over the other classifiers with 100% efficiency even when trained with a minimum amount of data. ii National Institute of Technology Rourkela CERTIFICATE This is to certify that the thesis entitled “Data-based Modeling: Application in Process Identification, Monitoring and Fault Detection” submitted by Mr. Naga Chaitanya Kavuri, in partial fulfillment of the requirements for the award of Master of Technology (Research) in Chemical Engineering, at National Institute of Technology, Rourkela (Deemed University) is an authentic work carried out by her under my supervision and guidance. To the best of my knowledge, the matter presented in the thesis has not been submitted to any other University/Institute for the award of any Degree or Diploma. Prof. Dr. Madhusree Kundu Date: Place: NIT Rourkela Department of Chemical Engineering National Institute Of Technology Rourkela-769008 iii ACKNOWLEDGEMENT It is impossible to thank one and all in this thesis. A few however stand for me as I go on to complete this project. If words are considerable as symbols of approval and taken as acknowledgement then let the words play a heralding role in expressing my gratitude. I would like to express my extreme sense of gratitude to Dr. Madhusree Kundu, Associate Professor, NIT Rourkela for her guidance throughout the work and her encouragement, positive support and wishes extended to me during the course of investigation. I would also like to thank Dr. K. C. Biswal, H.O.D, Dept of Chemical Engineering, NIT, Rourkela, for his support academically. A special thanks to Prof. G. K. Roy, Chemical Engineering Department, NIT, Rourkela, for their valuable advices and moral support. I am highly indebted to the authorities of NIT, Rourkela for providing me various facilities like library, computers and Internet, which have been very useful. On a personal front, I would like to thank Mr. Jagajjanani Rao from the bottom of my heart without whom I will not be joining this institution and made this far. I cannot say thanks and walk away from two persons who were there as pillars to my moral stability through my bad times. It’s you Mom and Deepu. I express special thanks to all my friends, for being there whenever I needed them. Thank you very much Sonali, Sonu, Gaurav Bhai, Diamond, Seshu, Vamsi and Vikky. Finally, I am forever indebted to my brother for his understanding and encouragement when it was most required. I dedicate this thesis to my family and friends. Naga Chaitanya Kavuri iv TABLE OF CONTENTS Abstract..................................................................................................................................................................... i Acknowledgement ...................................................................................................................................................... iv Chapter 1- Introduction 1.1. Introduction .................................................................................................................................................... 1 1.2. Modeling ......................................................................................................................................................... 3 1.2.1. Data based modeling ................................................................................................................................. 5 1.3. Motivation ...................................................................................................................................................... 7 1.4. Objectives ....................................................................................................................................................... 8 1.5. Organization of Thesis ................................................................................................................................... 9 References:............................................................................................................................................................. 11 Chapter 2-Related Works and Computational Techniques 2.1. Related Work ............................................................................................................................................... 13 2.2. Computational Techniques ......................................................................................................................... 17 2.2.1. Clustering ................................................................................................................................................. 17 2.2.2. Principal Component Analysis ............................................................................................................... 19 2.2.3. Partial Least Squares Model .................................................................................................................. 22 2.2.4. Statistical Process Monitoring (SPM) Charts ....................................................................................... 25 2.2.4.1. Range Charts ................................................................................................................................... 27 2.2.4.2. X-bar Charts ................................................................................................................................... 28 2.2.4.3. CUSUM Charts ............................................................................................................................... 29 2.2.4.4. Moving Range Charts..................................................................................................................... 30 2.2.5. Artificial Neural Networks ..................................................................................................................... 31 2.2.5.1. Neural Networks as Classifiers ...................................................................................................... 32 2.2.5.2. Neural Networks as Functional Approximator ............................................................................ 42 2.2.6. Time-Series Identification ....................................................................................................................... 50 References:............................................................................................................................................................. 54 Chapter 3- Process Identification 3.1. Phenol as an organic pollutant and its removal ........................................................................................ 58 3.2. Identification of dynamics for phenol biodegradation.............................................................................. 59 3.3. Bio-degradation of Phenol ........................................................................................................................... 62 3.3.1 Strain ........................................................................................................................................................ 62 3.3.2 Laboratory scale bench reactor.............................................................................................................. 62 5 3.3.3 Media ........................................................................................................................................................ 62 3.3.4 Chromatographic analysis of phenol ..................................................................................................... 63 3.4. Identification of Process Dynamics Using ANN and ARX ....................................................................... 64 3.4.1 Artificial Neural Network: ...................................................................................................................... 64 3.4.2 Auto Regression models with eXogenous (ARX) inputs ...................................................................... 66 3.5. Partial Least Squares (PLS) Regression .................................................................................................... 67 Tables: .................................................................................................................................................................... 69 Figures: .................................................................................................................................................................. 72 References:............................................................................................................................................................. 78 Chapter 4- Process Monitoring & Fault detection 4.1. Wine Quality Monitoring ................................................................................................................................ 82 4.1.1 Wine data set .......................................................................................................................................... 82 4.1.2 Development of Statistical Classifier .................................................................................................. 83 4.1.2.1 Identification of Classes Present in the Data Using PCA and K-means Clustering .......... 84 4.1.2.2 PLS Based Classifier Development & its Performance .......................................................... 85 4.1.3 Development of Neural Classifier ....................................................................................................... 88 4.1.3.1 Probabilistic neural Network (PNN) Based Classifier Development & its Performance . 88 4.1.3.2 ART1 Network Based Classifier Development & its Performance ...................................... 89 4.2. Online process monitoring of Phenol Degradation .................................................................................... 90 4.2.1 Monitoring of Process parameters using univariate statistics ....................................................... 91 4.2.2 Monitoring the process parameters using multivariate statistics ................................................. 92 Tables: ................................................................................................................................................................... 94 Figures: ................................................................................................................................................................. 98 Reerences: ........................................................................................................................................................... 106 Chapter 5-Conclusion 5.1. Conclusion .................................................................................................................................................. 109 5.2. Future Recommendation ........................................................................................................................... 110 6 CHAPTER 1- INTRODUCTION 1.1. INTRODUCTION Present work addresses three different kinds of problems related to Process Identification, Product Quality Monitoring, and detection of abnormal operating condition in a process leading to Process Faults. Process identification helps in developing an efficient monitoring and controller systems for any process. Process identification is about the detection and understanding of dynamics present in a process from its historical data. Different machine learning algorithms can be effectively utilized for these purposes. The detection of fault followed by its diagnosis is extremely important for effective, economic, safe and successful operation of a process. Efforts to manufacture a higher proportion of within specification product and to reduce the variability in the product quality, i.e. to produce more consistent product, has lead to an increase in the use of Statistical Process Control (SPC). SPC refers to a collection of statistical techniques and charting methods that have been found to be useful in ensuring consistent production and, consequently, in obtaining significant advantages. However, most modern industrial processes have available frequent on-line measurements on many process variables and, in some instances, on several properties of raw materials and final product. Furthermore, there are measurements of characteristics related to product quality that are usually measured infrequently off-line. Therefore, industrial quality problems are multivariate, since they involve measurements on a number of characteristics, rather than one single characteristic. As a result, univariate SPC methods and techniques provide little information about the interactions between characteristics and, therefore, it is not appropriate for modern day processes. Most of the 1 limitations of univariate SPC can be addressed through the application of Multivariate Statistical Process Control (MSPC), which considers all the characteristics of interest simultaneously and can extract information on the behavior of each characteristic relative to the others. Here there are two applications of MSPC. One is the detection and allotment of end product to one of the predefined categories which is called Statistical Quality Control. Second one is the online process monitoring to make sure the process is under control or not which is referred as fault detection. A biodegradation process of an organic pollutant phenol is used for the identification and fault detection purposes. Phenol, one of the major organic pollutants from paper and pulp, pharmaceutical, iron-steel, coke- petroleum, and paint industry [1-4] is degraded by heterotrophic bacteria Pseudomonas putida (ATCC: 11172). For Statistical Quality Control purpose the determination of quality of wine has been considered. The determination of quality of food stuffs, water and beverages and control of their correspondence to standards is an urgent problem needed to be addressed. Statistical Quality Control (SQC) was designed to sample a large population on an infrequent basis. Classical analytical techniques such as various chromatography, spectrometry, and etc are used for the determination of different characteristics of wine samples. However they are time-consuming, expensive, and laborious which can hardly be done on-site or on-line. For quality control of perishable products, it is necessary to evaluate a group of certain components that reflects the ageing and spoilage of the product. These components can be numerous or unknown and the problem appear to be quite difficult. Besides, it is impractical and very hard to compare the results of instrumental analysis to biological sensing [5]. Chemometric techniques have been used in wine analysis by researchers like Buratti et al., 2007; Parra et al., 2006; Buratti et al., 2004; Riul et al., 2004; Di Natale et al., 2004; Legin et al., 2003; Di Natale et al., 2000[6-12]. 2 Artificial Adoptive Resonance theory network (ART1), Probabilistic neural network (PNN) and PLS based classifiers can be immensely helpful for classification among wine samples. There is an inherent relation between the objectivity of the proposed project and the modeling; especially the data based modeling. A brief discussion about modeling, hence data based modeling seems to be an integral portion of the prologue. 1.2. MODELING The essence of process modeling, in general, is to capture the important aspects of the physical reality while discarding irrelevant detail of the process.It may therefore often be possible to devise several types of models of the same physical reality and one can pick and choose among these depending on the desired model accuracy and on their capability of analyzing the process. An efficient and effective process model is required for the following purposes: • Research & Development • Planning and Scheduling • Process Design • Process Optimization • Process Simulation • Process Identification • Process Control, Monitoring & Safety Measure • Fault Detection & Diagnosis 3 Different models with different degrees of sophistication can be built. The degree of complexity to be chosen is balance between accuracy and computational burden. Too sophisticated models are not always computationally affordable. An outline of the procedure for first principle model building can be summarized in the following steps. • Decision on the level of model complexity • Writing the model equations • Judicious model assumptions • Devising suitable mathematical structure and solution methodology • Determination of model parameters • Model verification • Model validation & refinement • Model prediction It is important in this respect to recognize the fact that most mathematical models are not completely based on rigorous mathematical formulation of the physical and chemical processes taking place in the system. Every mathematical model contains a certain degree of empiricism. The degree of empiricism limits the generality of the model and, as our knowledge of the fundamentals of the process increases, the degree of empiricism decreases and the generality of the model increases. The existence of models at any stage, with their appropriate level of empiricism, help greatly in the advancement of the knowledge of the fundamentals, and, therefore, helps to decrease the degree of empiricism and increase the level of rigor in the mathematical models. Models always contain certain simplifying assumptions which are believed; not to affect the predictive nature of the model in any manner that undermines the 4 purpose of it. There are processes; whose physics are poorly known. For example one cannot really determine the kinetics of a biological degradation process as it is highly difficult to recognize the rate determining step because of numerous enzymatic reactions involved in the metabolic pathway. For modeling such process a different approach is followed called ‘Data Based Modeling’ or ‘Black Box Modeling’ or ‘empirical modeling’. Here the modeling will be done based only on empiricism. In this context the mathematical modeling can be disjointed in to two categories. Figure 1.1: Classification of Process Models 1.2.1. DATA BASED MODELING Data based modeling is one of the very recently added crafts in process identification, monitoring and control. To derive models based on first principles for complex processes become difficult because of poor knowledge in terms of process kinetics, order, and parameters. The black box models are data dependent and model parameters are determined by experimental 5 results /wet labs, hence these models are called data based models or experimental models. Unlike the white box models derived from first principles, the black box/data based models or empirical models do not describe the mechanistic phenomena of the process; they are based on input-output data and only describing the overall behavior of the process [13]. The data based models are especially appropriate for problems that are data rich, but hypothesis and/or information poor. In all the cases the availability of sufficient number of quality data points are required to propose a good model. Quality data is defined by noise free data; free of outliers is ensured by data mining and pre conditioning. The phases in the Data based modeling are: • System analysis • Data collection • Data conditioning • Key variable analysis • Model structure design • Model identification • Model evaluation Types of Data Based Models: Data based models can be divided in to two major categories namely: • Unsupervised models: These are the models which try to extract the different features present in the data without any prior knowledge of the patterns present in the 6 data. Examples are Principal Component Analysis (PCA), Hierarchical Clustering Techniques (Dendrograms), non-hierarchical Clustering Techniques (K-means). • Supervised models: These are the models which try to learn the patterns in the data under the guidance of a supervisor who trains these models with inputs along with their corresponding outputs. Examples include Artificial Neural Networks (ANN), Partial Least Squares (PLS) and Auto Regression Models etc. In this era of data explosion, rational as well as potential conclusions can be drawn from the data by the help of data based modeling like Partial least squares analysis (PLS), Neural networks, Fuzzy, and Neuro Fuzzy. Principal component analysis (PCA), Independent component analysis (ICA), Canonical analysis, PLS, clustering analysis which are being used for data based modeling are all chemo-metric techniques. In this regard, we owe a profound debt to multivariate statistics. Efficient data mining, hence, efficient data based modeling will enable the future era to exploit the huge database available; in newer dimensions and perspective; embraced with never expected possibilities. In the present work, MATLAB 7.6 & STATISTICA 9.0 were used to implement all the machine learning algorithms. 1.3. MOTIVATION In this era of data explosion, rational as well as potential inferences can be drawn from the data with the help of data based modeling like Partial least squares analysis (PLS), Neural networks, Fuzzy, and Neuro Fuzzy, Principal component analysis (PCA), Independent component analysis (ICA), Canonical analysis, PLS, and different clustering techniques. In this regard, we owe a profound debt to multivariate statistics. Considering correlated and non-linear 7 nature, non-stationary and multi-scale behavior of the chemical and biochemical processes, data driven/ chemo-metric techniques seems to be the logical choice for process identification and monitoring. Efficient data mining, hence, efficient data based modeling may enable the future era to exploit the huge database available; in newer dimensions and perspective; embraced with never expected possibilities. 1.4. OBJECTIVES The main objectives of the present project are as follows: 1. Process Identification: Organic pollutant phenol was degraded by using bacteria named Pseudomonas putida (ATCC: 11172). In a batch reactor, four parameters namely temperature, pH, RPM and phenol dosage were varied systematically using experimental design (Taguchi method L’16) techniques to produce a set of useful data. By using this dataset; the phenol degradation as a rate process was identified (modeled) using supervised techniques ANN and ARX. In an effort to develop an alternative rate model without the help of fundamental kinetic data of the very process, present work was taken up to develop data based rate models. PLS technique was used to develop an empirical model relating phenol degradation process with the variables like temperature, pH, RPM and Phenol loading at steady state. 2. Process Quality monitoring: Development of a feature based classifier could circumvent the problem of monitoring food quality without relating instrumental analysis to biological sensing like ageing and spoilage of the product and it is one of the significant steps of on-line product quality 8 monitoring. A wine data set containing 178 samples and their corresponding 13 features was taken as a case study. The unsupervised technique like PCA and Kmeans clustering were used to reduce the dimensionality and classify the samples into three groups. Followed by the development of supervised classifiers using various machine learning algorithms like Artificial Adoptive Resonance theory network (ARTI), Probabilistic neural network (PNN) and Partial least squares (PLS). 3. Process Fault detection: For the phenol degradation process, three experimental runs were used to produce time series data which were being used for the univariate and multivariate statistical monitoring of phenol degradation process. Different SPC charts and PCA were used monitoring the process of phenol degradation; hence identification of process faults, if any. 1.5. ORGANIZATION OF THESIS Chapter 1 presents the abridged introduction of the thesis with its overview on the perspective of the present state of art, and the objectivity of the thesis with its organization. Chapter 2presents a detail discussion on modeling; especially data based modeling with a mention to PCA, PLS, clustering, ARX and different types of neural networks, which were used in the subsequent chapters. This chapter also presents a concise discussion on SPCA with a special mention to different types of control charts used for process monitoring purpose, hence fault detection. Chapter 3 presents the identification of phenol degradation process. In the present work, the organic pollutant phenol was degraded by bacteria named Pseudomonas Putida (ATCC: 11172). Four parameters namely temperature, pH, RPM and phenol dosage were varied systematically using experimental design (Tugachi method) techniques to produce useful data. 9 Chapter 4 is about process monitoring to ensure product/process quality. A wine data set was taken as a case study for process quality. This chapter is also about stringent maintenance of normal operating condition of the phenol degradation process by detecting faults. In an ending note, Chapter 5 concludes the thesis with future recommendations. 10 REFERENCES: 1. Aksu, S., & Yener, J., 1998. Investigation of biosorption of phenol and monochlorinated phenols on the dried activated sludge. Process Biochem., 33, 649–655. 2. Patterson, J.N., 1997. Waste Water Treatment Technology. Ann Arbor Science, New York. 3. Bulbul, G., & Aksu, Z., 1997. Investigation of wastewater treatment containing phenol using free and Ca-alginated gel immobilized Pseudomonas putida in a batch stirred reactor. Turkish J. Eng. Environ. Sci., 21, 175–181. 4. Sung, R.H., Soydoa, V., & Hiroaki, O., 2000. Biodegradation by mixed microorganism of granular activated carbon loaded with a mixture of phenols. Biotechnol. Letters, 22, 1093–1096. 5. Legin, A., Rudnitskaya, A., Vlasov, Y., Natale, C. D., Davide, F., & D’Amico, A. 1997. Tasting of beverages using an electronic tongue. Sensors and Actuators, B 44, 291-296. 6. Burrati, S., Ballabio, D., Benedetti, S., & Cosio, M.S. 2007. Prediction of Italian red wine sensorial descriptors from electronic nose, electronic tongue and spectrophotometric measurements by means of Genetic Algorithm regression models. Food Chemistry, 100, 211-218. 7. Parra, V., Arrieta, A.A., Fernández-Escudero, J.B., Ro¬dríguez-Méndez, M.L., & De Saja, J.A. 2006. Electronic tongue based on chemically modified electrodes and voltammetry for the detection of adulterations in wines. Sensors and Actuators, B 118, 448-453. 11 8. Buratti, S., Benedetti, S., Scampicchio, M., & Pangerod E.C. 2004. Characterization and classification of Italian Barbera wines by using an electronic nose and an amperometric electronic tongue. Analytica Chimicha Acta, 525, 133-139. 9. Riul, A., de Sousa, H.C., Malmegrim, R.R., dos Santos, D.S., Carvalho, A.C.P.L.F., Fonseca, F.J., Oliveira, O.N., & Mattoso, L.H.C. 2004. Wine classification by taste sensors made from ultra-thin films and using neural networks. Sensors and Actuators, B 98, 77-82. 10. Di Natale, C., Paolesse, R., Burgio, M., Martinelli, E., Pennazza, G., & D’Amico, A. (2004). Application of metalloporphyrins - based gas and liquid sensor arrays to the analysis of red wine. Analytica Chimica Acta, 513, 49-56. 11. Legin, A., Rudnitskaya, A., Lvova, L., Vlasov, Y., Di Natale, C., & D’Amico, A. 2003. Evaluation of Italian wine by the electronic tongue: recognition, quantitative analysis and correlation with human sensory perception. Analytica Chimicha Acta, 484, 33-44. 12. Di Natale, C., Paolesse, R., Macagnano, A., Mantini, A., D’Amico, A., Ubigli, M., Legin, A., Lvova, L., Rudnitskaya, A., & Vlasov, Y. 2000. Application of a combined artificial olfaction and taste system to the quantification of relevant compounds in red wine. Sensors and Actuators, B 69, 342-347. 13. Roffel, B. & Betlem, B. 2006. Process dynamics and control: modeling for control and prediction. John Wiley & Sons, West Sussex, England. 12 CHAPTER 2- RELATED WORK AND COMPUTATIONAL TECHNIQUES 2.1. RELATED WORK Over the last decade, the use of data based modeling techniques have gain a huge momentum in process identification, process fault diagnosis, and process control. The mature data collection technology has catalyzed the very activity. Chemical and biochemical processes are inherently non-linear, correlated in nature, shows non-stationarities and multi-scale behavior. Knowledge gathered from the process might have been a natural and logical choice for monitoring and controlling such a process. In this regard, some of the research efforts made by the previous researchers deserve mentioning. Chen et. al. (2002) have integrated two data driven techniques, neural network (NN) and principal component analysis (PCA) to develop a method called NNPCA for process monitoring[1]. In this method NN was used to summarize the operating process information into a nonlinear dynamic mathematical model and PCA was employed to generate simple monitoring charts based on the multivariable residuals derived from the difference between the process measurements and the neural network predictions. Examples from the recent monitoring practice in the industry and the large-scale system in the Tennessee Eastman process problem were presented. Zhao et. al. (2007) have introduced a new STMPCA (soft-transition multiple PCA) modeling method to avoid misclassification problems associated with simple stage-based subPCA while monitoring batch processes [2]. The method was based on the idea that process 13 transition could be detected by analyzing changes in the loading matrices, which revealed evolvement of the underlying process behaviors. They proposed that by setting a series of multiple PCA models with time-varying covariance structures, which reflected the diversity of transitional characteristics and could preferably solve the stage-transition monitoring problem in multistage batch processes. The superiority of the proposed method was illustrated by applying it to both the real three-tank system and the simulation of benchmark fed-batch penicillin fermentation process with more reliable monitoring charts. Both results of real experiment and simulation clearly demonstrated the effectiveness and feasibility of the proposed method. Gaetano et. al. (2009) have designed a novel supervised neural network-based algorithm to reliably distinguish the electrocardiographic (ECG) records between normal and ischemic beats of the same patient [3]. The basic idea was to consider an ECG digital recording of two consecutive R-wave segments (RRR interval) as a noisy sample and an underlying function was approximated by a fixed number of Radial Basis Functions (RBF). The linear expansion coefficients of the RRR interval represent the input signal of a feed-forward neural network which classified a single beat as normal or ischemic. The developed system used several patient records taken from the European ST-T database. The obtained results showed that the proposed algorithm offered a good combination of sensitivity and specificity, making the design of a practical automatic ischemia detector feasible. Meleiro et. al (2009) have employed a constructive learning algorithm to design a nearoptimal one- hidden layer neural network structure that approximated the dynamic behavior of a bioprocess [4].The method determined not only a proper number of hidden neurons but also the particular shape of the activation function for each node. Here, the projection pursuit technique was applied in association with the optimization of the solvability condition, giving rise to a 14 more efficient and accurate computational learning algorithm. Each activation function of a hidden neuron is defined according to the peculiarities of each approximation problem guiding to parsimonious neural network architectures. The proposed constructive learning algorithm was successfully applied to identify a MIMO bioprocess, providing a multivariable model that was able to describe the complex process dynamics, even in long-range horizon predictions. The identified model was considered as part of a model-based predictive control strategy, producing high-quality performance in closed-loop experiments. Sadrzadeha et.al. (2009) have used a simple two layered feed forward MLP neural network to predict separation percent (SP) of lead ions from wastewater using electro-dialysis (ED) [5]. The aim was to predict SP of Pb2+ as a function of concentration, temperature, flow rate and voltage. Once optimum numbers of hidden layers and nodes in each layer were determined, the selected structure (4:6:2:1) was used for prediction of SP of lead ions as well as current efficiency (CE) of ED cell for different inputs in the domain of training data. They have claimed that ANN successfully tracked the non-linear behavior of SP and CE versus temperature, voltage, concentration and flow rate with standard deviation not more than 1%. Jyh-Cheng Jeng (2010) presented the use of both recursive PCA (RPCA) and moving window based PCA (MWPCA) for online updation of the PCA model and its corresponding control limits for monitoring statistics [6]. He derived an efficient algorithm based on rank one matrix update of the covariance matrix, which was tailored for RPCA and MWPCA computations. He demonstrated the complete monitoring system through simulation examples and the results had shown the effectiveness of the proposed method. Marchitana et. al. (2010) have attempted to compare two popular non-parametric modeling and optimization techniques, response surface methodology (RSM) and artificial 15 neural network (ANN) for reactive extraction of tartaric acid from aqueous solution using Amberlite LA-2 (amine) [7]. The extraction efficiency was modeled and optimized as a function of three input variables, i.e. tartaric acid concentration in aqueous phase CAT (g/L), pH of aqueous solution and amine concentration in organic phase CA/O (% v/v). Both methodologies were compared for their modeling and optimization abilities. According to analysis of variance (ANOVA) the coefficient of multiple determination of 0.841 was obtained for RSM and 0.974 for ANN. The optimal conditions offered by RSM and genetic algorithm (GA) led to an experimental extraction efficiency of 83.06%. On the other hand, optimal conditions offered by the ANN model coupled with GA led to an experimental reactive extraction efficiency of 96.08%. Bin-Shams et. al. (2011) have used a CUSUM based statistical monitoring scheme to monitor a particular set of Tennessee Eastman Process (TEP), which were early monitored by using contribution plots [8]. Contribution plots were found to be inadequate when similar variable responses were associated with different faults. Abnormal situations from the process historical database were then used in combination with the proposed CUSUM based PCA model to unambiguously characterize the different fault signatures. The use of a family of PCA models trained with CUSUM transformations of all the available measurements collected during individual or simultaneous occurrence of the faults were found effective in correctly diagnosing these faults. Pendashteh et. al. (2011) have employed a feed-forward neural network trained by batch back propagation algorithm to model a membrane sequencing batch reactor (MSBR) treating hypersaline oily wastewater [9]. The MSBR operated at different total dissolved solids (TDSs) (mg/L), various organic loading rates (OLRs) (kg COD/(m3 day)) and cyclic time (h). They have 16 used a set of 193 operational data from the wastewater treatment with the MSBR to train the network. The training, validating and testing procedures for the effluent COD, total organic carbon (TOC) and oil and grease (O&G) concentrations were successful and a good correlation was observed between the measured and predicted values. The results of this study showed that ANN-GA could easily be applied to evaluate the performance of a membrane bioreactor even though it involved the highly complex physical and biochemical mechanisms associated with the membrane and the microorganisms. In view of this, present work was taken up to identify the Phenol degradation process and diagnosing its faults with a view to monitor it. ANN and PLS based classifiers were designed as an integral part of wine quality monitoring. 2.2. COMPUTATIONAL TECHNIQUES 2.2.1. CLUSTERING Clustering techniques were adopted for the classification of wine samples in the present project. Before going for the classification or development of classifier, one needs to identify different classes present in the data in an accurate manner using the historical data thus ensuring the efficiency of the classifier developed. Clustering techniques comes handy for the purpose. Clustering technique is more primitive in that; no a-priori assumptions are made regarding the group structures. Grouping can be made on the basis of similarities or distances (dissimilarities). Hierarchical clustering techniques processed either by a series of successive mergers or a series of successive divisions. Agglomerative hierarchical methods start with individual objects ensuring as much number of clusters as objects initially. The most similar objects (or less Intercluster distance) are grouped first, and these initial groups are merged according to their 17 similarities. Eventually as the similarities decreases (distance increases), all subgroups are fused into a single cluster. The divisive hierarchical method works in the opposite direction. An initial single group of objects is divided into two subgroups such that they are far from each other. This subdivision continues until there are as many subgroups as objects; that is until each object forms a group. The results of both the methods may be displayed in the form of a two-dimensional diagram called dendrogram. Inter-cluster distances are expressed by single linkage, complete linkage and average linkage. In the present work the agglomerative hierarchical method was applied to group the different wine samples not the variables. Non-hierarchical, unsupervised method, K-means clustering was also applied in this work. The number of clusters K can be prespecified or can be determined iteratively as a part of the clustering procedure. The K-means clustering proceeds in three steps, which are as follows: 1. Partitioning of the items in to K initial clusters. 2. Assigning an item to the cluster whose centroid is nearest (distance is usually a Euclidian). Recalculation of the centroid for the cluster receiving the new item and for the cluster losing the item. 3. Repeating the step-2 until no more reassignment takes place or stable cluster tags are available for all the items. The basic algorithm of K-means is as follows: • For a given assignment C, computation of the cluster mean mk: ∑x (2.1) i mk = • i:C ( i ) = k Nk , k = 1,..., K . For a current set of cluster means, assigning each observation as: C (i ) = arg min xi − mk 2 , i = 1,..., N 1≤ k ≤ K 18 (2.2) • Iteration of the above two steps until convergence. The K-means clustering has a specific advantage of not requiring the distance matrix as required in hierarchical clustering, hence ensures a faster computation than the latter. The Kmeans algorithm has been applied to many engineering problems [10-14]. 2.2.2. PRINCIPAL COMPONENT ANALYSIS The two major advantages of principal component analysis are pattern recognition and dimensionality reduction. Both these features of PCA were explored in wine classification and the pattern recognition capacity was explored in monitoring of phenol degradation process. PCA is a multivariate statistical technique initially used to analyze data from process plants. PCA offers a new set of uncorrelated variables that are a linear combination of the original variables. The new variables capture the maximum variance in the original data set in ascending order. The new variables are called the ‘principal components’ and they are estimated from the eigenvectors of the covariance or correlation matrix of the original variables. PCA was originally developed by Pearson, 1901. The PCA model can be used to detect outliers in data, data reconciliation, and deviations from normal operation condition that indicate excessive variation from normal target or unusual patterns of variation. Operation under various known upsets can also be modeled if sufficient historical data are available to develop automated diagnosis of source causes of abnormal process behavior [15]. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT), the Hotelling transform or proper orthogonal decomposition (POD). The applicability of PCA is based on certain assumptions, which are as follows, • Assumption on Linearity 19 • Assumption on the statistical importance of mean and covariance • Assumption that large variances have important dynamics A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are first data reduction and second interpretation. Generally, this is a mathematical transform used to find correlations and explain variance in a data set. The goal is to map the raw data vector E onto vectors S, where, the vector x can be represented as a linear combination of a set of m ortho-normal vectors (2.3) where the coefficients can be found from the equation This corresponds to a rotation of the coordinate system from the original to a new set of coordinates given by z. To reduce the dimensions of the data set, only a subset (k <m) of the basic feature vectors are preserved. The remaining coefficients are replaced by constants and each vector x is then approximated as (2.4) The basic vectors are called principal components which are equal to the eigenvectors of the covariance matrix of the data set. The coefficients and the principal components should be chosen such that the best approximation of the original vector on an average is obtained. However, the reduction of dimensionality from m to k causes an approximation error. The sum of squares of the errors over the whole data set is minimized if we select the vectors that correspond to the largest Eigen values of the covariance matrix. As a 20 result of the PCA transformation, the original data set is represented in fewer dimensions (typically 2-3) and the measurements can be plotted in the same coordinate system. This plot shows the relation between different observations or experiments. Grouping of data points in this plot suggest some common properties and those can be used for classification. We had a number of wine samples with different features/properties. Part of the available measurements can be used as a training set to define the classes, while the rest can be kept out for validation purposes. Assuming n measurements are used for training and p for validation, the training data is organized in a single matrix of the following form: X= (2.5) where, each row in X represents one measurement and the number of columns m is equal to the length of the measurement sequence or features. Following the step described above, the covariance matrix and its Eigen values λ were calculated. Its eigenvectors form an orthonormal basis ! ; that is " " # . The original data set can be represented in the new basis using the relation: $ " After this transformation, a new data matrix of reduced dimension can be constructed with the help of Eigen values of the matrix C. This is done by selecting the highest λ values since they correspond to the principal components with highest significance. The number of PCs to be included should be high enough to ensure good separation between the classes. Principal components with low contribution (low values of λ) should be neglected. Let the first k PCs as new features be selected neglecting the remaining (m-k) principal components. In this way, a new data matrix D of dimension n × k was obtained. 21 D= % % (2.6) With the matrix D is defined, the next step is directed towards classification of substances. The matrix U is used during the validation and also plays a key role in the online implementation of the classification algorithm. The PCA score data sets are grouped into number of classes following the rule of nearest neighborhood clustering algorithm. The above reduced data matrix is utilized for construction of class prototypes. Let & ' denote l pattern classes in n number of measurements, represented by the single prototype vector( ' . The maximum value of l can reach up to n. The mean or class centroids of ( ' vectors have m numbers of latent features, each of which represents unique feature in reduced dimension space. The distance between an incoming pattern x and the prototype vectors are)* + , ( + # - . - /. The minimum distance will classify x at Cj for which )* is minimum: )* 0.1+ , ( + 234# - . - /. (2.7) For online system, it may be inferred that the incoming pattern represented by unknown type has similarity with the one of the lth class of known types. 2.2.3. PARTIAL LEAST SQUARES MODEL The data of Phenol degradation process was used for the identification of this process with the help of Partial Least Squares (PLS) technique. Partial least squares is one of the important multivariable statistical techniques to reduce the dimensionality of the plant data, to find the latent variables from the plant data by capturing the largest variance in the data and 22 achieves the maximum correlation between the predictor ( X ) variables and response ( Y ) variables. First proposed by Wold (1966) [16]. PLS has been successfully applied in diverse fields including process modeling, identification of process dynamics & fault detection, process monitoring and it deals with noisy and highly correlated data, quite often, only with a limited number of observations available. A tutorial description along with some examples on the PLS model was provided by Geladi and Kowalski (1986) [17]. When dealing with nonlinear systems, the underlying nonlinear relationship between predictor variables ( X ) and response variables ( Y ) can be approximated by quadratic PLS (QPLS) or splines. Sometimes it may not function well when the non-linearities cannot be described by quadratic relationship. Qin and McAvoy (1992)[18] suggested a new approach to replace the inner model by neural network model followed by the focused R & D activities taken up by several other researchers like Holcomb & Morari (1992); Malthouse et al.(1997); Zhao et al.(2006); Lee et al. (2006)[19-22]. The mathematical formulation of static PLS is as follows: Input - output data were generated by exciting the processes with pseudo random binary signals (PRBS). 54and46 matrices are scaled in the following way before they are processed by PLS algorithm. −1 X = XS X and Y = YS Y s x1 Where S X = 0 0 s y1 and SY = sx 2 0 −1 (2.8) 0 s y 2 The S X and SY are scaled matrices. The idea of PLS is to develop a model by relating the scores of 5 and 6data. PLS model consists of outer relations (5&6data being related to their scores individually) and inner relations that links X data scores to Y data scores. The outer relationship for the input matrix and output matrix with predictor variables can be written as 23 T T T X = t1 p1 + t 2 p2 + ............... + tn pn + E = TPT + E T T (2.9) T Y = u1q1 + u2 q2 + .................. + un qn + F = UQT + F (2.10) where, 7and 84represents the matrices of scores of 5 and 6 while 9and :44represent the loading matrices for 5 and 6. If all the components of 5and 6 are described, the errors E &; become zero. The inner model that relates 5 to6 is the relation between the scores 7&8. 8 7< (2.11) Where < is the regression matrix. The response 6 can now be expressed as: 6 7<: ; (2.12) To determine the dominant direction of projection of 5and 6 data, the maximization of covariance within 5 and 6 is used as a criterion.The first set of loading vectors = and > represent the dominant direction obtained by maximization of covariance within 5and 6. Projection of 5 data on = and 6 data on > resulted in the first set of score vectors ? and , hence the establishment of outer relation. The matrices 5 and 6 can now be related through their respective scores, which is called the inner model, representing a linear regression between ? and :4 @ ? . The calculation of first two dimensions is shown in Fig. 2.1. The residuals are calculated at this stage is given by the following equations. E1 = X − t1 p1 ' ' F1 = Y − u1 q1 = Y − t1b1 q1 (2.13) ' (2.14) The procedure for determining the scores and loading vectors is continued by using the newly computed residuals till they are small enough or the number of PLS dimensions required are exceeded. In practice, the number of PLS dimensions is calculated by percentage of variance explained and cross validation. The irrelevant directions originating from noise and redundancy 24 are left as E and F . The multivariate regression problems decomposed into several univariate regression problems with the application of PLS. Figure 2.1: Standard linear PLS algorithm. 2.2.4. STATISTICAL PROCESS MONITORING (SPM) CHARTS The goal of statistical process monitoring (SPM) is to detect the existence, magnitude, and time of occurrence of changes that cause a process to deviate from its desired operation. The methodology for detecting changes is based on statistical techniques that deal with the collection, classification, analysis, and interpretation of data. Traditional statistical process control (SPC) has focused on monitoring quality variables at the end of a batch and if the quality variables are outside the range of their specifications, making adjustments (hence control the process) in subsequent batches. An improvement of this approach is to monitor quality variables during the progress of the batch and make adjustments if they deviate from their expected ranges. Monitoring quality variables usually delays the detection of abnormal process operation because the appearance of the defect in the quality variable takes time. Information about quality variations is encoded in process variables. The measurement of process variables is often highly automated and more frequent, enabling speedy refinement of measurement information and inferencing about product quality. Monitoring of process variables is useful not only for 25 assessing the status of the process, but also in controlling the product quality. When the process monitoring indicates abnormal process operation, diagnosis operations are initiated to determine the source cause of this abnormal behaviour. In this framework, each quality variable is treated as a single independent variable.The abnormal operating conditions in the Phenol degradation process were detected using traditional univariate statistical control charts like A Charts, R charts, Moving Range charts and CUSUM charts.The theoretical postulations of control limits charts are as follows: Traditional statistical monitoring techniques for quality control of batch products relied on the use of univariate SPC tools on product quality variables. Before going into any further details of the control charts, one should have a brief idea about statistical hypothesis testing. A statistical hypothesis is an assumption or a guess about the population expressed as a statement about the parameters of the probability distributions of the populations. Procedures that enable decision making whether to accept or reject a hypothesis are called tests of hypotheses. For example, if the equality of the mean of a variable (µ) to a value ‘a’is to be tested, the hypotheses are: Null hypothesis: H0 : µ = a Alternate hypothesis: H1 : µ ≠ a Two kinds of errors may occur while testing the hypothesis: Type I error (α): Type II error (β): 9B3CDC?4EF GEF 4.H4?3CI 9B2J./4?43CDC?4EF GEF 4.H42J/HCI First αis selected to compute the confidence limit for testing the hypothesis then a test procedure is designed to obtain a small value for β, if possible. β is a function of sample size and is reduced as sample size increases. 26 Three parameters affect the control limit selection: • The estimate of average level of the variable • The variable spread expressed as range or standard deviation, • A constant based on the probability of Type I error; α . The "3 σ " ( σ denoting the standard deviation of the variable) control limits are the most popular control limits. The constant 3 yields a Type I error probability of 0.00135 on each side ( α = 0.0027). The control limits expressed as a function of population standard deviation σ are: 8&K 7J3LC? MN 2.2.4.1. K&K 7J3LC? , MN (2.15) RANGE CHARTS P starts with the R charts. Since the control limits of the O Pchart depends Development of O on process variability, its limits are not meaningful before R is in-control.Range is the difference between the maximum and minimum observations in a sample. 1 m R = ∑ Ri m i =1 Ri = xmax i − xmin i (2.16) The random variable R/ σ is called the relative range. The parameters of its distribution depend on sample size n, with the mean being d 2 . An estimate of σ (the estimates are denoted by σ̂ ’) can be computed from the range data by using σˆ = R d2 (2.17) Where d 2 is called Hartley’s constant The standard deviation of R is estimated by using the standard deviation of R/ σ , d 3 ,: σˆ R = d 3σ = d 3 R d2 (2.18) 27 The control limits of the R chart are: R d2 (2.19) d3 d and D4 = 1 + 3 3 d2 d2 (2.20) UCL, LCL = R ± 3d 3 Defining D3 = 1 − 3 The control limits become UCl = R D3 and LCL = R D4 2.2.4.2. (2.21) X-BAR CHARTS One or more observations may be made at each sampling instant. The collection of all observations at a specific sampling time is called a sample. Xi = 1 n ∑ xi n i =1 X ij = 1 m n ∑∑ xij mn i =1 j =1 Where m is the number of samples and n is number of observations in a sample (sample size). The estimator for the mean process level (centerline) is X . Since the estimate of the standard deviation of the mean process levelN4is UCL / LCL = X ± ( A2 × R ) Where A2 = R , dn (2.22) 3 d2 n (2.23) Where n is the number of readings, d 2 is Hartley’s constant A typical X-bar and R-chart was shown in Fig. 2.2. 28 Figure 2.2: A typical X-bar and R- Chart. 2.2.4.3. CUSUM CHARTS The cumulative sum (CUSUM) chart incorporates all the information in a data sequence to highlight changes in the process average level. The values to be plotted on the chart are computed by subtracting the overall mean µ 0 from the data and then accumulating the differences.For a sample size n ≥ 1, denote the average of the jth sample xj. The quantity i Si = ∑ ( x j − µ 0 ) j =1 (2.24) is plotted against sample number i. CUSUM charts are very effective in detecting small process shifts, since they combine information from several samples. CUSUM charts are very effective with samples of size 1. The CUSUM values can be computed recursively. S i = ( xi − µ 0 ) + S i −1 (2.25) If the process is in-control at the target value µ 0 , the CUSUM Si should meander randomly in the vicinity of 0. If the process mean is shifted, an upward or downward trend will develop in the plot. Visual inspection of changes of slope indicates the sample number (and consequently the time) of the process shift. Even when the mean is on target, the CUSUM Si may 29 wander far from the zero line and give the appearance of a signal of change in the mean. Control limits in the form of a V-mask were employed when CUSUM charts were first proposed in order to decide that a statistically significant change in slope has occurred and the trend of the CUSUM plots different than that of a random walk. CUSUM plots generated by a computer became more popular in recent years and the V-mask has been replaced by upper and lower confidence limits of one-sided CUSUM charts. One-Sided CUSUM charts are developed by plotting i Si = ∑ [ x j − ( µ0 − K )] j =1 (2.26) Where, K is the reference value to detect an increase in the mean level. If Si becomes negative for µ1 > µ 0 , it is reset to zero. When Si exceeds the decision interval H, a statistically significant increase in the mean level is declared. Values for K and H can be computed from the relations: K= ∆ 2 H= d∆ 2 (2.27) Given the probabilities of type 1 (α) and type 2 (β) errors, the size of the shift in the mean to be detected ( ∆ ), and the standard deviation of the average value of the variable x (σ x ) , the parameters in above equation are as follows: d= 2.2.4.4. 1−α ln δ β 2 2 Where δ = ∆ σx (2.28) MOVING RANGE CHARTS In a moving-range chart, the range of two consecutive sample groups of size a are computed and plotted. For a ≥ 2 MRt = max(i) − min(i) , (2.29) 30 Where i is the subgroup containing samples from ‘t-a+1’ to ‘t’. The computation procedure is as follows: • Selecting the moving range size a. Often a = 2 • Obtaining the estimates of MR and σ = MR / d 2 by using the moving-ranges MRt of length a . For a total of m samples: MR = • m− a +1 1 ∑ MRt m − a + 1 t =1 (2.30) Computing the control limits with the center line at MR : LCL = D3 × MR UCL = D4 × MR (2.31) 2.2.5. ARTIFICIAL NEURAL NETWORKS Artificial Neural Networks (ANNs) are widely applied nowadays for classification, identification, control, diagnostics, recognition, etc. An Artificial Neural Network (ANN) is an information processing paradigm that is stimulated by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. Basically, a neural network (NN) is composed of a set of nodes (Fig. 2.3). Each node is connected to the others via a set of links. Information is transmitted from the input to the output cells depending on the strength of the links. Usually, neural networks operate in two phases. The first phase is a learning phase where each of the nodes and links adjust their strength in order to match with the desired output. A learning algorithm is in charge of this process. When the learning phase is complete, the NN is ready to recognize the incoming information and to work as a pattern recognition system. 31 Figure 2.3: 2.2.5.1. General Neural Network Architecture NEURAL NETWORKS AS CLASSIFIERS Neural networks, either supervised or unsupervised have emerged as an important tool for classification. The recent vast research activities in neural classification have established that neural networks are a promising alternative to various conventional classification methods. The advantage of neural networks lies in the following theoretical aspects. First, neural networks are data driven self-adaptive methods in that they can adjust themselves to the data without any explicit specification of functional or distributional form for the underlying model. Second, they are universal functional approximators in that neural networks can approximate any function with arbitrary accuracy [23-25]. Since any classification procedure seeks a functional relationship between the group membership and the attributes of the object, accurate identification of this underlying function is doubtlessly important. Third, neural networks are nonlinear models, which makes them flexible in modeling real world complex relationships. Finally, neural networks are able to estimate the posterior probabilities, which provide the basis for establishing classification rule and performing statistical analysis [26]. In the present work, 32 two network architectures were used for the classification purpose namely Probabilistic Neural Network (PNN) and Adaptive Resonance Theory Network (ART1). 2.2.5.1.1. PROBABILISTIC NEURAL NETWORKS (PNN) The probabilistic neural network (PNN) was introduced by Donald Specht (1990) [27] and it can be used for classification problems. The multilayer feed forward network can be used to approximate non-linear functions where the network structure is sufficiently large. Any continuous function can be approximated by carefully choosing the parameters in the network. For the determination of those highly non-linear parameters learning should be based on nontraditional optimization techniques. A viable alternative is the radial basis function neural network. A RBF network uses radial basis functions as activation functions. Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a nonlinear RBF activation function and a linear output layer as shown in Fig. 2.4. In the basic form all inputs are connected to each hidden neuron. The inputs vector X = [ x1 , x2 , x3 ,....xn ] is applied to the neurons in the hidden layer. Each hidden layer neurons computes the following exponential functions, called Gaussian response function as given by: D2 hi = exp− i 2 , where X =an input 2σ (2.32) Di2 = ( x − u i ) T ( x − u i ) = squared distance between the input vector and training vector.(2.33) ui =weight vector of hidden layer neuron i. The weights of each hidden layer neuron are assigned the values of an input training vector. The output neuron produces the linear weighted summation of these as given by: y = ∑ hi wi , where wi = a weight in the output layer (2.34) 33 Sometimes the outputs are optionally normalized according to the following formula that divides the output of each neuron in the output layer by the sum of all hidden layer outputs. i.e. out i = ∑ hi wi i ∑h i i Thus the output has a significant response to the input only over a range of values of spread parameter, called the receptive field of the neuron, the size of which is determined by the value of σ as shown in Fig. 2.4. PNN is having its theoretical foundation on Bayesian classifier theory. PNN being basically a Radial basis neural network (RBNN) containing an extra competitive layer in addition to the radial basis layer. When an input is presented, the first layer computes distances from the input vector to the training input vectors and produces a vector whose elements indicate how close the input is to a training input. The second layer sums these contributions for each class of inputs to produce as its net output a vector of probabilities. Finally, a competitive transfer function on the output of the second layer picks the maximum of these probabilities, and produces a ‘1’ for that class and a ‘0’ for the other classes. The basic architecture of PNN is presented in Fig. 2.5. The basis of PNN is Bayes theorem, which states: P( y i x) = P( x y i ) P( y i ) / P( x) (2.35) 34 u X σ h h(x) One-dimensional basis function Figure 2.4: Figure2.5: Radial Basis network architecture Basic architecture of Probabilistic neural networks (PNN). 35 Let y denote the membership variable that takes a value of yi if the object belongs to i group i. P( y i ) is the prior probability of class i and P( x y ) =Probability density function. P( y i x) is the posterior probability of group i . P (x ) is the probability density function: i i 9 Q * P( x y ) P( y ) (2.36) In PNN, the probability distribution function is approximated by Parzon windows, typically using the exponential function. D ij2 K N K N i ( ) = ( ) exp( − ) = ( )∑ h j (2.37) P x y ∑ By the previous definitions, N i j =1 2σ 2 N i j =1 i i i Where K = 1 2π d /2 = the scaling factor to produce a multidimensional unit Gaussian, σd D ij2 = ( x − ui )T ( x − ui ) = the squared Euclidian distance between the current input x and training vector j in class i . Applying Bayer’s theorem to calculate the conditional probability of x for each class, then summing these over all classes yields, D i2 N i K Ni K )( )] = P ( x ) = ∑ P ( x y ) P ( y ) = ∑ ( i ) ∑ exp[( − N 2σ 2 N i =1 N j =1 i =1 Ni i i Ni Ni Ni ∑∑h i j (2.38) i =1 j =1 where, Ni is the number of classes. Thus the inner summation adds all hidden layer neuron outputs associated with class i; the outer summation counts these over all classes. The double summation may be eliminated by simply counting all hidden neuron outputs. If we have an object with a particular feature vector x and a decision is to be made about its group membership, the probability of classification error is 36 9C333R S T P( y i x) U V* =1- P( y i x), if y i is already decided (2.39) Hence, if the purpose is to minimize the probability of total classification error, then the following is the widely used Bayesian classification rule, Decide yi for x if P ( y i x ) = max P ( y i x ) (2.40) i =1... N 2.2.5.1.2. ART1 NETWORK The stability-plasticity dilemma remained unresolved for many conventional artificial neural networks. The ability of a net to learn new patterns equally well at any stage of learning without washing away the previously learnt patterns is called its plasticity. A stable net does not return any pattern to a previous cluster. Some nets achieve stability gradually adjusting their learning rates provided the same training set is presented many times before them. Those conventional nets cannot learn a pattern while presented first time before them. A real network is constantly exposed to changing patterns; it may never see the same training vector twice. Under such a circumstance the back propagation networks can learn nothing with continuously modifying their weights without a respite of getting a stationary setting. Artificial adoptive resonance theory networks (ART1) are designed to be both plastic and stable. The first version of ART network was "ART1", which was developed by Carpenter and Grossberg (1988). The ART1 network is a vector classifier. It accepts an input vector and classifies it as one of the categories depending upon which of the stored pattern it resembles within a specified tolerance otherwise a new category is created by storing that pattern as an input vector. No stored pattern is modified if it does not match the current input vector within a specified tolerance; hence the 37 stability- plasticity dilemma is solved. ART1 is designed for classifying binary vectors. The classification process through ART involves three steps; recognition, comparison and the search phase. During learning one input vector is presented to the network. The degree of similarity is controlled by vigilance parameter W (0-1). 2.2.5.1.3. B ASIC A RCHITECTURE The ART1 network consists of three major components accompanying groups of neurons (Fig.2.6&2.7). • Input processing field-F1 layer • Cluster units –F2 layer • Reset mechanism Figure 2.6: Architecture of ART1 Input processing layer is divided into two layers • Input portions – F1(a): Represents the given input vector • Interface portion –F1(b): Exchanges the input portion signal with the F2 layer 38 Figure 2.7: Structure of supplemental units Cluster units –F2 layer This is a competitive layer. The cluster unit with largest net input is selected to learn the input pattern. The activation of all other F2 units is set to zero. F1(b) is connected to F2 layer through bottom-up weights * and F2 layer is connected to F1(b) layer by top down weight ?* . Reset Mechanism Depending on the similarity between the top down weight and the input vector, the cluster unit is allowed to learn a pattern or not. This is done at the reset unit, based on the signals it receives from the input and interface portion of the F1 later. If the cluster unit is not allowed to learn, it becomes inhibited and a new cluster unit is selected for learning. It dictates the three possible states for F2 layer neurons; they are namely active, inactive and inhibited. The difference between the inactive and inhibited is that for both the cases activation state of F2 unit is zero. In its inactive state, the F2 neurons are available in next competition during the presentation of current input vector which is not possible when the F2 layer is inhibited. 39 2.2.5.1.4. A LGORITHM The binary input vector is presented to F1 (a) layer and is then passed on to F1 (b) layer. The F1 (b) layer sends signal to F2 layer over weighted interconnection path (Bottom-up weights). Each F2 unit calculates the net input. The node with the largest input is the winner and its activation state is 1. All the other nodes in F2 layer are considered to have activation state of 0 but not inhibited and the reset is true. The winning node of F2 layer alone is eligible to learn the input pattern. The signal is sent from F2 layer to F1 (b) through weighted interconnections which are top down weights. The activation vector X of the F1 (b) layer are considered to be active if they receive non-zero weights both from F1 (a) and F2 layer. The norm of the vector +5+ renders the number of components in which the top-down weight vector for the winning unit (?* ) and the input vector S are both 1. Depending upon the +X+ ratio of norm of to norm of ST +Y+ U, either the weights of the winning cluster units are adjusted or the reset mechanism is rescheduled. The whole process is repeated until either a match is found or all neurons in the F2 layer are inhibited. The training algorithm for ART1 is as follows, Step1. Initialization of parameters and weights K>1 &Z W - # \ [ Z * [ Z \]^. Where 1 is the number of components in the input vector ?* [ # For each training input Step2. Activation states of all F2 neurons are set to zero and all F1 (a) neurons are assigned to the input vector S 40 Computation of norm of S (+_+ H Step3. Sending signals from F1 (a) to F1 (b) layer, H Step4. For each F2 node that is not already inhibited under the current reset schedule Calculation of net input to that particular F2 node Step5. (* S`* a Finding highest (* among all (* ’s. Step6. Re-computation of 44of F1 (b) layer. H ?* Step7. Step8. Computation of the norm of vector =+5+ Step9. Test for reset, if T +Y+ U Z W, the jth node is inhibited. Continue from Step 5. +X+ +X+ If T +Y+ U b W \e f Step10.Updation of weights for node j, * cd =\]^+e+ &?* cd = Test for stopping criterion: • No change in top-down or bottom up weights. • No reset • Maximum number of epochs exceeded 2.2.5.1.5. D ATA T REATMENT & P ROCESSING FOR A RT 1 Processing the data according to the demand of ART1 network architecture plays an important role in their successful implementation, which is as follows, 41 • Normalization of data matrix: All the elements in the scaled matrix are lying between 0-1. The linear scaling function for zero to one transforms a variable x k into x * k in the following way: x*k , j = xk , j − min( xk ) for all max( xk ) for all j 's j 's − min( xk ) for all (2.41) j 's Where, k and j are column and row of the data matrix respectively. • Conversion of scaled data matrix into binary matrix: The elements of the above scaled matrix which are below 0.5 are given an attribute of 0 & elements including 0.5 and above are considered as 1.0 in the binary data matrix. 2.2.5.2. NEURAL NETWORKS AS FUNCTIONAL APPROXIMATOR As mentioned in the section 2.5.1, neural networks serve as universal functional approximator. A simple Multi Layer Perceptron model is good enough to understand and adapt the functional relationship between inputs and output data. A detailed architecture and working tactic of MLP is given below. 2.2.5.2.1. MULTI LAYER PERCEPTRON (MLP) The multilayer perception neural network is built up of simple components. In the beginning, a single input neuron will described which will then be extended to multiple inputs. Next, these neurons will be stacked together to produce layers [28]. Finally, the layers are cascaded together to form the network. Single-input neuron: A single-input neuron is shown in Fig. 2.8. The scalar input p is multiplied by the scalar weight W to form Wp, one of the terms that is sent to the summer. The other input, 1, is multiplied by a bias b and then passed to the summer. The summer output n often referred to as the net input, goes into a transfer function f which produces the scalar neuron 42 output a (sometimes "activation function" is used rather than transfer function and offset rather than bias). Figure 2.8: Single input neuron From Fig. 2.8, both w and b are both adjustable scalar parameters of the neuron. Typically the transfer function is chosen by the designer and then the parameters w and b will be adjusted by some learning rule so that the neuron input/output relationship meet some specific goal. The transfer function in Fig. 2.8 may be a linear or nonlinear function of n. A particular transfer function is chosen to satisfy some specification of the problem that the neuron is attempting to solve. One of the most commonly used functions is the log-sigmoid transfer function, which is shown in Fig. 2.9[28]. Figure 2.9: Log-sigmoid transfer function 43 This transfer function takes the input (which may have any value between plus and minus infinity) and squashes the output into the range 0 to 1 according to the expression: a= 1 1 − e −1 (2.42) The log-sigmoid sigmoid transfer function is commonly used in multi multi-layer layer networks that are trained using the back propagation algorithm. Multiple-input input neuron: Typically, a neuron has more than one input. A neuron with R inputs is shown in Fig. 2.10.. The individual inputs p1, p2,……, pg are each weighted by corresponding elements W1,1W1,2,….W1,Rof the weight matrix W. Figure 2.10: Multiple input neuron. The neuron has a bias bb,, which is summed with the weight inputs to form the net input inp n: n = W11 p1 + W12 p2 + L + W1R p R + b (2.43) This expression can be written in matrix form as: n = Wp + b (2.44) Where, the matrix W for the single neuron case has only one row. Now the neuron output can be written as: 44 a = f (Wp + b) (2.45) A particular convention in assigning the indices of the elements of the weight matrix has been adopted [28]. The first index indicates the particular neuron destination for the weight. The second index indicates the source of the signal fed to the neuron. Thus, the indices in W1,2 say that this weight represents the connection to the first (and only) neuron from the second source [28]. A multiple-input neuron using abbreviated notation is shown in Fig. 2.11. Figure 2.11: Neuron with R inputs, abbreviated notation. As shown in Fig. 2.11, the input vector p is represented by the solid vertical bar at left. The dimensions of p are displayed below the variable as R×1, indicating that the input is a single vector of R elements. These inputs go to the weight matrix W, which has R columns but only one row in this single neuron case. A constant 1 enters the neuron as an input and is multiplied by a scalar bias b. The net input to the transfer function f is n, which is the sum of the bias b and the product Wp. The neuron's output is a scalar in this case. If there exit more than one neuron, the network output would be a vector. 45 2.2.5.2.1.1 M ULTILAYER P ERCEPTRON N ETWORK A RCHITECTURES Commonly one neuron, even with many inputs, may not be sufficient. We might need five or ten, operating parallel, in what we will call a “layer". This concept of a layer is discussed below. A layer of neurons: A single-layer network of S is shown in Fig. 2.12. Note that each of the R inputs is connected to each of the neurons and that the weight matrix now has s rows. The layer includes the weight matrix, the summers, the bias vector b, the transfer function boxes and the output vector a. Each element of the input vector p is connected to each neuron through the weight matrix W. Each neuron has a bias bi, a summer, a transfer function f and an output ai. Taken together, the outputs form the output vector a. It is common for the number of inputs to a layer to be different from the number of neurons (i.e. R≠S). The input vector elements enter the network through the weight matrix W: W1,1 L W1,R W = M O M WS ,1 L WS ,R (2.46) The row indices of the elements of matrix W indicate the destination neuron associated with that weight, while the column indices indicate the source of the input for that weight. Thus, the indices in W3,2 say that this weight represents the connection to the third neuron from the second source. The S-neuron, R-input, one-layer network also can be drawn in abbreviated notation as shown in Fig. 2.13. Here again, the symbols below the variables tell that for this layer, p is a vector of length R, W is an S×R matrix and a and b are vectors of length S. As defined previously, the layer includes the weight matrix, the summation and multiplication operations, the bias vector b, the transfer function boxes and the output vector. 46 Figure 2.12: Layer of S neurons. Figure 2.13: Layer of S neurons, abbreviated notation. Multiple layers of neurons: Now consider a network with several layers which has been implemented in this project for the purpose of process identification. In this network each layer has its own weight matrix W, its own bias vector b, a net input vector n and an output vector a. Some additional notation should be introduced to distinguish between these layers. Superscripts are used to identify these layers. The number of the layer as a superscript is 47 appended to the names for each of these variables. Thus, the weight matrix for the second layer is written as W2. This notation is used in the three-layer network shown in Fig. 2.14. Figure 2.14: Three layer network. 2.2.5.2.1.2 S TRUCTURE AND O PERATION OF M ULTILAYER P ERCEPTRON N EURAL N ETWORK (MLP) MLP neural networks consist of units arranged in layers [29]. Each layer is composed of nodes and in the fully connected network considered here each node connects to every node in subsequent layers. Each MLP is composed of a minimum of three layers consisting of an input layer, one or more hidden layers and an output layer. The input layer distributes the inputs to subsequent layers. Input nodes have liner activation functions and no thresholds. Each hidden unit node and each output node have thresholds associated with them in addition to the weights. The hidden unit nodes have nonlinear activation functions and the outputs have linear activation functions. Hence, each signal feeding into anode in a subsequent layer has the original input multiplied by a weight with a threshold added and then is passed through an activation function that may be linear or nonlinear (hidden units). A typical three layer network is shown in Fig. 2.15. Only three layer MLPs will be considered in this work since these networks have been 48 shown to approximate any continuous function [23, 25, 30]. For the actual three layers MLP, all of the inputs are also connected directly to all of the outputs [29]. The training data consist of a set NV training patterns (xp,tp) where P represents the pattern number. In Fig. 2.15, XP corresponds to the N-dimensional input vector of the Pth training pattern and YP corresponds to the M-dimensional output vector from the trained network for the Pth pattern. For ease of notation and analysis, threshold on hidden units and output units are handled by assigning the value of one to an augmented vector component denoted by Xp (N+1). The output and input units have linear activations. The input to the Jth hidden unit, net P(j) is expressed [29] as: N +1 net p ( j ) = ∑ k =1 Whi ( j , k ) X p ( k ) 1 ≤ j ≤ Nh (2.47) With the output activation for the Pth training pattern, Op(j), being expressed by: O p ( j ) = f (net p ( j )) (2.48) The nonlinear activation is typically chosen to be the sigmoid function f (net p ( j )) = 1 1− e − net p ( j ) (2.49) In (2.47) and (2.48), the N input units are represented by the index K and Whi (J, K) denotes the weight connecting the Kth input unit to the Jth hidden unit. The overall performance of the MLP is measured by the mean square error (MSE) expressed by: E= 1 N ∑ Nv p =1 Ep = 1 N ∑ ∑ Nv M p =1 i =1 [t p (i) − y p (i )]2 (2.50) 49 Ep corresponds to the error for the Pth pattern and tp is the desired output for the Pth pattern. This also allows the calculation of the napping error for the ith output unit to be expressed by: Ei = 1 Nv ∑ M i =1 [t p (i ) − y p (i )]2 (2.51) with the ith output for the Pth training pattern expressed by: N +1 Y p (i) = ∑k =1Woi (i, k ) X p (k ) + ∑ j =h1Woi (i, j ).O p ( j ) N (2.52) In (2.52), Woi (i,k) represents the weight from the input nodes to the output nodes and Woh (i ,j) represents the weight from the hidden nodes to the output nodes. woi(1,1) Op(1) netp(1) woh(1,1) Xp(1) Yp(1) Xp(2) Yp(2) Xp(3) Xp(N) Yp(3 ) woh(M,Nh) woi(Nb,N) Input Layer netp(Nh) Op(Nh) Hidden Layer Yp(M) Output Layer Figure 2.15: Typical three layer multilayer perceptron neural network 2.2.6. TIME-SERIES IDENTIFICATION The description dynamic input-output models are more appropriate for representing the behavior of processes with a view to process monitoring, fault detection and real time control system design. The linear model structures are discussed in this section. They can handle mild nonlinearities. They can also result from linearization around an operating point. Inputs, outputs, disturbances and state variables are denoted as u, y, d and x, respectively. The models can be in 50 continuous time (differential equations) or discrete time (difference equations). For multivariable processes where ui(t), uz(t), ….,um (t) are the m inputs, the input vector u(t) at time t is written as a column vector. Similarly, the p outputs, and the n state variables are defined by column vectors: y1 (t ) y (t ) = M , y (t ) p u1 (t ) u (t ) = M , u (t ) m x1 (t ) x(t ) = M x (t ) n Disturbances d (t ) , residuals e(t ) = y (t ) − yˆ (t ) , and random noise attributed to input and output variables are also represented by column vectors with appropriate dimensions in a similar manner. Time series models can be cast as a regression problem where the regressor variables are the previous values of the same variable and past values of inputs and disturbances. A general linear discrete time model for a single variable y(t) can be written as y (t ) = η (t ) + w(t ) (2.53) where w (t ) is a disturbance term such as measurement noise and η (t ) is the noise free output η (t ) = G ( q , θ )u (t ) (2.54) with rational function G ( q , θ ) and input u (t ) . The function G ( q , θ ) relates the inputs to noisefree outputs whose values are not known because the measurements of the outputs are corrupted by disturbances such as measurement noise. The parameters of G ( q , θ ) are represented by the vector θ , and q is called the shift operator. Assume that relevant information for the current value of output y(i) is provided by past values of y(t) for ny previous time instances and past values of u(t) for nu previous instances. The relationship between these variables is 51 η (t ) + f1η (t − 1) + ... + f n η (t − n y ) = b1u (t ) + b2u (t − 1) + ... + bn u(t − (nu − 1)) y u (2.55) where f i , n = 1,2,..., n y and bi = 1,2,..., nu are parameters to be determined from data. Defining the shift operator q as y(t − 1) = q −1 y(t ) (2.56) Eq. (2.55) can be written using two polynomials in q η (t )(1 + f1q −1 + L + f n q y − ny ) = u (t )(b1 + b2 q −1 + L + bnu q − ( nu −1) ) (2.57) This equation can be written in a compact form by defining the polynomials F ( q ) = (1 + f1q −1 + L + f n y q − ny ) (2.58) B (q ) = (b + b q −1 + L + b q − ( nu −1) ) 1 2 nu Where η (t ) = G ( q , θ )u (t ) with G ( q, θ ) = (2.59) B(q) F (q) (2.60) Often the inputs may have a delayed effect on the output. If there is a delay of nk sampling times, Eq. (2.55) is modified as η (t ) + f1η (t − 1) + ... + f n η (t − n y ) = b1u (t − nk ) + b2u (t − (nk + 1)) + ... + bn u (t − (nu + nk − 1)) y u (2.61) The disturbance term can be expressed in the same way w(t ) = H ( q , θ ) e (t ) (2.62) Where, e(t) is white noise and −n −1 C (q ) 1 + c1q + L + cnc q c H ( q,θ ) = = D(d ) 1 + d1q −1 + L + d nd q −nd (2.63) The model (Eq. 2.53) can be written as y (t ) = G ( q , θ )u (t ) + H ( q , θ )e (t ) (2.64) 52 where, the parameter vector θ contains the coefficients bi, ci, di and fi of the transfer functions G ( q , θ ) and H ( q , θ ) . The model structure is described by five parameters ny, nu, nk, nc, and nd. Since the model is based on polynomials, its structure is finalized when the parameter values are selected. These parameters and the coefficients are determined by fitting candidate models to data and minimizing some criteria based on reduction of prediction error and parsimony of the model. Auto Regressive model with eXogenous inputs (ARX), Auto Regressive moving average model with eXogenous inputs (ARMAX), Output error model (OE), non-linear version NRMAX, NRMA are the frequently used time series identification models. One of the drawbacks of these models are their limited range of applicability; i.e. the extrapolation capabilities of these models beyond the range for which they are developed become poor. Cross correlation coefficient is a tool that can be used to check whether there is sufficient impact of the process input on process output, i.e. whether two time series data are correlated. In case of ARX model both the denominators of G and H will become same and the disturbance term C(q) becomes unity leaving the above equation like A( q ) y (t ) = B ( q )u (t ) + e (t ) (2.64) ARX model was used in the present project for the time series identification of phenol degradation process. All the machine learning algorithms including PCA, PLS, Clustering, PNN, ART1 and time-series identification are used in the present work for process identification, process quality monitoring and fault detection purpose and they have been discussed in the present chapter with their present state of art and application. 53 REFERENCES: 1. Chen, J. & Liao, C. 2002. Dynamic process fault monitoring based on neural network and PCA. Journal of Process Control, 12, 277–289. 2. Zhao, C., Wang, F., Lu, N. & Jia, M. 2007. Stage-based soft-transition multiple PCA modeling and on-line monitoring strategy for batch processes. Journal of Process Control, 17, 728–741. 3. De Gaetano, A., Panunzi, S., Rinaldi, F. & Sciandrone, M. 2009. A patient adaptable ECG beat classifier based on neural networks. Applied Mathematics and Computation, 213, 243-249. 4. Meleiro, L. C., Zuben, F. V. & Filho, R.M. 2009. Constructive learning neural network applied to identification and control of a fuel-ethanol fermentation process. Engineering Applications of Artificial Intelligence, 22, 201–215. 5. Sadrzadeh, M., Mohammadi, T., Ivakpour, J. & Kasiri, N. 2009.Neural network modeling of Pb2+ removal from wastewater using electrodialysis. Chemical Engineering and Processing, 48, 1371–1381. 6. Jyh-Cheng Jeng, 2010. Adaptive process monitoring using efficient recursive PCA and moving window PCA algorithms. Journal of the Taiwan Institute of Chemical Engineers, 41, 475–481. 7. Marchitana, N., Cojocarub, C., Mereutaa, A., Ducac, Gh., Cretescud, I. & Gontaa, M. 2010. Modeling and optimization of tartaric acid reactive extraction from aqueous solutions: A comparison between response surface methodology and artificial neural network. Separation and Purification Technology, 75, 273–285. 54 8. Bin Shams, M.A., Budman, H. M. & Duever, T. A. 2011. Fault detection, identification and diagnosis using CUSUM based PCA. Chemical Engineering Science, 66, 4488-4498. 9. Pendashteh, A. R., Fakhru’l-Razi, A., Chaibakhsh, N., Abdullah, L.C., Sayed, S.M. & Abidin, Z.Z. 2011. Modeling of membrane bioreactor treating hypersaline oily wastewater by artificial neural network. Journal of Hazardous Materials, 192, 568– 575. 10. Chang, P., & Lai, C. 2005. A hybrid system combining self-organizing maps with casebased reasoning in wholesaler’s new-release book forecasting. Expert Systems with Applications, 29(1), 183–192. 11. Hsieh, N. 2005. Hybrid mining approach in the design of credit scoring models. Expert Systems with Applications, 28(4), 655–665. 12. Kuo, R., Kuo, Y., & Chen, K. 2005. Developing a diagnostic system through integration of fuzzy case-based reasoning and fuzzy ant colony system. Expert Systems with Applications, 28(4), 783–797. 13. Shin, H., & Sohn, S. 2004.Segmentation of stock trading customers according to potential value. Expert Systems with Applications, 27(1), 27–33. 14. Tsai, C., Chiu, C., & Chen, J. 2005. A case-based reasoning system for PCB defect prediction. Expert Systems with Applications, 28(4), 813–822. 15. Raich, A., & Cinar, A. 1996. Statistical process monitoring and disturbance diagnosis in multivariable continuous processes. AIChE Journal, 42(4), 995-1009. 16. Wold, H. 1966. Estimation of principal components and related models by iterative least squares, In: Krishnaiah, P.R., Ed., Multi Variate Analysis II. Academic Press: New York, 391-420. 55 17. Geladi, P., & Kowalski, B.R. 1986. Partial least-squares regression: A tutorial. Anal. Chim. Acta., 185, 1-17. 18. Qin, S.J., & McAvoy, T.J. 1992. Nonlinear PLS modeling using neural network. Comput. Chem. Eng., 16 (4), 379-391. 19. Holcomb, T.R., &Morari, M. 1992. PLS/neural networks. Comput. Chem. Eng., 16(4), 393-411. 20. Malthouse, E.C., Tamhane, A.C., & Mah, R.S.H. 1997.Nonlinear partial least squares. Comput. Chem. Eng., 21(8), 875-890. 21. Zhao, S.J., Zhang, J., Xu, Y.M., & Xiong, Z.H. 2006. Nonlinear projection to latent structures method and its applications. Ind. Eng. Chem. Res., 45, 3843-3852. 22. Lee, D.S., Lee, M.W., Woo, S.H., Kim, Y. &Park, J.M. 2006. Nonlinear dynamic partial least squares modeling of a full-scale biological wastewater treatment plant. Process Biochemistry, 41, 2050-2057. 23. Cybenko, G. 1989. Approximation by superposition of a sigmoidal function, Math. Contr. Signals Syst., 2, 303–314. 24. Hornik, K. 1991. Approximation capabilities of multilayer feed forward networks. Neural Networks, 4, 251–257. 25. Hornik, K., Stinchcombe, M., & White, H. 1989. Multilayer feed forward networks are universal approximators. Neural Networks, 2, 359–366. 26. Richard, M.D., & Lippmann, R. 1991. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Comput., 3, 461–483. 27. Specht, D.F. 1990. Probabilistic Neural Networks. Neural Networks, 3, 109-118. 56 28. Haykin, S. 1999. Neural Networks: A Comprehensive Foundation. 2nd Edn., New Jersey: Prentice-Hall. 29. Walter, H.D., & Michael T.M. 2005. Recent Developments in Multilayer Perceptron Neural Networks. Proceedings of the 7th Annual Memphis Area Engineering and Science Conference, MAESC 2005. 30. Hornik, K., Stinchombe, M., & White, H. 1990. Universal Approximation of an unknown Mapping and its Derivatives Using Multilayer Feed forward Networks. Neural Networks, 3(5), 551-560. 57 CHAPTER3- PROCESS IDENTIFICATION The current chapter presents the identification of phenol degradation process. The organic pollutant phenol has been degraded by bacteria named Pseudomonas putida (ATCC: 11172). Four parameters namely temperature, pH, RPM and phenol dosage were varied systematically to produce 16 sets of useful data. Out of sixteen runs, three runs namely 4th, 6th and 16th runs were used to produce time series data which in turn was used for the identification of dynamics. The process dynamics was identified by using (ARX) and Artificial Neural Networks (ANN). A PLS regression model which can estimate the effect of each parameter on the phenol degradation process has been developed. 3.1. PHENOL AS AN ORGANIC POLLUTANT AND ITS REMOVAL Phenol is a commonly found pollutant in industrial waste effluents, like from the factories of iron-steel, coke petroleum, pesticide, paint solvent, pharmaceutical, wood processing chemicals, and pulp and paper [1-4]. Physicochemical processes (adsorption using activated carbon, for example) and biological processes are often used for removal of phenol and its derivatives from wastewater. The drawbacks of adsorption processes include the needs of adsorbent regeneration, the compensation to adsorbent loss, and the possibility of inducing secondary pollution [5-9]. On the other hand, activated sludge is capable of removing both heavy metals and variety of organic compounds in wastewater stream at a relatively low processing cost [10-15]. Activated sludge is basically a biomass that contains mainly bacteria and protozoa. The cell wall of the bacteria primarily consists of various organic compounds including chitin, acidic polysaccharides, lipids amino acids and other cellular compounds that could adsorb both 58 the heavy metals and various organics [16, 17]. The protozoa are unicellular, motile, relatively large eukaryotic cells, which could absorb organic compounds and lipids [1, 17]. Moreover, the P. putida had been identified as an effective strain for biodegradation of chlorophenol, resorcinol, and related compounds [18, 19]. 3.2. IDENTIFICATION OF DYNAMICS FOR PHENOL BIODEGRADATION Bioprocess technology is currently employed for the production of several commodity and fine chemicals. Because of the complex nature of microorganism growth and product formation or degradation of substrate in batch and fed-batch cultures, which are often used in preference to continuous cultures, the control of bioprocesses continues to pose a challenge to chemical engineers. Extensive developments in the area of bioprocess have begun, but much work remains to be done to couple model-based control methods to biochemical reactor technology. In a first principle approach, Haldane equation [20] has been frequently used to describe the degradation of phenol in pure or mixed cultures [21-26], which will be written as µ= µm S K S + S + S 2 Ki (3.1) Where µ is the specific growth rate, S is the phenol concentration and Ks, Ki& µm are constants. Usually parameters derived from batch experiments are used in Equation (3.1) to predict the response. Haldane equation predicts a global maximum specific growth rate (µ*) at a phenol 59 concentration S* and the specific growth rate asymptotically approaches zero as the substrate (phenol) concentration increases. Equation (3.1) is also used to describe the specific phenol uptake rate (qp) of washed cells of Pseudomonas putida in batch systems [27-29]. Notwithstanding its popularity, questions remain as to the adequacy of Equation (3.1) as a model of phenol degradation. The basis of Haldane equation is a hypothetical enzyme-substrate interaction: hi g _4 j 4g_ hk g_ _4 j 4g__ % g_4 j 4g 9 Where, the constants K1 and K2 are equivalent to KS and Ki, respectively of Equation (3.1). Similarly, the rate constant k is equivalent to µm/E0, where E0 is the total concentration of enzyme catalyzing the slowest reaction in the pathway for the consumption of substrate. Unfortunately, at a fundamental level, there is no mechanistic proof for the above scheme with respect to degradation of phenol [30]. Moreover, from a more practical perspective, prediction discrepancies exist in the literature. Although betterment of these models have been done in recent years [31, 32], the complexity is still there because of lack of knowledge in the metabolic path way of the phenol uptake and its utilization. In first principle approach if one wants to perfectly model the phenol degradation process, one has to go through the chemical mass and energy balances and should be aware of the rate limiting steps that control the degradation process of phenol, which clearly requires the knowledge of the reactions that are taking place inside the organism (nothing but metabolism). But in case of the prediction of phenol 60 degradation using data based modeling, the perfection is achieved by improving the quality and the amount of the input output data. Thus the complexity of knowing and analyzing all the metabolic pathways and their reactions required for the development of the process model is reduced. In this perspective, the present work attempts to propose an alternative to rate model for phenol degradation process, the data based modeling techniques like ANN, and ARX. PLS technique was used to develop an empirical model relating phenol degradation process with the variables like temperature, pH, RPM and Phenol loading at steady state. In order to predict phenol degradation as a function of five inputs namely temperature, pH, RPM, Phenol Loading and time a Multi-Layer Perceptron (MLP) network was trained selecting suitable numbers of input and hidden neurons, adjusting weights, selecting proper training algorithms, varying number of iterations, selecting proper learning rate etc. While using ARX for prediction of phenol degradation, the order of the model had to be selected and established. The time series data used in MLP network and for ARX prediction were tested apriori for their auto correlation and partial correlation coefficient, which could reveal the nonstationarity in the time series data, if any. It’s not the question of only using ARX and ANN, a thorough data preprocessing is a part of that activity. The ensemble of aforesaid activities qualifies ANN and ARX as data based models/black-box models and in the present study instead of first-principle models of the process of interest. 61 3.3. BIO-DEGRADATION OF PHENOL 3.3.1 STRAIN The strain being used for phenol degradation was a heterotrophic bacterium named Pseudomonas putida (ATCC: 11172). The strain was obtained from National Collection of Industrial Microorganisms (NCIM), Pune. The strain was supplied in the form of dry spores. The spores were cultured in the media specified by NCIL for the preparation of inoculum. The strain was sub cultured every two weeks for maintenance. 3.3.2 LABORATORY SCALE BENCH REACTOR A IIC-LABEAST bench-top bio-fermentor of 2.5 litter capacity has been used for the degradation purpose (Fig. 3.1). The reactor was connected to a high and low temperature thermostatic water bath to maintain constant temperature. An aeration of 1LPM was maintained for the supply of oxygen in the reactor. The fermentor was fully equipped with a pH electrode, stirrer and baffles. 3.3.3 MEDIA The media composition was decided in such way that it supplies all the necessary nutrients for the growth of P.putida. The constituents are of ‘basic mineral salt medium’ which is widely used for the growth of microorganisms. In this context, the following media composition was used for the phenol degradation [33]. K2HPO4 1.5g/L KH2PO4 0.5 g/L 62 (NH4)2SO4 0.5 g/L NaCl 0.5 g/L Na2SO4 3.0 g/L Yeast Extract 2 g/L Glucose 0.5 g/L FeSO4 0.002 g/L CaCl2 0.002 g/L The experiments were designed systematically by varying the variables namely Temperature, pH, RPM and Phenol Loading at four different levels each. A total of 16 combinations were made which can effectively give the picture of effects imposed by each parameter on the process. All the experiments were run for an incubation period of 36 hrs. Table 3.1 shows the different combinations containing four parameters at four different levels each and their corresponding phenol degradation percentage. 3.3.4 CHROMATOGRAPHIC ANALYSIS OF PHENOL The concentration of Phenol remained in the broth solution was determined using High Performance Liquid Chromatography (HPLC). “Jasco PU-2080Plus” high performance liquid chromatography was used for the analysis. The mobile phase used for the determination of phenol is a mixture of Isopropanol, Acetic Acid and water in a ratio 20:1:79 respectively. For this mobile phase and at a flow rate of 0.6 mL/min the retention time was found to be 4.5 min. Figure 3.2shows the chromatogram which gives a peak for phenol at a retention time of 4.5 min. 63 3.4. IDENTIFICATION OF PROCESS DYNAMICS USING ANN AND ARX In order to predict phenol degradation, Multi-Layer Perceptron (MLP) network and ARX were used. The time series data used in MLP network and for ARX prediction were tested apriori for their non-stationarity, if any. 3.4.1 ARTIFICIAL NEURAL NETWORK: The inputs to the ANN model are process measurements. The measurements are weighted individually or in groups and then combined using a nonlinear ‘activation function’ at a node referred to as a neuron, named after the nodes in the brain which combine information sent from the natural senses. A process ANN model is often composed of an input layer, an output layer, and one or more hidden layers of nodes. The traditionally used format of ANN is the feed forward neural network (FFN). In this network, the outputs of one layer of nodes serve as the inputs to the following nodes. Given a set of process measurements, the outputs of the neural network can be estimated parameter values or process variables. The weights applied to the inputs in the model are determined through the training processes. To train the ANN, complete process information, corresponding to the neural network inputs and outputs, were required and obtained from the sets of fermentation runs. The set of process input and output measurements spanned by the experimental data is termed the ‘experimental space’ and the ANN could predict outputs accurately within this range. Here the artificial neural network acted as a function approximator. The efficiency of a neural network mainly depends upon the data that were used for the training. The training data were self-sufficient in explaining the all the aspects that were 64 considered for the modelling of the process. To develop the rate model of phenol degradation process; three out of sixteen runs (4th, 6th and 16th) were used for the production of time series data/historical database with a time interval of 15 minutes. Taking the time series reading for every 15 minutes for 36 hrs results in (36×4=144) samples for one run. Analyzing 144 samples using HPLC for estimating the amount of phenol degraded is a big task. Even though it is enough to have one run to estimate the effect of time on the degradation process, three runs were chosen which included the max variations in the input variables, hence, enhanced the accuracy in estimating the dynamics of the process. The data was augmented before training to make it sufficient enough to represent the effect of each parameter on phenol degradation process. The data were randomized before training to remove any bias in the weight decay. The neural network model used for the prediction was Multi-Layer Perceptron (MLP) model. The number of neurons in the hidden layer varied between 2 and 20 and best of five networks were chosen for the prediction purpose. There were a total of five inputs (i.e. Temperature, pH, RPM, Phenol Loading and Time) in the input layers and one output in the output layer (i.e. % Phenol Degradation). A randomly selected 70% of the data were used for the training, 15% were used for the testing purpose and 15% were used for the validation purpose. The performance of the best five network combinations are presented in Table 3.2. All the networks were trained by using BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm and the error function used was SOS (Sum of Squares) function [34]. Different activation functions were used for both hidden and output layers and best five combinations were represented in Table 3.2. Figures 3.3and 3.4 depict the training and prediction performances for 65 best five developed MLPs. The representation of MLP is in the form of ‘MLP a-b-c’, where ‘a’ is the number of units in the input layer, ‘b’ is the number of units in the hidden layer and ‘c’ represents the number of units in the output layer. For example ‘MLP 8-12-1’ represents that there are 8 units in the input layer, 12 units in the hidden layer and one unit in the output layer. MLP includes a bias unit for each and every input variable that has been given to it. In the present ANN model, there are five input variables. The network should have 10 input units including five variables and five bias units. But there were only 8 units in the input layer because, while training the networks; the variable ‘Phenol Loading’ was defined as categorical variable as it accepted only four distinct values (namely 100, 200, 300 and 400 ppm) and did not vary with the time. Thus the network did not consider ‘Phenol Loading’ as an active input unit and a bias unit was not assigned against it. Hence the architecture had only 8 input units and one output unit. 3.4.2 AUTO REGRESSION MODELS WITH EXOGENOUS (ARX) INPUTS The time series identification using ARX model was done by using the time series data produced in 4th run. ARX model itself considers only one data channel (run) for the identification of dynamics in the process. So only one run (4th) was considered there, either of the 6th& 16th runs could have been used. In ARX model the disturbance term will be neglected. An Auto Regression models with eXogenous (ARX) inputs can be written as: A( q ) y (t ) = B ( q )u (t ) + e (t ) (3.2) Where, 66 A(q)- Transfer function developed from the output data in correspondence to the given input data B(q)– Transfer function developed from the input data. y(t) – Output i.e. phenol degradation percentage. u(t) – Input continuous variable i.e. Temperature/RPM/pH e(t) – Error q– Shift operator given by y(t)/y(t-1) Both A(q) and B(q) are the polynomials having their corresponding coefficients (a1,a2,a3,…,an and b1,b2,b3,…,bn) which are determined from the time series data. Three numbers of SISO transfer functions were developed by considering each of the three continuous variables (temperature, pH, and RPM) as inputs with percentage phenol degradation as output. The coefficients for each ARX model were given Table 3.3. The time series identification was done by using the ‘Identification Toolbox’ in MATLAB. Figure 3.5 shows the measured and simulated outputs and the fit of the model developed by correlating temperature with the output (phenol degradation %). The percentage fit values for the three models developed by correlating temperature, RPM and pH are 90.22%, 91.51% and 85.22% respectively. Thus the ARX models were able to identify the dynamics of the phenol biodegradation process reasonably well. In fine, it can be acclaimed that it is not the question of only using ARX and ANN, a thorough data preprocessing is a part of that activity. The aforesaid ensemble of activities justifies the inclusion of ARX and ANN as data based models. 3.5. PARTIAL LEAST SQUARES (PLS) REGRESSION Partial least squares technique was used as a function approximator or regressor. PLS attempts to find latent variables that capture the maximum variance in the data at the same time 67 achieve maximum correlation between predictor and predicted variables. PLS regression is a recent technique that generalizes and combines features from principal component analysis and multiple regressions. This prediction is achieved by extracting from the predictors a set of orthogonal factors called latent variables which have the best predictive power. The inputs and outputs of 16 numbers of runs were used for the PLS regression purpose. The inputs were temperature, pH, RPM and phenol dosage, whereas the output was phenol degradation percentage. It is to be mentioned that the database taken for developing the PLS model was a steady state one. Hence the regression model can only predict the final phenol degradation after the incubation period of 36 hrs for the chosen and interpolated combinations of input variables. The PLS regression of 16 different input combinations of four variables and their corresponding phenol degradation percentage as output resulted in a regression coefficient of 0.9669 which eventually was a good fit. The prediction versus actual output using PLS model is presented in Figure 3.6. The sixteen combinations of the inputs (temperature, pH, RPM and phenol dosage) for phenol degradation process and their corresponding PLS predicted outputs revealed acquiescent resemblance with the experimental results. In fine, it can be concluded that ANN and ARX based Phenol degradation dynamics emancipated encouraging results. PLS based empirical model developed can be helpful in designing the process with a view to sizing the equipment and utility requirements. 68 TABLES: Table 3.1: Different combinations of input parameters and their corresponding phenol degradation percentages as output. Run Temperature RPM pH Phenol Loading % Degradation 1 34.0 210 6.0 100 62.22 2 34.0 150 7.0 300 54.76 3 28.0 150 6.0 400 49.22 4 28.0 210 7.0 200 59.68 5 31.0 240 6.0 300 53.72 6 25.0 150 5.5 100 61.34 7 28.0 240 6.5 100 62.7 8 25.0 240 7.0 400 51.26 9 31.0 150 6.5 200 56.42 10 25.0 210 6.5 300 54.46 11 25.0 180 6.0 200 56.08 12 31.0 180 7.0 100 64.56 13 34.0 240 5.5 200 55.2 14 28.0 180 5.5 300 51.8 15 31.0 210 5.5 400 48.86 16 34.0 180 6.5 400 51.54 69 Training perf. 0.999826 0.999772 0.999792 0.999811 0.999837 Net. name MLP 8-12-1 MLP 8-16-1 MLP 8-20-1 MLP 8-12-1 MLP 8-20-1 Index 1 2 3 4 5 0.999875 0.999781 0.999885 0.999845 0.999809 Test perf. 0.999812 0.999782 0.999812 0.999781 0.999793 Validation perf. 0.072456 0.084205 0.093302 0.102242 0.077928 Training error 0.060513 0.104174 0.054573 0.073340 0.089405 Test error 0.081934 0.092511 0.082554 0.100799 0.087374 Validation error BFGS 315 BFGS 282 BFGS 344 BFGS 124 BFGS 227 Training algorithm Table 3.2: Summary of the network performances in ANN identified phenol degradation process. SOS SOS SOS SOS SOS Error function Logistic Tanh Logistic Tanh Logistic Hidden activation 70 Identity Tanh Logistic Exponential Logistic Output activation Table 3.3: Coefficients of three SISO transfer functions for three ARX models developed. Variable considered for ARX→ a1 q1 b1 a2 q2 b2 a3 q3 b3 a4 q4 b4 a5 q5 b5 a6 q6 b6 a7 q7 b7 a8 q8 b8 a9 q9 b9 Temperature - 0.8985 -4.247 ×10-15 - 0.1227 - 4.247×10-15 - 0.03465 - 4.247×10-15 - 0.1638 - 4.247×10-15 - 0.08709 - 4.247×10-15 - 0.02045 - 4.247×10-15 0.2004 -0.03953 -0.08341 -- RPM - 0.8985 -- 0.109 -- 0.02249 -- 0.1856 -- 0.0995 -- 0.01641 -0.009252 0.2055 -0.04895 -0.07851 -- pH - 0.8931 -- 0.1227 -- 0.03465 -- 0.1638 -- 0.08709 -- 0.02045 -0.2004 -0.03953 -0.08341 -- 71 FIGURES: Figure 3.1: Laboratory scale Bench-top Bio-Fermentor. 72 mV RUN_11301.DATA [Jasco Analog Channel 2] 650 600 550 500 450 400 350 300 200 150 100 50 SPW 0.20 STH 10.00 250 0 RT [min] 0 1 2 3 4 5 6 Figure 3.2: A sample chromatogram of phenol showing the peak at RT 4.5min. 73 Figure 3.3: Performance of networks in ANN based dynamic model for Phenol degradation (Training, validation and testing performance combined) 74 Figure 3.4: Prediction performance of the developed networks in ANN based dynamic model for Phenol degradation 75 Figure 3.5: Measured and simulated outputs and the fit of the ARX model developed by Identification Toolbox in MATLAB. 76 Figure 3.6: Predicted versus actual process outputs using PLS model. 77 REFERENCES: 1. Aksu, S. & Yener, J., 1998. Investigation of biosorption of phenol and monochlorinated phenols on the dried activated sludge. Process Biochem. 33, 649– 655. 2. Patterson, J.N. 1997. Waste Water Treatment Technology. Ann Arbor Science, New York. 3. Bulbul, G. & Aksu, Z., 1997. Investigation of wastewater treatment containing phenol using free and Ca-alginated gel immobilized Pseudomonas putida in a batch stirred reactor. Turkish J. Eng. Environ. Sci., 21, 175–181. 4. Sung, R.H., Soydoa, V. & Hiroaki, O. 2000. Biodegradation by mixed microorganism of granular activated carbon loaded with a mixture of phenols. Biotechnol. Letters, 22, 1093–1096. 5. Perrich, J.R. 1981. Activated Carbon. Adsorption for Wastewater Treatment. CRC Press, Boca Raton, Florida. 6. Brasquet, C., Rouss, J., Subrenat, E. & Le Cloriec, P. 1996. Adsorption and selectivity of activated carbon fibers application to organics. Environ. Technol., 17, 1245–1252. 7. Kumar, S., Upadhyay, S.N. & Upadhya, Y.D. 1987. Removal of phenols by adsorption of fly ash. J. Chem. Technol. Biotechnol., 37, 281– 292. 8. Singh, B.K. & Rawas, N.S. 1994. Comparative sorption equilibrium studies of toxic phenols on fly ash and impregnated fly ash. J. Chem. Technol. Biotechnol., 61, 307– 317. 78 9. Munaf, E., Zkin, R., Kurniad, R. & Kurniadi, I. 1993. The use of rice husk for removal of phenol from waste water as studied using 4- aminoantipyrine spectrophotometric method. Environ. Technol., 18, 355–358. 10. Tsezos, M. & Bell, J.P. 1989. Comparison of biosorption and desorption of hazardous organic pollutants by live and dead biomass. Water Res., 23, 563–568. 11. Chitra, S. & Chanrakasan, G. 1996. Response of phenol degrading Pseudomonas pictorium to changing loads of phenolic compounds. J. Environ. Sci. Health, A31, 599–619. 12. Vaker, D., Connell, C.H., Wells & W.W. 1967. Phosphate removal through municipality waste water treatment at San Antonjo Texas. J. Water Pull. Cont. Fed., 39, 750–771. 13. Zumriye, A., Derya, A., Elif, R. & Burcin, K. 1999. Simultaneous biosorption of phenol and nickel from binary mixtures onto dried aerobic activated sludge. Process Biochem., 35, 301–308. 14. Bux, F., Akkinson, B. & Kasan, K. 1999. Zinc biosorption by waste activated and digested sludges. Water Sci. Technol., 39 (10–11), 127– 130. 15. Ulku, Y., Goksel, N.D. & Celal, F.G. 1999. Effect of chromium (VI) on the biomass yield of activated sludge. Enzyme Microbiol. Technol., 25, 48–54. 16. Takahiro, K. & Eiichi, M. 1995. Survial of a non-flocculating bacterium Thiobacillus thioparus TK-1 inoculated to activated sludge. Water Res., 29, 2751–2754. 17. Brandt, C., Zeng, A. & Deckwer, W. 1997. Adsorption and desorption of pentachlorophenol on cells of M. chlorophenolicum Pep-I. Biotechnol. Bioeng., 55, 480–489. 79 18. Shular, M.C. & Karge, F. 1992. Bioprocess Engineering. Basic Concepts, Prentice Hall, New Jersey. 19. Clessceri, C.S., Greenberg, A.B. & Trussel, R.R. 1985. Standard methods for the determination of water and wastewater. APHA, Washington, DC 5.48–5.53. 20. Haldane, J.B.S. 1930. Enzymes. Longmans, London. 21. Andrews, J.F. 1968. A mathematical model for the continuous culture of micro organisms utilizing inhibitory substrates. Biotehnol. Bioeng., 10, 707-723. 22. D’ Adamo, P.D., Rozich, A.F., & Gaudy Jr., A.F. 1984. Analysis of growth data with inhibitory substrate. Biotechnol. Bioeng., 26, 397-402. 23. Edwards, V.h., Ko, C.R., & Balogh, A. 1982. Dynamics and control of continuous microbial propagators to subject substrate inhibition. Biotechnol. Bioeng., 14, 939974. 24. Pawlowsky, U., & Howell, J.A. 1973. Mixed culture biooxidation of phenol: I. Determination of kinetic parameters. Biotechnol. Bioeng., 28, 965-971. 25. Sokol, W. 1988. Dynamics of continuous stirred-tank biochemical reactor utilizing inhibitory substrate. Biotechnol. Bioeng., 31, 198-202. 26. Yang, R.D., & Humphrey, A.E. 1975. Dynamic and steady state studies of phenol biodegradation in pure and mixed cultures. Biotechnol. Bioeng., 17, 1211-1235. 27. SoKol, W. 1987. Oxidation of an inhibitory substrate by washed cells (oxidation of phenol by Pseudomonas Putida). Biotechnol. Bioeng., 30, 921-927. 28. Sokol. 1988. Uptake rate of phenol by Pseudomonas putida grown in unsteady state. Biotechnol. Bioeng., 32, 1097-1103. 80 29. Sokol, W., & Howell, J.A. 1981. Kinetics of phenol degradation by washed cells. Biotechnol. Bioeng., 23, 2039-2049. 30. Li, J., & Humphrey, A.E. 1989. Kinetic and flourimetric behavior of a phenol fermentation. Biotechnol. Lett., 11(3), 177-182. 31. Wang, S.J., & Loh, K.C. 1999. Modeling the role of metabolic intermediates in kinetics of phenol biodegradation. Enzyme Microb. Technol., 25, 77–184. 32. Alper, N., & Beste, Y. 2004. Modeling of phenol removal in a batch rector. Process Biochemistry, 40, 1233-1239. 33. Allsop, P.J., Chisti, Y., Moo-Young, M. & Sullivan, G.R. 1993. Dynamics of Phenol Degradation by Pseudomonas putida. Biotechnology and Bioengineering, 41, 572580. 34. Bishop, C., 1995. Neural Networks for Pattern Recognition. Oxford: University Press. 81 CHAPTER 4- PROCESS MONITORING& FAULT DETECTION The present chapter explores both product quality monitoring and process monitoring (fault detection of the process). For product quality monitoring purpose a wine dataset has been considered as case study and been explored with multivariate statistics, hence data based models in developing product quality monitoring methodologies. The unsupervised techniques like PCA and K-means clustering and supervised PLS technique were used to design statistical PLS based classifiers useful for wine quality monitoring. Artificial Adoptive Resonance theory network (ARTI) and Probabilistic neural network (PNN) were also used to design neural classifiers. In order to detect process faults (process monitoring), a time series data produced by phenol degradation was being used. CUMSUM and X-bar charts along with Moving Range and Range charts were used for the univariate process monitoring while PCA was used for the multivariate process monitoring in order to detect process abnormalities, if any. 4.1. WINE QUALITY MONITORING 4.1.1 WINE DATA SET Wine dataset contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars which represents three diferent qualities of wine. A chemical analysis of 178 Italian wines from three different cultivars yielded 13 measurements. This dataset is often used to test and compare the performance of various 82 classification algorithms. The chemical analysis determined 13 constituents found in each of the three types of wines. These are: 1) Alcohol 8) Nonflavanoid phenols 2) Malic acid 9) Proanthocyanins 3) Ash 10)Color intensity 4) Alcalinity of ash 11)Hue 5) Magnesium 12)OD280/OD315 of diluted wines 6) Total phenols 13)Proline 7) Flavanoids 4.1.2 DEVELOPMENT OF STATISTICAL CLASSIFIER The use of sensor arrays for producing features followed by the multivariate data analysis (MVDA) and different clustering techniques to discriminate among various samples paves the way to a successful design of a classifier. The use of various decision rules qualifies the classifier to be used for authentication purpose. Discrimination and classification of the feature variables produced from multisensory array owe a profound debt to the multivariate statistics these days. In these procedures, an underlying probability model must be assumed in order to calculate the posterior probability upon which the classification decision is made. One major limitation of the statistical models is that they work well only when the underlying assumptions are satisfied. The effectiveness of these methods depends to a large extent on the various assumptions or conditions under which the models are developed. Wine dataset containing178 numbers of wine samples possessing 13 numbers of feature variables was used to develop PLS based classifier, which is supposed to be very important development for on-line monitoring of wine samples. 83 4.1.2.1 IDENTIFICATION OF CLASSES PRESENT IN THE DATA USING PCA AND K-MEANS CLUSTERING A data matrix (178 ×13) was generated from the wine dataset considered. Eigenvector decomposition was done on the data matrix. It was found that the first three principal components captured 2/3rd of the total variance. Table 4.1 shows the percentage variance covered by each principal component and their corresponding Eigen values. Scores were generated along the first and second principal component direction. Figure 4.1 represents the scores along principal component 1 (PC1) versus principal component 2 (PC2). The projection of all the data points of 13 dimensions into two- dimensional plane makes us to visualize the patterns present in the whole data set. From the Figure 4.1we can see that the data is distributed into three distinct groups. The possible distinct groups have been apparently separated in the figure. For confirming this finding, the K-means clustering has been applied by defining different number of clusters. The mean squared error (MSE) of miss classification also represented that three numbers of clusters is optimum with the least MSE. The stable K-means statistics of the scores along PC1PC2 are presented in Table 4.2, which presents all the 3 cluster centroids, the number of data points pertaining to each cluster. As a part of hierarchical clustering, the distance matrix or a dissimilarity matrix was generated, which was symmetric along the diagonal (all the diagonal elements are zero). A hierarchical cluster tree was then created with that distance matrix to form two dendrogram (Figure 4.2) originated from the score vectors along PC1-PC2. In a dendrogram, the grouping of the branches down the tree represents the formation of clusters as the distance between the cluster centers becomes very low. Only 30 number of data points of whole data set were represented in the figure so that one can visualize the data points distinctly. Including more data points will make the dendrogram look clumsy. In the dendrogram we can see three groups; 84 first group forming at an inter cluster distance above 4.5 and the second at around 4.0 and third group around at inter cluster distance of 3.5. Once the patterns were found and confirmed that there are clearly three distinct groups in the whole dataset, K-means clustering has been applied in 13 dimensions with a predefined cluster number of three. The K-means has formed three groups and assigned all the data points of wine samples to any one of those three groups. Initially the data was having only the information about the 13 attributes, but now the data is having 13 attributes of all the wine samples (inputs) along with their corresponding group numbers (outputs). This information, provided by K-means was used for the development of different types of classifiers. 4.1.2.2 PLS BASED CLASSIFIER DEVELOPMENT & ITS PERFORMANCE PLS based classifier falls under the category of statistical classifier. This methodology has successfully been used in a large number of areas like metabonomic [1] and transcriptomic [2] studies. In the present problem PLS has been used as a classifier. The wine dataset considered was successfully clustered in to 3 groups by k-means clustering. The 3 different classes of wine samples were represented as three numbers of vectors and they were amalgamated to give raise the predictor 5 matrix. A corresponding response l matrix #mn o M indicating the wine class was generated. In the l matrix, ‘1’ represents the presence of an individual class and 0 represents its absence. Each of the 178 row vectors of the l matrix is either 1 0 0 or 0 1 0 or 0 0 1. Each of the wine samples was having randomly chosen 105 numbers of data (chosen out of 178 numbers) for training and 73 numbers for testing containing 13 feature variables. Three classes of 5 vectors were regressed by PLS to three numbers of characteristic6vectors. 85 Each of the regressed 6 vectors of the response matrix was given a class membership by choosing an entry most close to 1.0; uniquely existing among any one of the 3 columns of them. The designed PLS classifier was then used for predicting 6 s’ representing unknown sample classes corresponding to unknown 5 samples. The Nonlinear Iterative Partial Least Squares (NIPALS) algorithm originally proposed by Chiang et. al. (1992) was adapted here and used in the present work for detecting unknown wine class is as follows [3]: 1. Formation of training/ predictor matrix 5Fo : Out of the row vectors, row vector 1 - 45 = sample class 1; row vectors 46 - 90 = sample class 2; row vectors 91-103 = sample class 3. 2. Assigning response matrix 6Fo as class identification matrix consisting of 1’s and 0’s only. 3. Relating 5 and 6 by PLS regression; hence determination of the matrix of regression coefficient and loading matrices corresponding to 5 and 6 data. 4. Formation of test 5 matrix of dimensionmM o #M 5. Prediction of mM numbers of # o M6 vectors corresponding to mM numbers of test vectors of the test 5 dataset using the developed model in step 4. 6. Determination of mM numbers of # o M dimensional abs 6 , # vectors. 7. Detection of outlier; A sample not among the considered wine classes: Detection of the minimum entry value among all the 3 numbers of columns for each of mM numbers of # o M dimensional abs 6 , # vectors. If any of the minimum entry from the 3 number of columns (abs6 , #) for those4mM numbers of test vectors is > (± 15 % of 1.0), most likely; the sample class corresponding to that is an outcaste. 86 8. Generation of class membership: Detection of the minimum entry value among all the columns for each of the mM numbers of # o M dimensional abs 6 , # vectors is synonymous to find out the entry close to 1.0 among all the columns for each of the 4mMnumbers of # o M dimensional PLS regressed abs 6 vectors. The column number corresponding to the minimum entry is the class the .pC1?.2.J?.110C3 of that qth test vector (among 73 test vectors). In this way all the mM numbers of vectors of the test 5 dataset occupied a class membership ranging from 1 - . If the .pC1?.2.J?.110C3 # .pC1?.2.J?.110C3 r .pC1?.2.J?.110C3 4M pC?C?Cp/JHH # pC?C?Cp/JHH r4 pC?C?Cp/JHH M The classifier operated with almost 100 % efficiency. Figure 4.3shows the wine classification performance with 2 misclassifications over 73 numbers of samples.Misclassification means categorizing of one class of sample as other class (misclassification rate percentage is ((number of misclassifications/ total number of samples)*100). It was found to be 2.0 % for the present case). One of the samples belonged to class ‘1’ type wine was classified as wine class ‘2’ and the other sample belonged to sample class ‘2’ was classified as wine class ‘3’ by the designed PLS classifier. This development seems to be a potential one so far on-line monitoring of beverage quality is concerned. The code for the statistical classifier was developed using MATLAB. 87 4.1.3 DEVELOPMENT OF NEURAL CLASSIFIER Neural networks have been successfully applied to a variety of real world classification tasks in industry, business and science [4]. Applications include bankruptcy prediction [5,6], handwriting recognition [7,8], speech recognition [9,10], product inspection [11,12], fault detection [13,14], medical diagnosis [15,16], and bond rating [17,18]. A number of performance comparisons between neural and conventional classifiers have been made by many studies [1921]. In addition, several computer experimental evaluations of neural networks for classification problems have been conducted under a variety of conditions [22,23]. Two network architectures namely Probabilistic Neural Networks (PNN) and Adaptive Resonance Theory (ART) networks were employed for the classification among wine samples in the current study. Their performances were presented below. 4.1.3.1 PROBABILISTIC NEURAL NETWORK (PNN) BASED CLASSIFIER DEVELOPMENT & ITS PERFORMANCE PNN act as a better classifier than a function approximator. For developing a PNN based classifier the target vector which represents the cluster number of the each sample, has been converted from indices (1,2 and 3) to vectors ([1 0 0], [0 1 0] and [0 0 1]) as mentioned in the section 4.1.2.2. Once the output vectors were formed for all the samples, the whole dataset containing 178 samples of 13 attributes along with their cluster numbers has been redistributed into six randomly selected datasets containing 20%, 30%, 40%, 50%, 60% and 70% of data. The networks were trained by using those randomly selected data sets and tested against the remaining percentage of the dataset. The random selection of data was done using the method described by Box and Muller (1958) and Devroye (1986) [24, 25]. The networks were trained using ANN toolbox of MATALB. The performances of developed PNN networks as a classifier 88 are presented in Table 4.3. The results clearly indicated that maximum efficiency was achieved when the training was carried out using 70% of the data. Thus one can conclude that sufficient data is required for PNN to predict the posterior probabilities accurately. 4.1.3.2 ART1 NETWORK BASED CLASSIFIER DEVELOPMENT & ITS PERFORMANCE Processing the data according to the demand of ART1 network architecture plays an important role in their successful implementation. Normalization of data matrix and conversion of scaled data matrix into binary matrix are the major steps in this regard. The data has been normalized and converted into binary data according to the procedure mention in section 2.5.1.2.3 of Chapter 2. The 8%, 20%, 30%, 44% , 56%, 70%, 80% and 92% of 178 binary data samples were randomly chosen as training data sets as well as target data sets (n×14 matrix) for the ART1 network. The first 13 columns of the data set formed the 178 number of input feature vectors and 14th column serves as 178 numbers of targets or class tags. 3 different classes of training and testing pools were created out of 178 samples or feature vectors to design three different classifiers as ART1-1, ART1-2, & ART1-3. In a particular data pool; either training or testing, the presence of any one of the three classes of feature vectors are targeted as ‘1’ and any other class of feature vectors apart from that category in that pool are targeted as ‘0’. The ART1 networks developed were very robust as reflected by its classification efficiency of 100 % for all combinations of training and testing vectors. A randomly selected 20 % of the data from the data base were used for training each of the ART1-1, ART1-2, and ART1-3 networks (The two vigilance parameters (W of 0.4 & 0.7 with 100 iterations were used for the training of the networks) and simulation for three trained networks were done with the corresponding randomly selected samples containing 20%, 30%, 40%, 50%, 60% & 70% data. A representative (10×10 matrix, including only 9 feature columns + 1 target column) normalized data matrix is shown in 89 Table 4.4. Table 4.5 is a representative (10×12 matrix, including only 9 feature columns + 3 target columns) binary data matrix. Table 4.6, 4.7 & 4.8 represent the training time & efficiency of the networks.The code for the ART1 based classifier was developed using MATLAB. From the above one can clearly say that adaptive resonance theory has its own superiority over other classifiers. This is mainly due to the plasticity of the ART1 network. Due to this nature, an ART1 network accepts an input vector and classifies it as one of the categories depending upon which of the stored pattern it resembles within a specified tolerance otherwise a new category is created by storing that pattern as an input vector. It makes ART1 networks superior than conventional back propagation neural networks like PNN. The low performance of PNN can be attributed to the fact that it can learn nothing with continuously modifying their weights without a respite of getting a stationary setting. 4.2. ONLINE PROCESS MONITORING OF PHENOL DEGRADATION Monitoring of the process parameters which affect the product quality and keeping them in control will ensure the quality of the product. At the same time detection of process abnormalities or faults and their diagnosis is very important so far as the maintenance of the product specification is concerned. The successful functioning of any plant needs proper monitoring of the important process parameter and earlier detection of abnormal operating condition, which may avoid run away situations. The time series data produced during the phenol degradation experiments have been used to monitor three variables namely temperature, pH and RPM. 90 4.2.1 MONITORING OF PROCESS PARAMETERS USING UNIVARIATE STATISTICS Pchart integrated with R charts used CUSUM integrated with Moving Range Charts and O to monitor the process in a univariate manner. Figures 4.4, 4.5 & 4.6 show the CUSUM and Moving Range charts of temperature, pH and RPM respectively. Figures 4.7, 4.8 & 4.9 show the X-bar and Range charts of temperature, pH and RPM respectively. The data produced in the 4th run have been used for the monitoring of process parameters. The control limits for these charts have been customized by altering the multiples of σ values. The multiples of sigma were taken in such a way that the control limits will represent the 95% confidence intervals. From the figures one can observe that the deviation between the R-charts and Moving range charts are prominent. These are due to the fluctuations in the subset values i.e. triplicate values that have been taken for readings. Anyway, only X-bar and CUSUM charts were considered for the process fault detection purpose. Both X-bar charts and CUSUM charts indicated certain common operating points where the parameters were out of control in both the charts. CUSUM chart for temperature had shown 3 numbers of instances where the temperature went out of control whereas X-bar chart had shown 8 numbers of such instances having one commonality. The CUSUM & X-bar charts for pH had shown 4 numbers of deviations from normal operating condition having two numbers of common deviations. The CUSUM chart for RPM produced 5 outliers whereas X-bar produced 2 outliers having no common deviations. 23 numbers of faulty situations were found over four numbers of process parameters. All the deviation points were noted and checked whether the multivariate statistical process control can identify the same abnormalities. 91 4.2.2 MONITORING THE PROCESS PARAMETERS USING MULTIVARIATE STATISTICS The same time series data has been used for the Multivariate Statistical Process Monitoring (MSPC) purpose using PCA. In MSPC all three variables were considered at a time and their corresponding principal components were recognized. The data points were projected onto these new dimensions considering two components at a time i.e. PC1 & PC2, PC2 & PC3 and PC1 & PC3. As the principal components are the directions in which maximum variance is present, all the data points should fall within a cluster. Points that deviate from the normal operating conditions should fall apart. Figures 4.10, 4.11 & 4.12 present the projections of all the process variables/features on to PC1&PC2, PC1&PC3 and PC2&PC3 respectively. The Ellipses in all the figures represent the 95% confidence levels of the axes values. The major and minor axes of the ellipse are defined as: Centroid=(µx ,µy )4 Length of Major axis a=µx ±0.95Rx /24 Length of Minor axis b=µy ±0.95Ry /24 Where, µ is the mean value of the corresponding coordinates and R is the range for the same. The points out of the ellipse represent the outliers. Here one can alter the confidence levels to maintain stringent control limits. Projections on PC1&PC2, PC1&PC3 and PC2&PC3 have produced 11, 15 and 12 outliers, respectively, producing a total of 26 outliers excluding the common points. In comparison to the traditional SPM, the MSPC yielded 2 new outliers or abnormal operating situations namely at 3rd and 102nd instant. Thus multivariate approach in process monitoring always helps in detecting abnormal conditions in a process where individual process variables/features may seem to be under control but their combination may produce an 92 abnormal situation affecting the process in an adverse way. One of the major characteristics of multivariate data is that the variables being measured are almost never independent, but rather, they are highly correlated with one another at any given time. Hence multivariate process monitoring will be the best approach to monitor a process where many inter related process variables are involved. 93 TABLES: Table 4.1: Eigen values and % variances captured by the Principal Components Principal Component 1 2 3 4 5 6 7 8 9 10 11 12 13 Eigen value 4.705850 2.496974 1.446072 0.918974 0.853228 0.641657 0.551028 0.348497 0.288880 0.250902 0.225789 0.168770 0.103378 %Total Variance 36.19885 19.20749 11.12363 7.06903 6.56329 4.93582 4.23868 2.68075 2.22215 1.93002 1.73684 1.29823 0.79521 Cumulative Eigen values 4.70585 7.20282 8.64890 9.56787 10.42110 11.06276 11.61378 11.96228 12.25116 12.50206 12.72785 12.89662 13.00000 Cumulative % 36.1988 55.4063 66.5300 73.5990 80.1623 85.0981 89.3368 92.0175 94.2397 96.1697 97.9066 99.2048 100.0000 Table 4.2: Statistics of K-means Clustering (Score along PC1-PC2) Cluster Identity No. of Samples Cluster Centroid 1 49 1.2613,0.7622 2 61 -1.0521,0.6057 3 68 0.0349,-1.0955 Table 4.3: Classification performance of the developed PNN networks. % Accuracy for test sets Training Set 20% 20% 30% 40% 50% 60% 70% -- 55.88235 69.69697 66.66667 78.78788 90.90909 30% 43.75 -- 64.58333 70.83333 70.83333 83.33333 40% 48.78049 56.09756 -- 71.95122 79.26829 84.14634 50% 47.31183 57.6087 70.65217 -- 72.82609 83.69565 60% 46.46465 49.49495 71.71717 69.69697 -- 85.85859 70% 47.72727 54.54545 68.18182 71.21212 74.24242 -- 94 Table 4.4: Representative normalized data (10×10) matrix for training ART1 and PNN networks Sample No Alcohol Alcalinity Magnesium Phenols Color Hue Dilution Proline 1 0.641 0.290 Ash 0.397 0.449 1.000 0.222 0.561 0.353 0.667 Cluster # 1 2 0.024 0.347 0.000 0.000 0.129 0.111 0.255 0.235 0.019 1 3 0.000 0.815 0.726 0.755 0.161 0.222 0.745 0.471 1.000 1 4 0.725 0.484 0.493 0.571 0.548 1.000 1.000 0.118 0.596 2 5 0.048 1.000 1.000 1.000 0.710 0.222 0.184 1.000 0.365 1 6 0.623 0.331 0.425 0.408 0.516 0.570 0.898 0.706 0.462 1 7 0.737 0.419 0.425 0.347 0.000 0.000 0.010 0.471 0.468 1 8 0.539 0.645 0.644 0.653 0.806 0.074 0.000 0.529 0.000 3 9 1.000 0.234 0.041 0.286 0.032 0.222 0.480 0.412 0.468 1 10 0.419 0.000 0.178 0.490 0.065 0.356 0.653 0.000 0.385 3 Table 4.5: Representative binary data (10×12) matrix for training ART1 networks Sample No Alcohol Ash Alcalinity Magnesium Phenols Color Hue Dilution Proline Cluster 1 Cluster 2 Cluster 3 1 1 0 0 0 1 0 1 0 1 1 0 0 2 0 0 0 0 0 0 0 0 0 1 0 0 3 0 1 1 1 0 0 1 0 1 1 0 0 4 1 0 0 1 1 1 1 0 1 0 1 0 5 0 1 1 1 1 0 0 1 0 1 0 0 6 1 0 0 0 1 1 1 1 0 1 0 0 7 1 0 0 0 0 0 0 0 0 1 0 0 8 1 1 1 1 1 0 0 1 0 1 0 1 9 1 0 0 0 0 0 0 0 0 0 0 0 10 0 0 0 0 0 0 1 0 0 0 0 1 95 Table 4.6: Performance of the ART1 networks with vigilance parameter 0.4 & 100 iterations ART1-1 ART1-2 ART1-3 Training Testing vector % vector% 8 92 0.177 100 0.199 100 0.200 100 20 80 0.275 100 0.371 100 0.305 100 30 70 0.384 100 0.360 100 0.509 100 44 56 1.214 100 1.244 100 1.309 100 56 44 1.44 100 1.419 100 1.59 100 70 30 1.29 100 1.442 100 1.816 100 80 20 1.248 100 1.495 100 1.200 100 92 8 2.646 100 2.690 100 2.015 100 Training time (s) Efficiency% Training Efficiency% time(s) Training time(s) Efficiency% Table 4.7: Performance of the ART1 networks with vigilance parameter 0.7& 100 iterations Training Testing Vector % vector% ART1-1 Training Efficiency time (s) ART1-2 Training Efficiency time (s) ART1-3 Training Efficiency time (s) % 8 92 0.169 100 0.216 100 0.220 100 20 80 0.317 100 0.356 100 0.266 100 30 70 0.456 100 0.376 100 0.577 100 44 56 1.02 100 0.964 100 0.986 100 56 44 1.551 100 1.589 100 1.459 100 70 30 1.558 100 1.591 100 1.808 100 80 20 1.697 100 1.604 100 1.913 100 92 8 2.811 100 2.775 100 2.684 100 96 Table 4.8: Performance of ART1 networks trained with randomly selected 20% data ART1-3 ART1-2 ART1-1 Test Data set Computation Time (s) Efficiency Computation Time (s) Efficiency Computation Time (s) Efficiency 20% 30% 40% 50% 60% 70% ρ=0.4 0.875 1.0046 0.9709 1.0073 0.872 0.9029 ρ=0.7 0.955 0.9366 1.0522 0.8738 1.0361 0.9216 ρ=0.4 100% 100% 100% 100% 100% 100% ρ=0.7 100% 100% 100% 100% 100% 100% ρ=0.4 0.892 0.8673 1.0057 0.966 0.9605 0.9626 ρ=0.7 1.0505 1.1874 1.0369 1.0149 1.004 0.9506 ρ=0.4 100% 100% 100% 100% 100% 100% ρ=0.7 100% 100% 100% 100% 100% 100% ρ=0.4 1.0115 0.9378 1.0767 0.8299 0.9825 0.8963 ρ=0.7 1.0596 0.9239 0.9363 0.9807 1.074 1.0218 ρ=0.4 100% 100% 100% 100% 100% 100% ρ=0.7 100% 100% 100% 100% 100% 100% 97 FIGURES: Figure 4.1: Discrimination and clustering of scores along PC1-PC2 98 Dendrogram (based on average linkage and 30 nodes) 4.5 4 Inter-cluster distance 3.5 3 2.5 2 1.5 1 0.5 21 22 29 23 10 25 9 2 24 8 11 26 30 20 13 14 27 16 18 19 28 1 6 3 17 7 12 5 4 15 Wine sample Figure 4.2: Dendrogram on score along PC1-PC2 99 Figure 4.3: Performance of the developed PLS based classifier 100 Figure 4.4: CUSUM and Moving range chart of temperature. Figure 4.5: CUSUM and Moving Range chart of pH. 101 Figure 4.6: CU-SUM and Moving R Charts of RPM. Figure 4.7: X-bar and Range chart of temperature. 102 Figure 4.8: X-bar and Range chart of pH. Figure 4.9: X-bar and Range chart of RPM 103 Figure 4.10: Projections on PC-1 and PC-2 with ellipse representing 95% confidence. Figure 4.11: Projections on PC-1 and PC-3 with ellipse representing 95% confidence. 104 Figure 4.12: Projections on PC-1 and PC-3 with ellipse representing 95% confidence. 105 REFERENCES: 1. Jonsson, P., Bruce, S.J., Moritz, T., Trygg, J., Sjo¨stro¨m, M., Plumb, R., Granger, J., Maibaum, E., Nicholson, J.K., Holmes, E., & Antti, H. 2005. Extraction, interpretation and validation of information for comparing samples in metabolic LC/MS data sets. Analyst, 130, 701–707. 2. Perez-Enciso, M., & Tenenhaus, M. 2003. Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach. Hum. Genet., 112, 581–592. 3. Chiang, Y.Q., Zhuang, Y.M & Yang, J.Y. 1992. Optimal Fisher discriminant analysis using the rank decomposition. Pattern Recognition, 25, 101-111. 4. Widrow, B., Rumelhard, D.E., & Lehr, M.A. 1994. Neural networks: Applications in industry, business and science. Commun. ACM, 37, 93–105. 5. Altman, E.I., Marco, G., & Varetto, F., 1994. Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (theItalian experience). J. Bank. Finance, 18, 505–529. 6. Lacher, R.C., Coats, P.K., Sharma, S.C., & Fant, L.F. 1995. A neural network for classifying the financial health of a firm. Eur. J. Oper. Res., 85, 53–65. 7. Guyon, I. 1991. Applications of neural networks to character recognition. Int. J. Pattern Recognit. Artif. Intell., 5, 353–382. 8. Knerr, S., Personnaz, L., & Dreyfus, G. 1992. Handwritten digit recognition by neural networks with single-layer training. IEEE Trans. Neural Networks, 3, 962–968. 9. Bourlard, H., & Morgan, N. 1993. Continuous speech recognition by connectionist statistical methods. IEEE Trans. Neural Networks, 4, 893–909. 106 10. Lippmann, R.P. 1989.Review of neural networks for speech recognition. Neural Comput., 1, 1–38. 11. Lampinen, J., Smolander, S., & Korhonen, M. 1998. Wood surface inspection system based on generic visual features, in Industrial Applications of Neural Networks, Soulie, F.F., &Gallinari, P., Eds, Singapore: World Scientific, 35–42. 12. Petsche, T., Marcantonio, A., Darken, C., Hanson, S.J., Huhn, G.M., & Santoso, I. 1998. An autoassociator for on-line motor monitoring, in Industrial Applications of Neural Networks, Soulie, F.F., & Gallinari, P., Eds, Singapore: World Scientific, 91– 97. 13. Barlett, E.B., & Uhrig, R.E. 1992. Nuclear power plant status diagnostics using artificial neural networks. Nucl. Technol., 97, 272–281. 14. Hoskins, J.C., Kaliyur, K.M., & Himmelblau, D.M. 1990. Incipient fault detection and diagnosis using artificial neural networks, in Proc. Int. Joint Conf. Neural Networks, 81–86. 15. Baxt, W.G. 1990. Use of an artificial neural network for data analysis in clinical decision-making: The diagnosis of acute coronary occlusion. Neural Comput., 2, 480–489. 16. Baxt, W.G. 1991. Use of an artificial neural network for the diagnosis of myocardial infarction. Ann. Internal Med., 115, 843–848. 17. Dutta, S., & Shekhar, S. 1988. Bond rating: A non-conservative application of neural networks, in Proc. IEEE Int. Conf. Neural Networks, 2, San Diego, CA, 443–450. 107 18. Surkan, J., & Singleton, J.C. 1990. Neural networks for bond rating improved by multiple hidden layers, in Proc. IEEE Int. Joint Conf. Neural Networks, 2, San Diego, CA, 157–162. 19. Curram, S.P., & Mingers, J. 1994. Neural networks, decision tree induction and discriminant analysis: An empirical comparison. J. Oper. Res. Soc., 45(4), 440–450. 20. Huang, W.Y., & Lippmann, R.P. 1987. Comparisons between neural net and conventional classifiers, in IEEE 1st Int. Conf. Neural Networks, San Diego, CA, 485–493. 21. Michie, D., Spiegelhalter, D.J., & Taylor, C.C., Eds. 1994. Machine Learning, Neural, and Statistical Classification, London, U.K.: Ellis Horwood. 22. Patwo, E., Hu, M.Y., & Hung, M.S. 1993. Two-group classification using neural networks. Decis. Sci., 24(4), 825–845. 23. Subramanian, V., Hung, M.S., & Hu, M.Y. 1993. An experimental evaluation of neural networks for classification. Comput. Oper. Res., 20, 769–782. 24. Box, G.E.P., & Muller, M.E. 1958. A note on the generation of random normal deviates. Annals of Mathematical Statistics, 29, 610-611. 25. Devroye, L. 1986. Non-uniform random variate generation. Springer, New York. 108 CHAPTER 5- CONCLUSION 5.1. CONCLUSION Present work addressed several problems related to process identification, process quality monitoring and process fault detection with the help of machine learning algorithms. The statistical and neural techniques including PCA, K-means clustering, PLS, PNN, and ART1 were used to approach the best possible solutions of the problems taken up. Efficient pre-processing of process data, design of efficient algorithms have been the key steps for those data based models developed. Efficient models thus developed could identify and predict the process without a priori knowledge of the process like Phenol degradation by Pseudomonas putida (ATCC: 11172). The time series data generated in this work were used to develop ANN and ARX based predictive models relating phenol degradation process with the variables like temperature, pH, RPM and Phenol loading at any particular instant. PLS technique was used to develop an empirical model relating phenol degradation process with the variables like temperature, pH, RPM and Phenol loading at steady state. Classical analytical techniques such as various chromatography, spectrometry, and etc are used for the determination of different characteristics of food and beverage samples. However they are time-consuming, expensive, and laborious which can hardly be done on-site or on-line. Development of a feature based classifier could circumvent the problem of monitoring food quality without relating instrumental analysis to biological sensing like ageing and spoilage of the product. A wine data set containing 178 samples and their corresponding 13 features was taken as a case study. The unsupervised technique like PCA and K-means clustering were used to 109 reduce the dimensionality and classify the samples into three groups followed by the development of supervised classifiers using various machine learning algorithms like Artificial Adoptive Resonance theory network (ARTI), Probabilistic neural network (PNN) and Partial least squares (PLS). All the designed classifiers emancipated encouraging performance. The univariate and multivariate statistical monitoring of phenol degradation process were carried out to detect process abnormalities. Different SPC charts and PCA were used monitoring the process of phenol degradation; hence identification of process faults, if any. A biodegradation process of an organic pollutant phenol was used for the fault detection purposes. The detection of fault followed by its diagnosis is extremely important for effective, economic, safe and successful operation of a process. 5.2. FUTURE RECOMMENDATION The following are the future recommendations: • Development of data based model for identification of more complex processes. • Development of ART2 and PLS based classifiers for different beverage and water classification with a view to on-line process quality monitoring. • Monitoring, detecting, and diagnosing faults in more complex processes using hierarchical and multi-way PCA. 110

Download PDF

advertisement